Tuesday, April 17, 2007

The 64-bit saga

I have spent the better part of last six months in porting the product(s) that I work on, to 64-bit architectures of the supported platforms. In this effort, I learnt a lot of new things, though (unfortunately or fortunately) I did not have to deal with the endian-ness issues, due to the nature of the code and its applications. One of the most interesting parts was to see things coming into practice that I had only read in the books (but I'll still swear by Kerninghan and Ritchie!).

I have tried to document the problems that I found interesting (the 'interesting' adjective is only in retrospect; at the time I encountered them, they were just plain nasty!). The current focus is on hidden problems in existing code, most of which are what I call Casting Ouches, but usually refer to all of them as Coding Malpractice.

Many issues that I am going to cite are very straightforward, yet it is surprising that how often they are embedded in code. Most of these examples (and none of these is hypothetical) are simply bad coding practices. But they can be suicidal when a change in scenario occurs – it happened with me, when I compiled the code on 64-bit architecture. The code that had been working fine on 32-bit architecture, started throwing up problems the moment it was run on 64-bit. I was seeing crashes and incorrect results – I cannot say which is worse.

It goes without saying that on 32-bit architectures, the pointers (addresses) are 32-bit long, and on 64-bit architectures, pointers are 64-bit long. Further, on the UNIX flavors (Solaris, Linux, AIX, HP) that the code is support on, both int and long are 32-bit on 32-bit architectures. On 64-bit, int is 32-bit, while long is 64-bit.

P.S. Most of this work was done on Sun Solaris 8, using Workshop graphical debugging tool. In few cases, Purify and Valgrind lent a helping hand.

No comments: