... that is the question.
This is a problem that I encountered long time back, and one that re-iterated an important lesson.
As I have mentioned before, our software products are supported on different UNIX flavors - Solaris, Linux, AIX and HP. At one time, we were perplexed by a difference in report on HP than the other platforms [it was either additional or missing messages, I do not remember the case now; but the behavior on HP was incorrect for sure]. Suspecting memory corruption, we ran Purify, but it did not reported any errors. And neither did valgrind.
Then began the tedious process of debugging - in such cases the way we usually work is to start two debugging sessions in parallel - one on the port on which the behavior is correct, and one on the port with the incorrect behavior, and compare step-by-step execution on the two platforms [which is a cumbersome process, specially if the testcase is non-trivial]. Me and a colleague had spent a few hours on this problem, when we finally noticed the difference - a variable was compared for inequality with 0 [variable > 0] - on other platforms the result of this operation was TRUE, while on HP the result was FALSE. This was strange, as the variable seemed to have the same value in both the places. And then, the inspiration struck us - the variable in question was a single-bit integer bit-field, and the value was '1'.
Now the question is - what should the value of such a variable be ? For integers, the MSB [most significant bit] is the sign-bit. In case of single-bit integers, should the only bit [which is also the MSB] be treated as the value-bit, or the sign-bit ??
In normal case [platforms other than HP], the single bit was treated as the value, and value of the bit-field was interpreted as '1'. On HP, the single bit was treated as the sign bit, and the value of the bit-field was interpreted as '-1'.
Moral of the Story : Always declare the bit-fields as 'unsigned' [by default they will be signed]. And if the bit-field is expected to take negative values, specify an additional bit explicitly for the signedness informaiton.
This is a problem that I encountered long time back, and one that re-iterated an important lesson.
As I have mentioned before, our software products are supported on different UNIX flavors - Solaris, Linux, AIX and HP. At one time, we were perplexed by a difference in report on HP than the other platforms [it was either additional or missing messages, I do not remember the case now; but the behavior on HP was incorrect for sure]. Suspecting memory corruption, we ran Purify, but it did not reported any errors. And neither did valgrind.
Then began the tedious process of debugging - in such cases the way we usually work is to start two debugging sessions in parallel - one on the port on which the behavior is correct, and one on the port with the incorrect behavior, and compare step-by-step execution on the two platforms [which is a cumbersome process, specially if the testcase is non-trivial]. Me and a colleague had spent a few hours on this problem, when we finally noticed the difference - a variable was compared for inequality with 0 [variable > 0] - on other platforms the result of this operation was TRUE, while on HP the result was FALSE. This was strange, as the variable seemed to have the same value in both the places. And then, the inspiration struck us - the variable in question was a single-bit integer bit-field, and the value was '1'.
Now the question is - what should the value of such a variable be ? For integers, the MSB [most significant bit] is the sign-bit. In case of single-bit integers, should the only bit [which is also the MSB] be treated as the value-bit, or the sign-bit ??
In normal case [platforms other than HP], the single bit was treated as the value, and value of the bit-field was interpreted as '1'. On HP, the single bit was treated as the sign bit, and the value of the bit-field was interpreted as '-1'.
Moral of the Story : Always declare the bit-fields as 'unsigned' [by default they will be signed]. And if the bit-field is expected to take negative values, specify an additional bit explicitly for the signedness informaiton.
3 comments:
gosh i take back my comment on ur other blog, if i were given this i'd never be able to debug it.
Very interesting. Never knew about this.
@Maverick: Lol! No, I am sure anyone could have solved it - you just need to be open, and inquisitive.
@Pooja: Neither did I, until I hit it :-))
Post a Comment