Well, the title is just a catchy line I "borrowed" from a friend's custom message. The problem I am about to discuss does not have to do with pointers, but it indeed has to do with uninitialized variables.
The software product that I work on, is supported on three different UNIX platforms (Solaris, AIX, Linux), on different flavors of each of these. When a test cases starts failing on some of the platforms, especially on a random basis, it is fairly safe to assume that a memory corruption has happened. The primary software tool that we use to analyze memory corruptions is IBM Rational Purify.
A few days back some testcases in our test suite started failing due to missing messages from the log file - the failures were random, mostly on Solaris 9 and 10, and some times on AIX (almost never on Solaris 8 and Linux EE and OEE). I was almost certain that a memory corruption had been introduced in the code. What was surprising was that there was one particular message that went missing, and that the failures existed only in one stream, though it was not very different from two other streams, on which no such occurences were reported. But such is the nature of memory corruptions.
So, I ran Purify on one such testcase, but it reported no error.
Then, since I was fairly confident that it was nothing but a corruption, I tried Valgrind as well. Valgrind is a free software from GNU, available only on Linux (Purify is available for both Solaris and Linux), and it does not have a fancy GUI like Purify. But then, it does not have a fancy price tag either. [My primary development platform is Solaris, and the company buys Purify licenses, so my first preference is to use Purify, rather than any other tool.]
Valgrind did point out read of uninitialized memory - the value of a bit-field was tested to issue the message under analysis, and this bit-field was not initialized in some scenarios.
The interesting part to note here is why was the problem not reported by Purify, which is usually quite accurate - it owes to the way bit-fields are stored in a structure or a class object, and retrieved from the memory. When a structure (or an object) declares some bit-fields, these are packed together, and padded with empty bits to align the object at the word boundary. When the value of a bit-field is read, the OS reads the complete word, rather than the individual field. Purify works on the granularity of a word, so it will report an uninitialized memory read if some of the bits of the word are not initialized. Now, the empty bits that were padded for alignment will obviously ALWAYS be uninitialized; so to avoid false warnings, in the default mode Purify suppresses the uninitialized read messages in case of bit-fields.
For those who are familiar with Purify, the Purify error code for uninitialized read is UMR [Uninitialized Memory Read]. For bit-fields, the warning that is issued (and which is suppressed by dfault) is UMC [Uninitialized Memory Copy].
The software product that I work on, is supported on three different UNIX platforms (Solaris, AIX, Linux), on different flavors of each of these. When a test cases starts failing on some of the platforms, especially on a random basis, it is fairly safe to assume that a memory corruption has happened. The primary software tool that we use to analyze memory corruptions is IBM Rational Purify.
A few days back some testcases in our test suite started failing due to missing messages from the log file - the failures were random, mostly on Solaris 9 and 10, and some times on AIX (almost never on Solaris 8 and Linux EE and OEE). I was almost certain that a memory corruption had been introduced in the code. What was surprising was that there was one particular message that went missing, and that the failures existed only in one stream, though it was not very different from two other streams, on which no such occurences were reported. But such is the nature of memory corruptions.
So, I ran Purify on one such testcase, but it reported no error.
Then, since I was fairly confident that it was nothing but a corruption, I tried Valgrind as well. Valgrind is a free software from GNU, available only on Linux (Purify is available for both Solaris and Linux), and it does not have a fancy GUI like Purify. But then, it does not have a fancy price tag either. [My primary development platform is Solaris, and the company buys Purify licenses, so my first preference is to use Purify, rather than any other tool.]
Valgrind did point out read of uninitialized memory - the value of a bit-field was tested to issue the message under analysis, and this bit-field was not initialized in some scenarios.
The interesting part to note here is why was the problem not reported by Purify, which is usually quite accurate - it owes to the way bit-fields are stored in a structure or a class object, and retrieved from the memory. When a structure (or an object) declares some bit-fields, these are packed together, and padded with empty bits to align the object at the word boundary. When the value of a bit-field is read, the OS reads the complete word, rather than the individual field. Purify works on the granularity of a word, so it will report an uninitialized memory read if some of the bits of the word are not initialized. Now, the empty bits that were padded for alignment will obviously ALWAYS be uninitialized; so to avoid false warnings, in the default mode Purify suppresses the uninitialized read messages in case of bit-fields.
For those who are familiar with Purify, the Purify error code for uninitialized read is UMR [Uninitialized Memory Read]. For bit-fields, the warning that is issued (and which is suppressed by dfault) is UMC [Uninitialized Memory Copy].
2 comments:
nice blog and good post! I like the bug-free joke :D
@Sameer: Thanks! And I am glad you found it useful.
Post a Comment