Wednesday, October 7, 2009

Snap dumps

While working on the mystery of the extra database read, I added the beginnings of a simple "snap dump" capability to the object-relational mapping library that I maintain at work.

I'm not sure how common the notion of snap dumps is anymore.

I learned about snap dumps 30 years ago, when I was working on mainframes; these dumps were, and perhaps still are, quite common in the mainframe world.

I think that the word "snap" may come from "snapshot", although I remember that IBM mainframe guys also had some sort of tortued acronym for them. The idea of a snap dump is:
  • It's initiated by the application software, not by the operating system itself
  • It contains information about the contents of program memory which is particularly relevant to the application itself (as opposed to an exhaustive dump of all of the known memory)
  • It is often formatted and organized for direct reading by developers; that is, it is emitted in text form, not binary form
  • It is intended for post-mortem diagnosis of serious internally-detected error conditions
In my particular case, since I have been hunting a problem related to my object cache, the snap dump that I produce has lots of information about the state and contents of the object cache. Over time, as I use the snap dump feature for hunting other bugs, its contents and legibility may improve and change.

For now, I'm pleased to have some basic infrastructure in place, since a primary rule of agile programming is to get something simple that works, then evolve it later.

And, best of all, I think I found the cache bug! The snap dump didn't directly show me the problem, but it ruled out a number of other possibilities, and finally the (obvious all along) answer was right there, staring me in the face.

It's always a great feeling to find the bug, though I also feel moderately foolish that it eluded me for so long.

That's just the way bugs are.

No comments:

Post a Comment