How many of us have logged or seen a bug like this?
"System does X randomly."
Oh boy. These are fun for a lot of reasons, but today let's just talk about that one word "randomly". What on earth does something happens "randomly"? Simple:
Almost everything has a pattern. It has one or more sets of circumstances under which it occurs. Sometimes those are very subtle. Maybe, for example, random means "only when the memory returns a single bit error on a specific process that is attempting a write to a bad sector on the disk." That's really rare and very subtle, but it's a pattern.
You may not need to know the full pattern to fix a bug. Often it's helpful, and usually having part or all of the pattern will help know that a bug is fixed. However, sometimes you're not going to know all the circumstances. You may get part of them, as in "it fails 60% of the time when X", and that's okay. All you need is enough of the pattern to create and confirm a fix.
Don't fear random. It's not good to have a lot of randomness; it indicates a lack of understanding. However, at some point, randomness is still okay; when you have enough to fix and verify, then remaining randomness is acceptable. In other words, define it, don't fret.
2 comments: