How regressions happen

Working with software is frustrating, and working on software doubly so, because things up and break. Seemingly without warning or cause. One day the software is fine, the next day it flat-out doesn’t work. Even worse, one day it has this one bug or missing feature and the next day the bug is fixed or feature added but there’s all new bugs. What gives?

I was finishing a new feature a couple weeks ago. The project was to the point of functional completion, but still needed QA review to find all the little missing details and product review to find any lurking inconsistencies or oddities. Over the course of a few days, I worked with my brave and intrepid teammates to get our software from a state of optimistic, abstract readiness to a more specific, concrete readiness to ship. As we worked through various scenarios, it often came up that something worked on Monday when Alice tried it but didn’t work Tuesday when Bob tried it.

And thus, I was compelled to try and sheepishly explain why this kind of thing happens so frequently in the process of producing software.

In short, software is a mess of interconnections, ripple effects, and unintended consequences. Largely, software is a bunch of informal systems interacting with other dynamic systems interacting with humans.

Suppose a computer as sophisticated as an original iPhone. Given its processor and memory size, it can be in more distinct states than there are stars in the sky. More atoms than there are on earth, if you consider the storage.

Say, unscientifically, that 10% of those states result in your program doing the right thing and 90% result in bugs or indecipherable garbage. Because web/mobile app developers don't work in formalisms, it's impossible to say which of those states our program can be in, and whether any given code change can take you from 10% of the good states to 90% of the bad states.

Each time you change code, whether its adding a conditional to prevent a bug or adding new behavior for a wholly new feature, you're getting an entirely new "path" through the good/bad states. Some changes are more chaotic than you’d think; little change, bit outcome. And hence, sometimes a thing works fine, you make a change to fix something unrelated, and boom! you're in a bad state.

To make matters worse, these states, the good and bad ones, aren’t determined in isolation. They’re interdependent. A small change in how you go from the 437th state to the 438th could result in an unexpected foray into the land of bad states.

It’s easy to think that the miracle is not that we’ve managed to arrange rocks and sand into microprocessors that seem like they can think, but that code we run on those rocks and sand ever worked in the first place!

Adam Keys @therealadam