Surprising Efforts: Debug vs Test vs Fix

Published: by

In the last article on serverless, I referenced the old ad in the New York City subways for a trade school. Their tagline was similar to, "technicians will always be needed, because things always will break."

We technologists are familiar - intimately - with fixing broken things. Sometimes, it is our own software, devices or infrastructure; other times, it is someone else's. Either we have become responsible for it, or we need it to work under certain circumstances where it simply fails.

This is not necessarily bad. Much of the open-source ecosystem has evolved precisely due to the willingness (or necessity) to fix other people's stuff. Look at any project on github, the most active open-source repository, and you will see enormous numbers of "Pull Requests", actual fixes for issues or additions of feature to someone else's software.

One of the interesting quirks of building technology is that there is a perverse relationship between the time it takes to find the source of a problem ("debug"), the time it takes to define a test to catch this issue in the future ("test"), and the time it takes to fix the issue ("fix").

Most people tend to think that software is like a bridge. When it is broken, a civil engineer takes 1-2 days to figure out the problem, a few more days to design the fix, and then the contractors weeks or months to implement it.

Software, however, is the other way around. As a general rule of thumb:

Debug >>> Test >>> Fix

It often requires a significant amount of brainpower and effort to reason about the problem, propose multiple hypotheses, and test each one until isolating the root cause. It then takes less effort, but still some real work, to create new tests that will create the scenarios in which the technology breaks down and test for it. Finally, the actual fix itself often is a small bit of change or code.

Here is one salient example.

As part of my participation in the open-source community, I have released elements of useful software under quite liberal usage licenses. Sometimes these are the result of work at a client, wherein they agree to the mutual benefits of releasing the products in a free open-source fashion (why they do so is a subject for another day).

Last week, a gentleman from a California startup who lives in Argentina submitted a "Pull Request" to fix an issue with one of these products.

  • The amount of time it took to reason about and find the problem: sufficient.
  • The number of lines of code required to test the issue: 26.
  • The number of lines of code required to fix the issue: 1.

Actually, it was even less than one, as it required moving just two characters (as in "letters", not "I am quite the character") 6 positions to the left. That was it.

Similarly, last year, I was working with a client who required certain network behaviours. For reasons unknown, they could not get the dynamic network behaviour they wanted, and so statically defined the configuration. This was more rigid and brittle, tied down the operations, required downtime to change, and risked cross-service pollution.

In helping find he problem, it required precisely a 2-line change in code. The test itself? almost 50 lines. Reasoning about the problem and testing for it? Almost 2 days.

The interesting insight, however, is the lessons to be learned.

Software engineers, managers and executives, and especially those without hard experience in technology, tend to estimate the amount of time required to build a product, or add features. Those with more experience allocate a "bug budget", some amount of time allotted to the engineers to repair bugs.

Even among those, very few budget the time according to the real effort to reason+test+fix. At best, it is test+fix, with some assumption about parity between those efforts. Extremely rarely have I seen budgeting for the full recognition of "debug>>test>>fix".

When allocating time to fix bugs - as the ads said, "things always break" - it is important to recognize that the difficult and time-consuming part is not fixing the problem, but rather developing tests for the problem, and even more so reasoning about the causes and testing those hypotheses.

Summary

Are your issues and bugs backing up? Is your "fix budget" never sufficient? It might be because it truly isn't large enough, or it might be because you need new methodologies to calculate what that "fix budget" should be. Ask us to help.