Un-Reviewed Code Is Tech Debt

May 31

Reading code is hard. It’s one of those universal truths is that regardless of language or platform, writing code is always easier than reading it. Joel on Software has a now 24-year old post on this topic which is still as relevant now as it was at the turn of the century: Reading Code is Like Reading the Talmud.

Mostly this is due to lack of context, there’s so much stuff around any piece of code that you really need to hold in your head for this part to make any sense. All the really useful code reviews I’ve been a part of involve asking the original developer questions: “whats going on here”, “why did you make this decision”, and so on. These don’t always result in changes, but they’re necessary to make sure the context is shared amongst everyone.

One of the reasons I tend to be millitant about software comments is to try and get that initial programmer context preserved with the code itself—to run with Joel’s Talmud analogy, to see if we can include the first batch of commentary along with the source right at the start.

Which brings us to code reviews. I’ve been thinking about code reviews a lot lately, for various reasons. One of the things I keep thinking about is that I can’t believe how hard to do well they still are, but they’re hard for many of the same reasons that reading code is hard. But of course, thats one of the reasons they’re so important: not only to “make sure the code is right”, but also to spread that context around to the team.

I’ve taken to thinking of un-reviewed code as a kind of tech debt—context debt, if you will. And this is the worst kind of debt, in that it’ll build up in the background while you’re not paying attention, and then a couple people get promoted or leave, and you realize you have a whole-ass application that no one on the team has ever seen the insides of. This is kind of rephrasing the “bus factor” problem, but I like treating it as a class of debt because it gives us an existing framework to pay it down.

But that doesn’t solve the basic problem that code review is hard to do, and most of our tools don’t really help. I mean, one of the reasons XP went all-in on pair programming is that was easier than figuring out how to make code easier to read and reason about.

And so given all that, I’ve also been stewing on how it’s very (not) funny to me that we keep finding new ways to replace “writing code” with “code review.”

One of them is that on top of all the other reasons not to let the Plagiarism Machine condense you some code out of the æther, is that now you still have to review that code, but that original context not only isn’t available, it doesn’t even exist. So code reviews become even more important at the same time as they get impossibly harder. Sort of instant-deploy tech debt. It’s the copy-paste from Stack Overflow, only amped way up. But, okay, that’s this new toy burning off the fad, hopefully people will knock that off at some point.

The thing I’ve really been thinking about is all that un-reviewed code we’ve been dragging around in the form of open source libraries. This musing, of course, brought to you by that huge near-miss last month (Did 1 guy just stop a huge cyberattack?), along with the various other issues going on over in NPM, PyPy, and then the follow-up discussion like: Bullying in Open Source Software Is a Massive Security Vulnerability

I mean, this whole thing should be a real wakeup call to the entire OSS world in a “hang on, what the hell are we doing” sort of way. Turns out that sure, with enough eyes all bugs are shallow, but you still have to have someone look. And the fact that it was a guy from Microsoft who found the bug? Because something was too slow? Delicious, but terrifying.

Everyone links to the xkcd about Dependencies with a sort of head-shake “that’s just how it is”. But sooner or later, that guy is going to leave, or need better insurance. You might not be paying the volunteers, but you can bet someone else would be willing to.

Like all of us, I wonder how many of these are out there in the wild? I’m glad I don’t run a Software Dev team that handles sensitive data currently, because at this point you have to assume any FOSS package has a >0% chance of hosting something you don’t want running on your servers.

And to bring it back around to the subject at hand, the real solution is “we need a way to audit and review open source packages”, but after a generation of externalizing that cost, no one even knows how to do that?

But what would I be doing if I was still in charge of something that handled PHI or other sensitive or valuable data? My initial reaction was I’d be having some serious conversations about “what would it take to remove all the open source. No, give me an estimate for all of it. All.”

(And to be clear, it’s not like commercial software is immune either, but that’s a different set of risk vectors and liability.)

I’d want a list of all the FOSS packages in the system, sorted into these buckets:

Stuff we’re barely using, that we could probably replace in a day or two. (The CSV formatter library that we only use to write one file in one place.)
Bigger things that we’re using more of, but could still get our arms around what a replacement looks like. (We pulled in Apache Commons collections because it was easy to use, but we’re using less than 10% of it.)
Big, foundational stuff: Spring, Tomcat, Linux, language standard libraries. Stuff you aren’t going to rewrite.

That third category needs a light audit to make sure there’s an actual entity in charge of it with safety practices and the like. Probably a conversation with legal about liability and whatnot.

But for those first two buckets, I’d want to see an estimated cost to replace. And then I want to see a comparison of “how many hours of effort converted to salary dollars” vs “worst-case losses if our severs got p0ned”. Because the hell of it is, those numbers probably make it a slam dunk to do the rewrite.

But look! I’m doing that same fallacy—it’s easier to write than review, so let’s just rewrite it. And this has been sitting in the drafts folder for a month now because… I don’t know!

The current situation seem untenable, and all the solutions seem impossible. But that review debt is still there, waiting.

techwreckage of the unsustainablesoftware forestry

Gabriel L. Helman

Un-Reviewed Code Is Tech Debt

Under Attack, Please Stand By

Books That Need Updates