Software Forestry 0x07: The Trellis Pattern
In general, rewriting a system from scratch is a mistake. There are times, however, where replacing a system makes sense.
You’ve checked off every type in the Overgrowth Catalogue, entropy is winning, and someone does a back of the envelope cost analysis and finally says “you know, I think it really would be cheaper to replace this thing.”
Part of what makes this tricky is that presumably, this system has customers and is bringing in revenue. (If it didn’t you could just shut it down and build something new without a hassle.) And so, you need to build and deploy a new system while the old one is still running.
The solution is to use the old system as a Trellis.
The ultimate goal is to eventually replace the legacy system, but you can’t do it all at once. Instead, use the existing system as a Trellis, supporting the new system as it grows along side.
This way, you can thoughtfully replace a part at a time, carefully roll it out to customers, all while maintaining the existing—and working—functionality. The system as a whole will evolve from the current condition to the improved final form.
As you work through each capability, you can either use the legacy system as a blueprint to base the new one on, or harvest code for re‐use.
The great thing about a Trellis is that the Trellis and the Tree are partners. They have the same goals: for the tree to outgrow the trellis. But the trellis can be an active partner. I have a tree right now that still has part of a trellis holding up some branches. It’s one of the those trees that got a little too big a little too fast, put on a little too much fruit. A few years ago it had supports and trellises all around, now it’s down to just one whimsically-leaning stick. If things go well, I’ll be able to pull that out next summer.
The old system isn’t abandoned, it transitions into a new role. You can add features to make it work better as a trellis: an extra API endpoint here, a data export job there. The new system calls into the old one to accomplish something, and then you throw the switch and the old system starts calling into the new one to do that same thing.
Eventually the new system pulls away from the Trellis, grows beyond it. And if you do it right, the trellis falls away when its job is done. Ideally, in such a way that you could never tell it was there in the first place.
Sometimes, you can schedule the last switchover and have a big party when you turn the old system off. But if you’re really lucky, there comes a day where you realize the old load balancer crashed a month ago and no one noticed.
There’s a social & emotional aspect to this as well, which goes almost entirely undiscussed.
If we’re replacing a system, it’s probably been around for a while. We’re probably talking about systems with a decade+ of run time. The original architects may have moved on, but there are people who work on it, probably people who have built a whole career out of keeping it running.
There’a always some emotions when it comes time to start replacing the old stuff. Some stereotypes exist for a reason, and the sorts of people who become successful software engineers or business people tend to be the sorts of folks for whom “empathy” was their dump stat. There’s something galling about the newly arrived executive talking about moving on from the old and busted system, or the new tech lead espousing how much better the future is going to be. The old system may have gotten chocked out by overgrowth, left behind by the new growth of the tech industry, but it’s still running, still pulling in revenue. It’s paying for the electricity in the projector that’s showing the slide about how bad it is. It deserves respect, and so do the people who worked on it.
Thats the point: The Trellis is a good thing, it’s positive. The old system and—the old system’s staff—have a key role to play. Everyone is on the same team, everyone has the same goal.
There’s an existing term that’s often used for a pattern similar to this. I am, of course, talking about the Strangler Fig pattern. I hate this term, and I hate the usual shorthand of “strangler pattern” even more.
Really? Your mental model for biring in new software is that it’s an invasive parasite that slowly drains nutrients and kills its host? There are worse ways to go than being strangled, but not by much.
This isn’t an isolated style of metaphor, either. I used to work with someone—who was someone I liked, by the way—who used to say that every system being replaced needed someone to be an executioner and an undertaker.
Really? Your mental model for something ending is violent, state-mandated death?
If Software Forestry has a central thesis, it is this: the ways we talk about what we do and how we do it matter. I can think of no stronger example of what I mean than otherwise sane professionals describing their work as murdering the work of their colleagues, and then being surprised when there’s resistance.
What we do isn’t violent or murderous, it is collaborative and constructive.
What I dislike the most about Strangler Figs, though, is that a Strangler Fig can never exceed the original host, only succede. The Fig is bound to the host forever, at first for sustenance, and then, even after the host has died and rotted away, the Fig has an empty space where the host once stood, a ghost haunting the parasite that it can never fully escape from.
So if we’re going to use a real tree as our example for how to do this, let’s use my favorite trees.
Let me tell you about the Coastal Redwoods.
The Redwood forest is a whole ecosystem to itself, not just the trees, but the various other plants growing beneath them. When a redwood gets to the end of its life, it falls over. But that fallen tree then serves as the foundation to a whole new mini-ecosystem. The ferns and sorrel cover the fallen trunk. Seedlings sprout up in the newly exposed sunlight. Burls or other nodes sprout new trees from the base of the old, meaning maybe the tree really didn’t die at all, it just transitioned. From one tree springs a whole new generation of the forest.
There are deaths other than murder, and there are endings other than death.
Let’s replace software like a redwood falling; a loud noise, and then an explosion of new possibilities.
Software Forestry 0x06: The Controlled Burn
Did you know earthworms aren’t native to North America? Sounds crazy, but it’s true; or at least it has been since the glaciers of the last ice age scoured the continent down to the bedrock and took the earthworms with them. North America certainly has earthworms now, but as a recently introduced invasive species. (For everyone who just thought “citation needed”, Invasive earthworms of North America.)
As such, the biomes North America have very different lifecycles than their counterparts in Eurasia do. In, say, a Redwood Forest, organic matter builds up in a way it doesn’t across the water. Things still rot, there’s still fungus and microbes and bugs and things, but there isn’t a population of worms actively breaking everything down. The biomass decays slower. Some buildup is a good thing, it provides a habitat for smaller plants and animals, but if it builds up too much, it can start choking plants out before it can break down into nutrients.
So what happens is, the forest catches on fire. In a forest with earthworms, a fire pretty much always a bad thing. No so much in the Redwoods, or other Californian forests. The trees are fire resistant, the fire clears away the excess debris, frees those nutrients, and many species of cone-bearing conifer trees—redwoods, pines, cypresses, and the like—have what are called “serotinous” cones, which means they only germinate after a fire. Some are literally covered in a layer of resin that has to melt off before the seeds can sprout. The fire rips though, clears out the debris, and the new plants can sprut in the newly fertilized ground. Fire isn’t a hazard to be endured, it’s been adopted as a critical part of the entire ecosystem’s lifecycle.
Without human intervention, fires happen semi-regularly due to lighting. Of course, that’s a little unpredictable and doesn’t always turn out great. But the real problem is when humans prevent fires from taking hold, and then no matter how much you “sweep the forest,” the debris and overgrowth builds up and builds up, until you get the really huge fires we’ve been having out here.
The people who used to live here (Before, ahh… a bunch of other people “showed up and took over” who only knew how to manage forests with earthworms) knew what the solution was: the Controlled Burn. You choose a time, make some space, and carefully set the fire, making sure it does what it needs to do in the area you’ve made safe, but keep it out of places where the people are. In CA at least, we’re starting to adopt controlled burns as an intentional management technique again, a few hundred years later. (The biology, politics, history, and sociology of setting a forest on fire on purpose are beyond our scope here, but you get the general idea.)
I think a lot of Software Forests are like this too.
Every place I’ve ever worked has struggled with figuring out how to plan and estimate ongoing maintenance outside of a couple of very narrow cases. If it’s something specific, like a library upgrade, or a bug, you can usually scope and plan that without too much trouble. But anything larger is a struggle, because those larger maintenance and care efforts are harder to estimate, especially when there isn’t a specific & measurable customer-facing impact. You don’t have a “thing” you can write a bug on. You don’t know what the issues are, specifically, it’s just acting bad.
The problem requires sustained focus, the kind that lasts long enough to actually make a difference. And that’s hard to get.
One of the reasons why Cutting Trails is so effective is that it doesn’t take that much more time than the work the trail is being cut towards. Back when estimating via Fibonacci Sequence was all the rage, the extra work to cut the trail usually didn’t get you up to the next fibonacci number.
Furthermore, the effort to get in and actually estimate and scope some significant maintenance work is often more work than the actual changes. It’s wasteful to spend a week investigating and then write up a plan for someone to do later. You’re already in there!
Finally, rarely is there a direct advocate. There’s nearly always someone who acts as the Voice of the Customer, or the Voice of the Business, but very rarely is anyone the Voice of the Forest.
(I suspect this is one of the places where agile leads us astray. The need to have everything be a defined amount of work that someone can do in under a week or two makes it incredibly easy to just not do work that doesn’t lend itself to being clearly defined ahead of time.)
So the overgrowth and debris builds up, and you get the software equivalent of an unchecked forest fire: “We need to just rewrite all of this.”
No you don’t! What you need are some Controlled Burns.
It goes like this:
Most Forests have more than one application, for a wide definition of “application.” There’s always at least one that’s limping along, choked with Overgrowth. Choose one. Find a single person to volunteer. (Or get volun-told.) Clear their schedule for a month. Point them at the app with overgrowth and let them loose to fix stuff.
We try to be process-agnostic here at Software Forestry, but we acknowledge most folks these days are doing something agile, or at least agile adjacent. Two-week sprints seems to have settled as the “standard” increment size; so a month is two sprints. That’s not nothing! You gotta mean it to “lose” a resource for that much time. But also, you should be able to absorb four weeks of vacation in a quarter, and this is less disruptive than that. Maybe schedule it as one sprint with the option to extend to a second depending on how things look “next week.”
It helps, but isn’t mandatory, to have success metrics ahead of time. Sometimes, the right move is to send the person in there and assume you’ll find something to paint a bullseye around. But most of the time you’ll want to have some kind of measurement you can do a before-and-after comparison with. The easiest ones are usually performance related, because you can measure those objectively, but probably aren’t getting handled as part of the normal “run the business.” Things like “we currently process x transactions per second, we need to get that to 2x,” or “cut RAM use by 10%,” or “why is this so laggy sometimes?”
I did a Controlled Burn once on a system that needed to, effectively, scan every record in a series of database tables to check for things that needed to be deleted off of a storage device. It scanned everything, then started over and scanned everything again. When I started, it was taking over a day to get through a cycle, and that time was increasing, because it wasn’t keeping up with the amount of new work sliding in. No one knew why it took that long, and everyone with real experience with that app was long gone from the company. After a month of dedicated focus, it got through a cycle in less than two hours. Fixed a couple bits of buggy behavior while I was at it. No big changes, no re-architecture, no platform changes, just a month of dedicated focus and cleanup. A Controlled Burn.
This is the time to get that refactoring done—fix that class hierarchy, split that object into some collaborators. Write a bunch of tests. Refactor until you can write a bunch of tests. Fix that thing in the build process everyone hates. Attach some profilers and see where the time is going.
Dig in, focus, and burn off as much overgrowth as you can. And then leave a list of things to do next time. You should know enough now to do a reasonable job scoping and estimating the next steps, write those up for the to do list. Plants some seeds for new growth. You shouldn’t have to do a Controlled Burn on the same system twice.
Deploying this kind of directed focus can be incredibly powerful. The average team can absorb maybe one or two of these a year, so deploy them with purpose.
Sometimes, all the care in the world won’t do the trick, and you really do need to replace a system. Next time: The Trellis Pattern
Software Forestry 0x05: Cutting Trails
The lived reality of most Software Foresters is that we spend all our time with large systems we didn’t see the start of, and won’t see the end of. There’s a lot of takes out there about what makes a system “Legacy” and not “just old”, but one of the key things is that Legacy Systems are code without continuity of philosophy.
Because almost certainly, the person who designed and built in the first place isn’t still here. The person that person trained probably isn’t still here. Given the average tenure time in tech rolls, it’s possible the staff has rolled over half-a-dozen or more times.
Let me tell you a personal example. A few lifetimes ago, I worked on one of these two-decade old Software Forests. Big web monolith, written in what was essentially a homebrew MVC-esque framework which itself was written on top of a long-deprecated 3rd party UI framework. The file layout was just weird. There clearly was a logic to the organization, but it was gone, like tears in the rain.
Early on, I had a task where I needed to add an option to a report generator. From the user perspective, I needed to add an option to a combobox on a web form, and then when the user clicked the Generate button, read that new option and punch the file out differently.
I couldn’t find the code! I finally asked my boss, “hey, is there any way to tell which file has which UI pages?” The response was, “no, you just gotta search.”
As they say, Greppability Is an Underrated Code Metric.
(Actually, I’ve worked on two big systems now where I had essentially this exact conversation. The other one was the one where one of the other engineers described it as having been built by “someone who knew everything about how JSP tags worked except when to use them.”)
So you search for the distinctive text in the button, or the combo box, or something on the page. You find the UI. Then you start tracing in. Following the path of execution, dropping into undocumented methods with unhelpful names, bouncing into weird classes with no clear design, strange boundaries, one minute a function with a thousand lines, the next minute an inheritance hierarchy 10 levels of abstraction deep to call one line.
At this point you start itching. “I could rewrite all of this,” you think. “I could get this stood up in a weekend with Spring Boot/React/Ruby on Rails/Elixir/AWS Lambdas/Cool Thing I Used Last”. You start gazing meaningfully at the copy of the Refactoring book on the shelf. But you gotta choke down that urge to rebuild everything. You have a bug you have to fix or a feature to deploy. But it’s not actually going to get better if you keep digging the hole deeper. You gotta stop digging, and leave things better for the next person.
You need to Cut a Trail.
1. Leave Trail Markers.
First thing is, you have to figure out what it does now before you change anything. In a sane, kind world, there would be documentation, diagrams, clear training. And that does happen, but very, very rarely. If you’re very lucky, there’s someone who can explain it to you. Otherwise, you have to do some Forensic Architecture.
Talk to people. Add some logging. Keep clicking and watching what it does. Step through it in a debugger, if you can and if that helps, although I’ve personally found that just getting a debugger working on a live system is often times more work than is worth it for the information you get out of it. But most of all, read. Read the code closely, load as much of that system’s state and behavior into your mind as you can. Read it like you’re in High School English and trying to pull the symbolism out of The Grapes or Wrath or The Great Gatsby. That weird function call is the light at the end of the pier, those abstractions are the eyes on the billboard—what do they mean? Why are they here? How does any of this work?
You’ll get there, that’s what we do. There’s a point where you’ll finally understand enough to make the change you want to make. The trick is to stop at this point, and write everything down. There’s a “haha only serious” joke that code comments are for yourself in six months, but—no. Your audience here is you, a week ago. Write down everything you needed to know when you started this. Every language has a different way to do embedded documentation or comments, but they all have a way to do it. Document every method or function that your explored call path went through. Write down the thing you didn’t understand when you started, the strange overloaded behavior of that one parameter, what that verb really means in the function name, as much as possible, why it does what it does.
Take an hour and draw the diagram you wish you’d had. Write down your notes. And then leave all that somewhere that other people can find it. If you’re using a language where the imbedded documentation system can pull in external files, check that stuff right on in to the codebase. Most places have an internal wiki. Make a page for you team if there isn’t one. Under that, make a page for the app if it doesn’t have one. Then put all that you’ve learned there.
Something else to make sure to document early on: terminology. Everyone uses the same words to mean totally different things. My personal favorite example: no two companies on earth use the word “flywheel” the same way. It doesn’t matter what it was supposed to mean! Ask. Then write it down. The weird noun you tripped over at the start of this? Put the internal definition somewhere you would have found it.
People frequently object that they don’t have the time to do this, to which I say: you’ve already done the hard part! 90% of the time for this task was figuring it out! Writing it down will take a fraction of the time you already had to spend, I promise. And when you’re back here in a year, the time you save in being able to reload all that mental state is going to more than pay for that afternoon you spent here.
2. Write Tests.
Tests are really underrated as a documentation and exploration technique. I mean, using them to actually test is good too! But for our purposes we’re not talking about a formal TDD or Red-Green-Refactor–style approaches. That weird function? Slap some mocks and stubs together and see what it does. Throw some weird data at it. You’re goal isn’t to prove it correct, but to act like one of those Edwardian Scientists trying to figure out how air works.
Another Forest I inherited once, which was a large app that customers paid real money to use, had a test suite of 1 test—which failed. But that was great, because there was already a place to write and run tests.
Tests are a net benefit, they don’t all have to be thorough, or fall into strict unit/integration/acceptance boundaries. Sometimes, it’s okay to put a couple of little weird ones in there that exist to help explain what some undocumented code does.
If you’re unlucky enough to run into a Forest with no test runner, trust me, take the time to bolt one on. It doesn’t have to be perfect! But you’ll make that time back faster than you’d believe.
When you get done, in additional to whatever “normal” Unit or Integration tests your process requires or requests, write a really small test that demonstrates what you had to do. Link that back to the notes you wrote, or the documentation you checked in.
3. A Little Cleanup, some Limited Refactoring
Once you have it figured out, and have a test or two, there’s usually two strong responses: either “I need to replace all of this right now”, or “This is such a mess it’ll never get better.”
So, the good news is that both of those are wrong! It can get better, and you really probably shouldn’t rework everything today.
What you should do a little cleanup. Make something better. Fix those parameter names, rename that strangely named function. Heck, just fix the tenses of the local variables. Do a little refactoring on that gross class, split up some responsibilities. It’s always okay to slide in another shim or interface layer to add some separation between tangled things.
(Don’t leave a huge mess in the VC diff, though, please.)
Leave the trail a little cleaner than when you found it. Doesn’t have to be a lot, we don’t need to re-landscape the whole forest today.
4. Write the New Stuff Right (as possible)
A lot of the time, you know at least one thing the original implementors didn’t: you know how the next decade went. It’s very easy to come in much later, and realize how things should have been done in the first place, because the system has a lot more years on it now than it used to. So, as much as you can, build the new stuff the right way. Drop that shim layer in, encapsulate the new stuff, lay it out right. Leave yourself a trail to follow when you come back and refactor the rest of it into shape.
But the flip side of that is:
5. Don’t be a Jerk About It
Everyone has worked on a codebase where there’s “that” module, or library, or area, where “that guy” had a whole new idea about how the system should be architected, and it’s totally out of place with everything else. A grove of palm trees in the middle of a redwood forest. Don’t be that guy.
I worked on an e-commerce system once where the Java package name was something like com.company.services.store
, and then right next to it was com.company.services.store2
. The #2 package was one former employee’s personal project to refactor the whole system; they had left years before with it half done, but of course it was all in production, and it was a crapshoot which version other parts of the system called into. Don’t do that.
After you’re gone, when someone looks at the version control change log for this part of the system, you want them to see your name and think “oh, this one had the right idea.”
Software Forestry is a group project, for the long term. Most of the time, “consistency and familiarity” are more valuable than some kind of quixotic quest for the ideal engineering. It’s okay, we’ll get there. Keep leaving it better than you found it. It’ll be worth it.
You’ll be amazed what your overgrown codebase looks like after a couple months of doing this. That tangled overgrowth starts to look positively tidy.
But sometimes, just cutting trails doesn’t get you there. Next Time: The Controlled Burn.
Software Forestry 0x04: Library Upgrade Week
Here at Software Forestry we do occasionally try to solve problems instead of just turning them into lists of smaller problems, so we’re goona do a little mood pivot here and start talking about how to manage some of those forms of overgrowth we talked about last time.
First up: let me tell you the good news about Library Upgrade Week.
Just about any decently sized software system uses a variety of third party libraries. And why wouldn’t you? The multitude of high-quality libraries and frameworks out there is probably the best outcome of both the Open Source movement and Object-Oriented Software design. The specific mechanics vary between languages and their practitioner’s cultures, but the upshot is that very rarely does anyone build everything from scratch. There’s no need to go all historical reenactment and write your own XML parser.
Generally, people keep those libraries pinned and stay on one fixed version, rather than schlepping in a new version every-time an update happens. This is a good thing! Change is risk, and risk should be taken on intentionally. The downside is that those libraries keep moving forward, the version you’re using slips out of date, and now you have a bunch of Overgrowth. And so that means you need to upgrade them.
The upshot of all that is on a semi-regular basis, we all need to slurp in a bunch of new code no-one on the payroll wrote, and don’t really know how to test. Un-Reviewed Code Is Tech Debt, and one of the mantras of writing good tests is “don’t test the framework”, so this is always a little iffy.
It’s incredibly easy to just keep letting those weeds grow a little longer. “The new version doesn’t have anything we need”, “there’s no bugs”, “if it ain’t broke don’t fix it”, and so on. They always take too long, don't usually deliver immediate gratification, and are hard to schedule. It’s no fun, and no one likes to do it.
The trick is to turn it into a party.
It works like this: set aside the last week of the quarter to concentrate on 3rd party library upgrades. Regardless of what your formal planning cycle or process is, most businesses tends to operate on quarters, and there’s usually a little dead time at the end of the quarter you can repurpose.
The Process:
Form squads. Each squad is a group of like-minded individuals focused on a single 3rd party library or framework. Squads are encouraged to be cross-team. Each squad will focus on updating that 3rd party library in all applications or places where it is used. The intent is to make this a group event, where people can help each other out. Participation is not mandatory.
Share squad membership and goals ahead of time. Leadership should reserve the right to veto libraries as “too scary” or “not scary enough”. Libraries with a high severity alerts or known CVE are good candidates.
That week, each squad self organizes and works as a group through any issues caused by the upgrade. Other than major outages or incidents, squad members should be excused from other “run the business” type work for that week; or rather, the library upgrades are “the business.” Have fun!
On that Friday hold the Library Upgrade Week Show-n-Tell. Every team should demo what they did, how they did it, and what it took to pull it off. Tell war stories, hold a happy hour, swap jokes. If a squad doesn’t finish that's okay! The expectation is that they’ll have learned a lot about what it'll take to finish, and that work will be captured in the relevant team’s todo lists. If you’re in a process with short develop-deploy increments (like sprints) you can make the library upgrade(s) a release on its own. Ideally you already have a way to sign off a release as not containing regressions, and so a short release with just a library upgrade is a great way to make sure you didn’t knock some dominos over.
But wait! There's more! All participants will vote on awards to give to squads, for things like:
- Error message with least hits on Stack Overflow
- Largest version number jump
- Most lines changed
- Fewest lines changed
- Best team name
- Best presentation
Go nuts! Have a great time!
Yes, it’s a little silly, but that’s the point. I’ve deployed a version of this at a couple of jobs now, and it’s remarkable how effective it is. The first couple of cycles people hit the “easy” ones—uprev the logging library or a JSON parser or something. But then, once people know that Library Upgrade Week is coming, they start thinking about the harder stuff, and you start getting people saying they want to take a swing at the main framework, or the main language version, or something else load-bearing. It’s remarkable how much progress two or three people can make on a problem that looks unsolvable when they have an uninterrupted week to chew on it. (If you genuinely can’t spare a handful of folks to do some weeding four weeks out the the year, that’s a much larger problem than out of date libraries, and you should go solve that problem first. Like, right now.)
There’s an instinct to take the core idea to schedule this kind of maintenance for a few times a year, but leave off the part where it’s a party. This is a mistake. This is work people want to do even less than their usual work; the trick is to make everthing around it fun.
We’re Foresters, and both we and the Forest are here long term. The long term health of both depends on the care of the Forest being something that the Foresters enjoy, and it’s okay to stack that deck in your favor.
Software Forestry 0x03: Overgrowth, Catalogued
Previously, we talked about What We Talk About When We Talk About Tech Debt, and that one of the things that makes that debt metaphor challenging is that it has expanded to encompass all manner of Overgrowth, not all of which fits that financial mental model.
From a Forestry perspective, not all the things that have been absorbed by “debt” are necessarily bad, and aren’t always avoidable. Taking a long-term, stewardship-focused view, there’s a bunch of stuff that’s more like emergent properties of a long-running project, as opposed to getting spendy with the credit card.
So, if not debt, what are we talking about when we talk about tech debt?
It’s easy to get over-excited about Lists of Things, but I got into computer science from the applied philosophy side, rather than the applied math side. I think there are maybe seven categories of “Overgrowth” that are different enough to make it useful to talk about them separately:
1. Actual Tech Debt.
Situations where an explicit decision to do something “not as good” to ship faster. There’s two broad subcategories here: using a hacky or unsustainable design to move faster, and cutting scope to hit a date.
In fairness, the original Martin Fowler post just talks about “cruft” in a broad sense, but generally speaking “Formal” (orthodox?) tech debt assumes a conscious choice to accept that debt.
This is the category where the debt analogy works the best. “I can’t buy this now with cash on hand, but I can take on more credit.” (Of course, this also includes “wait, what’s a variable rate?”)
In my experience, this is the least common species of Overgrowth, and the most straightforwardly self correcting. All development processes have some kind of “things to do next” list or backlog, regardless of the formal name. When making that decision to take on the debt, you put an item on the todo list to pay it off.
That list of cut features becomes the nucleus of the plan for the next major version, or update, or DLC. Sometimes, the schedule did you a favor, you realize it was a bad idea, and that cut feature debt gets written off instead of paid off.
The more internal or infrastructure–type items become those items you talk about with the phrase “we gotta do something about…”; the logging system, the metrics observability, that validation system, adding internationalization. Sometimes this isn’t a big formal effort, just a recognition that the next piece of work in that area is going to take a couple extra days to tidy up the mess we left last time.
Fundamentally, paying this off is a scheduling and planning problem, not a technical one. You had to have some kind of an idea about what the work was to make the decision to defer the work, so you can use that same understanding to find it a spot on the schedule.
That makes this the only category where you can actually pay it off. There’s a bounded amount of work you can plan around. If the work keeps getting deferred, or rescheduled, or kicked down the road, you need to stop and ask yourself if this is actually debt or a something asperational that went septic on you.
2. We made the right decision, but then things happened.
Sometimes you make the right decisions, don’t choose to take on any debt, and then things happen and the world imposes work on you anyway.
The classic example: Third party libraries move forward, the new version isn’t cleanly backwards compatible, and the version you’re using suddenly has a critical security flaw. This isn’t tech debt, you didn’t take out a loan! This is more like tech property taxes.
This is also a planning problem, but tricker, because it’s on someone else’s schedule. Unlike the tech debt above, this isn’t something you can pay down once. Those libraries or frameworks are going to keep getting updated, and you need to find a way to stay on top of them without making it a huge effort every time.
Of course, if they stop getting updated you don’t have an ongoing scheduling problem anymore, but you have the next category…
3. It seemed like a good idea at the time.
Sometimes you just guess wrong, and the rest of the world zigs instead of zags. You do your research, weigh the pros and cons, build what you think is the right thing, and then it’s suddenly a few years later and your CEO is asking why your best-in-class data rich web UI console won’t load on his new iPad, and you have to tell him it’s because it was written in Flash.
You can’t always guess right, and sometimes you’re left with something unsupported and with no future. This is very common; there’s a whole lot of systems out there that went all-in on XML-RPC, or RMI, or GWT, or Angular 1, or Delphi, or ColdFusion, or something else that looked like it was going to be the future right up until it wasn’t.
Personally, I find this to be the most irritating. Like Han Solo would say, it’s not your fault! This was all fine, and then someone you never met makes a strategic decision, and now you have to decide how or if you’re going to replace the discontinued tech. It’s really easy to get into a “if it ain’t broke don’t fix it” headspace, right up until you grind to a halt because you can’t hire anyone who knows how to add a new screen to the app anymore. This is when you start using phrases like “modernization effort”.
4. We did the best we could but there are better options now.
There’s a lot more stuff available than there used to be, and so sometimes you roll onto a new project and discover a home-brew ORM, or a hand-rolled messaging queue, or a strange pattern, and you stop and realize that oh wait, this was written before “the thing I would use” existed. (My favorite example of this is when you find a class full of static final constants in an old Java codebase and realize this was from before Java 5 added enums.)
A lot of the time, the custom, hand-rolled thing isn’t necessarily “worse” than some more recent library or framework, but you have to have some serious conversations about where to spend your time; if something isn’t your core business and has become a commodity, it’s probably not worth pouring more effort in to maintaining your custom version. Everyone wants to build the framework, but no one really wants to maintain the framework. Is our custom JSON serializer really still worth putting effort into?
Like the previous, it’s probably time to take a deep breath and talk about re-designing; but unlike the previous, the persion who designed the current version is probably still on the payroll. This usually isn’t a technical problem so much as it is a grief management one.
5. We solved a different problem.
Things change. You built the right thing at the time, but now we got new customers, shifted markets, increased scale, maybe the feds passed a law. The business requirements have changed. Yesterday, this was the right thing, and now it isn’t.
For example: Maybe you had a perfectly functional app to sell mp3 files to customers to download and play on their laptops, and now you have to retrofit that into a subscription-based music streaming platform for smartphones.
This is a good problem to have! But you still gotta find a way to re-landscape that forest.
6. Context Drift.
There’s a pithy line that Legacy Code is “code without tests,” but I think that’s only part of the problem. Legacy code is code without continuity of philosophy. Why was it built this way? There’s no one left who knows! A system gets built in a certain context, and as time passes that context changes, and the further away we get from the original context, the more overgrown and weedy the system appears to become. Tests—good tests—are one way to preserve context, but not the only way.
A whole lot of what’s called “cruft” is here, because It’s harder to read code than to write it.. A lot of that “cruft” is congealed knowledge. That weird custom string utility thats only used the one place? Sure, maybe someone didn’t understand the standard library, or maybe you don’t know about the weekend the client API started handing back malformed data and they wouldn’t fix it—and even worse, this still happens at unpredictable times.
This is both the easiest and least glamorous to treat, because the trick here is documentation. Don’t just document what the code does, document why the code does what it does, why it was built this way. A very small amount of effort while something is being planted goes a long way towards making sure the context is preserved. As Henry Jones Sr, says, you write it down so you don’t have to remember.
To put all that another way: Documentation debt is still tech debt.
7. Not debt, just an old mistake.
The one no one likes to talk about. For whatever reason, someone didn’t do A-quality work. This isn’t necessarily because they were incompetent or careless, sometimes shit happens, you know? This the flip side of the original Tech Debt category; it wasn’t on purpose, but sometimes people are in a hurry, or need to leave early, or just can’t think of anything better.
And so for whatever the reason, the doors aren’t straight, there’a bunch of unpainted plywood, those stairs aren’t up to code. Weeds everywhere. You gotta spend some time re-cutting your trails through the forest.
As we said at the start, each of those types of Overgrowth has their own root causes, but also needs a different kind of forest management. Next week, we start talking about techniques to keep the Overgrowth under control.
Software Forestry 0x02: What We Talk About When We Talk About Tech Debt
Tell me if this sounds familiar: you start a new job, or roll onto a new project, or even have a job interview, and someone says in a slightly hushed tone, words to the effect of “Now, I don’t want to scare you, but we have a lot of tech debt.” Or maybe, “this system is all Legacy Code.” Usually followed by something like “but don’t worry! We’ve got a modernization effort!”
Everyone seems to be absolutely drowning in “tech debt”; hardly a day goes by where you don’t read another article about some system with some terrible problem that was caused by being out of date, deferred maintenance, “in debt.” We constantly joke about the fragile house-of-cards nature of basically, everything. Everyone is hacking their way, pun absolutely intended, through overgrown forests.
There’s a lot to unpack from all that. Other engineering disciplines don’t beg to rebuild their bridges or chemical plants after a couple of years, but they also don’t need to; they build them to last. How does this happen? Why is it like this?
For starters, I think this is one of those places where our metaphors are leading us wrong.
I can’t now remember when I first heard the term Technical Debt. I think it was early twenty-teens, the place I was working in the mid-aughts had a lot of tech debt but I don’t ever remember anyone using that term, the place I was working in the early teens also had a lot, and we definitely called it that.
One of the things metaphors are for is to make it easier to talk to people with a different background—software developers and business folks, for example. We might use different jargon in our respective areas of expertise, but if we can find an area of shared understanding, we can use that to talk about the same thing. “Debt” seems like a kind of obvious middle-ground: basically everyone who participates in the modern economy has a basic, gut-level understanding of how debt works.
Except, do they have the same understanding?
Personally, I think “debt” is a terrible metaphor, bordering on catastrophic. Here’s why: it has very, very different moral dimensions depending on whose talking about it.
To the math and engineering types who popularized the term, “debt” is obviously bad, bordering on immoral. They’re the kind of people who played enough D&D as kids to understand how probability works, probably don’t gamble, probably pay off their credit cards in full every month. Obviously we don’t want to run up debt! We need to pay that back down! Can’t let it build up! Queue that scene in Ghostbusters where Egon is talking about the interest on Ray’s two mortgages.
Meanwhile, when the business-background folks making the decisions about where to put their investments hear that they can rack up “debt” to get features faster, but can pay it off in their own time with no measurable penalty or interest, they make the obvious-to-them choice to rack up a hell of a lot of it. They debt-financed the company with real money, why not the software with metaphorical currency? “We can do it, but that’ll add tech debt” means something completely different to the technical and business staff.
Even worse, “debt” as a metaphor implies that it ends. In real life, you can actually pay the debt off; pay off the house, end the car payment, pay back the municipal bonds, keep your credit cards up to date, whatever. But keeping your systems “debt free” is a process, a way of working, not really something you can save up and pay off.
I’m not sure any single metaphor has done more damage to our industry’s ability to understand itself than “tech debt.”
Of course, the definiton “tech debt” expanded and has come to encompass everything about a software system that makes it hard to work on or the developers don’t like. “Cruft” is the word Fowler uses. “Tech debt”, “legacy”, “lack of maintenance” all kind of swirl into a big mish-mash, meaning, roughly, “old code that’s hard to work on.” Which makes it even less useful as a metaphor, because it covers a lot of different kinds of challenges, each of which calls for different techniques to treat and prevent. In fairness, Fowler takes a swing at categorizing tech debts via the Technical Debt Quadrant, which isn’t terrible, but is a little too abstract to reflect the lived reality.
This is a place where our Forestry metaphor offers up an obvious alternate metaphor: Overgrowth. Which gets close to heart of what the problem feels like: that we built a perfectly fine system, and now, after no action on our part, its not fine. Weeds. There’s that sense that it gets worse when you’re not looking,
There’s something very vexing about this. As Joel said: As if source code rusted. But somehow, that system that was just fine not that long ago is old and hard to work on now. We talk about maintenance, but the kind of maintenance a computer system needs is very different from a giant engine that needs to get oiled or it’ll break down.
I think a big part of the reason why it seems so endemic nowadays is that there was a whole lot of appetite to rewrite “everything for the web” in either Java or .Net around the turn of the century, at the same time a lot of other software got rebuilt mostly from scratch to support, depending on your field, Linux, Mac OS X, or post-NT Windows. There hasn’t been a similar “replant the forest” mood since, so by the teens everyone had a decade-old system with no external impetus to rebuild it. For a lot of fields, this was the first point where we had to think in terms of long term maintenance instead of the OS vendor forcing a rebuild. (We all became mainframe programmers, metaphorically speaking.) And so, even though the Fowler article dates to ’03, and the term is older than that, “Tech Debt” became a twenty-teens concern. Construction stopped being the main concern, replaced with care and feeding.
Software Forests need a different kind of tending than the old rewrite-updates-rewrite again loop. As Foresters, we know the codebases we work on were here before us, and will continue on after us. The occasional greenfield side project, the occasional big new feature, but mostly out job is to keep the forest healthy and hand it along to the next Forester. It takes a different, longer term, continuous world view than counting down the number of car payments left.
Of course, there’s more than one way a forest can get out of hand. Next Time: Types of Overgrowth, catalogued
Software Forestry 0x01: Somewhere Between a Flower Pot and a Rainforest
“Software” covers a lot of ground. As there are a lot of different kinds and ecosystems of forests, there are a lot of kinds and ecosystems of software. And like forests, each of those kinds of software has their own goals, objectives, constraints, rules, needs.
One of the big challenges when reading about software “processes” or “best practices” or even just plain general advice is that people so rarely state up front what kind of software they’re talking about. And that leads to a lot of bad outcomes, where people take a technique or a process or an architecture that’s intrinsically linked to its originating context out of that context, recommend it, and then it gets applied to situations that are wildly inappropriate. Just like “leaves falling off” means something very different in an evergreen redwood forest than it does in one full of deciduous oak trees, different kinds of software projects need different care. As practitioners, it’s very easy for us to talk past each other.
(This generally gets cited in cases like “if you aren’t a massive social network with a dedicated performance team you probably don’t need React,” but also, pop quiz: what kind of software were all the signers of the Agile Manifesto writing at the time they wrote and signed it?1)
So, before we delve into the practice of Software Forestry, let’s orient ourselves in the landscape. What kinds of software are there?
As usual for our industry, one of the best pieces written on this is a twenty-year old Joel On Software Article,2 where he breaks software up into Five Worlds:
- Shrinkwrap (which he further subdivides into Open Source, Consultingware, and Commercial web based)
- Internal
- Embedded
- Games
- Throwaway
And that’s still a pretty good list! I especially like the way he buckets not based on design or architecture but more on economic models and business contraints.
I’d argue that in the years since that was written, “Commercial web-based” has evolved to be more like what he calls “Internal” than “Shrinkwrap”; or more to the point, those feel less like discrete categories than they do like convenient locations on a continuous spectrum. Widening that out a little, all five of those categories feel like the intersections of several spectrums.
I think spectrums are a good way to view the landcape of modern software development. Not discrete buckets or binary yes/no questions, but continuous ranges where various projects land somewhere in between the extremes.
And so, in the spirit of an enthusiastic “yes, and”, I’d like to offer up what I think are the five most interesting or influential spectrums for talking about kinds of software, which we can express as questions sketching out a left-to-right spectrum:
- Is it a Flower Pot or a Sprawling Forest?
- Does it Run on the Customer’s Computers or the Company’s Computers?
- Are the Users Paid to Use It or do they Pay to Use It?
- How Often Do Your Customers Pay You?
- How Much Does it Matter to the Users?
Is it a Flower Pot or a Sprawling Forest?
This isn’t about size or scale, necessarily, as mich as it is about overall “complexity”, the number of parts. On one end, you have small, single-purpose scripts running on one machine, on the other end, you have sprawling systems with multiple farms or clusters interacting with each other over custom messaging busses.
How many computers does it need? How many different applications work together? Different languages? How many versions do you have to maintain at once? What scale does it operate at?3 How many people can draw an accurate diagram from memory?
This has huge impacts on not only the technology, but things like team structure, coordination, and planning. Joel’s Shrinkwrap and Internal categories are on the right here, the other three are more towards the left.
Does it Run on the Customer’s Computers or the Company’s Computers?
To put that another way, how much of it works without an internet connection? Almost nothing is on one end or the other; no one ships dumb terminals or desktop software that can’t call home anymore.
Web apps are pretty far to the right, depending on how complex the in-browser client app is. Mobile apps are usually in the middle somewhere, with a strong dependency on server-side resources, but also will usually work in airplane mode. Single-player Games are pretty far to the left, only needing server components for things like updates and achievement tracking; multiplayer starts moving right. Embedded software is all the way to the left. Joel’s Shrinkwrap is left of center, Internal is all the way to the right.
This has huge implications for development processes; as an example, I started my career in what we then called “Desktop Software”. Deployment was an installer which got burned to a disk. Spinning up a new test system was unbelievably easy, pull a fresh copy of the installer and install it into a VM! Working in a micoservice mesh environment, there are days that feels like the software equivalent of greek fire, a secret long lost. In a world of sprawling services, spinning up a new environment is sometimes an insurmountable task.
A final way to look at this: how involved do your users have to be with an update?
Are the Users Paid to Use It or do they Pay to Use It?
What kind of alternate options do the people actually using the software have? Can they use something else? A lot of times you see this talked about as being “IT vs commercial,” but it’s broader than that. On the extreme ends here, the user can always choose to play a different mobile game, but if they want to renew their driver’s license, the DMV webpage is the only game in town. And the software their company had custom built to do their job is even less optional.
Another very closely related way of looking at this: Are your Customers and Users the same people? That is, are the people looking at the screen and clicking buttons the same people who cut the check to pay for it? The oft-repeated “if you’re not the customer you’re the product” is a point center-left of this spectrum.
The distance between the people paying and the people using has profound effects on the design and feedback loops for a software project. As an extreme example, one of the major—maybe the most significant—differences between Microsoft and Apple is that Microsoft is very good at selling things to CIOs, and Apple is very good at selling things to individuals, and neither is any good at selling things the other direction.
Bluntly, the things your users care about and that you get feedback on are very, very different depending on if they paid you or if they’re getting paid to use it.
Joel’s Internal category is all the way to the left here, the others are mostly over on the right side.
How Often Do Your Customers Pay You?
This feels like the aspect that’s exploded in complexity since that original Joel piece. The traditional answer to this was “once, and maybe a second time for big upgrades.” Now though, you’ve got subscriptions, live service models, “in-app purchases”, and a whole universe of models around charging a middle-man fee on other transactions. This gets even stranger for internal or mostly-internal tools, in my corporate life, I describe this spectrum as a line where the two ends are labeled “CAPEX” and “OPEX”.
Joel’s piece doesn’t really talk about business models, but the assumption seems to be a turn-of-the-century Microsoft “pay once and then for upgrades” model.
How Much Does it Matter to the Users?
Years and years ago, I worked on one of the computer systems backing the State of California’s welfare system. And on my first day, the boss opened with “however you feel about welfare, politically, if this system goes down, someone can’t feed their kids, and we’re not going to let that happen.” “Will this make a kid hungry” infused everything we did.
Some software matters. Embedded pacemakers. The phone system. Fly-by-wire flight control. Banks.
And some, frankly, doesn’t. If that mobile game glitches out, well, that’s annoying, but it was almost my appointment time anyway, you know?
Everyone likes to believe that what they’re working on is very important, but they also like to be able to say “look, this isn’t aerospace” as a way to skip more testing. And thats okay, there’s a lot of software that if it goes down for an hour or two, or glitches out on launch and needs a patch, that’s not a real problem. A minor inconvenience for a few people, forgotten about the next day.
As always, it’s a spectrum. There’s plenty of stuff in the middle: does a restaurant website matter? In the grand scheme of things, not a lot, but if the hours are wrong that’ll start having an impact on the bottom line. In my experience, there’s a strong perception bias towards the middle of this spectrum.
Joel touches on this with Embedded, but mostly seems to be fairly casual about how critical the other categories are.
There are plenty of other possible spectrums, but over the last twenty years those are the ones I’ve found myself thinking about the most. And I think the combination does a reasonable job sketching out the landscape of modern software.
A lot of things in software development are basically the same regardless of what kind of software you’re developing, but not everything. Like Joel says, it’s not like Id was hiring consultants to make UML diagrams for DOOM, and so it’s important to remember where you are in the landscape before taking advice or adopting someone’s “best practices.”
As follows from the name, Software Forestry is concerned with forests—the bigger systems, with a lot of parts, that matter, with paying customers. In general, the things more on the right side of those spectrums.
As Joel said 22 years ago, we can still learn something from each other regardless of where we all stand on those spectrums, but we need to remember where we’re standing.
Next Time: What We Talk About When We Talk About Tech Debt
- I don’t know, and the point is you don’t either, because they didn’t say.
- This almost certainly wont be the last Software Forestry post to act as extended midrash on a Joel On Software post.
- Is it web scale?
Software Forestry 0x00: Time For More Metaphors
Software is a young field. Creating software as a mainstream profession is barely 70 years old, depending on when you start counting. Its legends are still, if just, living memory.
Young enough that it still doesn’t have much of its own language. Other than the purely technical jargon, it’s mostly borrowed words. What’s the verb for making software? Program? Develop? Write? Similarly, what’s the name for someone who makes software? Programmer? Developer? We’ve settled, more or less, on Engineer, but what we do has little in common with other branches of engineering. Even the word “computer” is borrowed; not that long ago a computer was something like an accountant, a person who computed.1 None of this is a failing, but it is an indication of how young a field this is.
This extends to the metaphors we use to talk about the practice of creating that software. Metaphors are a cognitive shortcut, a way to borrow a different context to make the current one easier to talk about. But they can also be limiting, you can trap yourself in the boundaries of the context you borrowed.
Not that we’re short on metaphors, far from it! In keeping with the traditions of American Business, we use a lot of terms from both Sports (“Team”, “Sprint,” “Scrum”) and the Military (“Test Fire,” “Strategy vs. Tactics”). The seminal Code Complete proposed “Construction”. Knuth called it both an Art and a branch of Literature. We co-opted the term “Architecture” to talk about larger designs. In recent years, you see a lot of talk about “Craft.” “Maintenance-Oriented Programming.” For a while, I used movies. (The spec is the script! Specialized roles all coming together! But that was a very leaky abstraction.)
The wide spread of metaphors in use shows how slippery software can be, how flexible it is conceptually. We haven’t quite managed to grow a set of terms native to the field, so we keep shopping around looking for more.
I bring this up because what’s interesting about the choice of metaphors isn’t so much the direct metaphors themselves but the way they reflect the underlying philosophy of the people who chose them.
There’s two things I don’t like about a lot of those borrowed metaphors. First, most of them are Zero Sum. They assume someone is going to lose, and maybe even worse, they assume that someone is going to win. I’d be willing to entertain that that might be a useful way to think about a business as a whole in some contexts, but for a software team, that’s useless to the point of being harmful. There’s no group of people a normal software team interacts with that they can “beat”. Everyone succeeds and fails together, and they do it over the long term.
Second, most of them assume a clearly defined end state: win the game, win the battle, finish the building. Most modern software isn’t like that. It doesn’t get put in a box in Egghead anymore. Software is an ongoing effort, it’s maintained, updated, tended. Even software that’s not a service gets ongoing patches, subscriptions, DLC, the next version. There isn’t a point where it is complete, so much as ongoing refinement and care. It’s nurtured. Software is a continuous practice of maintenance and tending.
As such, I’m always looking for new metaphors; new ways of thinking about how we create, maintain, and care for software. This is something I’ve spent a lot of time stewing on over the last two decades and change. I’ve watched a lot of otherwise smart people fail to find a way to talk about what they were doing because they didn’t have the language. To quote every informercial: there has to be a better way.
What are we looking for? Situations where groups of people come together to accomplish a goal, something fundamentally creative, but with strict constraints, both physical and by convention. Where there’s competition, but not zero sum, where everyone can be successful. Most importantly, a conceptual space that assumes an ongoing effort, without a defined ending. A metaphor backed by a philosophy centered around long-term commitment and the way software projects sprawl and grow.
“Gardening” has some appeal here, but that’s a little too precious and small-scale, and doesn’t really capture the team aspect.2 We want something larger, with people working together, assuming a time scale beyond a single person’s career, something focused on sustainable management.
So, I have a new software metaphor to propose: Software Forestry.
These days, software isn’t built so much as it’s grown, increment by increment. Software systems aren’t a garden, they’re a forest, filled with a whole ecosystem of different inhabitants, with different sizes, needs, uses. It’s tended by a team of practitioners who—like foresters—maintain its health and shape that growth. We’re not engineers as much as caretakers. New shoots are tended, branches pruned, fertilizer applied, older plants taken care of, the next season’s new trees planned for. But that software isn’t there for its own sake, and as foresters we’re most concerned with how that software can serve people. We’re focused on sustainability, we know now that the new software we write today is the legacy software of tomorrow. Also “Software Forestry” means the acronym is SWF, which I find hilarious. And personally, I really like trees.3 Like with trees, if we do out jobs right this stuff will still be there long after we’ve moved on.
It’s easy to get too precious about this, and let the metaphor run away with you; that’s why there were all those Black Belts and Ninjas running around a few years ago. I’m not going to start an organization to certify Software Rangers.4 But I think a mindset around care and tending, around seasons, around long-term stewardship, around thinking of software systems as ecosystems, is a much healthier relationship to the software industry we actually have than telling your team with all seriousness that we have to get better at blocking and tackling. We’re never going to win a game, because there’s no game to win. But we might grow a healthy forest of software, and encourage healthier foresters.
Software Forestry is a new weekly feature on Icecano. Join us on Fridays as we look at approaches to growing better software. Next Time: What kind of software forests are we growing?
-
My grandmother was a “civillian computer” during the war, she computed tables describing when and how to release bombs from planes to hit a target; the bombs in those tables were larger than normal, needing new tables computed late in the war. She thought nothing of this at the time, but years later realized she had been working out tables for atomic bombs. Her work went unused, she became a minister.
- Gardening seems to pop up every couple of years; searching the web turns up quite a few abandoned swings at Software Gardening as a concept.
- I did briefly consider “Software Arborists”, but that’s a little too narrow.
- Although I assume the Dúnedain would make excellent programmers.
Un-Reviewed Code Is Tech Debt
Reading code is hard. It’s one of those universal truths is that regardless of language or platform, writing code is always easier than reading it. Joel on Software has a now 24-year old post on this topic which is still as relevant now as it was at the turn of the century: Reading Code is Like Reading the Talmud.
Mostly this is due to lack of context, there’s so much stuff around any piece of code that you really need to hold in your head for this part to make any sense. All the really useful code reviews I’ve been a part of involve asking the original developer questions: “whats going on here”, “why did you make this decision”, and so on. These don’t always result in changes, but they’re necessary to make sure the context is shared amongst everyone.
One of the reasons I tend to be millitant about software comments is to try and get that initial programmer context preserved with the code itself—to run with Joel’s Talmud analogy, to see if we can include the first batch of commentary along with the source right at the start.
Which brings us to code reviews. I’ve been thinking about code reviews a lot lately, for various reasons. One of the things I keep thinking about is that I can’t believe how hard to do well they still are, but they’re hard for many of the same reasons that reading code is hard. But of course, thats one of the reasons they’re so important: not only to “make sure the code is right”, but also to spread that context around to the team.
I’ve taken to thinking of un-reviewed code as a kind of tech debt—context debt, if you will. And this is the worst kind of debt, in that it’ll build up in the background while you’re not paying attention, and then a couple people get promoted or leave, and you realize you have a whole-ass application that no one on the team has ever seen the insides of. This is kind of rephrasing the “bus factor” problem, but I like treating it as a class of debt because it gives us an existing framework to pay it down.
But that doesn’t solve the basic problem that code review is hard to do, and most of our tools don’t really help. I mean, one of the reasons XP went all-in on pair programming is that was easier than figuring out how to make code easier to read and reason about.
And so given all that, I’ve also been stewing on how it’s very (not) funny to me that we keep finding new ways to replace “writing code” with “code review.”
One of them is that on top of all the other reasons not to let the Plagiarism Machine condense you some code out of the æther, is that now you still have to review that code, but that original context not only isn’t available, it doesn’t even exist. So code reviews become even more important at the same time as they get impossibly harder. Sort of instant-deploy tech debt. It’s the copy-paste from Stack Overflow, only amped way up. But, okay, that’s this new toy burning off the fad, hopefully people will knock that off at some point.
The thing I’ve really been thinking about is all that un-reviewed code we’ve been dragging around in the form of open source libraries. This musing, of course, brought to you by that huge near-miss last month (Did 1 guy just stop a huge cyberattack?), along with the various other issues going on over in NPM, PyPy, and then the follow-up discussion like: Bullying in Open Source Software Is a Massive Security Vulnerability
I mean, this whole thing should be a real wakeup call to the entire OSS world in a “hang on, what the hell are we doing” sort of way. Turns out that sure, with enough eyes all bugs are shallow, but you still have to have someone look. And the fact that it was a guy from Microsoft who found the bug? Because something was too slow? Delicious, but terrifying.
Everyone links to the xkcd about Dependencies with a sort of head-shake “that’s just how it is”. But sooner or later, that guy is going to leave, or need better insurance. You might not be paying the volunteers, but you can bet someone else would be willing to.
Like all of us, I wonder how many of these are out there in the wild? I’m glad I don’t run a Software Dev team that handles sensitive data currently, because at this point you have to assume any FOSS package has a >0% chance of hosting something you don’t want running on your servers.
And to bring it back around to the subject at hand, the real solution is “we need a way to audit and review open source packages”, but after a generation of externalizing that cost, no one even knows how to do that?
But what would I be doing if I was still in charge of something that handled PHI or other sensitive or valuable data? My initial reaction was I’d be having some serious conversations about “what would it take to remove all the open source. No, give me an estimate for all of it. All.”
(And to be clear, it’s not like commercial software is immune either, but that’s a different set of risk vectors and liability.)
I’d want a list of all the FOSS packages in the system, sorted into these buckets:
- Stuff we’re barely using, that we could probably replace in a day or two. (The CSV formatter library that we only use to write one file in one place.)
- Bigger things that we’re using more of, but could still get our arms around what a replacement looks like. (We pulled in Apache Commons collections because it was easy to use, but we’re using less than 10% of it.)
- Big, foundational stuff: Spring, Tomcat, Linux, language standard libraries. Stuff you aren’t going to rewrite.
That third category needs a light audit to make sure there’s an actual entity in charge of it with safety practices and the like. Probably a conversation with legal about liability and whatnot.
But for those first two buckets, I’d want to see an estimated cost to replace. And then I want to see a comparison of “how many hours of effort converted to salary dollars” vs “worst-case losses if our severs got p0ned”. Because the hell of it is, those numbers probably make it a slam dunk to do the rewrite.
But look! I’m doing that same fallacy—it’s easier to write than review, so let’s just rewrite it. And this has been sitting in the drafts folder for a month now because… I don’t know!
The current situation seem untenable, and all the solutions seem impossible. But that review debt is still there, waiting.
Sometimes You Just Have to Ship
I’ve been in this software racket, depending on where you start counting, about 25 years now. I’ve been fortunate to work on a lot of different things in my career—embedded systems, custom hardware, shrinkwrap, web systems, software as a service, desktop, mobile, government contracts, government-adjacent contracts, startups, little companies, big companies, education, telecom, insurance, internal tools, external services, commercial, open-source, Microsoft-based, Apple-based, hosted onvarious unicies, big iron, you name it. I think the only major “genres” of software I don’t have road miles on are console game dev and anything requiring a security clearance. If you can name a major technology used to ship software in the 21st century, I’ve probably touched it.
I don’t bring this up to humblebrag—although it is a kick to occasionally step back and take in the view—I bring it up because I’ve shipped a lot of “version one” products, and a lot of different kinds of “version ones”. Every project is different, every company and team are different, but here’s one thing I do know: No one is ever happy with their first version of anything. But how you decide what to be unhappy about is everything.
Because, sometimes you just have to ship.
Let’s back up and talk about Venture Capital for a second.
Something a lot of people intellectually know, but don’t fully understand, is that the sentences “I raised some VC” and “I sold the company” are the same sentence. It’s really, really easy to trick yourself into believing that’s not true. Sure, you have a great relationship with your investors now, but if they need to, they will absolutely prove to you that they’re calling the shots.
The other important thing to understand about VC is that it’s gambling for a very specific kind of rich person. And, mostly, that’s a fact that doesn’t matter, except—what’s the worst outcome when you’re out gambling? Losing everything? No. Then you get to go home, yell “I lost my shirt!” everyone cheers, they buy you drinks.
No, the worse outcome is breaking even.
No one wants to break even when they go gambling, because what was the point of that? Just about everyone, if they’re in danger of ending the night with the same number of dollars they started with, will work hard to prevent that—bet it all on black, go all-in on a wacky hand, something. Losing everything is so much better than passing on a chance to hit it big.
VC is no different. If you take $5 million from investors, the absolutely last thing they want is that $5 million back. They either want nothing, or $50 million. Because they want the one that hits big, and a company that breaks even just looks like one that didn’t try hard enough. They’ve got that same $5 mil in ten places, they only need one to hit to make up for the other nine bottoming out.
And we’ve not been totally positive about VC here at Icecano, so I want to pause for a moment and say this isn’t necessarily a bad thing. If you went to go get that same $5 million as a loan from a bank, they’d want you to pay that back, with interest, on a schedule, and they’d want you to prove that you could do it. And a lot of the time, you can’t! And that’s okay. There’s a whole lot of successful outfits that needed that additional flexibility to get off the ground. Nothing wrong with using some rich people’s money to pay some salaries, build something new.
This only starts being a problem if you forget this. And it’s easy to forget. In my experience, depending on your founder’s charisma, you have somewhere between five and eight years. The investors will spend years ignoring you, but eventually they’ll show up, and want to know if this is a bust or a hit. And there’s only one real way to find out.
Because, sometimes you just have to ship.
This sounds obvious when you say it out loud, but to build something, you have to imagine it first. People get very precious around words like “vision” or “design intent”, but at the end of the day, there was something you were trying to do. Some problem to solve. This is why we’re all here. We’re gonna do this.
But this is never what goes out the door.
There’s always cut features, things that don’t work quite right, bugs wearing tuxedoes, things “coming soon”, abandoned dead-ends. From the inside, from the perspective of the people who built the thing, it always looks like a shadow of what you wanted to build. “We’ll get it next time,” you tell each other, “Microsoft never gets it right until version 3.”
The dangerous thing is, it’s really, really easy to only see the thing you built through the lens of what you wanted to build.
The less toxic way this manifests is to get really depressed. “This sucks,” you say, “if only we’d had more time.”
The really toxic way, though, is to forget that your customers don’t have the context you have. They didn’t see the pitch deck. They weren’t there for that whiteboard session where the lightbulbs all went on. They didn’t see the prototype that wasn’t ready to go just yet. They don’t know what you’re planning next. Critically—they didn’t buy in to the vision, they’re trying to decide if they’re going to buy the thing you actually shipped. And you assume that even though this version isn’t there yet, wherever “there” is, that they’ll buy it anyway because they know what’s coming. Spoiler: they don’t, and they won’t.
The trick is to know all this ahead of time. Know that you won’t ship everything, know that you have to pick a slice you actually do, given the time, or money, or other constraints.
The trick is to know the difference between things you know and things you hope. And you gotta flush those out as fast as you can, before the VCs start knocking. And the only people who can tell you are your customers, the actual customers, the ones who are deciding if they’re gonna hand over a credit card. All the interviews, and research, and prototypes, and pitch sessions, and investor demos let you hope. Real people with real money is how you know. As fast as you can, as often as you can.
The longer you wait, the more you refine, or “pivot”, or do another round of ethnography, is just finding new ways to hope, is just wasting resources you could have used once you actually learned something.
Times up. Pencils down. Show your work.
Because, sometimes you just have to ship.
Reviews are a gift.
People spending money, or not, is a signal, but it’s a noisy one. Amazon doesn’t have a box where they can tell you “why.” Reviews are people who are actually paid to think about what you did, but without the bias of having worked on it, or the bias of spending their own money. They’re not perfect, but they’re incredibly valuable.
They’re not always fun. I’ve had work I’ve done written up on the real big-boy tech review sites, and it’s slightly dissociating to read something written by someone you’ve never met about something you worked on complaining about a problem you couldn’t fix.
Here’s the thing, though: they should never be a surprise. The amount that the reviews are a surprise are how you know how well you did keeping the bias, the vision, the hope, under control. The next time I ship a version one, I’m going to have the team write fake techblog reviews six months ahead of time, and then see how we feel about them, use that to fuel the last batch of duct tape.
What you don’t do is argue with them. You don’t talk about how disappointing it was, or how hard it was, or how the reviewers were wrong, how it wasn’t for them, that it’s immoral to write a bad review because think of the poor shareholders.
Instead, you do the actual hard work. Which you should have done already. Where you choose what to work on, what to cut. Where you put the effort into imaging how your customers are really going to react. What parts of the vision you have to leave behind to build the product you found, not the one you hoped for.
The best time to do that was a year ago. The second best time is now. So you get back to work, you stop tweeting, you read the reviews again, you look at how much money is left. You put a new plan together.
Because, sometimes you just have to ship.
“And Then My Reward Was More Crunch”
For reasons that are probably contextually obvious, I spent the weekend diving into Tim Cain’s YouTube Channel. Tim Cain is still probably best known as “the guy who did the first Fallout,” but he spent decades working on phenominal games. He’s semi-retired these days, and rather than write memoirs, he’s got a “stories from the old days” YouTube channel, and it’s fantastic.
Fallout is one of those bits of art that seems to accrete urban legends. One of the joys of his channel has been having one of the people who was really there say “let me tell you what really happened.”
One of the more infamous beats around Fallout was that Cain and the other leads of the first Fallout left partway through development of Fallout 2 and founded Troika Games. What happened there? Fallout was a hit, and it’s from the outside it’s always been baffling that Interplay just let the people who made it… walk out the door?
I’m late to this particular party, but a couple months ago Cain went on the record with what happened:
and a key postscript:
Listening To My Stories With Nuance
…And oh man, did that hit me where I live, because something very similar happened to me once.
Several lifetimes ago. I was the lead on one of those strange projects that happen in corporate America where it absolutely had to happen, but it wasn’t considered important enough to actually put people or resources on it. We had to completely retool a legacy system by a hard deadline or lose a pretty substantial revenue stream, but it wasn’t one of the big sexy projects, so my tiny team basically got told to figure it out and left alone for the better part of two years.
Theoretically the lack of “adult supervision” gaves us a bunch of flexibility, but in practice it was a hige impediment every time we needed help or resources or infrastructure. It came down to the wire, but we pulled it off, mostly by sheer willpower. It was one of those miracles you can sometimes manage to pull off; we hit the date, stayed in budget, produced a higher-quality system with more features that was easier to maintain and build on. Not only that, but transition from the old system to the new went off with barely a ripple, and we replaced a system that was constantly falling over with one that last I heard was still running on years of 100% uptime. The end was nearly a year-long sprint, barely getting it over the finish line. We were all exhausted, I was about ready to die.
And the reward was: nothing. No recognition, no bonus, no time off, the promotion that kept getting talked about evaporated. Even the corp-standard “keep inflation at bay” raise was not only lower than I expected but lower than I was told it was going to be; when I asked about that, the answer was “oh, someone wrote the wrong number down the first time, don’t worry about it.”
I’m, uh, gonna worry about it a little bit, if that’s all the same to you, actually.
Morale was low, is what I’m saying.
But the real “lemon juice in the papercut” moment was the next project. We needed to do something similar to the next legacy system over, and now armed with the results of the past two years, I went in to talk about how that was going to go. I didn’t want to do that next one at all, and said so. I also thought maybe I had earned the right to move up to one of the projects that people did care about? But no, we really want you do run this one too. Okay, fine. It’s nice to be wanted, I guess?
It was, roughly, four times as much work as the previous, and it needed to get done in about the same amount of of time. Keeping in mind we barely made it the first time, I said, okay, here’s what we need to do to pull this off, here’s the support I need, the people, here’s my plan to land this thing. There’s aways more than one way to get something done, I either needed some more time, or more people, I had some underperformers on the team I needed rotated out. And got told, no, you can’t have any version of that. We have a hard deadline, you can’t have any more people, you have to keep the dead weight. Just find a way to get four times as much work done with what you have in less time. Maybe just keep working crazy hours? All with a tone that I can’t possibly know what I’m talking about.
And this is the part of Tim Cain’s story I really vibrated with. I had pulled off a miracle, and the only reward was more crunch. I remember sitting in my boss’s boss’s office, thinking to myself “why would I do this? Why would they even think I would say yes to this?”
Then, they had the unmitigated gall to be surprised when I took another job offer.
I wasn’t the only person that left. The punchline, and you can probably see this coming, is that it didn’t ship for years after that hard deadline and they had to throw way more people on it after all.
But, okay, other than general commiserating with an internet stranger about past jobs, why bring all this up? What’s the point?
Because this is exactly what I was talking about on Friday in Getting Less out of People. Because we didn’t get a whole lot of back story with Barry. What’s going on with that guy?
The focus was on getting Maria to be like Barry, but does does Barry want to be like Barry? Does he feel like he’s being taken advantage of? Is he expecting a reward and then a return to normal while you’re focusing on getting Maria to spend less time on her novel and more time on unpaid overtime? What’s he gonna do when he realizes that what he thinks is “crunch” is what you think is “higher performing”?
There’s a tendency to think of productivity like a ratchet; more story points, more velocity, more whatever. Number go up! But people will always find an equilibrium. The key to real success to to figure out how to provide that equilibrium to your people, because if you don’t, someone else will.
Getting Less out of People
Like a lot of us, I find myself subscribed to a whole lot of substack-style newsletter/blogs1 written by people I used to follow on twitter. One of these is The Beautiful Mess by John Cutler, which is about (mostly software) product management. I was doing a lot of this kind of work a couple of lifetimes back, but this one of the resources that has stayed in my feeds, because even when I disagree it’s at least usually thoughtful and interesting. For example, these two links I’ve had sitting in open tabs for a month (They’re a short read, but I’ll meet you on the other side of these links with a summary):
TBM 271: The Biggest Untapped Opportunity - by John Cutler
TBM 272: The Biggest Opportunity (Part 2) - by John Cutler
He makes a really interesting point that I completely disagree with. His thesis is that the biggest untapped opportunity for companies are people who are only doing good work, but give off the indications that they could be doing great work. “Skilled Pragmatists” he calls them—people who do good work, but aren’t motivated to go “above and beyond”, and are probably bored but are getting fulfillment outside of work. Not risk takers, not big on conflict, probably don’t say a lot in meetings. And most importantly, people who have decided to not step it all the way up, the’ve got agency, and deployed it.
In a truly world-class, olympic level of accidentally revealing gender bias, he posits two hypothetical workers, “Maria” who is the prototypical “Skilled Pragmatist” and by comparison, “Barry”, who takes a lot of big risks but gets a lot of big things done.
He then kicks around some reasons why Maria might do what she does, and proposes some frameworks for figuring out how to, bluntly, get more out of those people, how to “achieve more together”.
Not to use too much tech industry jargon here, but my response to this is:
HAHAHAHA, Fuck You, man.
Because let’s back way the hell up. The hypothetical situation here is that things are going well. Things are getting done on time, the project isn’t in trouble, the company isn’t in trouble, there aren’t performance problems of any kind. There’s no promotion in the wings, no career goals that aren’t being achieved. Just a well-preforming, non-problematic employee who gets her job done and goes home on time. And for some reason, this is a problem?
Because, I’ll tell you what, I’ll take a team of Marias over a team of Barrys any day.
Barry is going to burn out. Or he’s going to get mad he isn’t already in charge of the place and quit. Or get into one too many fights with a VP with a lot of political juice. Or just, you know, meet someone outside of work. Have a kid. Adopt a pet. That’s a fragile situation.
Maria is dependable, reliable. She’s getting the job done! She’s not going to burn out, or get pissed and leave because the corporate strategy changed and suddenly the thing she’s getting her entire sense of self-worth from has been cancelled. She’s not working late? She’s taking her kids to sports, or spending time with her wife, or writing a novel. She’s balanced.
The issue is not how do we get Maria to act more like Barry, it’s the other way around—how do we get Barry to find some balance and act more like Maria?
I’ve been in the software development racket for a long time now, and I’ve had a lot of conversations with people I was either directly managing, implicitly managing, or mentoring, and I can tell you I’ve had a lot more conversations that boiled down to “you’re working too hard” than I’ve had “you need to step it up.”
Maybe the single most toxic trait of tech industry culture is to treat anything less than “over-performing” as “under-performing”. There are underperformers out there, and I’ve met a few. But I’ve met a whole lot more overperformers who are all headed for a cliff at full speed. In my experience, the real gems are the solid performers with a good sense of balance. They’ve got hobbies, families, whatever.
Overperformers are the sort of people who volunteer to monitor the application logs over the weekend to make sure nothing goes wrong. Balanced performers are the ones that build a system where you don’t have to. They’re doing something with the kids this weekend, so they engineer up something that doesn’t need that much care and feeding.
I suspect a lot of this is based on the financialization of everything—line go up is good! More story points, more features, ship more, more quickly. It’s the root mindset that settled on every two weeks being named a “sprint” instead of an “increment.” Must go faster! Faster, programmer! Kill, Kill!
And as always, that works for a while. It doesn’t last, though. Sure, we could ship that in June instead of September. But you’re also going to have to hire a whole new team afterwards, because everyone is going to have quit after what we had to do to get it out in June. No one ever thinks about the opportunity costs of burning out the team and needing a new one when they’re trying to shave a few weeks off the schedule.
Is it actually going to matter if we ship in August?
Because, most of the time, most places, it’s not a sprint, and never will be. It’s a marathon, a long drive, a garden. Imagine what we could be building if everyone was still here five, ten, fifteen years from now! If we didn’t burn everyone out trying to “achieve more”.
Because, here’s the secret. Those overperformers? They’re going to get tired. Maybe not today, maybe not tomorrow. But eventually. And sooner than you think. And they’re going to realize that no one at the end of their life looks back and thinks, “thank goodness we landed those extra story points!” Those underperformers? They actually wont get better. Not for you. It’s rude, but true.
The only people who you’re still going to have are the balanced people, the “skilled pragmatists”, the Marias. They’ve figured out how to operate for the long haul. Let’s figure out how to get more people to act like them. And they want to work in a place that values going home on time.
Let’s figure out how to get less out of people.
-
I feel like we need a better name for these, since substack went and turned itself into the “nazi bar”. It’s funny to me that as the social medias started imploding, “email newsletters” were the new hotness everyone seemed to land on. But they’re just blogs? Blogs that email themselves to you when they publish? I mean, substack and its ilk also have RSS feeds, I read all the ones I’m subscribed to in NetNewsWire, not my email client.
Of course, the big innovation was “blog with out-of-the-box support for subscriptions and a configurable paywall” which is nothing to sneeze at, but I don’t get why email was the thing everyone swarmed around?
Did google reader really crater so hard that we’ve settled on reading our favorite websites as emails? What?