Software Forestry 0x05: Cutting Trails
The lived reality of most Software Foresters is that we spend all our time with large systems we didn’t see the start of, and won’t see the end of. There’s a lot of takes out there about what makes a system “Legacy” and not “just old”, but one of the key things is that Legacy Systems are code without continuity of philosophy.
Because almost certainly, the person who designed and built in the first place isn’t still here. The person that person trained probably isn’t still here. Given the average tenure time in tech rolls, it’s possible the staff has rolled over half-a-dozen or more times.
Let me tell you a personal example. A few lifetimes ago, I worked on one of these two-decade old Software Forests. Big web monolith, written in what was essentially a homebrew MVC-esque framework which itself was written on top of a long-deprecated 3rd party UI framework. The file layout was just weird. There clearly was a logic to the organization, but it was gone, like tears in the rain.
Early on, I had a task where I needed to add an option to a report generator. From the user perspective, I needed to add an option to a combobox on a web form, and then when the user clicked the Generate button, read that new option and punch the file out differently.
I couldn’t find the code! I finally asked my boss, “hey, is there any way to tell which file has which UI pages?” The response was, “no, you just gotta search.”
As they say, Greppability Is an Underrated Code Metric.
(Actually, I’ve worked on two big systems now where I had essentially this exact conversation. The other one was the one where one of the other engineers described it as having been built by “someone who knew everything about how JSP tags worked except when to use them.”)
So you search for the distinctive text in the button, or the combo box, or something on the page. You find the UI. Then you start tracing in. Following the path of execution, dropping into undocumented methods with unhelpful names, bouncing into weird classes with no clear design, strange boundaries, one minute a function with a thousand lines, the next minute an inheritance hierarchy 10 levels of abstraction deep to call one line.
At this point you start itching. “I could rewrite all of this,” you think. “I could get this stood up in a weekend with Spring Boot/React/Ruby on Rails/Elixir/AWS Lambdas/Cool Thing I Used Last”. You start gazing meaningfully at the copy of the Refactoring book on the shelf. But you gotta choke down that urge to rebuild everything. You have a bug you have to fix or a feature to deploy. But it’s not actually going to get better if you keep digging the hole deeper. You gotta stop digging, and leave things better for the next person.
You need to Cut a Trail.
1. Leave Trail Markers.
First thing is, you have to figure out what it does now before you change anything. In a sane, kind world, there would be documentation, diagrams, clear training. And that does happen, but very, very rarely. If you’re very lucky, there’s someone who can explain it to you. Otherwise, you have to do some Forensic Architecture.
Talk to people. Add some logging. Keep clicking and watching what it does. Step through it in a debugger, if you can and if that helps, although I’ve personally found that just getting a debugger working on a live system is often times more work than is worth it for the information you get out of it. But most of all, read. Read the code closely, load as much of that system’s state and behavior into your mind as you can. Read it like you’re in High School English and trying to pull the symbolism out of The Grapes or Wrath or The Great Gatsby. That weird function call is the light at the end of the pier, those abstractions are the eyes on the billboard—what do they mean? Why are they here? How does any of this work?
You’ll get there, that’s what we do. There’s a point where you’ll finally understand enough to make the change you want to make. The trick is to stop at this point, and write everything down. There’s a “haha only serious” joke that code comments are for yourself in six months, but—no. Your audience here is you, a week ago. Write down everything you needed to know when you started this. Every language has a different way to do embedded documentation or comments, but they all have a way to do it. Document every method or function that your explored call path went through. Write down the thing you didn’t understand when you started, the strange overloaded behavior of that one parameter, what that verb really means in the function name, as much as possible, why it does what it does.
Take an hour and draw the diagram you wish you’d had. Write down your notes. And then leave all that somewhere that other people can find it. If you’re using a language where the imbedded documentation system can pull in external files, check that stuff right on in to the codebase. Most places have an internal wiki. Make a page for you team if there isn’t one. Under that, make a page for the app if it doesn’t have one. Then put all that you’ve learned there.
Something else to make sure to document early on: terminology. Everyone uses the same words to mean totally different things. My personal favorite example: no two companies on earth use the word “flywheel” the same way. It doesn’t matter what it was supposed to mean! Ask. Then write it down. The weird noun you tripped over at the start of this? Put the internal definition somewhere you would have found it.
People frequently object that they don’t have the time to do this, to which I say: you’ve already done the hard part! 90% of the time for this task was figuring it out! Writing it down will take a fraction of the time you already had to spend, I promise. And when you’re back here in a year, the time you save in being able to reload all that mental state is going to more than pay for that afternoon you spent here.
2. Write Tests.
Tests are really underrated as a documentation and exploration technique. I mean, using them to actually test is good too! But for our purposes we’re not talking about a formal TDD or Red-Green-Refactor–style approaches. That weird function? Slap some mocks and stubs together and see what it does. Throw some weird data at it. You’re goal isn’t to prove it correct, but to act like one of those Edwardian Scientists trying to figure out how air works.
Another Forest I inherited once, which was a large app that customers paid real money to use, had a test suite of 1 test—which failed. But that was great, because there was already a place to write and run tests.
Tests are a net benefit, they don’t all have to be thorough, or fall into strict unit/integration/acceptance boundaries. Sometimes, it’s okay to put a couple of little weird ones in there that exist to help explain what some undocumented code does.
If you’re unlucky enough to run into a Forest with no test runner, trust me, take the time to bolt one on. It doesn’t have to be perfect! But you’ll make that time back faster than you’d believe.
When you get done, in additional to whatever “normal” Unit or Integration tests your process requires or requests, write a really small test that demonstrates what you had to do. Link that back to the notes you wrote, or the documentation you checked in.
3. A Little Cleanup, some Limited Refactoring
Once you have it figured out, and have a test or two, there’s usually two strong responses: either “I need to replace all of this right now”, or “This is such a mess it’ll never get better.”
So, the good news is that both of those are wrong! It can get better, and you really probably shouldn’t rework everything today.
What you should do a little cleanup. Make something better. Fix those parameter names, rename that strangely named function. Heck, just fix the tenses of the local variables. Do a little refactoring on that gross class, split up some responsibilities. It’s always okay to slide in another shim or interface layer to add some separation between tangled things.
(Don’t leave a huge mess in the VC diff, though, please.)
Leave the trail a little cleaner than when you found it. Doesn’t have to be a lot, we don’t need to re-landscape the whole forest today.
4. Write the New Stuff Right (as possible)
A lot of the time, you know at least one thing the original implementors didn’t: you know how the next decade went. It’s very easy to come in much later, and realize how things should have been done in the first place, because the system has a lot more years on it now than it used to. So, as much as you can, build the new stuff the right way. Drop that shim layer in, encapsulate the new stuff, lay it out right. Leave yourself a trail to follow when you come back and refactor the rest of it into shape.
But the flip side of that is:
5. Don’t be a Jerk About It
Everyone has worked on a codebase where there’s “that” module, or library, or area, where “that guy” had a whole new idea about how the system should be architected, and it’s totally out of place with everything else. A grove of palm trees in the middle of a redwood forest. Don’t be that guy.
I worked on an e-commerce system once where the Java package name was something like com.company.services.store
, and then right next to it was com.company.services.store2
. The #2 package was one former employee’s personal project to refactor the whole system; they had left years before with it half done, but of course it was all in production, and it was a crapshoot which version other parts of the system called into. Don’t do that.
After you’re gone, when someone looks at the version control change log for this part of the system, you want them to see your name and think “oh, this one had the right idea.”
Software Forestry is a group project, for the long term. Most of the time, “consistency and familiarity” are more valuable than some kind of quixotic quest for the ideal engineering. It’s okay, we’ll get there. Keep leaving it better than you found it. It’ll be worth it.
You’ll be amazed what your overgrown codebase looks like after a couple months of doing this. That tangled overgrowth starts to look positively tidy.
But sometimes, just cutting trails doesn’t get you there. Next Time: The Controlled Burn.