I’ve been contemplating recently if there is value in legacy code and if so, what is it? It’s the result of inheriting a new codebase coupled with more free time at home due to social distancing because of COVID-19. I’m sharing below my findings and conclusions.
What is ‘legacy code’
It’s interesting for an industry like software engineering, where technology and tooling improve on a monthly basis, one would assume, people work with the latest-and-greatest tools for development. However, if you speak with professional programmers, you will get the impression everyone works with legacy code on a daily basis and all those modern tools are a dream. For example, many people like Rust as a language, but I’m yet to meet a person who codes in Rust for a living.
Let’s get out of the way what legacy code actually is, as there is no strict definition. I’ve wrongly thought for some time that it’s usually the language it’s written in – COBOL, Fortan 77, Python 2.7, an abandoned frontend framework or that it’s a large accumulation of untested tech debt. But I’ve come to the realization that you can have legacy code written in non-abandoned languages. For example, C++ is far from considered obsolete or unmaintained, but the code written with the 1999 standards in the early 2000 is definitely legacy now. The conclusion I’ve come to is that what classifies something as legacy code is the paradigm it’s written in. You can still have old code run in containers, but it doesn’t make it better quality.
How we end up with it
The important part is that the paradigm that classifies it also changes. One such example would be testing. It’s not a new concept, but chances are high that you’ve worked with untested code. 20 years ago, distributed systems weren’t mainstream, so it’s likely you would have more unit tests than integration tests. Things like fault tolerance were not in the top priorities, as most software executed on a single machine from start to end. Another such example, would be technical documentation. When you don’t keep the code paradigm up to date, whether that is rewriting in a more suitable language that solves the exact problem better or introducing microservices and breaking a monolith – the code decays. People don’t want to work on it and you end up in a vicious cycle and the codebase quality worsening accelerates. We always prefer our own code, even if worse, as it’s written with our understanding and code structure vision. It’s also why we always prefer to start a new project over maintaining an existing one.
Dealing with complexity
The code usually goes one of two ways – hard to understand because of years of piled if statements or something straightforward that needs a refresh. I’ve always believed the complicated one makes you a stronger developer. It forces you to think and understand what’s happening and incentivizes you to come up with ways to improve it for the next maintainer after you. You need to grasp what the limitations were at the time when it was written. If you want to become a better software engineer – read experienced people’s code. Couple the complicated part with lack of any tests, this is where a strong developer would shine. As in not breaking things for existing users, yet wanting to make things better, you would have to be twice as careful. The value proposition in untangling a critical system that works, especially if heavily distributed, is enormous. Doing the same for a system that doesn’t work, is even greater, as it’s significantly harder.
The second bucket is the code that just needs a refresh. It still offers value, just of a different type. As we assume it’s a straightforward piece, it gives you freedom to learn and experiment if there are newer tools that solve the problem. Newer frameworks that improve just readability, not necessarily performance. Think in lines, of migrating Python 2.7 to Python 3 or using modern C++ syntax over the existing more verbose one.
Refactor, don’t rewrite
I’m not saying never to rewrite. Sometimes, it would be just faster and better in all aspects then refactoring. For example, converting a Perl script to a Python one. Usually rewriting is the basic instinct and the more exciting choice, but it’s incorrect in the majority of cases. People tend to underestimate the rewrite part, as it comes with other baggage like new deployment specifics, testing, documentation and others. Delete code, generously, I’ve noticed it’s a practice that almost all developers enjoy, as it usually signifies a new cleaner way of doing things. The cost isn’t high, you have great version control systems along with feature flags, in case you get it wrong. The less code there is to read, the less logic you need to keep in your brain and faster to understand and troubleshoot, for you and the next person.
The cycle never ends
Software is evolving constantly, unlike building a bridge, where the construction is the more resource intensive part and maintenance is a smaller ongoing effort. Building software is the exact opposite – the initial creation effort is much smaller than the maintenance part. You may write 100 lines of code, but they will be read by tens of people or thousands, if we talk open source. That’s why investment in ongoing improvement of code readability is always worth it, as the code will continue to be reread. Good technical documentation is hard to do and maintain, automate as much as possible out of it, never give up on it even if it feel bothersome. The cost of lack of documentation is greater then the time spent documenting.
Today’s code is tomorrow’s legacy
If there’s something I’d like you to take away, it’s that having your code become legacy is a form of flattery. It has survived generations of maintainers throughout the years and is still working today. How many people are confident the code they are writing today, will work in 5-10 years time without touching anything? So the next time you are faced with legacy code, accept the challenge and make the most of it.