How You Can Successfully Ship New Code in a Legacy Codebase

First published at Wednesday, 19 April 2017

Kore Nordmann
Benjamin Eberlei

This blog post has first been published in the Qafoo blog and is duplicated here since I wrote it or participated in writing it.

Warning: This blog post is more then 7 years old – read and use with care.

How You Can Successfully Ship New Code in a Legacy Codebase

The greek philosopher Heraclitus already knew that "change is the only constant" and as software developers we know this to be true for much of software development and business requirements.

Usually the problems software needs to solve get more complex over time. As the software itself needs to model this increased complexity it is often necessary to replace entire subsystems with more efficient or flexible solutions. Instead of starting from scratch whenever this happens (often!), a better solution is to refactor the existing code and therefore reducing the risk of losing existing business rules and knowledge.

A good strategy for this kind of change is called "Branch By Abstraction". While the term is certainly clumsy and overloaded, the idea itself is genious.

Instead of introducing a long running branch in your version control system (VCS) where you spend days and months of refactoring, you instead introduce an abstraction in your code-base and implement the branching part by selecting different implementations of this abstraction at runtime.

Explaining Branch By Abstraction without an example is not a good idea, so lets take a look at three different examples to get an understanding how to make use of branch by abstraction:

Example 1: Replacing the Backend in a CMS

Up to eZPublish version 4, the popular CMS had a very relational-database centric API and model where all the abstractions made it quite clear that the underlying database is relational. In addition the content repository was leaky about another relational abstraction, the fact that one attribute is stored in one row because of the use of a highly sophisticated Entity-Attribute-Value model.

Those assumptions in the API and the inability to implement different data models made it very hard to scale eZPublish 4 beyond a certain number of content objects. With the rise of Solr and ElasticSearch customers wanted to use the powerful search engines, but the API limitations made this nearly impossible.

With eZPublish 5 a new API was introduced with a fully object-oriented interface and a new content model abstraction.

The idea was to allow developers to work towards switching the storage model step by step. You could replace usages of the old API with the new API which would already allow to benefit of NoSQL search based implementations, with a fallthrough to the legacy database schema. After a full switch to the new API it would also be possible to replace the legacy database schema entirely.

Critical in using branch by abstraction in such a scenario is finding the right API that is not a leaky abstraction for the abstracted storage engine anymore.

<?php

$query = new Query([
    'filter' => new Criterion\LogicalAnd([
        new Criterion\ContentTypeIdentifier("article");
        new Criterion\Field("status", Operator::EQ, "xyz"),
        new Criterion\FullText("Hello World");
        new Criterion\Visibility(Criterion\Visibility::VISIBLE),
    ])
]);

$searchResult = $searchService->findContent($query);

You can see how this API exposes query functionality generically, but makes sure to add semantic meaning such as the FullText criterion. This allowed eZ Systems to implement the API both with the old legacy database schema as a first implementation, but also experiment with Solr and other search technologies to achieve much better performance and scalability.

Example 2: Rewriting a submodule without changing public API

Very early in my career I was responsible for maintaining a fairly large newsletter system that had pretty bad APIs and was a large mess of spaghetti code.

One center piece of this system was the parser that took free form HTML, some configuration flags and text input and generated an HTML email that included tracking data and links, inline CSS styles, images uploaded to a CDN and mangled the HTML such that Outlook and the likes rendered it correctly.

The code for this large system was hidden in a single god-function inside a class:

class Mailing
{
    public function generateMail()
    {
        // 2000-3000 lines of PHP parsing goodness!

        $this->save(array('generated_mail' => $parsedHtmlBody));
    }
}

The code was hard to test and was a frequent source of small and nasty bugs. When we onboarded a second customer onto this system and he needed different configuration inputs everything fell apart.

We built a new mail parser from scratch with a nice new API. It was entirely stateless and accepted a large object of inputs and produced a large object of outputs that could then be put into the existing database as required. The code was feature flagged to our new customer first:

class Mailing
{
    public function generateMail()
    {
        if (Config::getTenant() === 'new_customer') {
            $parser = new MailParser(/* some dependencies */);
            $parsedMail = $parser->parse(
                new MailDraft(array(/* tons of inputs */))
            );

            $this->save(array('generated_mail' => $parsedMail->body));

            return;
        }

        // old 2000-3000 lines of PHP goodness!
    }
}

The combination of highly unit tested code and some weeks of production experience with a subset of customers finally gave us confidence to get rid of the old code entirely and we ended up with a shiny nice API within this legacy code base, but entirely decoupled from it.

Example 3: Github reimplements Merge button

In December 2015, Github blogged about their use of branch by abstraction to replace how the merge button works across the whole site. This critical feature of Github needed extensive testing so they used a library called Scientist to devise an experiment.

They refactored the old code into a dedicated method and then wrote new code with the same method signature. The scientist library then allows them to run both new and old code after each other. The library checks if the response of both methods is the same and logs an error if its not the case.

This strategy is a very powerful use-case of branch by abstraction.

The Process

The process of implementing Branch By Abstraction usually follows a certain set of steps while each step already provides you with a sensible outcome. The basic we have to work with looks something like this:

We have several "Clients" (classes, functions) which use and probably implement an implicitly defined concept like in one of the examples above. This might be inline code (Example 2) or already code in other classes which is called using APIs which do not describe the actual domain well (Example 1). Usually it is a combination of both.

Step 1: Refactoring The First Client

We start by picking one of those "Clients" (not the most complex one) and see how this Client uses the implicitly defined concept. We create a new API for this. This is very similar to Extract Methods or Extract Classes refactorings. While moving the old code we also create an abstraction (interface, set of interfaces).

Do not over-analyze in this step. Just extract the Facade which is required by this one Client and do not try to already cover all other Clients concepts. We will get there. It is best to just move the old code behind a Facade (implementing the abstraction) to make sure the system behaves like before and we do not break something. Functional Tests are really useful when doing such refactorings.

Already after this step the code in the first Client will be much more readable. You already did something beneficial to the software.

Step 2: Iterate Across Clients

We continue to do this with every client. You will have to adapt the Facade and the abstraction during this process. There will always be edge cases in in some clients you will not have thought of initially. But this is also the beauty behind this approach. We slowly discover and understand the implicit concept / our domain.

Step 3: Finish The Clients

We do this until all clients are refactored. The clients are done now and so is our abstraction. We now understood the concept and have modelled it with a new set of interfaces. This is great!

But there still is this Facade which contains all the old ugly code. If the code is working well enough and we do not have to change it often it is valid to just keep it there and ignore it. It will not be maintainable, but if there are no requirements to change, adapt or extend it – then this is fine. Feel free to just stop here.

Step 4: Starting a new Implementation

If we decided that we need a new implementation because the old one is just unmaintainable we can start with this now. We have an abstraction which we "just" need to implement. Please do this test-driven. With tests we can be pretty sure that the new implementation will succeed and fulfil all the requirements of our abstraction. Since the Facade, the Verifier and the new implementation all fulfill the same interface we should be able to replace the implementation used by the Clients using our Dependency Injection Container or with some Factory.

But we only really know that the new implementation is a success if we are testing it in production. And with the given abstraction we can implement a Verifier which ensures exactly this.

The Verifier knows the legacy Facade and the new code – it also implements the abstraction. The Verifier now always calls the Facade wrapping the old code and the new implementation and compares their output. It can optionally even compare additional metrics like speed and memory usage. By using this in production for some time and logging all differences we can be pretty sure that the new implementation will be ready for production.

In this step you might also find bugs in the old code, just like Github did in the example linked before.

Step 5: Delete the Old Code

Now comes the most beautiful step: Deleting the old code.

Once we are sure everything works there is no need to keep the Verifier nor the Facade wrapping the old code. You can throw everything away and now only maintain and adapt the new clean code.

Conclusion

Branch by abstraction is not an easy skill to master and the implementation is always extremely specific to the use-case. But the benefits of avoiding long-running branches and complex merges, being able to test old vs new code and gradually rolling out the new functionality to only a subset of users are so large, that you should definately learn about it. To be honest, for me it is one reason why refactoring in a large legacy code can be so much fun!

Subscribe to updates

There are multiple ways to stay updated with new posts on my blog:

A classic RSS feed (for example in Portalific)
I'll toot about it on mastodon
All updates will go to LinkedIn, as well