Testing legacy code

First published at Friday 30 July 2010

This blog post has first been published in the Qafoo blog and is duplicated here since I wrote it or participated in writing it.

Warning: This blog post is more then 14 years old – read and use with care.

Testing legacy code

Today we know about the benefits of Test Driven Development and normally also start new projects using TDD. Most projects from the last couple of years integrated this method in their daily development process, which often results in in good code coverage results of 90% and above. But what about all the other old projects, you still manage in your daily work?

Those projects already exist for several years, doing their jobs and are adding everyday value for their users. They contain of thousands or hundreds of thousands lines of code, but most or all of them are untested. Estimating the time for modifications and maintenance is nearly impossible and risky, because side effects are always possible. So what can we do with such code bases?

Uncovered source codeUncovered source code

Option one: Ee can try to write tests for the whole application, afterwards. But for this task a developer could spend all his working years without doing anything else then writing tests. So, this isn't really a realistic solution.

Option two: We can start to write tests only for newly added features. At a first glance this seems to be a good compromise. You don't have to invest too much time, and all newly added features are tested. But this solution has one major drawback. What's with all that legacy code? The code that makes the maintenance a nightmare and estimating a gambling game. It is still untested.

This brings us to the third possible solution: We write tests for newly added features and also we add tests for the existing code once we have to change it. With this approach, we have the advantage of both new and old source code will be tested over time, without spending too much time and money for legacy code tests.

Following approach three, chances are good that we get an application stable within a few month, because normally changes to legacy code are not spread throughout the whole application. This means that parts of an application will never change a) because no one uses them or b) because they are so stable so that no one has to change them. While other parts of the application will change regularly a) because there are so many bugs in that part of the source code or b) because there are continuous requirement changes and improvements.

Sounds good, right? But using a normal coverage report, you may never get an appealing percentage of 70%, 80% or more. The report will stay red and the covered lines will stuck around 25% to 30%. This condition can be really frustrating for all participants, because they work hard on their tests, but there is no positive and motivating feedback. And here comes PHP_ChangeCoverage into play.

PHP_ChangeCoverage combines the coverage data collected by PHPUnit and the commit history of your version control system and generates a new coverage report, which only reflects those parts of the application that have been changed within a specified time range. You can get a first version of PHP_ChangeCoverage from its github repository.

~ $ git clone git://github.com/manuelpichler/php-change-coverage.git phpccov ~ $ cd phpccov

In order to run PHP_ChangeCoverage two additional dependencies must be installed. The first one is PHPUnit and the second one is PHP_CodeCoverage. Both can be installed through PHPUnit's PEAR channel.

~ $ sudo pear channel-discover pear.phpunit.de ~ $ sudo pear channel-discover components.ez.no ~ $ sudo pear install --alldeps phpunit/PHPUnit ~ $ sudo pear install --alldeps phpunit/PHP_CodeCoverage-beta
Coverage for partial tested codeCoverage for partial tested code

Now that all required programs are installed we can start a small example session. Therefore you can checkout out the small example application which can be found under docs/example.

~ $ cd docs/example ~ $ cd example ~ $ svn co file://`pwd`/svnrepo checkout

Now let's generate the code coverage report for the project the first time:

~ $ ../../phpccov --coverage-html coverage \ checkout/test/PHP/ChangeCoverage/ExampleTest.php

As you can see, the PHP_CodeCoverage command line interface accepts the same arguments as the used PHPUnit version accepts. And without any special parameters both tools have nearly the same behavior. If we look onto the generated code coverage report (Coverage for partial tested code), we can see that the source has a line coverage of round about 60% and method coverage of 50%. If we assume now that the methods getBaz() and showBaz() are frequently changed, while the other two methods are just kept, because no one knows where or if at all they are used. So let us rerun the tests, but this time we only want the coverage information for that part of source that was modified in time frame, starting from a specified date.

Coverage based on code changesCoverage based on code changes
~ $ ../../phpccov --modified-since 2010/07/27 \ --coverage-html coverage \ checkout/test/PHP/ChangeCoverage/ExampleTest.php

This time we got a coverage report that only highlights those lines that were changed since the specified date. All the other lines are flagged as dead code. If you don't like the dead code behavior and you prefer to highlight all unchanged lines as covered, you can add the --unmodified-as-covered option to the phpccov command line call.

~ $ ../../phpccov --modified-since 2010/07/27 \ --unmodified-as-covered \ --coverage-html coverage \ checkout/test/PHP/ChangeCoverage/ExampleTest.php

Currently PHP_ChangeCoverage supports five different version control systems, through the underlying vcs_wrapper.

If PHP_ChangeCoverage cannot detect one of these version control systems it will fallback to a simple file based implementation that uses the last modification time to collect only changed files.

Additionally PHP_ChangeCoverage uses PHPUnit to execute the test cases and PHP_CodeCoverage to generate the different report formats.

The project is licensed under New BSD license and is available on Github for forking.

Subscribe to updates

There are multiple ways to stay updated with new posts on my blog: