Database Tests With PHPUnit

First published at Tuesday, 4 October 2016

This blog post has first been published in the Qafoo blog and is duplicated here since I wrote it or participated in writing it.

Warning: This blog post is more then 9 years old – read and use with care.

Database Tests With PHPUnit

Most of us do not use PHPUnit solely for Unit Tests but we also write Integration or Acceptance Tests with PHPUnit. One very common question then is how to interact with the database correctly in those tests. Let me show you the different options and their trade offs…

There are multiple aspects of database tests where our decision has impact on test atomicity and test runtime. All decisions boil down to: More test atomicity leads to longer test runs, and we can buy test speed by omitting test atomicity. One might be tempted to immediately favour test atomicity but fast tests are a feature which is often underappreciated. If your tests are slow (more then 30 seconds or even taking minutes) developers will not run those tests before each commit any more – they feel it just takes them too much time. Your tests will still be run on your Continuous Integration server but even when your tests fail it is annoying to debug them, because you have to wait so long until the failing test is reached.

Removing Data versus Schema Reset

You have a schema of your database laying around somewhere, right? It should even be somewhere in your source code repository to be able to initialize a full build of your application. We prefer using something like DBDeploy to maintain and document changes to our schema but the solution does not really matter. With a schema you can drop the entire database to clean it up before a test and re-apply the schema. This works well with MySQL or SQLite but takes a lot of time with systems like Oracle.

The other idea is to reset the tables your test modified. Since this does not do any schema changes but just removes data it is usually faster. The most common implementation is to remove data manually in a tearDown() method in all tables you know the test modified. The problem now is that your application will change. This means that the tables data is written into will also change. At some point you'll forget removing some data in some table and this will cause a side effect on another test – a horribly hard to debug side effect. And this only works for new data, it will not revert changed data.

Some people are adding something like a changed column to each table which automatically recieves the time of the last change to a row. If you are doing this you can execute something like DELETE FROM $table WHERE changed >= $testStartTime on each table. But then again, this only works for data added in a test and would remove all rows which were changed in a test.

Summary

Resetting the full schema is the cleanest approach, but also takes the most time. Resetting a selected number of tables is faster, but also more cumbersome and error-prone. If you have some kind of change management you might be able to use that. Especially for large and complex schemas (where you would prefer a cleaner approach) resetting the full schema often takes far too much time.

Point of Data Reset

Now we know different ways to reset the data in our database, but when should we do the reset. There are basically three options:

Before each test
Before each test class
Once before the whole test run

Before Each Test

The setUp() method in PHPUnit_Framework_TestCase is the right point to run initialisations before each test and we can initialize the database here if our test wants to access the database and clean it up later in the tearDown() method. This way we preserve the atomicity of our tests. But since a single cleanup can take up to some seconds this is only feasible for simple and small schemas.

Before Each Test Class

PHPUnit offers the two (static) methods setUpBeforeClass() and tearDownAfterClass() which are run once before any test in the test case class is executed and respectively after all tests in the test case class have been executed.

These methods allow us to reset or setup the database once for entire test case. If you do something like this you must remember that the tests in this test case depend on each other. The first test might create some row and the second test then accesses this row and uses it for something. I usually add @depends annotations on the tests to make this extra obvious. Also the second test will not work without the first one anyways. You can see an example for such a test case in our TimePlanner demo project.

The initialisation of the database happens in the mentioned methods in a base class for our integration tests. The methods look slightly more complicated because the tests are run against three different databases (MySQL, SQLite and CouchDB) and all are resetting the entire schema using Doctrine.

With this method you cannot easily reorder your tests in a test case anymore, but you can still run your tests cases in any order, since they are still supposed to be independent. If you want to debug a single test it still means that you have to run all tests which it depends on before it – which can be annoying.

Before the whole test run

PHPUnit allows you to specify a bootstrap file, either on the CLI or in the phpunit.xml. Nowadays this is mostly used to just specify the vendor/autoload.php from composer, but you can do more in this file. For example you can initialize your database schema. You can also do this in some build step which is always executed before each test run or somewhere similar.

This allows you to initialize the database once before all tests. This approach is supposed to be the fastest because it minimizes database interaction and thus IO. But there are some drawbacks:

If you don't specify the order of your test cases manually the order they are picked up by PHPUnit might changed depending on your OS or file system. This can lead to very weird and hard to debug test failures.
State is leaked between all your tests across your complete test suite. This again can lead to situations which are very hard to debug.
There will be situations where you can debug one test only when running the entire test suite because it depends on the state of some random test you might not even be aware of.

Summary

Except for very small and simple projects it is usually best to initialize the database before each test class. This seems to be the be best compromise between test stability and speed for most projects – your mileage might vary.

Mocking the Database Away

Another entirely different option which comes to mind is mocking the entire interaction with your database. But such complex mocks are often error prone and a lot of effort to implement and maintain. If your database interaction is very simple it might work for you. In any project with non trivial queries (aggregations, joins, …) you probably do not want to walk this path.

In a MySQL project you can think about using a SQLite in-memory database, but at some point you'll discover differences between these two database management systems. And this could mean production bugs which will not be discovered during testing.

Conclusion

In general you should try to reduce access to global state (like a database) as much as possible in your tests. But at some point you'll want to test the interaction with your database – in this case you must decide for a way which works best for your project. In most projects the best way for us seems to reset the full schema before each test case which interacts with the database.

Subscribe to updates

There are multiple ways to stay updated with new posts on my blog:

A classic RSS feed (for example in Portalific)
I'll toot about it on mastodon
All updates will go to LinkedIn, as well