We had a (very useful) meeting at work today which, at one point, turned to the extent to which our end-to-end tests should extend beyond the software that we are writing and actually invoke our software via our partners’ software. (As opposed to driving our external interfaces through test clients that we’ve written ourselves.)

My boss was strongly in favor of integration tests (i.e. ones that used our partners’ software); I was against doing that as the default (though I certainly agree that some basic integration tests are useful), but admittedly some of that was an emotional reflex instead of a well-considered response. So here’s a try at the latter.

What makes a good test? One answer: the test should be (in no particular order):

  • Precise
  • Fast
  • Easy to write (and maintain)
  • Realistic

Precise: a test is intended to verify certain behavior, and an ideal test will fail every time if your code sin’t as intended, pass every time if the code is as intended. Fast: the more (useful) feedback the better, and the quicker you can get back to programming the better. Easy to write: you want to encourage people to write as many tests as is useful, and to keep the tests up to date as the code changes. And realistic is good, too: what matters is how your product will behave when released into the wild.

Software is built out of subsystems, each having different interfaces. And the first three desiderata all argue for the same behavior: you should write your test at the interface of the subsystem that’s in charge of the behaviour you’re trying to test. Any further in is impossible; but the further out you go, the less likely it is that your test is going to be precise, fast, maintainable. Whereas the fourth desideratum suggests that you should write your tests at the largest-scale interface that is practical.

In the past, I’ve had a bit of a blind spot in this regard – I appreciated unit tests faster than I appreciated acceptance tests, for example. But acceptance tests have saved me several times over the last year, so I really do believe in them now. (Note, however, that acceptance tests don’t have to be end-to-end.) My unit tests aren’t perfect; also (as Phlip periodically points out on the XP newsgroup), if your tests are a bit broader than absolutely necessary, then they can also serve as backup tests for the systems other than the one that you’re focusing on. (This is why he warns against overuse of mocks.)

Anyways, clearly some level of end-to-end testing is required: even if you want to test through the interface of the subsystem controlling the behavior under test, that still suggests some amount of end-to-end testing, because sometimes the behavior under test is nothing less than the behavior of the entire system! But not always – usually the behavior you’re interested is more specific, and usually you have a choice of interfaces at which you could test the behavior under question.

The criteria above give one answer as to what interface to chose: chose the highest-level interface for which it’s possible to write the test without seriously sacrificing precision, speed, or ease of creation. Most of the time, this will be a narrow interface; sometimes, though, it will be a broader interface. And it’s worth investing some amount of effort into building up tools to make it easier to test via that broader interface: there may not be much you can do about the precision and speed metrics, but judicious tool-writing can be an enormous help with the ease of use metric.

At work right now, we have thousands of unit tests; you can run them all in ten minutes or so, and if you’re working on one particular piece of code, you can run the relevant ones in much less time than that. Which is necessary to maintain a good workflow: you can run some of them constantly to catch your mistakes quickly, you can run all of them before you check in your code, just to make sure you didn’t guess wrong about which tests are relevant.

We have hundreds of tests that test our system more broadly, typically running our entire system (but not our partners’ systems). We run these every night. This is good, too: if something slips through the unit tests, you’ll find out about it soon. I suspect we could get some mileage out of speeding up those tests, and they’re lacking in precision at times, but I don’t have anything useful to say on those fronts, and I don’t think that’s a pain point.

We have something like ten tests which check integration with our partners; those also run every night, except when they don’t because something has changed the behavior of our partners’ systems in a way that we need help to resolve. (As has been the case for the last couple of weeks.)

And my boss is right in saying that this isn’t enough tests at the integration layer. (And we all agree, of course, that their reliability has to improve.) On the other hand, I’m right in being worried about focusing excessively on that level: tests should be precise, fast, and easy to write, and currently our integration tests satisfy none of those criteria.

This suggests the following courses of action:

  • We should examine our non-integration end-to-end tests, and try to judiciously pick some to move over to the integration layer, in a way that gives us the most bang for our buck.
  • We should invest time in making our integration tests more precise, faster, and easier to write, so that we can move more tests to that level in the future.

To be sure, I have no idea how to balance that latter investment against other forces competing for our time. We certainly have other high-priority tasks (my current favorite of which is getting our Customer ducks in order), but it’s important enough that, even now, it’s worth spending effort on. And it should be an ongoing task for the indefinite future.

Post Revisions:

There are no revisions for this post.