In the spirit of “every long e-mail I send somewhere should be shamelessly recycled on my blog”, I present some random thoughts on testing.

Why do we release products with defects that we weren’t aware of? This is a sign of flaws in our testing; two possible causes are:

  1. We don’t know what to test for.
  2. We do know what to test for, but we’re not able to do enough testing before release.

For 1, how can we figure out where our blind spots are? Some tactics:

  • Defect clusters.

If we can figure out in what areas we’ve historically had a large number of post-release bugs, then we can increase our testing in that area in future products. So if people can come up with useful suggestions for analyzing post-release data, that would be very useful.

  • Different classes of tests.

One of the most interesting testing ideas I’ve seen over the last couple of years is the idea that you can analyze tests along two dimensions: are they business-facing or technology-facing, and are they intended to support engineering or to critique the product? (The idea comes from Brian Marick, I blather on about it elsewhere, and there’s also a section on it in Implementing Lean Software Development.)

This gives four quadrants. Technology facing tests designed to support engineering are unit tests, tests that narrowly focus on a specific internal interface. Business facing tests designed to support engineering are tests that are focused on a certain aspect of customer-visible behavior. Technology facing tests designed to critique the product are property testing, tests for “nonfunctional requirements”: load testing, security testing, combinatorial testing. And business facing tests designed to critique the product are various sorts of manual poking around: usability testing, exploratory testing, etc.

I know that, in the past, I’ve had huge blind spots in these quadrants. And we can gather data to figure out which quadrants we might be missing: if we’re either not implementing known basic requirements or taking too long for the product to stabilize its functionality in those basic requirements, then we might be missing tests in the two “support engineering” quandrants. If we’re running into lots of corner case bugs or stress bugs, we’re missing property testing. And if we’re producing products that behave according to spec, but isn’t what the customer wants, then we’re missing tests that are business facing and designed to critique the product.

The above assumes that we don’t know what to test for; what if we do know what to test for, we’re just not doing a good enough job? Here, testing is a bottleneck, and we want to speed it up. At least, it might be a bottleneck: it may also be the case that something else is a bottleneck, creating schedule pressure that isn’t caused by testing, and testing gets unfairly shrunk because it comes at the end of the development cycle. But, for now, let’s assume testing is a bottleneck.

There are certain obvious knobs we can turn here (hire more testers, build more machines to test on), and that may be what we have to do, but those knobs cost money. So we should also look at the testing value stream with lean eyes figure out where we can find waste, and eliminate as much as possible.

To that end, some questions:

  • Are there manual tests that can be turned into automated tests?

Doing this would have three benefits:

  1. If availability of human testers is a bottleneck, this helps alleviate that bottleneck.
  2. Automated tests are generally faster than manual tests.
  3. Engineers developing the product can run the tests more easily, which means that they can find defects sooner after introducing them, which has no end of benefits.
  • Are there tests that can be sped up?

One technique that works really well on the software side is to directly test the smallest software interface relevant to the issue in question, instead of starting up the whole system: this can turn a 5 minute test into a 5 millisecond test. For example, every time I check in software, I first run a suite of 5000 or so automated tests; if I had to actually run the whole StreamStar system for each test, that would take weeks, but as it is it takes 15 minutes to run all 5000 tests. (And I wish they were faster than that!)

To be clear, we do have other tests that run the whole system. But, to return to the four quadrants above, try to move as many tests as possible to the “support engineering” side (by turning them into tests of clear functional requirements), and try to move as many of those as possible to the “technology-facing” quadrant (by shrinking the interfaces they test). You still need all four quadrants, but that’s the quadrant where you get the most bang for your time.

  • Is the test analysis taking too long?

Maybe the problem isn’t with running the tests, it’s with making sense of the results of the tests. Do the tests give a clear pass/fail result? Failing tests take more time to analyze than passing tests (among many other problems, e.g. one bug can mask another); do we have too many failing tests? Do the tests not generate enough information in the failure case to make analysis easy (e.g. so you can tell different known bugs apart, or known bugs apart from unknown bugs)?

  • Is the test writing taking too long?

If so, we should invest more time in test frameworks.

  • Are people or machines idle inappropriately?

This is a dangerous issue to approach, because you don’t want to do makework for the sake of makework: for best utilization of a system, you should work your bottlenecks at as close to 100% as possible but explicitly allow slack in all other components. Having said that, sometimes waiting is just plain waste. For example, if you’re low on test machines, you want to separate running tests from analyzing tests as much as possible, so you can keep the machine busy running the next test while you’re still analyzing the previous one. (But if you’re not low on test machines, then if you can speed up the test writing/analyzing process by hogging the machine for a while longer, that’s a better choice. And still better is to make the writing and analyzing as easy as possible, so you don’t have to make that choice!)

  • Do people have time to think about what they’re doing?

Overworked people make mistakes; even if they don’t make mistakes, it’s hard to devise a method to cut testing time in half in some area if your boss is harping on you to get dozens of things done today.

  • Are good ideas spreading through the group?

We need a way to identify our best ideas and to get them adopted broadly.

Post Revisions: