exploratory testing

You are viewing an old revision of this post, from April 21, 2007 @ 21:45:11. See below for differences between this version and the current revision.

The Poppendiecks’ latest book gives an interesting analysis of types of testing. (Taken originally from Brian Marick’s blog.) They propose that you divide testing up in two different ways: on the one hand, you can classify tests as either intended to support programming or to critique the product. On the other hand, you can classify tests as either intended to be business facing or technology facing.

This gives you four quadrants. Tests that are technology facing and supporting programming are unit tests. As my loyal readers know, those are the best thing ever, so I won’t go into details.

Tests that are business facing and supporting programming are acceptance tests, or story tests. It took me a little while longer to appreciate these – I looked at tests initially largely through defect prevention goggles, and surely there couldn’t be any bugs left after my unit tests? Well, actually, it turns out that there could be: there are (many) fewer than if I hadn’t been doing pervasive unit testing, but many fewer than a lot is some, not none. Some of those defects are due to legacy code issues, but by no means all. And it’s not like I have a magic wand to get rid of legacy code, anyways.

In both cases, tests have more virtues than just preventing defects. They establish a contract, for one thing. In the unit test case, it might be a contract between programmers, or it just might be a contract between a single programmer’s fingers and the part of the programmer’s brain that cares about things working properly, but it’s a contract either way. In the story test case, it’s ideally a contract between programmers and business types; I still haven’t reached that world (it’s probably the area at work where we’re least agile), alas, but at the least it’s a contract between code and an imagined outsider. And they promote communication (between programmers, between programmers and business, between a programmer and the same programmer years or months or weeks later). And they promote design. In both cases, they’re automated, to make it as easy as possible for the programmer to run as many tests as possible.

Which is all great: better code, fewer defects, shorter debugging cycles, on and on. With all of that goodness, what more could you want?

Quite a bit, it turns out. There are people who say that it’s okay to have a testing department going through manual tests of your product: programmers have a conflict of interest which prevents them from seriously scrutinizing their code, so the only remedy is to have an army of testers to click through your interface to make sure it all works. Those people are wrong on a bunch of levels: for one thing, clicking through interfaces takes forever; for another thing, the programmers are the only people who know the corner cases; for a third thing, programmers aren’t so irresponsible as this suggests; for a fourth thing, the ways having a fast, comprehensive test suite improves your programming are so varied and positive that you’d be crazy to give it up for a slow external test cycle. It is true that having extra eyes doesn’t hurt; that’s why we would like to bring in business types to help with the acceptance tests, that’s why we pair program and have collective code ownership. Surely all that is good enough?

Well, no: even with a good set of acceptance tests, you’ll still find problems the first time you plop your product in front of a user poking around. A lot of that (at least in my case) can be chalked up to inadequate acceptance testing and inadequate business involvement in test design; still, if you’re like me, it takes a while to learn how to do good acceptance tests, and you’re probably dealing with legacy code which didn’t have proper acceptance tests to start with, and you need some way to learn where your acceptance testing skills need improvements. Playing with the product is a great way to do that.

Which brings us to the business facing / critique product quadrant: exploratory testing. (And useability testing.) People just poking around with your product, seeing what it does, pushing areas that might be limits. Not following a script: if you can script a test, you should work hard to automate it, to help support programming. (And if you find a defect during your exploratory testing, please do automate what you just did, so programmers can learn!) Just trying to look at the product with users’ eyes, seeing how it feels.

Like the earlier categories of tests, exploratory tests have virtues beyond finding defects. Even if you aren’t inserting defects into your code, you may have specification errors: your design may not work as well as you’d hoped when confronted with users. Or, for that matter, you may be playing around with various designs, trying to decide which is best. Or you may just need to communicate to somebody else in a visceral way what your product really does.

At work, we’d been slacking off on exploratory testing until recently: we were very engineering-focused, and the few people on the business side were too busy selling our product to have much time to play around with it. We’re doing better now (learning from our experiences), but we still have a ways to go.

So now I’m happy with three of the quadrants, though I still have a lot to learn. Which strongly suggests that my next revelation will be on the virtue of the fourth quadrant: property testing, from a technology facing point of view. These are perfomance testing, security testing, combinatorial error testing. Actually, maybe I got that revelation a year or so ago: we’d been doing performance testing for a while, which was all well and good and helped us catch a few performance regressions. But what was really eye-opening was when we started inserting random errors (deterministically, starting from a seed which changed every night but which allowed us to rerun the tests if problems arose) into the input of one component of the problem, which did a lovely job of uncovering defects. Again, if we’d been doing better in our unit testing, we wouldn’t have inserted the defects in the first place, but we’re not perfect, and we need ways to learn how to improve our testing skills.

We still have room for improvement on this front, though. We should write random error tests for more components. Our load tests take too long to calibrate, so we haven’t always kept them up to date as we use faster hardware.

A useful analysis; I wish I’d seen it a couple of years ago. (But, if I had, I probably wouldn’t have been able to appreciate it.) I like how it divides up the virtues of a traditional testing group: some of those virtues can better be gotten in other ways, indeed maybe all of them can. But the virtues are real and varied, so there are several kinds of blind spots you should work to avoid.

Post Revisions:

April 27, 2012 @ 21:37:15 [Current Revision] by David Carlton
April 21, 2007 @ 21:45:11 by David Carlton

Changes:

--- April 21, 2007 @ 21:45:11
+++ Current Revision
-Deleted: The Poppendiecks' <a href="http:// www.bactrian.org/~carlton/ dbcdb/529/">latest book</a> gives an interesting analysis of types of testing. (Taken originally from <a href="http:// www.testing.com/ cgi-bin/blog/ 2003/08/21#agile-testing- project-1">Brian Marick's blog</a>.) They propose that you divide testing up in two different ways: on the one hand, you can classify tests as either intended to support programming or to critique the product. On the other hand, you can classify tests as either intended to be business facing or technology facing.
+Added: The Poppendiecks' <a href="http:// www.bactrian.org/~carlton/ dbcdb/529/">latest book</a> gives an interesting analysis of types of testing. (Taken originally from <a href="http:// www.exampler.com/old-blog/ 2003/08/21/#agile-testing- project-1">Brian Marick's blog</a>.) They propose that you divide testing up in two different ways: on the one hand, you can classify tests as either intended to support programming or to critique the product. On the other hand, you can classify tests as either intended to be business facing or technology facing.
 Unchanged: This gives you four quadrants. Tests that are technology facing and supporting programming are unit tests. As my loyal readers know, those are the best thing ever, so I won't go into details.
 Unchanged: Tests that are business facing and supporting programming are acceptance tests, or story tests. It took me a little while longer to appreciate these - I looked at tests initially largely through defect prevention goggles, and surely there couldn't be any bugs left after my unit tests? Well, actually, it turns out that there could be: there are (many) fewer than if I hadn't been doing pervasive unit testing, but many fewer than a lot is some, not none. Some of those defects are due to legacy code issues, but by no means all. And it's not like I have a magic wand to get rid of legacy code, anyways.
 Unchanged: In both cases, tests have more virtues than just preventing defects. They establish a contract, for one thing. In the unit test case, it might be a contract between programmers, or it just might be a contract between a single programmer's fingers and the part of the programmer's brain that cares about things working properly, but it's a contract either way. In the story test case, it's ideally a contract between programmers and business types; I still haven't reached that world (it's probably the area at work where we're least agile), alas, but at the least it's a contract between code and an imagined outsider. And they promote communication (between programmers, between programmers and business, between a programmer and the same programmer years or months or weeks later). And they promote design. In both cases, they're automated, to make it as easy as possible for the programmer to run as many tests as possible.
 Unchanged: Which is all great: better code, fewer defects, shorter debugging cycles, on and on. With all of that goodness, what more could you want?
 Unchanged: Quite a bit, it turns out. There are people who say that it's okay to have a testing department going through manual tests of your product: programmers have a conflict of interest which prevents them from seriously scrutinizing their code, so the only remedy is to have an army of testers to click through your interface to make sure it all works. Those people are wrong on a bunch of levels: for one thing, clicking through interfaces takes forever; for another thing, the programmers are the only people who know the corner cases; for a third thing, programmers aren't so irresponsible as this suggests; for a fourth thing, the ways having a fast, comprehensive test suite improves your programming are so varied and positive that you'd be crazy to give it up for a slow external test cycle. It is true that having extra eyes doesn't hurt; that's why we would like to bring in business types to help with the acceptance tests, that's why we pair program and have collective code ownership. Surely all that is good enough?
 Unchanged: Well, no: even with a good set of acceptance tests, you'll still find problems the first time you plop your product in front of a user poking around. A lot of that (at least in my case) can be chalked up to inadequate acceptance testing and inadequate business involvement in test design; still, if you're like me, it takes a while to learn how to do good acceptance tests, and you're probably dealing with legacy code which didn't have proper acceptance tests to start with, and you need some way to learn where your acceptance testing skills need improvements. Playing with the product is a great way to do that.
 Unchanged: Which brings us to the business facing / critique product quadrant: exploratory testing. (And useability testing.) People just poking around with your product, seeing what it does, pushing areas that might be limits. Not following a script: if you can script a test, you should work hard to automate it, to help support programming. (And if you find a defect during your exploratory testing, please do automate what you just did, so programmers can learn!) Just trying to look at the product with users' eyes, seeing how it feels.
 Unchanged: Like the earlier categories of tests, exploratory tests have virtues beyond finding defects. Even if you aren't inserting defects into your code, you may have specification errors: your design may not work as well as you'd hoped when confronted with users. Or, for that matter, you may be playing around with various designs, trying to decide which is best. Or you may just need to communicate to somebody else in a visceral way what your product really does.
 Unchanged: At work, we'd been slacking off on exploratory testing until recently: we were very engineering-focused, and the few people on the business side were too busy selling our product to have much time to play around with it. We're doing better now (learning from our experiences), but we still have a ways to go.
 Unchanged: So now I'm happy with three of the quadrants, though I still have a lot to learn. Which strongly suggests that my next revelation will be on the virtue of the fourth quadrant: property testing, from a technology facing point of view. These are perfomance testing, security testing, combinatorial error testing. Actually, maybe I got that revelation a year or so ago: we'd been doing performance testing for a while, which was all well and good and helped us catch a few performance regressions. But what was really eye-opening was when we started inserting random errors (deterministically, starting from a seed which changed every night but which allowed us to rerun the tests if problems arose) into the input of one component of the problem, which did a lovely job of uncovering defects. Again, if we'd been doing better in our unit testing, we wouldn't have inserted the defects in the first place, but we're not perfect, and we need ways to learn how to improve our testing skills.
 Unchanged: We still have room for improvement on this front, though. We should write random error tests for more components. Our load tests take too long to calibrate, so we haven't always kept them up to date as we use faster hardware.
 Unchanged: A useful analysis; I wish I'd seen it a couple of years ago. (But, if I had, I probably wouldn't have been able to appreciate it.) I like how it divides up the virtues of a traditional testing group: some of those virtues can better be gotten in other ways, indeed maybe all of them can. But the virtues are real and varied, so there are several kinds of blind spots you should work to avoid.

Note: Spaces may be added to comparison text to allow better line wrapping.

Published 12/1/2006 & Filed in Lean / Agile,Programming

a week of wii »
« bonny doon

One Comment

Comments closed

Pingback from malvasia bianca » Blog Archive » thoughts on testing

[…] engineering or to critique the product? (The idea comes from Brian Marick, I blather on about it elsewhere, and there’s also a section on it in Implementing Lean Software […]

2/24/2008 @ 11:07 am

malvasia bianca