A lot of people writing about testing draw the correlation between testing and experimenting. You’ll often hear something like “testing is evaluation through experimentation.” But, as advice to testers, this falls far short of helpful if the notion of what being a good experimenter entails is not covered. So let’s talk about that.
If we go by the scientific viewpoint, an experiment is some procedure that you perform to make some discovery, test a hypothesis, or demonstrate something that is (allegedly) a known fact. But what does it mean to perform here? What are you actually doing?
I covered some of this when I talked about reframing test interviews and I’ll repeat the salient bits here about behavior and temperament. This will give me a chance to expand a bit on those ideas.
Performing an experiment, in any capacity, requires a tester to exhibit certain behaviors.
- You need to determine if the experiment is going well.
- You need to determine if the experiment is going poorly.
- You need a way to amplify the parts that are going well.
- You need a way to dampen the parts that are going poorly.
Here “going well” and “going poorly” can be as vague or as specific as you want them to be. For me, a large part of this is the feedback loop that my tests are providing so that I can reason about the experiment. Here the feedback is not just for me but for others. Specifically, I’m concerned about the tightness of that feedback loop.
Behaviors in Service of What?
I want to help others on my projects manage uncertainty and reduce the risk that comes with it. I want to do that in a way that will help all of us progressively discover and deliver an effective solution that matches up with the underlying business goals behind our project.
This requires a feedback loop that allows our design to adjust regularly and to evolve naturally.
This means I want to encourage fail-fast and safe-to-fail experiments in our design and our implementation. These experiments should allow sustained focus (for when things go well) and brief recovery (when things go bad).
When done well, this approach can lead to a risk-aware, but not risk-fearful, culture which is one of the key aspects of providing an environment focusing on “trying things out” (experiments).
The “heavier” the test artifacts — which are giving the feedback — become, the harder it is to allow them to evolve at a pace that sustains specification and development. This means testing, as a practice, and tests, as an artifact, have to be able to move at the speed of decision, assuming the “speed of decision” is reasonable.
Tests as Feedback
There’s an important point here that’s easy to say but often gets mired in a specific. Tests are a means of feedback to figure out whether we made mistakes. Here “mistakes” do not just mean “bugs” (which is where the mired in a specific comes in). And here “mistakes” are not necessarily just in code artifacts. Mistakes in reasoning are often more prevalent and are a wider cause of bugs than faulty programming.
The longer it takes us to get feedback from tests — again, of various sorts, not just against functioning code — as to whether we did or did not make a mistake, the more expensive we’re making our overall process. Thus the less value we get from that feedback. This means every mistake gets more expensive.
This gets into the Theory of Constraints in Testing. I talked about this a bit in
design pressure and these are thoughts I borrow from J.B. Rainsberger who got me thinking along these lines.
The idea is that if you drive the cost of a mistake as close to zero as possible, it doesn’t matter how many mistakes you make. When the costs are low, you can experiment more and make more mistakes. As Rainsberger puts it, “the Cost of Change curve becomes the Cost of Mistake curve.” What this means is that the emphasis is not the cost of change but rather on the cost of the mistake. It’s not how late in our project cycle we try to change something that necessarily matters; rather it’s the time between when we create a mistake and figure out that we did so. The longer that is, the more expensive it is.
This, to me, is really all that needs to be focused on with concepts like “agile” or “lean” or “being scrappy.” Any of those approaches are really about making better decisions sooner. A corollary to that is being able to make those decisions over the smallest spatial (feature size) and temporal (sprint size) intervals as possible.
Performing an experiment, in any capacity, requires a tester to exhibit a certain temperament.
- Be able to question everything and take very little at face value.
- Be able to argue both sides of a point but don’t be needlessly contrarian.
- Believe that knowledge and self-control lead to wise decisions and act accordingly.
- Believe and act like emotional and mental clarity are necessary.
- Be able to maintain focus to rely on intuition that is guided by experience.
These aspects of temperament are important because specialist testers like to be experimenters. But not just that. They like to talk about experimentation itself. Testers like to move fluidly between the theory of experimentation and the practice of it. Further, testers like to build the tools that support experiments. Testers want all this to be effective, efficient … and elegant.
Temperament in Service of What?
But what is this in service to? I believe one key aspect that is not often called out is the service of connecting causes with consequences. That’s why the feedback loop needs to be tight. The further the consequence (or discovery of it) is from its cause, the harder it becomes to reason about our experiments and thus the less value they provide us.
There are three sets of distinctions to be made in connecting causes with consequences:
- Between the immediate, the intermediate, and the distant.
- Between the exceptional and the general.
- Between the factual and the counterfactual.
Digging into those would take be a bit afield of where I’m leading this article.
Abstractions and Mental Models
Effective and efficient testing means finding the right level of abstraction and the right mental model that can describe a complex situation in a way that is concise and coherent. In a strict scientific endeavor we might have:
- Conciseness (of hypotheses)
- Completeness (of experimental scope)
- Coherence (of observational reporting)
I reframe these slightly in the context of specialist testers:
- Conciseness (of specification)
- Completeness (of feature coverage)
- Coherence (of documentation)
I realize we can get into what “specification” means and what “documentation” means. For right now, let’s not worry about that and just recognize what I described as forces.
When we, as specialist testers, find the right balance between each of these forces (conciseness, completeness, coherence), a viable question becomes: Can one, single artifact serve all three purposes effectively? Realistically, the question becomes: What are the minimum sources of truth we can get away with?
Answering that question often provides us with the models we expose to others, which guide our grouping strategies (say, for test artifacts) and our explanation strategies (say, for test reports).
To succeed in this, we need to be clear and precise about the topic (this addresses conciseness) but draw attention to the important relationships and special cases as well as the normal (this satisfies completeness). We also need to provide a context for what we do in the form of a narrative, using that to achieve a balance between general rules and specific examples (this promotes coherence).
The Forces of Projects
I referred to conciseness, completeness, and coherence as forces and that’s exactly what they are. They are not goals, as often treated, but rather influencing forces.
Here’s something important that I think often gets lost. Design, collectively, refers to the forces that exert sudden and unexpected pressures on your work. As specialist testers, we have to become very good at looking for and finding the sources that are the equivalent of friction and gravity in our projects.
We are looking for those things that apply both gradual and sudden pressures to design, some of which are unexpected. Changing requirements. Bugs. Tests that you can’t trust. Test suites that are too large to reason about. Environments that are hard to spin up. Test data that cannot be relied on. Customizations required by different customers. Support for different digital channels. And so on.
This requires a tester to treat testing as equal parts design and execution activity. We, as specialist testers, have to put pressure on design at a pace that matches how product teams specify and how development teams write code. The modern tester has to have feet planted in both worlds, which makes testing one of the most interesting career choices around.
Testing Is a Force!
All of this also means testing becomes its own sort of force.
Specialist testers, particularly in the industry as it is, must be rethinking what it means to add value and figuring out how testing puts pressure on design and surfaces issues related to quality.
Putting pressure on something means acting like a force.
As an experimentalist, this means we have to manage with facts, support with data, and convince with evidence. As testers performing experiments, this means we have to be able to talk about what adds value so we can all agree on what is value threatening.
This means we, as specialist testers, have to be in a position to add value — while dealing with the realistic constraints and variances — and sustain value — even when competing aspects are in place.
Imagine if physicists had to manufacture the very gravity or electromagnetic forces their experiments were designed to study. We specialist testers, in a very real sense, have to provide the very force that we use as the basis for ourselves and others to experiment. The ability to do this is a large part of what makes the testing discipline a unique one.
All of this — literally everything I said here — is what I believe it means to treat testing as a form of experimentation. So while this “testing is experimentation” dogma-becoming-tradition is easy to say, it’s important that specialist testers understand what it actually means.