Do Testers Understand Testing?

Written by Jeff Nyman 9 April 2019 5 Comments

I’m asking here: do testers understand testing? By which I mean: do testers truly understand testing? By which I mean … okay, you know what, let’s just dig in to the basis of testing for a moment.

Let’s start off with a simple phrase:

Putting things to the test.

That’s easy to say, right? But this notion of “putting something to the test” rests on basic methods that underlie all good research, which involves experimentation, exploration, and investigation. These methods have been with us for a long time. In fact you could argue the methods have been with us for thousands of years. But those methods were refined into what we sometimes call the “scientific method” not all that long ago, relatively speaking.

It’s that scientific method that I find testers sometimes only have a passing familiarity with. Now, you might argue, “Who cares? Plenty of testers do a good job and don’t require your scientific method.” Perhaps. Let’s leave that aside for a moment and ask this: does having a good understanding of the scientific method help testers become even better specialists of their discipline?

I would argue yes. But let’s talk about this method.

The Scientific Method

“If anything deserves to be called the scientific method, it is the simple but profoundly fundamental process wherein new ideas are put to the test: everything from the most rarefied and grand theoretical constructs to the claims of the experimenter to have discovered some new fact about the natural world.”

That comes from A Beginner’s Guide to Scientific Method by Stephen Carey. Notice that “put to the test” part there?

The notion of the scientific method rests on the concept that that any idea that has “a way of working” — whether that be the workings of nature or the workings of a software application — has consequences and that these consequences provide a basis for testing the idea in question.

So a powerful question for a team of testers is simply this:

Do we all have a similar view of what a test is?

Here’s another way we can ask it:

How do we answer the question “What is a test?”

It can be very interesting to have that discussion with a team. And note: a poor test team won’t even want to have that discussion. They may even dismiss it as just semantics.

Framing Tests and Testing

One way to frame this is that a test is always an experiment that provides an observation that allows people to reason about something. And an experiment is … what? It’s perhaps interesting how many testers can’t provide a good answer to that question. It’s interesting because the answer is tied up in what a test is!

Testing, as the basis of the scientific method, is basically an attempt to understand how, why, and under what conditions things happen the way they do. The important thing here is that any such attempt at understanding tends to show us that a basic underlying methodology exists and emerges.

This is why, in my post Testing Is Like …, I said:

“The very notion of testing undergirds disciplines like physics, geology, historiography, paleobiology, chemistry, social science, archaeology, linguistics, and so on and so forth. Our discipline has an incredibly rich pedigree that can inform us. Let’s start acting like it.”

This is important because innovation in a discipline springs from theoretical understanding. And if this is the case, does it perhaps make sense why developers have done more to promote testing over the last decade than testers?

Wait! They have? Well, in my opinion, yes. They have.

While testers have been debating the merits of “testing” vs “checking”, developers have been working on concepts like what continuous means in the context of testing, DevOps (and the context of testing to deployment), test supporting tooling, and so on.

Developers have had more fruitful debates about testing. Consider those debates around “integrated” vs “integration.” Or consider the debates between the RSpec-Rails team regarding test terminology and the differences between the Rails core team. What might be seen to be nothing more than internecine and specialized conflicts have, in my opinion, done more to shine the light on what testing is and how it can add value than most conversations I see testers having.

Developers have (arguably and in my opinion) written much better books about testing, such as Rails 5 Test Prescriptions, Developer Testing: Building Quality into Software, Continuous Integration: Improving Software Quality and Reducing Risk and so on. When I say “better” here I mean that in the sense of career-enhancing for those who want to specialize in testing but also remain career-relevant at the point where the theoretical rubber meets the practical road.

While test practitioners have written books — and very good ones — the ones that have moved the needle and forced testing to innovate more broadly — if not always in the best of directions — have been those by developers.

Assuming you’ll give me the benefit of the doubt and at least accept my opinion, let’s ask: why is this?

I think it’s because developers do tend to approach testing, as a discipline, with more of a theoretical understanding than testers. And why is that? Because they tend to embrace the “science” part of “computing science” more than testers. (Yes, I know it’s “computer science” but most know it should have been called “computing science.”)

So let’s go back to the science bit: the idea of tests being a form or experimentation.

Tests and Experiments

Just about anyone who works in the sciences will tell you that the goal of a decisive test is to arrange circumstances under which we can be confident that nothing unforeseen or extraneous can invalidate our experiment’s outcome.

And what is that outcome? Well, consider this. An experiment rules out false confirmation and false rejection.

These are concepts testers routinely get wrong, often talking about how all of science is based not on confirmation but on falsification. That’s entirely wrong and I’ll come back to this momentarily. But for now consider the following:

False confirmation: Could the predicted outcome be due to anything other than the explanation we have? If “Yes”, the experiment cannot verify the claim at issue.

False rejection: Could the predicted outcome fail to occur even if the explanation we have is correct? If “Yes”, the experiment cannot falsify the claim at issue.

The point is that if an experiment is well designed, the answer to both questions will be “No.” A good test will be designed to rule out the possibility of a false confirmation or rejection of the explanation we have or are operating off of.

Here when I say “explanation at issue” this can be, for example, our explanation of what we think the application is doing that it should not. Or our explanation of what we think the application is not doing that it should. Or our explanation of how we understand how the feature we are talking about adds value. Or our explanation of how our code will likely work if we add certain aspects to it. And so on.

This gets into how you design your tests, at various levels of abstraction, both in terms of their conceptual design and their implementation. So now let’s have our test team ask another question of itself:

Do we all operate under a common set of heuristics for writing / performing tests?

Put another way:

How do we answer the question: “How do you recognize a good test versus a bad test?”

These bolded questions that I’ve provided throughout this post so far are some of the foundational aspects of testing.

The reason — or at least a major part of the reason — we have such buggy software overall in the industry is because people don’t use the above ties between test and experiment to discover problems via exploration, experimentation and investigation. Rather than focusing on discovering problems, test teams tend to focus on confirmations of behavior.

Science Via Tester and Developer

Now, here I’ll say something that is arguable and certainly open to being challenged. Developer testing can be more interesting than non-developer testing.

Consider: to explain something is to introduce a set of factors that account for how or why the thing in question has come to be the case. How do we know if our proposed explanation is correct? We look for a consequence of the explanation. We look for something that ought to occur if circumstances are properly arranged and if the explanation is on the right track.

Then we carry out an experiment designed to determine whether the predicted result actually will occur under those circumstances. If we get the results we have predicted, we have good reason to believe our explanation is correct. If we fail to get them, we have some initial reason to suspect that we may be wrong or that we have to modify our explanation.

That is an expansion of the scientific method I already discussed.

Think of how this dynamic plays out, though. Testers aren’t setting out to explain something initially. (Just bear with me here.) They’re just looking for problems. Ideally. As I said above, many are doing that only as a side-effect of confirming behavior. Once those testers find a problem, however, explanation sets in: why is the problem happening. Again, that’s assuming the tester doesn’t just find the bug and report it so developers can do the science part.

So notice there’s a short-circuit. Science is:

Carefully observe some aspect of nature.
Propose a possible explanation for those observational findings.
Test that explanation.

Testers, when they only treat testing as an execution activity, do this in the form of the application after it is created. Developers do this as the application is being created. Developers by definition have to treat testing as a design activity, even if they don’t refer to it that way. Testers should do this but they often don’t.

Science, and its use of the scientific method, is the process of observing some aspect of nature, isolating a facet that is not well understood and then proposing and testing possible explanations. In the context of developing software, this is exactly how software is developed.

Developers are simply exposed to more of it than most testers are.

Go through my Testability series, as just an example. Most testers, I find, tend to get lost fairly quickly. Most developers, I find, have no such trouble. This is because developers are used to bringing the full weight of scientific thinking — and thus a form of the scientific method — into their work; whereas testers tend to have a compromised and limiting view of this method.

This is why, going back to my earlier point, I believe developers as a whole have been pushing the needle forward on testing more than testers have. A danger to this is that we have developed a bit of a technocracy around testing, which is where we turn testing into a programming problem.

Testers have had a chance to put the brakes on that for decades. They rarely have. And they rarely have, I argue, because of a compromised view of truly understanding testing and how it is situated within the context of the scientific method.

Feel Better Now, Jeff? Got That Off Your Chest? We Done Here?

This is actually a good place to end this post, perhaps.

If you’re so inclined, however, let’s explore just a little more. Let me give one more example of why I see testers often getting confused about the science part of testing, even when they claim to speak for scientists or claim to know “how scientists work.”

The Fallacy of Categorical Falsification

Before I go on, a bit of context. I worked at Fermilab for a few years, I’ve been on archaeological digs, I’ve worked in the context of the reconstruction of ancient texts based on an understanding of history, and I have some published papers in science contexts.

If you’re saying “Good for you. Who cares?”, I agree! I mention all this for literally no other reason than to say that I at least feel I have some justification for speaking as bluntly as I do. I can say all that and still admit that I’m probably dumber than your average rock. But I have worked very closely with scientists of various disciplines over the course of time.

So, that out of the way, I’ll say this: a lot of pundits for testing like to talk as if they understand “how science is done.” They immediately harp on falsification.

An Example of Falsification

Take a trip back in time with me, if you will. Specifically to 1782. This is when the planet Uranus was discovered by William Herschel. When Newton’s mechanics — his science of motion, essentially — were used to predict what Uranus’ orbit should be, the calculation was found to disagree with the observed orbit. Well … crap. That’s not good, right?

Was this example of disagreement between theory and observation taken to falsify the basis of the calculations? Did this example of falsification compromise the entire structure of Newtonian mechanics?

Of course not.

Theories are built out of abstract concepts, such as, say, gravitating bodies treated as though their mass is entirely focused in their centers. Think about theories of quality we have as something perhaps more relevant to your day-to-day. Regarding Uranus, if you actually study and think about how Newton’s laws are actually applied to calculations involving planetary orbits, then will be come to the conclusion that no direct application is possible without a whole series of what are called auxiliary assumptions. Thus when faced with potentially falsifying data, the tendency of most scientists is not to throw out an entire theoretical structure.

So what do scientists actually do? They play around with the (often numerous) auxiliary assumptions.

This is what happened in the case of Uranus. The auxiliary assumption that was challenged was the idea that there were only seven planets. Other astronomers, however, proposed that this assumption be abandoned and suggested there was some Planet 8, as yet unobserved, that was perturbing the orbit of Uranus. This planet was discovered in 1846 and called Neptune.

Consider the following from book The Hunt for Vulcan by Thomas Levenson:

“The enterprise of making sense of the material world turns on a key question: what happens when something observed in nature doesn’t fit within the established framework of existing human knowledge? The standard answer is that scientific ideas are supposed to evolve to accommodate new facts. … Ideas, though, are hard to relinquish, none more so than those of Isaac Newton.”

The author continues:

“Contrary to the popular picture of science, a mere fact — Mercury’s misplaced motion — wasn’t nearly enough to undermine that sturdy edifice. … As Vulcan’s troublesome history reveals, no one gives up on a powerful, or a beautiful, or perhaps simply a familiar and useful conception of the world without utter compulsion — and a real alternative.”

One more quote from the book, talking about yet another bit of “falsification” regarding Jupiter and Saturn:

“The true test of Newton’s science — of any abstract claim — comes when there is a conflict between existing understanding and some fact that doesn’t fit. The failure to match the actual motions of Saturn [which was slowing down] and Jupiter [which had sped up] with what the theory seemed to say should happen posed the question: what does that conflict mean? Is that a problem, or an opportunity?”

How Does This Relate to Testing?

The question being asked with all of the above research was basically this: what to do when calculation and observation disagree?

Here’s the point: as testers, we ask ourselves that question all the time. At least in some form. What happens when what people told me should happen is not happening? What happens when people told me what should happen is happening but it seems wrong? What happens when it seems that in various meetings people are talking about the same feature but in fundamentally different ways? What if I, for the life of me, can’t figure out what value a user is getting from this even though I can confirm it’s working?

Falsification Is Not Robust

Sticking with this planetary example, this does not necessarily mean that a theory can never be falsified, but it does mean that falsifiability is not a robust criterion for a scientific method. For example, let’s consider the sequel to the above experiments to find a new planet. Based on the success around finding Neptune, the exact same auxiliary assumption was questioned when attempting to solve a problem of a very specific anomaly in the orbit of Mercury when it was closest to the Sun. Experiments followed the same experimental track and proposed another, as yet unobserved, planet between the Sun and Mercury. The planet got the name Vulcan.

And yet … no such planet could be found. We now know that the auxiliary assumption that needed to be challenged was an entirely different one. In this case it was Newton’s theory of universal gravitation that needed to be questioned. A bit of “quality” that, at that time, many scientists thought was unassailable.

When confronted by potentially falsifying data, either the theory itself or at least one of the auxiliary assumptions required to apply that theory must be modified. The only slight hitch is that the observation or experiment doesn’t always tell us which. It’s like hunting bugs when you have slight guides such as the theory (a given feature) and the mechanics of the system (how the feature works in a browser or mobile context). But then you have to rely on observations and experiments to guide you towards the discovery of problems and then work out explanations for those problems.

Consider the words of Sabine Hossenfelder in her excellent book Lost in Math:

“What I learn, however, is that Karl Popper’s idea that scientific theories must be falsifiable has long been an outdated philosophy. I am glad to hear this, as it’s a philosophy that nobody in science ever could have used, other than as a rhetorical device. It is rarely possible to actually falsify an idea, since ideas can always be modified or extended to match incoming evidence.”

Sabine then says something that has long been known to anyone to actually studies how science is done, recognizing that we don’t just verify or falsify. Rather, regarding our ideas, she says:

“We “implausify” them: a continuously adapted theory becomes increasingly difficult and arcane—not to say ugly — and eventually practitioners lose interest. How much it takes to implausify an idea, however, depends on one’s tolerance for repeatedly making a theory fit conflicting evidence.”

Implausify. There’s a term more testers should be aware of.

Jim Baggott says essentially the same thing in his cautionary book Farewell to Reality:

“A theory should be regarded as scientific if it can in principle be falsified by an appeal to the facts. But this won’t do either.”

He follows that up with:

“neither verifiability nor falsifiability provides a sufficiently robust criterion for defining ‘science’.”

Exactly. I trust my point has been made regarding how falsification is not robust.

So why do so many test pundits focus on this as the “way science is done”? It’s because they don’t truly understand how exploration and experimentation work in a scientific context. And if, as I started with, experimentation and exploration are tied directly — inseparably! — with testing, then it follows that they may not truly understand testing either.

And this is why, again in my opinion, developers have more often than not pulled ahead of testers in moving the needle forward on testing.

Again, this sounds like a great place to stop this post. But like that person that just won’t take a hint, I’ll keep plowing on a bit.

Focus on Testability

What all of these scientists — and many others I could quote — is that the important defining criterion is the testability of the theory. And here treat “theory” as “explanation”, such as how we explain what kind and what extent of quality we have in a given feature, relative to various expectations. Whether we seek to verify our theory or falsify it, to qualify as a scientific theory (read: testable explanation) it should in principle be … you guessed it … testable!

This leaves us with a testability principle that I think is core for testers to understand and what my previously mentioned Testability series was mostly about. What we have is a heuristic. A testability heuristic.

The testability heuristic demands that scientific theories (testable explanations) be actually or potentially capable of providing tests against empirical facts.

Going back to the Hunt For Vulcan, the author says:

“[Scientists] certainly knew, when the real world confounded theoretical explanation, that could simply mean that the theory was wrong. But there was another option. If something that can be measured doesn’t fit, [those scientists] reasoned, the obvious next step is to look for something else, some other fact, perhaps some new way to understand the math itself, that could haul the real world back into agreement with its mathematical representation. Put another way: something out of whack suggests that there is something else out there to discover, maybe in nature, perhaps within the abstract ideas built to interpret nature’s ways.”

Abstract that idea and see how that applies to what you do as a tester within your day-to-day context. I’ll challenge you a bit here and say that it you can’t extrapolate that to what you do (or should be doing), you probably should find another career speciality.

Before I talked about what particle physics can teach us about testing and how testing helps us understand physics. I’m sure those who read the articles — all two of you! — wondered if I had lost my mind. The answer: of course I did! But while doing so, I think I had a point.

I’ll let someone else make that point for me. Consider Lee Smolin in his book The Trouble with Physics:

“We physicists have a responsibility to the future of our craft. Science is based on an ethic, and that ethic requires good faith on the part of its practitioners. It also requires that each scientist be the judge of what he or she believes, so that every unproved idea is met with a healthy dose of skepticism and criticism until it is proved. This, in turn, requires that a diversity of approaches to unsolved problems be supported and welcomed into the community of science.”

Replace “physicists” with testers, “scientist” with “tester”, and “science” with “testing” and I believe you can see how the idea maintains its truth. Here’s another from the same author in the same book:

“Science requires a delicate balance between conformity and variety. Because it is so easy to fool ourselves, because the answers are unknown, experts, no matter how well trained or smart, will disagree about which approach is most likely to yield fruit. Therefore, if science is to move forward, the scientific community must support a variety of approaches to any one problem.”

Do you see how that applies to what we do as testers?

Actually, one more for you and then I’ll let you get out of here. This one is from Carlo Rovelli in his book Seven Brief Lessons on Physics:

“Scrutinizing and deducting from the details of reality in order to pursue something which we can’t see directly but can follow the traces of. In the awareness that we can always be wrong, and therefore ready at any moment to change direction if a new track appears; but knowing also that if we are good enough we will get it right and will find what we are seeking. This is the nature of science.”

The nature of science, indeed. And the nature of testing.

5 thoughts on “Do Testers Understand Testing?”

Vikki Heaton says:

9 April 2019 at 10:12 am

I get what you’re saying here. But does it start to break down when you consider that if I find a bug, it does falsify the idea that the quality is good?

Reply
1. Jeff Nyman says:
  
  9 April 2019 at 10:28 am
  
  Well, does that truly falsify the idea that quality is good? In some small sense, possibly. But, like science, we have to situate our experiments in a context.
  
  If you find a bug that crashes the system and you know that system crashes should never happen and be exposed to the user, then this falsifies the idea that we have zero bugs. (But no one likely believed that anyway.) If falsifies the idea that we have a certain level of quality under certain conditions. Do those conditions matter? Do they matter enough? That’s up for debate which is why falsification is not, by itself, robust enough as the basis of testing.
  
  You found a problem. Is it a problem that matters? Does it matter enough? Does it matter enough to those who will likely be making the decisions? Which, if you apply this thinking earlier, allows you to ask: am I searching for the right problems? And even earlier: do I have techniques that let me search for a wide variety of problems? And even earlier still: do I know what kinds of problems matter?
  
  Falsification in the context you describe does not mean the application won’t be released any more than the failure of Uranus to match predicted orbits meant Newtonian mechanics was useless.
  
  But let’s say it’s not a crash. Let’s say you’ve found other bugs. How much do they degrade the experience such that the value of releasing now and iterating is outweighed by the nature of the bug? It’s always an open question.
  
  Let’s take it the other way. Let’s say you found no bugs at all. Or at least anything you considered a bug. Does that mean there aren’t any at all? Or just none that matter? Or just none that exposed themselves to your level of investigation?
  
  This is what I (and those I quoted) mean by falsification not being robust.
Johan Hoberg says:

15 April 2019 at 4:40 am

Very interesting read. Thanks.

Reply
Nilan-jan says:

23 April 2019 at 7:01 am

I’d be interested to see any references which demonstrate developer understanding of testing (e.g., the references to the Rails Rspec debates or the integrated/integration) or other.
Testing in the developer world is quite superficial. e.g., ‘…you can ensure your code adheres to the desired functionality…’ https://guides.rubyonrails.org/testing.html

Reply
1. Jeff Nyman says:
  
  29 April 2019 at 11:38 am
  
  In order to answer this it would be better to know what you’ve already searched out, at least to some extent, or what you’ve read.
  
  As just one example, just about all microservices books I’ve read lately have surprisingly good sections about testing. As another example, I mentioned three books in this post itself that I found very effective at thinking about testing. Those sources alone led me to a lot of other developer-focused things that reinforced the opinion I state in this post.
  
  You say “testing in the developer world is quite superficial.” I won’t doubt your experience, of course, since I don’t know what it is. Mine is very different, however. And I would argue that many could make the same claim but slightly differently: “testing in the tester world is quite superficial.”
  
  Any tester who feels confident stating unequivocally that developer testing is superficial presumably would have read thoughts from folks like J. B. Rainsberger or the thoughts of Alexander Tarlinder. There are many others but, again, I could go a long time listing books, blogs, articles and so on. But none of that may matter to you if you’ve come across them and those are what led to your viewpoint.
  
  So what’s more interesting is that you and I have clearly different experiences with developer testing. It would be interesting to see how those formed and upon what basis we came to our conclusions.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …