Forgetting How To Test

My original title for this post was “Thinking Clearly About Automation” but I realized there was a wider ambit to that discussion. We have a technocracy that likes to turn testing into a programming problem, suggesting that “manual testing” (testing done by humans) should be automated away as much as possible. That’s a danger. Some testers have combated this by suggesting automation has nothing to do with testing. I believe that’s also a danger.

Testers often argue about testing, which is fine. But I feel they often do so in ways that are ineffective, which is less fine. They engage in frivolous nomenclature battles — different from valuable semantic debates — to the detriment of our discipline as a whole, often stigmatizing something as “not testing” rather than recognizing some things as “one facet of testing.” Automation is one of the areas that I still think testers are very muddy thinkers about.

Why am I focusing on automation? Because it’s sometimes posited as the reason why we have an industry that is forgetting how to test. Too much of a focus on automation is often blamed for testing being compromised. There’s some truth to that but I feel the arguments used to justify the claim are often faulty. So let’s see if all of us — myself, in particular — are thinking clearly about all this.

First, Some Examples…

Here’s a sentiment I recently saw expressed:

“Automation testing is such a mind blowing silly thing. A tester has to find bugs in his code to make sure he [finds] the bugs in developers code.”

I would respectfully argue — and, in fact, did so argue — that the first statement doesn’t follow from the second. To put it in another context, many testers — and those testing without the use of tooling — miss bugs. This is because they have “bugs” in their thinking; cognitive biases, as just one example; malformed theories of error as another.

Would this mean that “testing is such a mind blowing silly thing”? Presumably not. What it means is we have to learn to apply a skill to testing.

The same applies to automation. Automation, in this context, is another skill in the form of development done specifically with the intent to support testing. Yes, of course we have to find bugs in our frameworks. But why is that presumed to be a bad thing, as the above statement implies?

Doing this also allows us to better understand why bugs creep into code in the first place. We learn to think more like developers in the sense that we can understand how and under what conditions developers are likely to create bugs in the code they write by reflecting on how we created them in ours.

In the technology industry we turn ideas into software via the expression of code. That expression can obviously be flawed and that is part of what testers look for. What better practice is there than in those things we ourselves write?

Another sentiment I saw expressed:

“Test automation is not for testers, it is a development activity.”

Notice again how the first doesn’t follow from the second. Yes, test automation is a development activity. It does not follow that this is not for testers. The presumption seems to be that testers cannot be programmers. I would agree that they cannot be programmers and testers at the exact same time. But they can certainly interleave those activities, shifting between the polarities.

This sentiment also seems to conflate programming with development, which is something else testers routinely seem to do. (Admittedly, the industry as a whole hasn’t exactly helped to frame this distinction.)

There’s a commonality here. Both are limited — and limiting — forms of thinking. Both state a conclusion, usually at the start, that is then followed by a sentiment that does not logically follow from the first.

Reframing the Sentiments

Testers need to combat sentiments like the above. The rest of this post will be dedicated to showing how I believe testers should be combating sentiments like the above by reframing them and, ideally, by not falling into the trap of making them in the first place.

I do want to stick with an automation focus here so let’s start with this …

Yes, Test Automation is For Testers

Code-based developers who have become specialists in their discipline are very good at considering testability in relation to design. Code-based automation contains the same relations between testability and design, which means there is a career path here for testers as well. This is, in fact, a career path that will help them collaborate with developers better.

This is why automation is very much in the testing wheelhouse and why, in fact, testers are (or should be) developers. They are just developers that have a test specialty rather than a programmatic specialty.

So automation can, and should, be seen as one intersection between testing and development.

Yes, Automation is (a form of) Testing

There is a growing segment of testers that argue that testing is purely an activity that requires cognition and thus automation can’t be considered testing.

Likely we’ve all seen the debates about “testing vs. checking.” For those who have heard or read me before, you know I think this is a flawed argument, although one where I do revisit my thinking from time to time. (I even full-on went with the distinction for an exploration example.)

To me, reframing automation as not being a form of testing is about as silly as the people who argue that machine learning isn’t at all about learning. It is. It’s just not about human learning. Learning, as a concept and a term, can be applied broadly. Or how we can consider human intelligence and artificial intelligence. Both can be a form of intelligence; we don’t need to repurpose a term; we just need to qualify the existing term. The same applies to testing. Automation is a form of testing. It’s just not human testing.

We Can Automate Some Testing…

You can automate testing in that, as one example, we know that evolution has performed testing for eons. And that requires no cognition. (So far as we know, anyway.) Yes, it’s true that testing of this sort can’t be done by the tools we, as humans, have built to automate actions against browsers and mobile devices given that we don’t have evolution’s breadth and depth of scope, not to mention time.

What is very much a fact is that you can automate testing as an execution activity. Yes, it’s a very limited form of testing. Yes, it’s purely algorithmic. That doesn’t mean it’s not “testing.” It doesn’t mean we have to use some other word (“checking”) to describe it. But it does mean we can automate one aspect of testing, which is execution.

We Can’t Automate All Testing…

What we can’t automate is testing as a design activity.

You could argue evolution figured that part out but it had billions of years and trillions upon trillions of components to operate on. And it still makes lots of mistakes and has lots of inefficiencies. As Robert Ford tells us in Westworld:

“Evolution forged the entirety of sentient life on this planet using only one tool … the mistake.”

But let’s consider that this idea also resonates with developers. Consider this:

“Modern IDEs are getting ‘helpful’ enough that at times I feel like an IDE operator rather than a programmer. The behavior all these tools encourage is not ‘think deeply about your code and write it carefully,’ but ‘just write a crappy first draft of your code, and then the tools will tell you not just what’s wrong with it, but also how to make it better.’”

So wrote Vivek Haldar, a software developer with Google. (See “Sharp Tools, Dull Minds”.) He feels like an operator rather than a programmer. And, likewise, testers can feel more like a programmer than a tester if they don’t think deeply about their automation code and what purpose it’s serving in terms of supporting testing.

What we also can’t automate is testing as a framing activity, such as how testing helps us provide a narrative and generate historical context. This is the testing that helps developers decide on a particular way of crafting an architecture or how testers help business frame requirements to remove ambiguities, inconsistencies and contradictions.

Automation Bias Has Been a Problem

The above notion of “we can automate some but not all” is important because it forces us to decide what that means. This is something that has happened in various disciplines throughout recent history and we see what happens when automation is not thought about clearly and is thus over-relied upon.

Consider this data point: a team of researchers at City University London conducted a review of mammography data. Mammography being the process of using low-energy radiation to examine the human breast for diagnosis and screening of various anomalies, such as cancers. This team found clear evidence that automation bias had a much greater effect on radiologists and other image readers. Specifically, the researchers found that while computer-aided detection did, in fact, tend to improve the reliability of “less discriminating readers” in assessing “comparatively easy cases,” it actually degraded the performance of expert readers in evaluating tricky cases.

That last sentence is critical. What it meant is that when relying on the software (the automation), the experts were more likely to overlook certain cancers.

That research was originally published back in 2013 in the article “How to Discriminate between Computer-Aided and Computer-Hindered Decisions: A Case Study in Mammography” in Medical Decision Making. Consider how Geoffrey Hinton, one of AI’s prominent advocates and researchers, said in 2016 that “it is quite obvious we should stop training radiologists” after the idea that IBM’s Watson Health would provide automation around health care.

This notion of automation compromising framing has existed for quite some time. I could point to dozens of studies in different areas of discipline, all involving this impact of automation on human thinking.

What I rarely see is specialist testers doing likewise.

Reactionary Automation Bias Has Also Been a Problem

The reason I harp on all this is because testers fight a losing battle in the industry when they keep talking about how we can’t automate testing.

We can and we do.

When testers argue otherwise, they don’t seem to realize that they are, in the perception of many non-testers, discounting reality. And reality denial is rarely a winning argument. This is (in part) why the technocracy has risen and why developers have contributed more to test thought than testers over the course of at least a decade, and probably more.

And this is a problem because while it’s important that testing is a core part of development, we do still need test specialists; those who specialize in testing as a craft and a discipline. (I prefer “specialist tester” to “dedicated tester.” Someone can be dedicated to testing and not be a specialist.)

Where the “Checkers” Get It Right

Here’s where I do agree with the “checkers” in terms of the focus of their angst. Automation is becoming a bit of an issue as a mono-focused element of how people see “testing.” Yes, automation can undoubtedly be effective as a multiplier of effort. The question is whether it’s being used to multiply the right effort.

This gets into a long-standing debate about “automation” in general which in turn goes back to an even longer standing debate about “mechanization.” On the latter word, I think Sigfried Giedion, in his 1948 Mechanization Takes Command, defined this quite well:

“Mechanization is an agent — like water, fire, light. It is blind and without direction of its own. Like the powers of nature, mechanization depends on man’s capacity to make use of it and to protect himself from its inherent perils. Because mechanization sprang entirely from the mind of man, it is the more dangerous to him.”

James Bright, in his 1958 Automation and Management nicely distilled the former term:

“Automation simply means something significantly more automatic than previously existed in that plant, industry, or location.”

The challenge in the software industry in particular is that software provides a mediating influence between us and whatever we are interacting with. That’s an important point for testers to be arguing.

Yes, you could argue that a bulldozer provides a mediating influence between the ground and the human but more so than a shovel does. And the shovel provides a mediating influence between just using our hands to dig. In each case, we are slowly drawn more away from the mechanics of what we are doing. But we don’t see that as harmful in the digging context. But with software? Well, let’s talk about that buffer that exists when automation is brought in.

The Automation Buffer

Let’s consider one of the things that people like to talk about, especially when there’s a rare occurrence of an accident.

With so-called “driverless cars” or autonomous vehicles, the human has shifted from being driver to being passenger. (Remember the above comments about being an operator rather than a programmer? A programmer rather than a tester?)

In the case of these cars, we say that the responsibility has shifted from human to technology or from people to software. And what that really means, to many people, is from a reasoning agent (human) to an unreasoning algorithm. In such a world, the making of judgments is presumed to become nothing more than data-processing. And, to be sure, that’s the fear of many who argue against artificial intelligence.

This, of course, presumes that human judgment is somehow more than just data-processing. I won’t argue that point here but notice how we did tease out the underlying assumption.

The problem isn’t the automation itself but it’s that mediating influence I just mentioned above; that buffer that automation creates between the human activity against the thing it is working with.

We reduce the intricacies and contingencies of interactions with a complex thing (software, car, etc) to a set of instructions. Or, rather, to a series of source code that provides the operation of an algorithm. This is an algorithm that doesn’t think or feel or even care. It computes.

Consider that tacit knowledge, which is also sometimes called procedural knowledge, refers to all the things we do without really thinking about them. It’s very hard to describe exactly what you do in many situations without resorting to generalizations and abstractions. The exact processing that occurs mentally is outside of our awareness.

But, again, notice that this doesn’t mean there isn’t some exact processing (i.e., an algorithm). It’s just that the means by which it operates is not something consciously available to us.

Explicit knowledge, which is also sometimes referred to as declarative knowledge, are the things you do that you can actually write down in a non-ambiguous way. As Nicholas Carr says in The Glass Cage:

“The boundary between the explicit and the tacit has always been a rough one — a lot of our talents straddle the line — but it seemed to offer a good way to define the limits of automation and, in turn, to mark out the exclusive precincts of the human.”

For you physics people, this is sort of like the boundary between the quantum and non-quantum. There’s some line — one that we haven’t actually found or defined very well — where the micro becomes the macro; where the effects of the quantum start or stop being as relevant, depending on which direction you are considering.

Interestingly, I find that has a lot of parallels in terms of how automation is situated within the boundaries of tacit and explicit knowledge and action.

This is perhaps another way that testing can help us understand physics but also how physics helps us understand testing. A delicious symmetry.

So automation provides a buffer of some sort between the explicit and the tacit. That’s an important point to understand because going with what I said earlier, we can (usually) automate the explicit (“as an execution activity”) but not the tacit (“as a design and/or framing activity”).

Replacing Testing with Automation?

In the context of artificial intelligence, people are often interested in asking and answering this question: can computers replicate our ends without replicating our means?

In the testing discipline, we are basically asking something similar: can algorithms (i.e., automation) replicate our ends without replicating our means?

In other words, if I employ automation, can I get the end result of what a tester would provide even if the automation can’t simulate all the means by which the tester would provide it? That is a much better discussion point than harping on about how automation isn’t testing. And I say that because, as I’ve shown above, people are dealing with this challenge in technology in a variety of contexts, from the “help” of IDEs to the concern about cars that drive themselves.

In fact, let’s go back to the cars. When a driverless car makes a left or right turn in traffic, it’s not tapping into some storehouse of intuition and skill regarding what it’s doing. What it is doing is following a program. But … and this is a key point before testers get too smug! … while the strategies are different, the outcomes, for practical purposes, are often the same.

Going back to The Glass Cage, Carr correctly asserts:

“In some cases, the unique strengths of computers allow them to perform what we consider to be tacit skills better than we can perform them ourselves.”

I showed that a bit when I first started talking about testing and AI, in which case AI was discovering bugs that humans found but, crucially, discovering some that humans had not.

Okay, but … does that same concept that Carr is talking about hold true in testing?

Reframe Nomenclature Issues into Semantic Issues

This is the problem. Many test pundits out there seem to be arguing against the automation of testing in the same way others argue against the automation of driving. I keep saying it’s a faulty argument but maybe what I should say is that it’s a faulty framing device.

To reinforce that point, jump back to 1997 when IBM’s Deep Blue chess-playing supercomputer, which could evaluate a billion possible moves every five seconds, beat the world champion Garry Kasparov. By comparison, an autonomous vehicle can usually process a million environmental readings a second. My somewhat oblique point here is that the same people who argued against chess playing computers are now making the exact same arguments against autonomous cars. And those are often the same people making faulty arguments that automation can’t be a form of testing. And no one is listening. And while no one listens, automation becomes that much more entrenched, testers seem that much less relevant, and a technocracy continues to flourish.

Yeah, I bolded that because I can’t stress the point enough.

For anyone who has actually built, worked with, and tested artificial intelligence and learning systems, you know that it is factually accurate to say that when computers get quick enough and can process enough data quickly enough, they do begin to mimic our ability to spot certain patterns and to make judgments based on those patterns. That is, for all practical purposes, learning from experience.

So if you want to argue that “machines can’t learn” and rail against “machine learning”, realize you might be better off just saying that “Learning is not unique to humans. Machines can learn, but not in the way that humans can. It’s a different type of learning which means we have to expand what we mean by the term ‘learning.'”

And that’s exactly the same argument I use with testers when they talk about “checking” and “testing.” The testers who argue that “automation can’t test” sound just like the people who say “machines can’t learn.” And no one is really listening to either group. So, again, let’s change the message; let’s reframe.

Nomenclature is the words we’re using; semantics is about what those words mean. “Learning” and “testing” are broad terms (suitcase words, I would argue). When people say they don’t want to argue semantics, they’re more often referring to a lack of desire to deal with too many nomenclature distinctions.

Automation is Pervasive

I don’t want to lose sight of that buffer I was talking about. Most of us know computers and automation are ubiquitous. We’re using algorithms to diagnose diseases, construct building and bridges, tutor students and grade homework, teach us languages, evaluate evidence, decide on how traffic and trains are routed, drive cars, fly and land planes. And, of course to test software.

With all of those, we can ask what buffer is being put in place between the human and what the human is doing. And we can ask if that buffer is harmful. Let’s just take one example. At the start of 2013, the Federal Aviation Administration released a notice called a SAFO (“Safety Alert for Operators”). This was sent to all United States airlines and other commercial air carriers. The document said, in part:

“This SAFO encourages operators to promote manual flight operations when appropriate.”

Key thing there: manual flight operations. But why was this being called out?

In turns out, the FAA had collected evidence — from crash investigations, incident reports, and cockpit studies — that suggested something quite alarming. The evidence was overwhelming in showing that pilots had become too dependent on autopilot mechanisms and other computerized systems in their planes. In other words: automation.

According to the SAFO, this automation had led to and would continue to lead to “degradation of the pilot’s ability to quickly recover the aircraft from an undesired state.”

But notice that no one was denying that these automated systems were flying the plane (execution) any more than those mammography algorithms were automating the discovery of cancers (execution) or that algorithms were in fact driving cars (execution) or that machines were in fact learning (execution).

In order to replace a human (even if only temporarily), an automated system — like that of the plane or a car or a diagnostic tool — first has to replicate a human, or at least some aspect of a human’s ability. That’s what is happening in the above cases.

This happens with the testing as execution activity as well.

And it helps to keep in mind the following:

“Automation does not simply supplant human activity but rather changes it, often in ways unintended and unanticipated by the designers.”

That’s from Raja Parasuraman in a 2000 paper called “Model for Types and Levels of Human Interaction with Automation”.

Nicholas Carr said something similar:

“Rather than opening new frontiers of thought and action to its human collaborators, software narrows our focus. We trade subtle, specialized talents for more routine, less distinctive ones.”

So notice how we can say both things: yes, we can automate testing. But only a part of what testing is.

And like other areas where automation is applied, sole reliance on this automation may lead to a “degradation of our ability to quickly recover from an undesired state.” It may “change human activity” and we need to be aware of what those changes are and whether, and to what extent, that change matters to us.

For example, does that change hinder our ability to determine some aspects of quality? Does it “narrow our focus” in terms of the problems we seek out?

All of this is the key to thinking clearly about automation because it’s making it clear that automation does provide a buffer between what we want to do and the mechanisms by which we try to do it, particularly when those mechanisms come between us and the thing we are working with. That buffer may also impact the decisions we make regarding the outcomes.

But Automation Is Good, Right?

That point about outcomes is a strong one, I think. Consider that automation’s focus in testing is often on enhancing speed and efficiency. As such, it’s a focus that’s determined more by a profit motive rather than by any particular concern for “how well we are seeking out problems.”

Automation is often touted as a mantra of “this frees up testers to do more.” But that only works if testers will actually “do more.” Yet, testers often have very compromised views of testing, which is what I started off this post with. Many testers, especially those building their careers in the technocracy, do only see testing as an execution activity.

And that’s a problem!

Because those testers are more likely to blindly equate automation will all testing rather than just some testing. These are the testers that aren’t going to be swayed just because you decided to call what they do “checking.” And if testers aren’t going to be swayed, I guarantee you non-testers are equally unlikely to be swayed.

There are way too many testers who seem to only talk about automation. Connect up with a lot of testers on, say, LinkedIn, and the vast, vast majority of posts are going to be about automation and only that.

These are the testers that can tell me twenty different ways to work with Selenium effectively but who, at the same time, confuse “random testing” with “ad hoc testing”, essentially assuming the work “ad hoc” means “random.”

These are the testers that can tell me all about locator strategies and SelectorBuilder patterns, but who can’t describe what a domain-to-range ratio is or why it matters for testing or who often don’t understand the concept of semantic fault distance and why that has relevance.

Does Automation Help Us Forget?

Going back to the airline example, let’s take it back even further than 2013. Back in 2011, Rory Kay, who was a long-time United Airlines captain who also served as the top safety official with the Air Line Pilots Association, stated his fear that the aviation industry as a whole was suffering from “automation addiction.” He said something quite interesting:

“We’re forgetting how to fly.”

This was even written about back in 2009 when Matthew Ebbatons talked about “skill fade” in mental and motor abilities when too much automation was relied upon.

So that’s what we, as specialist testers, have to ask about our industry. Are we seeing skill fade in terms of testing? And is this due to an increased focus on automation? And, even if that’s the case, is that tide stemmed in any way by simply saying automation doesn’t do testing but only does checking?

In the end thinking clearly, and in a nuanced way, about automation means asking what Rory did but in another context:

Are we forgetting how to test?

The answer: yes, quite possibly we are. And while we do so, we’re learning very unproductive ways to talk about testing. And thus to promote testing. And this is happening in a time when there are more pressures than ever to remove humans from testing. And that’s happening at a time when we are ever more dependent on various forms of technology.

Thus we are at a time where forgetting how to test is not just a technical dilemma, but an ethical and moral dilemma.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …