Will AI Perform Testing?

The title of this article is actually a little too simplistic. It’s more about asking: “Will AI Truly Perform Testing?” Or perhaps; “Will AI Perform Actual Testing?”

The context here is that recently Jason Arbon, the CEO of test.ai, made an interesting post on LinkedIn for a STAREAST talk on “The AI Testing Singularity.” He posted the following visual:

Jason was offering this as, in his words, a “rough timeline for when ‘AI’ does most software testing.” To be fair and accurate, this was presented as something he was still thinking about. That said, I appreciated the fact that Jason was willing to share his initial musings on this in a public venue because I think it’s at these phases when we can all help each other see how we’re thinking about this stuff.

A Spectrum of Testing

My response to Jason was that I would more want us to look at this idea with the same spectrum that we look at testing done by humans. As one example of that spectrum:

practice-driven --> principle-driven --> intuition-driven --> theory-driven

Everything that people talk about regarding artificial intelligence and machine learning is, so far, firmly at the left end of that spectrum, where (arguably) the least interesting problems for testing currently are and where we already have solutions that do not require the “training” that an AI would require nor elaborate models and algorithms. This is the area where testing is treated primarily as an execution activity.

But as you go up that scale, you get more into the notion of quality, experience, and value. And that’s where testing is treated as a design activity. That’s where testing helps us put pressure on our design thinking. This is where you think about notions like testability, which is driven by observability and controllability. This is where you deal with different abstraction levels of humans building complex things. That’s where the truly interesting challenges are.

So if we’re talking about AI doing “most software testing” we need to make sure people are not using a very limited (and limiting) view of the word testing.

Testing Abdicated to Tooling

To be very clear here, I do sincerely appreciate the enthusiasm for looking at testing via a lens of artificial intelligence and machine learning. I think that’s important and I don’t want to dampen or dismiss investigations in these directions.

What I’m less enthused about is people forgetting the lessons of the past. We have already seen what happens when humans abdicate the full range of a human activity, like testing, to machines and tooling. Some would argue it ceases to be “testing” any more and becomes “checking.” Others would simply argue that it becomes a much more compromised form of testing, one that helps us see what works rather than helping us discover the numerous ways that things may not work.

And let’s think about where this tooling really doesn’t help us.

There are two times we make the most mistakes when building complex things like software applications. It’s when we are talking about what to build and then when we are building it. Everything after that is essentially playing catch-up. That’s why shortening the feedback loop along cost-of-mistake curves is so important for testing.

Much of the thinking I’ve seen around artificial intelligence and machine learning doing “most testing” is well outside of that curve. That’s not to say that techniques around AI and ML won’t support testing, but it is calling into question how much of testing we think these techniques can do and how much of testing should be entrusted to them, regardless of how much they can do.

So, by all means, be aware by the possibilities. But also beware of the hyperbole.

A Test Algorithm? A Quality Model?

Let’s approach this from another angle. When someone tells you that AI or ML is solving some problem for you, they usually mean they have encoded some aspect of the problem solving process into a model or an algorithm. So when those same people tell you that AI or ML is solving some testing problem for you, the first thing to ask is: What probably exactly do you mean? But the next thing to ask is: How exactly have you encoded testing?

I say this because lately there’s been a lot of conflation around the idea of what’s a model and what’s an algorithm. And sometimes where and to what extent data fits into either of those gets to be murky.

Murky as all that might be, certainly there has to be an objective, right? If someone is building us some AI/ML tooling, we certainly going to provide them with an objective by which we will judge how well the tooling is doing.

So consider this objective: test this application.

If I feed that to my AI, with ML operating behind the scenes, what does this mean? To answer this, someone would have to ask: What would it mean to a human?

You will notice that many of the loudest proponents of AI and ML test tooling rarely engage with that question very well, assuming they do so at all. When they do, they often frame it around recognizing elements to interact with in an application.

Be that as it may, let’s say we have an objective. That objective is then translated into a mathematical problem amenable to computing. This computational problem, in turn, is solved using one or more machine learning algorithms: specific mathematical procedures that perform numerical tasks in more-or-less efficient ways. Keep in mind that no data is involved at this point. The algorithms, by themselves, don’t contain data.

The machine learning algorithms are then “trained” on a data sample, usually selected via human discretion from a wider pool of data points. In simple terms, this means that the sample or “training” data is fed into the algorithms to determine patterns. Whether these patterns are useful or not (or, often, whether they have predictive value) is verified using “test” data. This is a data set different from the training sample, though usually selected from the same data pool.

So what just happened here?

We got ourselves a machine learning model. We got an algorithm, along with the training and testing data sets, that is alleged to meet the objective. The model is then turned loose to help fulfill the objective.

So what do I do with it?

Well, as one example, I can have my AI Tester build up a model of my application by looking at widgets and being able to recognize them. But this is a solved problem now for automated tooling. I can have a widget recognized by, say, the attributes that define it. Okay, but what if the attributes change? Then I update my definition of that object. For AI, this means I might have to train again.

But wait: all of what I just described was not the objective I started with. What I just described is a different objective. This different objective was: recognize the widgets in this application.

Which is a far cry from an objective of: understand how the objects in this application work when interacted with by a human.

Which is an even further cry from the objective of: understand how the human is deriving value from what the application does when the objects in the application are interacted with.

The Objective Problem

This is what I call the “objective problem” with AI and ML is it relates to solving problems, such as determining quality or finding problems that degrade quality.

To make this point explicit, consider again our original objective: test this application.

Now frame the objective this way: determine that there are no critical value threats in this application.

Or: determine that there are no significant quality degradations in this application.

How does the model and algorithm I just talked about evolve to handle those objectives? If you have no idea at all, fear not. No one else really does either.

The Problem As I See It

At this point, you might say, “Jeff! You ludicrous Luddite! You bastion of the backwards-thinker! You’re simply not willing to engage with the future.”

Ah, dear reader, but I feel that I have. I have written quite a bit here on examples of thinking about machine learning and artificial intelligence in a testing context. I talked about the tester role in the context of machine learning. I’ve also talked about testing learning systems, asking whether an AI can become a tester, thinking about how to frame automation-based AI and making sure that testers can actually test an AI before they rely on one to do their testing for them.

In those posts, I’ve actually done grunt work. I’ve written algorithms and constructed models. I’ve created tooling that acts as a type of test mechanism against those algorithms and models. As such, I am very well aware of how easy it is to get all of that very wrong.

Yet, as you can hopefully tell, I have a lot of fun engaging with this topic. My concern, however, is that we are witnessing just another form of the technocracy here.

And yet … way too many testers are simply accepting it.

We are seeing another example of the danger of the technocrat tester.

And yet … way too many testers don’t seem all that concerned about it.

As testers part of our job is to sound the warnings; to dispel the illusions that people have about our technology; to allow people to make better decisions by being better able to reason to about things.

And yet … way too many testers aren’t applying these very lessons when they are likely going to be needed the most in the era of so-called “intelligent machines.”

We are on the precipice of a society being led by algorithms that are opaque; by “learners” and “models” that don’t understand ethics and wouldn’t care even if they did. Testing is one of our hopes of keeping us — and our technology! — honest. But if we abdicate testing to these algorithms, learners and models, then how are we going to do this?

Do we really want a world run on technology wherein we can say “humans not required” as the above image seems to indicate?

I hope not.

So I’m encouraging testers to be engaged with AI and ML, study it, play around with it. Actually know what you are talking about rather than just being for or against the technology.

Be excited about what this technology can be but also be realistic about what it can’t do. Or, perhaps, what it shouldn’t do even if it can.

4 thoughts on “Will AI Perform Testing?”

Butch Mayhew says:

18 April 2019 at 1:57 pm

Thanks so much for your thoughts around this topic, and for all the work you’ve put into formulating these thoughts. I look forward to how this conversation plays out across the testing community.

priti says:

23 April 2019 at 2:32 am

Hello,
I am just new in Testing but your article is really helpful for me, Thank you so much for this…

Tatiana says:

23 April 2019 at 6:34 am

Love it! I particularly like how you’re welcoming to Jason’s ideas but also being politely critical and skeptical. I think that idea of framing testing as execution and design is interesting because it does draw a distinction between how AI and machine learning in general is conceived. So if the idea testing is encoded as a model, as you say, what does that actually mean? Can we encode execution? I’m guessing probably to some extent. But can we encode? Design. I’m not sure about that.

William Hruska says:

6 October 2020 at 1:48 am

Nice. I just loved it.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …