AI Test Challenge Follow Up

So, not surprisingly, the AI test tooling community didn’t want to engage on my AI test challenge. They saw it as being inherently unfair. And, to a certain extent, it could be. But what this really showcased was people are talking about AI and test tooling with way too much hyperbole in an attempt to gain traction. So was my test challenge unfair? Is there too much hyperbole? Let’s dig in a bit.

Was the Challenge Unfair?

Regarding the challenge, it could be used in different ways. The way that I would be curious to see, at minimum, is whether an AI-based test tool has any effectiveness or efficiencies over traditional automation tools.

Consider this: I can write automation right now that will execute my Stardate Calculator and my Overlord. In fact, I have.

Here are the feature files for stardates and overlord. Here are the files that contain the execution logic for stardate steps and overlord steps. Here are the models for the execution, in the form of stardate page objects and overlord page objects.

Would I be able to have less of all that if I was using an AI-based test solution? Would my code be cleaner? More expressive? More understandable? Would code with an AI-based test tool run faster? Or would it respond better when and if the applications changed?

Key point here to the pundits: Don’t just tell me about it! Show me! I’m willing to help AI pundits experiment and do so in a way that we can all collaborate on. That’s how science works. And we are dealing with a science here. So let’s start acting like it.

So, quite literally, the AI Test Challenge can be just focused on the questions I asked above: show that the automation I’m doing above is made more effective or efficient by an AI.

No one took me up on even that minimal challenge.

Yet, according to some pundits, we’re in the “post-modern testing” era and, according to others, we’re “trashing determinism” and, according to yet others, these AI tools are on their way to “replacing human testers.”

But if you can’t even meet me halfway on the above simple part of the challenge, what hope do you have for the much more complicated part of determining if the Stardate Calculator or the Overlord is adding value? This is what my challenge was attempting to highlight by focusing on finding the value (testing) and seeing how people correlated that with the automated execution.

By the way, you might notice that my Veilus application states it is “A Test App for Tapestry” where Tapestry is one of my own (non-AI) test supporting tools that I wrote and made available, along with Veilus, so that people could determine if my experiment was valuable. I expect no less from those promoting AI tooling.

AI Technocracy

We’re seeing the rise of another technocracy here, folks. And just as most AI pundits don’t learn from the previous two AI Winters, it seems many of those engaging with testing haven’t learned from previous attempts at establishing a technocracy.

Some readers may remember that I talked about the dangers of a technocrat tester. More recently I talked about the test interview technocracy. And in that context I said this:

Imagine you hired someone. And all that person could do is exactly what you told them. They could do it over and over. But they would never vary from what you told them. And even if they do vary, they would only vary in exactly the way you told them to vary. And they wouldn’t be observing anything except what you specifically told them to observe.
That’s automation, basically.
It simply does exactly what it was told to do and it can look for a range of things that it was exactly told to look for. Put another way: automation doesn’t think. It doesn’t reason. It doesn’t interpret. It doesn’t make value judgments. It won’t be curious about something it sees (or doesn’t see).
It really is as simple as that.
Automation can provide confirmation (this is working) or it can provide falsification (this is not working). It can only do this in the context of being purely formulaic; by doing nothing more than following a script. It may stumble upon a bug. It may even be a new bug. But it will not have reasoned its way to that bug. It will have simply encountered it. And once the bug is encountered, automation will not explore around it or attempt to understand it.

Recently I was talking to someone and I referred to this as the Emmet Effect. I was referring to Emmet Brickowski, protagonist of the two Lego Movie films. In the first film, as the team is figuring out how to save the day, Emmet says:

“Just tell me exactly what to do and how to do it.”

Imagine if your human testers said that to you when you asked them to test. A bit worrisome, no?

Of course, we expect that from our automation because automation doesn’t think or reason or learn. But pundits for AI test tooling are claiming it does at least one (possibly two) of those things. But does that mean it’s “performing testing”? Is it doing so any more than our current non-AI automation?

If so, we need the AI pundits to clearly articulate how and to what extent.
If not, then we need the AI pundits to tell me why this tooling will be better than the tooling we already have.

Usually the argument comes down to “Well, there isn’t much value right this second. But imagine as testers are submitting all their training data for all these applications for a decade or more.”

Okay. But that presumes that all of this data will become democratized and available. (So far AI test tool companies hoard that data.) It also assumes that the training data will be as relevant as applications continue to evolve in terms of the user interfaces.

I bring this up because we saw the rise of one technocracy that tried to turn testing into a programming problem. We’re seeing another one right now. It’s basically just turning testing into an algorithmic problem. In fact, these are really two sides of the same coin. The commonality is the idea that of abdicating responsibility for certain decisions to technology.

Right now we might have this: “Should we release? Well, what does the automation say? Does it show all green?” Later we might have: “Should we release? Well, what’s the AI telling us to do?“

What Is Post-Modern Testing?

Lately you do have people talking about so-called “post-modern testing.” But what does that actually mean? But does these same people have a grasp of modern testing? It took me a long series of posts to write about that concept. “Modern testing” is by no means settled. So what does post-modern testing actually mean?

We’re hearing that “AIs that will replace human testers” but these AIs can’t — even with all the training you want to imagine — make value judgments. Or work at different levels of abstraction.

This isn’t to say that there’s nothing to the argument about “testing as algorithm.” In fact, there’s a lot to say about that. I find that many AI pundits seem to be unaware of how to frame this argument.

One of the most simple and accurate that I’ve seen comes from the book Algorithms to Live By:

“Algorithms have been a part of human technology ever since the Stone Age.”

This is true. But this also allows us to discuss things that human testers currently do in “modern testing.” Specifically, we don’t just use testing as an execution activity. We also use testing as a design activity and as a framing activity.

Maybe AI pundits are aware of this. But they don’t often talk about their tooling that way when they talk about the removal of human testers.

Consider also Stephen Pinker, a professor in the Department of Psychology at Harvard University, in his article “Thinking Does Not Imply Subjugating” where he says:

“Cognitive feats of the brain can be explained in physical terms: To put it crudely (and critics notwithstanding), we can say that beliefs are a kind of information, thinking a kind of computation, and motivation a kind of feedback and control.”

The idea here being that a computational theory of reason opens the door to artificial intelligence — i.e., to machines that “think” in some fashion. The idea further embraces the notion that by examining cognition we might be able to solve computational problems posed by our environment. Essentially by recognizing those problems as “computational” in the first place.

This could perhaps change what we think about human rationality. And thus perhaps to how we apply the algorithms of test thinking to our computational environments (i.e., our applications).

But AI pundits around test tooling rarely talk this way. And that’s a shame because it would open some fruitful dialogue with the very people they are claiming will be replaced in the post-modern age they think we’re in. Most of these pundits focus on the tooling rather than on the basis of what that tooling is purported to do.

So Was the Challenge Unfair?

With this challenge, I wanted AI pundits to figure out that there was a way to engage on this without necessarily worrying about whether this was a “Turing style test” for AI.

No, the challenge wasn’t stated that way. But humans are good at reframing the problem to something that can be handled. Note that this is a very human thing to do, in contrast to an AI. So there was a bit of a meta challenge here.

All I really want people to do right now is show me how the AI tooling is different from the non-AI tooling that I described above. Show me that it has the potential to be more effective or efficient at doing two things: (1) running the tests and (2) alerting me to problems.

I already have human testers and I already have machine-based (non-AI) automation. Assuming we’re talking about testing only as an execution activity here, show the value-add of AI-based tooling.

Promote Accurately; Evangelize Responsibly

I’ve said this in other posts and I’ll say it again here: I’m excited about the prospects of machine learning and artificial intelligence as human-supporting and human-enabling technologies. But it’s critical that we know what we’re talking about — and how we’re talking about it — when we promote these tools and particularly as we ask humans to abdicate certain roles and responsibilities to them.

When you have a solution to promote or a tool to sell, you already have a biased interest in getting people to accept your view of what testing is and what these tools are capable of. The rest of us need to be backstops to that.

AI: Steadily Reducing Testing?

Case in point, consider that with my discussion above, and in the previous post, it seems we have already reduced the “testing” that AI can do to the so-called “menial” (by which is usually meant checking for regressions) and the purely formulaic (automated tasks that require no thought).

Given that the vast majority of testing is not menial or formulaic, that’s quite a reduction.

Further, we already have tools that do this: non-AI automation. So AI has to show that it can do this better and with less overhead. And, of course, without being opaque about how it does it because testing is about removing that which is opaque. Testing relies on methods that are entirely demonstrable and, crucially, explicable, just like experimentation in science.

We’ve also reduced the domain of the AI test supporting tools regarding moving across boundaries. For example, many applications these days use components like third-party APIs. Or interconnected and interdependent services, only some of which propagate up to a direct (as opposed to indirect) user interface. Others are connected to various engines — like Vanguard indexes, car dealerships, scientific databases — and so on.

Are we also saying our new AI testing doesn’t deal with those? Again, quite a reduction.

So we already see here that “AI performing testing” has been reduced quite a bit, usually to a graphical user interface and then only after a significant amount of training. And we’ve already agreed (I hope) that AI won’t — for the foreseeable future — be determining value or performing testing as a design activity.

So what then are we actually talking about?

What Is This Test AI We Speak Of?

As Kai Krause states in his article “An Uncanny Three-Ring Test for Machina Sapiens”:

“We use terms like AI too easily, as in Hemingway’s ‘All our words from loose using have lost their edge.’ Kids know it from games — zombies, dragons, soldiers, aliens. If they evade your shots or gang up on you, that is called ‘AI.’ Change the heating, lights, lock the garage — we are told that is a ‘smart home.’ Of course, these are merely simplistic examples of ‘expert systems’ — look-up tables, rules, case libraries. Maybe they should be labeled, as artist Tom Beddard says, merely ‘artificial smarts’?”

So what we need are the AI pundits to speak honestly about what their technology actually will be doing when they claim it’s going to replace human testers or when claims are made that we’ve somehow moved into some new “era” of testing.

Because so far all I’ve seen is technology that basically does what current automation does. And even then: only current GUI-based automation.

So perhaps a more accurate statement is not that “AI will replace human testers” but rather that “AI will replace current non-AI automation tools.”

And perhaps I can accept that and then ask: so show me how. What’s the basis for that belief? Demonstrate to me — particularly on something you didn’t custom build for the purpose — how this may work. Something like — oh, I don’t know — a Stardate Calculator or an Overlord provisioning application.

And then, even better, provide the means for me to experiment myself. Automation really opened up when it became open source and moved us away from proprietary and closed tools like QTP and SilkTest. This allowed a broader range of experimentation.

As the physicist Haim Harari has said:

“Some prominent scientific gurus are scared by a world controlled by thinking machines. I’m not sure this is a valid fear. I’m more concerned about a world led by people who think like machines, a major emerging trend of our digital society.”

That’s what I’m seeing more of, particularly in the testing industry.

3 thoughts on “AI Test Challenge Follow Up”

Tatiana says:

28 May 2019 at 4:44 pm

Interesting stuff here and I do like how you framed this.

The lack of contenders doesn’t surprise me. As you know, I work for one of the companies that you were challenging. This did really force us to consider a lot of our approach. We still believe in what we’re trying to accomplish but have recognized that how we talk about it can be more clear and probably more empirical.

We’ve definitely been hearing the talk of more “post-modern testing” and how that contrasts with traditional testing. But it’s actually been unclear what even “traditional” testing is. Testing has changed a whole lot in the industry and if there is something “traditional” it’s been a moving target. There’s also been a lot of talk about the era of “autonomous testing” and “collaborative smart testing” and those are lumped under a “post-modern, non-traditional” view.

Unclear what that view is, however.

Debba Falcone says:

29 May 2019 at 3:17 am

You’re going to make enemies with this. You know that, right? 🙂 Right now people promoting AI and those accepting it have very thin skins. They’re investing too much of their ego in this and so any challenges like the one you’re doing here are going to be seen as attacks on what they believe.

As someone who is working in this field, however, I definitely appreciate your approach here. I do feel there’s a bit too much unquestioning acceptance of whatever it is we think we’re doing with AI in general, not even regarding its role around testing.

Aaron Evans says:

12 June 2019 at 5:02 pm

I don’t know, it took me an inordinately long time to figure out how to login.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …