AI Test Challenge

This is not a challenge for testers to test an AI. Although that is a worthy challenge, one I tackled a bit. For right now, I want to propose a challenge for those promoting tools that claim to perform testing, particularly when the claim is that such tooling stands a chance of replacing human testers.

First, let’s set our stage a bit.

Thinking About AI

Steve Omohundro, co-founder of the Center for Complex Systems Research at the University of Illinois, said something interesting in his article “A Turning Point in Artificial Intelligence”:

“Modern AI is based on the theory of ‘rational agents,’ arising from work on microeconomics in the 1940s by John von Neumann and others. AI systems can be thought of as trying to approximate rational behavior using limited resources. There’s an algorithm for computing the optimal action for achieving a desired outcome.”

The notion of a “rational agent” can be applied to a tester who is engaging with an application in order to determine if aspects of quality are present, missing, or degraded. That is our desired outcome. Is there an algorithm for that? Are there optimal actions for that? If you can’t answer that for a human doing testing, I’m not sure how you could encode it in an AI that purports to do testing.

Beyond that, however, I’ll ask my readers if we can we agree on the following: that the testing we’re talking about there requires human-level intelligence?

Now, this doesn’t rule out AI because human-level intelligence is what many practitioners of AI want to create or at the very least emulate. So what I’m asking us to agree on is that testing is not just some formulaic process that can occur in the absence of human-level intelligence.

Murray Shanahan, a professor of cognitive robotics at Imperial College London, wrote an article called “Consciousness in Human-Level AI” and said this:

“Intelligence is bound up with what philosophers call intentionality.”

That notion of intention is important because, in the context I’m concerned about here, it speaks to the value we intend to deliver through our applications or services. That’s a large part of what we’re testing around. Does our implementation provide our intent? Does our intent actually deliver value to the people we think in does in the way that we think it does? Aligning that business intent with our activities as testers is crucial to what “good testing” means.

In that same article, Shanahan says:

“All animals, to some degree or other, manifest cognitive integration, which is to say they can bring all their mental resources — perceptions, memories, and skills — to bear on the ongoing situation in pursuit of their goals.”

That’s important. Testing is about experimentation, investigation, and exploration. Some of that is based on our perceptions, memories and skills. Shanahan continues:

“In (healthy) humans, all these attributes come together as a package. But in an AI they can potentially be separated. So our question must be refined. Which, if any, of the attributes we associate with consciousness in humans is a necessary accompaniment to human-level intelligence?”

And that does get to an interesting point. In fact, on that last point, we can refine that even further by just asking about the human-level intelligence required to carry out testing. We already agreed (I hope!) that testing requires human-level intelligence. But there is the question of how much of our conscious experiences, which evolve, are necessary to that intelligence. If it turns out certain things are necessary, our AI pundits will need to talk about how those necessities are accounted for in an AI that is said to perform testing.

Before I get to the challenge here, one more thought from Stephen Pinker, a professor in the Department of Psychology at Harvard University, from his article “Thinking Does Not Imply Subjugating”:

“Just as inventing the car did not involve duplicating the horse, developing an AI system that could pay for itself won’t require duplicating a specimen of Homo sapiens.”

What does that mean when we talk about tooling that allegedly is performing testing? If we’re not duplicating a human or even the full ambit of human intelligence, then what we are doing when it comes to testing with these technologies?

The Challenge

I want an “AI Tester” (as opposed to the Human Tester) to go to my sample app home page and login. The application is my Veilus app:

Veilus App: http://veilus.herokuapp.com/

A human with eyes open and brain engaged will be able to figure out how to login with what credentials. So should the AI Tester.

Once there and logged in, navigate to my stardate page:

Stardate Calculator: http://veilus.herokuapp.com/stardate

And … test it.

For an extra challenge, navigate to my overlord page:

Overlord Provisioning: http://veilus.herokuapp.com/overlord

And test that too.

If you are asking “test what?”, well, do what a human would do when presented with the above and is asked to test it.

What I’d like to know is if the Stardate Calculator delivers value. I would also like to know if the Overlord application is working in a way that will please our users. (Our users for that are mad scientists and I really don’t want them too angry at me.)

The Parameters of the Challenge

Now keep in mind I’m saying: test those things.

We are told that AI is “doing testing.” So I don’t just mean here recognizing widgets on the pages. I already have automation that can do that perfectly well. Humans can do that equally well. I don’t mean just being able to carry out actions on the page. I have automation right now that can do that. And, again, humans can do that as well.

What I mean is exactly what I said: test those two feature areas. They are very self contained.

Here’s some good news: the applications won’t be changing at all. So you don’t have to worry about that. The interfaces will be staying the same. And the requirements are set and won’t be changing. So I’m not even talking about testing as a design activity here; I’m largely talking (mostly) about testing as an execution activity. Also note that my applications aren’t connected up to actual real world systems, such as in autonomous vehicles, nor are there connections to third party sources, such as a Facebook API or a trading system or a bidding engine and so forth.

Some more good news: you don’t have to test for security. Or for performance. Or for usability. Or for accessibility. You just have to test for the external quality of functionality but, of course, in service of testing the behavior to understand if value is being delivered. And notice this focus on external qualities means you don’t even have to be thinking about internal qualities, which is where some of the most important testing is done.

And if it helps those who want to try out this challenge, be thankful I’m not asking you to test my Test Quest game, which I do in fact expect testers to be able to engage with and find quality problems.

What’s The Point?

What this exercise helps me do is see how someone formulates testing. Then it lets me see how they carry out testing. Finally it also helps me see what they think their AI can actually do in terms of testing.

As I close this post, I do want to be specific about this because I don’t want people thinking my criteria are shifting or arbitrary and thus designed to dismiss any AI right out of the starting gate.

One thing that guides testers is the idea of always using precise language and being guided by observed fact. So if you say “AI can perform testing”, then make sure what you mean is precise (about “AI” and “testing”) and allow observed facts to determine if what you say has veracity.

Another thing that guides testers is the idea of taking imagination to its limits but drawing few conclusions without solid experimental proof. So if you say “AI can perform testing”, how does this aspect come into play?

Human testers — the good ones, anyway — do a lot. They explore, observe, experiment, eliminate sources of error, compare some theory with experimental findings, think around problems, and, finally, draw whatever conclusions stand the test of time based on their observations. And even then they have to be open to challenge; they can’t become a prisoner of their own ideas or their own results because quality is a shifting perception of value over time.

Human testers must have faculties of observation, exploration, imagination, and contemplation. Those are combined with experimental skill, often meticulous record keeping, and a good dose of sheer determination.

Human testers embody the full relationship between observation, hypothesis, deduction, induction, confirmatory experimentation, falsifying experimentation, and the act of implausifying ideas (such as the idea that everything is working or that value is being delivered).

I realize all of this may not be what people mean when they say “AI performs testing” or “AI will remove the need for manual testers.”

But if that’s the case, then people need to be clear about what they do mean. And what technologies, like AI, are actually providing. Because right now a lot of people are making a lot of claims. And a lot of those claims are going without too much challenge.

And challenging perceptions and conceptions is a large part of what testing is all about.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …

Thinking About AI

The Challenge

The Parameters of the Challenge

What’s The Point?

Leave a Reply Cancel reply