Testing and AI

What’s been interesting in the testing world — at least the part of it that I hang out in — is the application of different AI-based learning algorithms to the act of exploring an application and seeing what (if anything) that tells us regarding the algorithmic and non-algorithmic parts of the testing discipline. Let’s talk about this because I think is fertile ground for testers to be exploring.

Exploring Games by Learning

Tools like those used by Deep Mind, Vicarious and the OpenAI library have show us how we can have machine learning (reinforcement learning, specifically) learn to play games like Pong and Space Invaders solely from pixels.

Finding Bugs by Learning

Beyond that, game exploits/glitches (treat those as bugs) in Super Mario Bros, for example, were found by neural networks. Specifically a technique known as Neuroevolution with MarI/O. These bugs were found by humans as well. But, quite interestingly, the algorithms also found glitches that humans had yet to find. This despite those glitches having existed for decades in the game.

Make sure you are internalizing what I just said. The algorithms explored the system in ways that humans had not. That includes players and testers of the game.

Finding Choices and Flaws by Learning

In my own game, Test Quest, the reinforcement learning algorithms I applied, using techniques called policy gradients and q-learning, found a particular design choice in the game that every tester I’ve used the game with has not found. This design choice could easily be construed as a bug. Further, it’s a bug that only exists for a certain amount of time in the game because later actions make the bug impossible to recreate. But it’s an auto-win bug. Meaning, if you find it and exploit it you win the game without any effort at all.

The learning algorithms also found a design flaw in my Alien Test Invaders game, on average, much quicker than humans figured it out. The algorithms did so simply by observing patterns over a series of training sessions. No different than what a human would or could do. This is another “win without effort” bug. If you’re not familiar with this issue, see if you can find it! (The context for the game, among other things, is covered in my post Gaming Like a Tester.)

Notice here that in Test Quest the issue was a design choice. I purposely put in that choice because it actually tests the kind of assumptions that testers make in an environment. The Alien Test Invaders issue was a design flaw — and thus could also be a bug — because while it does not stop you from winning (or losing) the game nor does it stop a full experience of the game, it does allow you to exploit a flaw where no effort is required.

Relevance?

Well, let’s take this out of the gaming context. There’s been a lot of interesting experimentation around seeing how (if) those algorithms can learn any sort of user interface from pixels: say your average desktop application or web storefront.

So basically a lot of new experiments are being done to train a policy network to generate a distribution of actions (type here, click this, drag that, look for error messages, etc) so that interaction with a user interface is attempted via exploration. Then you apply a training algorithm and see what happens in terms of how well it “tests” (note the quotes!) the application once it has learned something about it.

And … So What?

These algorithms are examples of working towards artificial general intelligence (AGI). Some testers fear this kind of AI will replace them. It won’t. Not any time soon. But it is interesting in terms of considering what might be purely algorithmic in testing — including exploration — and what is not. This also might be a way to operationalize, visually and otherwise, the oft-stated distinction between testing and checking.

Beyond that, the realms of data science, machine learning and artificial intelligence are more and more becoming part of the operational backbone of much of what we test. Yet I fear the testing industry is (currently) filled with practitioners that can scarcely talk about these concepts much less are prepared to test them in a way that separates them from developers.

My Own Experimenting

This is a topic I plan to expand on. I mentioned some above examples of my own experiments. I want to make some of this a bit more accessible for people to play around with. So, to that end, I’ve written a version of the Berkeley AI Pac-Man program that I’m calling Pacumen. I’ve cleaned up their code, added a few new features, and removed a lot of bugs.

I’ve written this as part of some classes I’m going to be teaching on the intersection of artificial intelligence (broadly speaking) and testing. As just one example, if you are given an application that is using artificial intelligence algorithms or agents, what is your strategy for reasoning about it and then testing it, automated and otherwise? The Pacumen context provides a fun but relevant environment because essentially it models search problems and decisions processes. And that is exactly a core component of what we humans do, acting as rational agents, when we explore applications.

Beyond that, however, I think is the more fascinating idea of possibly being able to more deeply explore what it is we actually do when testing and exploring, by attempting to model that algorithmically. And where those algorithms break down and can’t replicate exactly what a human does or can only do so after a massively unreasonable length of time.

This isn’t me saying that everything we, as testers, do is algorithmic enough that it can be encoded in learning algorithms and applied solely via automation. But this is me saying that all of above context I provided should be fascinating to testers who want to remain relevant in an industry that is seeing some gradual but seismic shifts in the technological substrate.

Look for more posts on Pacumen coming soon. Via my blog I’ll present a condensed view of what I do in my classes and I’ll be curious to hear any and all feedback.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …