Testing Learning Systems

Let’s continue the from the last post where you saw a working implementation of a learning environment called Pacumen. Here I want to provide you more details of the basis for this kind of work and use that as a springboard for thinking about how testers fit in these situations.

I will note right at the start that some of these images in the post are adapted from the “Intro to AI” course provided by Dan Klein and Pieter Abbeel.

What are we doing when we’re working in the context of machine learning or artificial intelligence environments? Well, we’re thinking about a real problem. Then we’re abstracting that real problem into something that’s a reasonable simplification of that real problem. Then we put agents in the context of that simplified problem and see if we can make them find a solution.

This is a vast oversimplification of a lot of details but it’s largely accurate at a conceptual level.

Agents and Environments

A key word I mentioned there is “agent.” The problems that we’re working on will require agents and the main unifying theme is the idea of an intelligent agent or an agent that acts rationally. Specifically, we want to come up with a formalism to allow agents to plan ahead. Thus, we formalize these kinds of problems as search problems. This could be searching for genetic markers in DNA strands, searching for reviews that reveal specific sentiments, searching data to understand electoral dynamics, and so on. These search problems are very different than learning problems, which I hope to cover in future posts.

And that brings us to Pac-Man.

Pac-Man is an implementation of a “Grid World”, which is a maze-like problem. In this kind of problem, an agent lives in a grid. Every location the agent can be at corresponds to a state. Walls will block the agent’s path. Pac-Man is an environment that allows you to model search problems and decision processes by creating algorithms that the pacman agent carries out.

And with this tight focus, we’ll define “AI” in a constrained way: as the study of agents that receive percepts from an environment through sensors and act upon that environment using actuators.

Any possible outcome — anything that could possibly happen in the environment you are looking at for solving your problem — has a number associated with it. That number is called a utility. Utility measures how good an outcome is. Goals are expressed in terms of the utility of outcomes. So “being rational” and “acting intelligently” means maximizing your expected utility. The focus on expectation refers to taking an average over possible outcomes weighted by how likely those outcomes are.

So let’s refine our definition of agent a bit. An agent is an entity that perceives (gets inputs) and acts (generates outputs). A rational agent selects actions that maximize its (expected) utility.

But there are many sequences of actions to consider. What we ultimately want to do is figure out how to build an agent such that the agent most efficiently considers the most relevant sequence of actions such that the agent can find an optimal sequence of actions that it can then act upon. We want the solution returned to be the “least cost” solution. This means we want to algorithmically constrain the solution.

Tester Hat On?

So let’s say that in your context as a tester in this type of company, you will be facilitating and helping the teams who are building general search algorithms and applying them to pacman scenarios.

I think it’s instructive to stop right now and just ask yourself: “Okay, what am I thinking about in terms of testing given the knowledge I have so far?” It’s always possible to consider testing as a design activity as opposed to just an execution activity. So even with the limited information above, I’m hoping that design activity part of you is already feeling the tug of creativity.

Domain Terminology

Let’s consider some of the terminology at this point so that we can constrain a bit how we talk about all of this.

World State

A world state is everything we can think of that relates to the problem. This can be large.

With the above world state, you must count all the agent positions (the x, y coordinates of pacman), the food dot count and whether a given food dot has been eaten or not, the power pellet count and whether a given pellet has been eaten, the possible ghost positions, whether the ghosts are in a scared state, how long the ghosts will be in a scared state, the possible directions that the pacman agent can be going / facing: north, south, east, west, the layout of the maze itself, a running score, and so on.

State Space

Then there’s the state space. The state space is your abstraction of the world state. Here you abstract away the reality and come up with a reasonable simplified abstraction.

Notice in this simplified state, there are no walls to contend with, except the border walls. Effectively there’s no maze. Also the ghosts are constrained such that they can’t ever kill pacman. You might set up a state space like this, where certain complications are abstracted away, to see how the agent could solve a particular problem, such as “eat all the dots in the quickest time possible.”

There can be a large number of states in a state space and often you won’t enumerate all of them ahead of time.

That’s a visual listing of some possible states that pacman can be in.

Start State

These kinds of search problems require a starting position:

So the start state is just the place in the grid where the pacman agent will start.

Goal State

You also must have a goal state, which serves as a goal test.

Sometimes the goal state may be a specific position but other times it can just be a condition, like “all food dots must be eaten.” In that case, the goal state is any state where all the food dots are gone, regardless of where the pacman agent ends up when that’s the case. A goal test basically checks if that condition has been achieved.

So with that broken down state space visualization shown above, assuming it doesn’t matter where pacman ends up, only that the board is cleared, there are nine goal states – nine ending locations – for pacman where all food pellets have been cleared.

It’s important that the goal test only access the state. The goal test is not accessing the world state, for example. This means that you need to define all states from your state space such that a goal test is possible. The same applies to final component required: a successor function.

Successor Function

A successor function basically prescribes that if the agent is in a particular state, and the agent were to take a particular action, what is the next state going to be.

A successor function will also tell you if, with that action, there is any cost associated with it and, if so, how much is that cost. So with the above visual, from the start state, the pacman agent can go north or east; the cost for both is 1.0. This means “1 time step passes.”

In many of these simulations, you have a living reward which is negative. This means there is a cost to still living in the world and not achieving your goal. In the Pacumen context, this is 1. So each time step — each action taken — costs 1 point. Eating a dot, however, gives you 10 points. So as the successor function is carried out this changing of the score, which reflects the utility, is always being updated.

As with a goal test, the state space must contain enough information that a successor function can be defined because it’s essentially how the agent moves from one state to another. Without that, the agent could never go from the start state to the goal state.

Find Solutions

A solution to a search problem is a sequence of actions (a plan, basically) which transforms the start state to a goal state. That’s what all those components above are put in place to allow you to check.

But keep in mind that the real problem was turned into a model – a simplified version of the problem. Models don’t have perfect fidelity; some things are abstracted away. And therein lies a challenge. If the models are too detailed, you can’t solve the problem. At least not in a reasonable time and possibly not ever. If the models are not detailed enough, you don’t actually solve the problem; instead you solve a very simple thing that isn’t really the problem.

Types of Agents

This brings us back to our agent. There can be different actions that will reach a goal.

Thus the agent can be of different types. One common type is a reflex agent. Reflex agents don’t simulate or plan: they just choose their next action based on the current state of the world.

While such agents can seem profoundly stupid, a reflex agent can still take good actions. The problem is usually that the reflex agent is not choosing those actions in terms of consequences for a longer-range goal. You saw exactly this in the previous post, where you were able to write a reflex agent that did very well in some situations and terribly in others.

There are also planning agents. A planning agent will think about different ways their actions could play out and then pick a plan that has a high chance of success.

In computing terms, this agent will perform computations (“think about different ways”) and then run simulations (“pick a plan based on likely success”).

Planning agents ask “what if”. The focus of the question being “What would the world be like after some sequence of actions?” But there are usually many possible sequences of actions to consider. So a main goal for building systems that solve problems is figuring out how to efficiently consider the most relevant sequence of actions such that we can find an optimal sequence of actions that we can then act upon.

A planning agent, unlike a reflex agent, doesn’t do all this in the world; rather, it does it in the model and then applies what it learned in the model to the world. Which is why choosing that state space abstraction — the model — is so critical. The next critical bit is the algorithms that we will tell the agent to execute in the context of that model.


We’re going to jump into the algorithms in the next post but I do want to set the stage here.

These algorithms need to have two important properties. They need to be complete and they need to be optimal. Complete means that if there is a solution — if there is a path that gets you to the goal — then the algorithm will actually get you to the solution. Optimal means that if we have a solution (from a given algorithm), then that solution is the least cost solution.

So let’s just keep in mind the path to success here:

  1. Set up a state space, a start state, a successor function, and a goal state along with a goal test.
  2. Then call an algorithm that generates a solution.

Fundamentally, that’s about it. The algorithm uses nothing more than what you set up. And, as stated before, a solution to a search problem is a sequence of actions that transforms the start state to a goal state. Or to one of the goal states, if there are multiple.

Tester Hat Still On?

So right there you have a major test condition that you are probably thinking of, right? That test condition critical to thinking about testing in these contexts. I hope I’ve somewhat gently guided you to that realization.

And here’s where we’ll close off this post. As a tester you’ve been given some terminology now and even some moving parts to consider. Beyond the critical test condition I just mentioned, are various data conditions coming to mind? Are you starting to think of possible variations to the primary condition? Are you thinking about how you would figure out what “bad” versus “good” looks like and how you would report that?


This article was written by Jeff Nyman

Anything I put here is an approximation of the truth. You're getting a particular view of myself ... and it's the view I'm choosing to present to you. If you've never met me before in person, please realize I'm not the same in person as I am in writing. That's because I can only put part of myself down into words. If you have met me before in person then I'd ask you to consider that the view you've formed that way and the view you come to by reading what I say here may, in fact, both be true. I'd advise that you not automatically discard either viewpoint when they conflict or accept either as truth when they agree.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.