This post follows on from the first part. That post, and this one, are building up your understanding of the algorithms that you have to consider when you are a tester in a machine learning or AI environment, particularly when dealing with search problems.
What we’re going to be doing in this post is taking the search problems we’ve already talked about and injecting into them information about where the goals might be so that we can find optimal solutions while exploring less of the search tree. So, with that in mind, let’s take a moment to revisit a few key points.
We have a search problem. This is made up of states in the search abstraction or model. These states are the possible configurations of the world as relevant to your search problem. There are actions that come along with costs. There is a successor function, which basically tells you how the states evolve in response to your actions. The successor function is essentially acting as your knowledge about how the world works. We also have a start state and a goal test.
We have a search tree. This is a data structure being built; nodes represent all the possible plans (sequences of actions) you could execute for reaching states, hopefully extending to reaching a goal. Plans have costs, which are the sum of action costs along the path.
We have a search algorithm. This algorithm systematically builds a search tree. Starting at the top and moving toward a goal, we want to choose an ordering of the fringe, which are the unexplored nodes in the search tree, and find the least cost path through the tree to achieving the goal. That’s important: we don’t just want a way of achieving the goal, but doing it for the least cost, where the costs are encoded in the successor function’s costs.
We looked at specific algorithms: depth first search, breadth first search, and uniform cost search. One thing I hope, as a tester, you were able to ferret out is that all of those algorithms mainly differ in how they prioritize what they pick to expand from the fringe. The decision of what nodes are picked from the fringe is the strategy; those strategies are what differentiate the search algorithms. Those search algorithms also have different sensitivities to the cumulative costs of moving within the tree.
So what you hopefully have here is an understanding of the broad test and data conditions that will apply in this kind of environment.
Let’s do a friendly quiz here and test our understanding. Consider this simple, four-state graph:
Here’s the question: how big is the search tree? I’ll revisit this at the end of the post. Take some time to think about it. This is really just seeing if you understand how search trees can be constructed from state graphs, which is what a lot of the first part of this post was about. For now, let’s get into informed search.
These search algorithms can get quite a bit more tricky to conceptualize than with the uninformed search that we looked at in the previous post. As such I’m going to be a little more abstract here. This mirrors those situations in testing where sometimes you find you are swimming in details and other times you are in a desert, bereft of all details. That said, I will note that we are going to stick with the tree search from the previous post but we’re going to improve it a bit.
As mentioned in the last post, informed search algorithms are characterized most often by greedy search and A* (“aay-star”) search. A greedy search and an A* search are tree searches but a key difference is that these are tree searches that will be guided by a heuristic. Uninformed searches explore the search tree in some way that is systematic but unaware of where the goals are. Informed searches, by contrast, do have some awareness of where the goal is.
Greedy refers to the strategy we use in selecting which node to expand first. Greedy doesn’t care about the cost. Like a depth first search, both algorithms basically go down the depth. But depth first search is just left to right, while greedy is a bit informed by the heuristic. The heuristic allows you to adopt a strategy of expanding a node that you think is closest to a goal state. In fact, that’s what a heuristic basically is: an estimate of the distance to the nearest goal for each state. I’ll talk more about heuristics in a bit. Conceptually this is what that looks like:
Most of the details there you can ignore but treat the pink dot as the goal state. Greedy did some breadth but mainly dove right down to the goal. Unlike depth first or breadth first, greedy expanded very little of the overall search tree.
There is an interesting situation that occurs when you introduce heuristics. Your agent can be given a really bad heuristic. A heuristic, for example, that is high cost close to the goal, but low cost far from the goal. This would mean the strategy would continually explore things far from the goal. You would end up with something like this:
Tester Hat On?
The above is a pattern that, as a tester, you have to be aware of and be able to spot when you are testing such systems. Admittedly, this is about the worst possible heuristic that you are likely to be given and thus it’s actually unlikely the situations you are testing will be this obvious. But therein lies the challenge as well.
A* search goes to the goal like greedy, but a difference is that it backtracks as needed. A* is actually a bit of a combination of algorithms. To understand this, however, let’s go back to uniform cost search for a moment.
Uniform cost search is slow and steady. The algorithm explores every part of the state space that you can reach more cheaply than the cheapest way of reaching the goal. Greedy search, on the other hand, races in the direction that it thinks, looking ahead, is the shortest cost to the goal. A* lets us bolt both of those strategies together in one algorithm.
Okay, wait a minute. Does that mean A*, definitionally, is better than the other algorithms? And if so, why don’t we just always use it? Well, the tricky part about informed search goes back to those heuristics.
Heuristics inform our search and guide us as to where we should be going in our search tree as we are partially building it up and trying to find a goal. A heuristic is a function that maps from states to real numbers. The function ideally tells you how close a given state is to the goal. The trick is that heuristics have to be designed for each problem: you essentially bolt on the heuristic to your search problem.
Depending on which state you are considering, your heuristic will give you a different value: states far away from the goal will ideally have a high score (large number), which means there is a high cost to reach the goal. Those states closer to the goal should give a low score, thus a low estimated cost to reach the goal. Let’s consider this in relation to a pacman agent trying to get to a dot:
Here we see two approaches. In one I have to go to the left 10 and 5 down. In the other I go diagonal 11.2. Did your tester senses kick in there? These aren’t even reasonable are they?
Well, here’s the thing: the lines are reasonable heuristics even though the strict path they show is not possible. As long as you have something that assigns a value to each state, in terms of how long you think it will take to get to the goal, and how much cost will be incurred, that is a heuristic function.
What you’re seeing in that above visual are two examples of heuristics: the Manhattan distance and the Euclidean distance. The Euclidean distance is the ordinary straight-line metric between two points in Euclidean space. The Manhattan distance measures the distance following only axis-aligned directions: meaning strictly horizontal or vertical.
I mentioned above that heuristics are estimates. They are an estimated value. And that’s important because if you want the actual value, you have to solve the search problem. But that’s what we’re trying to do in the first place. This means the efficacy of our heuristics really matters. We want what’s called an admissible heuristic. Admissible refers to a heuristic that is an underestimate of the optimal cost to get to the goal. So the heuristic must always be lower than the optimal cost of reaching the goal. That needs to hold for every single state in your state space. Every state in your state space has to have a optimistic guess of the cost to the goal.
An admissible heuristic is often called an “optimistic” heuristic because such heuristics can slow down bad plans — paths through the tree — but they will never outweigh the actual costs. An inadmissible — or pessimistic — heuristic essentially would not be optimal. You will not have a sense of the actual costs with such a heuristic and that means better paths to the goal may not get pulled from the fringe.
So let’s consider pacman again:
This is using a Manhattan distance heuristic. Is this admissible? It definitely is. You can never get to the goal more cheaply than following the horizontal and vertical direct path, ignoring walls. That’s the cheapest path – when there are no walls. When there are walls, your paths will only get longer.
Okay, but how do you prove admissibility? One way to do it is to come up with a relaxed version of the problem. A relaxed version of the problem is one where you make available more successors in your successor function. If you make available more successors, your problem becomes relaxed. So in pacman’s case that would allow for going through walls like we’re seeing here. And once you allow that, you allow different approaches:
Here we have the Euclidean distance being used. So you have additional successor states available for certain states that were next to walls and if, in addition to the original set of actions, you get an extra set of actions, and you solve this new problem that has more actions available, then the optimal solution to that new problem will be cheaper than the optimal solution to the original problem. Assuming that you assigned costs in a reasonable way for your new actions.
Tester Hat On?
I’m willing to bet, depending on your experience with these concepts, that you’re finding some of this slightly more difficult to get through than the first post. As you can probably see, from a testing standpoint, these aspects of heuristics can lead to a lot of considerations in terms of making sure functionality is not only working but is working in the most optimal way. That is, after all, a large part of what we are testing for. A large part of what you are reporting about is how these algorithms are finding optimal solutions based on the cost.
Test Considerations for Cost
So now let’s consider a particular state graph:
The numbers for h are the heuristic costs at each point; the other numbers are the action costs. Now let’s consider the search tree for that graph:
Here we have h again, but what is g? Okay, let’s step back and look at this in the context of our algorithms.
- The uniform cost search algorithm orders its fringe exploration by the cumulative path cost, or backward cost, which is often called g(n). So for a node on the search tree which corresponds to a path on the graph, it has some sum of the action costs. That’s g(n) for that node.
- The greedy search algorithm orders its fringe exploration by goal proximity, or forward cost, which is often called h(n). The cost of h is the cost of getting to the goal. So greedy doesn’t care about expensive paths; it only cares about paths that are close to the goal.
- The A* search algorithm, as I mentioned before, bolts both of those strategies together in one algorithm. This means that A* search orders its fringe exploration by the sum: f(n) = g(n) + h(n).
Let’s use a video to explain this a bit.
In that video, uniform cost is showing in blue, greedy in red, and A* in purple. What are we seeing?
- The blue — uniform cost search — goes the cheapest route but that’s not too effective.
- The red — greedy — does expensive things that look good in the future.
What does the purple — A* — do? That looks at how much the current plan has cost you — g(n) — and how much you think it’s going to cost you — h(n). Because of g, A* knows to ignore paths that are actually expensive and because of h it knows to ignore paths that are going to be expensive.
Just to make this clear, let’s consider what’s happening for each. Do this by looking at that search tree.
Uniform Cost Search
For uniform cost search, we have start state S; we expand to get our only node on the fringe a. We expand again and we get the following nodes on the fringe:
- S -> a -> b
- S -> a -> d
- S -> a -> e
Which one of those do we expand for uniform cost search? We pick the one with the lowest backward cost, which is denoted by g. The lowest such cost is that for b (g = 2). So we expand that to get c. That leaves our fringe as:
- S -> a -> b -> c
- S -> a- > d
- S -> a -> e
The lowest backward cost is c (g = 3) so we expand it. Except that node doesn’t have any successors. So it gets removed from the fringe. That leaves us with the d and e paths. So we expand d and that gets us G. That gives us g = 6, which is the lowest backward cost. And it’s the goal, so we end up with S -> a -> d -> G. But note that we expanded a lot for a uniform cost algorithm.
For greedy, this follows the same path when we end up with S -> a -> b, S -> a -> d, S -> a -> e on our fringe. But here the lowest heuristic value (h) is 1 for e and so we expand that to get d. So now we have the following on our fringe (with h values):
- S -> a -> b (h = 6)
- S -> a -> d (h = 2)
- S -> a -> e -> d (h = 2)
Here both are d because both h’s are 2. What will greedy do? That depends on what’s used to break the tie. And that is where your heuristic comes in as part of your exploration strategy.
Finally, what will A* do? It will start out the same way, taking us to S -> a -> b, S -> a -> d, S -> a -> e on our fringe. A* looks at the sum of g + h for each: 8, 6, and 10. 6 is the “best” in this case, so d is expanded. That gets us G. So we have the following:
- S -> a -> b (g + h = 8)
- S -> a -> d -> G (g + h = 6)
- S -> a -> e (g + h = 10)
And we’re done.
Tester Hat On?
As a test specialist, you are a trained observer. So what are we observing here?
What we’re seeing here is that a uniform cost search is the cumulative cost of arcs back to the root of the tree. That’s the backward cost, g(n). This backward cost is computed as you go as part of the fringe. A greedy search is the forward cost, h(n). This is not cumulative but is instead a function of the state. And thus A* is a sum of those two. So looking at the state graph from above:
- A* doesn’t go to state c early — like uniform cost — because the h is high (far from goal).
- A* doesn’t do the S -> a -> e path — like greedy — because g is high (expensive path).
With A* we were guided toward the goal while maintaining enough states on the fringe and expanding the fringe enough in other directions where we might still find the goal. That gave us an optimal path to the goal.
So I’ll go back to what I asked a little bit before: Does that mean A*, definitionally, is better than the other algorithms? And if so, why don’t we just always use it? What do you think? As a tester, this was a challenge presented to me.
First, let’s jump back to some situations I presented earlier. How big was the search tree for that state graph? Did you think it through? Here’s the answer:
It’s effectively infinite!
Algorithms need to avoid building an entire search tree to find a solution. We already know that. But they also need to recognize an infinite loop. They can do this by recognizing repeated structure. Practically speaking, the algorithm has to understand that it’s seeing a repeat of what it has seen before and thus can avoid repeating that over and over.
Failure to detected repeated states causes exponentially more work. In fact, you may design something that never finishes. Which brings us to something I glossed over a bit in the first post. I had presented this:
And I asked you to consider something there. Specifically I said: “Does something jump out at you? Hint: testers often think in terms of equivalence partitions or classes. And that’s sometimes looking for things that are the same or would lead to the same effect. Take some time. Study the graph. Think it through.”
Take some time now to think that through.
What’s the answer I’m looking for? Simply that everything under S -> d -> e is identical to S -> e. Notice how the first subtree under S -> e is the same as the subtree under S -> d -> e. Yet S -> e is a better way of reaching the goal because there would be less actions and thus less cost. So, ideally, you’d like your algorithm to skip the first subtree.
Given that, what’s one search algorithm you know for sure you wouldn’t want to use?
This is important! Just because the two subtrees are structurally identical, and even functionally identical, from the standpoint of the search tree, they are not identical in terms of the efficiency and effectiveness of our search tree algorithms. For example, you would not get the best path here with a depth first search because it would explore the left most entirely before moving on. What about breadth first? Would that be better? Think about it.
Tester Challenge with A*
I hope you’re building up some sensitivities as to what to look for at this point. So now let’s talk about that question I’ve come back to twice already: should we just always use A*? Consider this state graph:
Try to work through this as a tester, given what you know. When should A*, in this case, terminate? I’ll give a hint here and say that this graph is showing a very common bug that can happen.
Let’s work through it together. Start with S on the fringe. That has a g cost of 0, h cost of 3 so the f cost is 3. On the fringe we get:
- S -> A with f cost of (2 + 2) = 4
- S -> B with an f cost of (2 + 1) = 3
So now grab S -> B off the fringe. (Do you see why we grab that one?) We get S -> B -> G with an f cost of 5. And, with that, we pushed something onto the fringe that reaches the goal. If we declare success right now, however, we did not find the optimal path to the goal. We need to wait until we pull it from the fringe; not push it onto the fringe.
So now let’s pull S -> A off. (Again, do you see why?) We get S -> A -> G with an f cost of 4. We now achieve the goal again and we have found the optimal path. This video shows what I just described:
Let’s drive this home with one more example.
Here we have S with f cost of 7. Pull it off the fringe. We have S -> A with an f cost of 7 and we have S -> G with an f cost of 5. Take that last one off the fringe and we declare success. But it’s not optimal! There’s a cost of 4 that we missed. So our heuristic, in this case, is too high and we overestimated the actual cost by 3. Again, here’s a video that can show that in action:
How do you avoid those repeating cycles? How do you avoid doing a lot of redundant work? That’s where the concept of a graph search comes in.
The basic idea here is that you never expand a state twice. Your tree search plus your set of expanded states is referred to as a “closed set.” When you add a graph search to the tree search, you expand the search tree node-by-node, just as you have been doing, but before expanding a state, you check to make sure its state has never been expanded before.
If the state is not new (and thus has been expanded before), you skip it. But if that state is new, you add it to the closed set. This can skip huge parts of the search tree, which might concern you, but this approach can still be complete. This is because anything you skip over is a part of the search tree you are exploring somewhere else.
As with A* search, I bet this sounds great to you, right? In fact, I bet it sounds like with A* and graph search, we’re approaching the pinnacle of search algorithms. Except that above I showed you how this can go wrong with A*. Let’s consider that same thing with graph search. Consider the following state space:
We start with S: g cost of 0, f cost of 2. We expand and we get S -> A and S -> B. Now expand B because of the lower f cost.
We get S -> B -> C so expand that and now we get S -> A -> C -> G. But we don’t declare success yet (remember why?) so we pull A from the fringe. We expand that and we have C again with S -> A -> C. Now what?
Here is where graph search differs from tree search. At this point we have two states on the fringe:
- S -> A-> C with f = 2 + 1
- S -> B -> C -> G with f = 6 + 0
Let’s pull C from the fringe because it has the lowest f cost. After we expand it, we realize we already called our successor function on C before and thus it’s in our closed set. So we don’t expand it again. A video might help see what’s happening:
What’s happening is that the G with f cost of 5 + 0 – the lower cost to the goal that we would want to happen – never happens. It never happens because in this example, we don’t pull that left-hand C since we already pulled the right-hand C. So what would happen is that G — with f of 6 + 0 — gets pulled from the fringe. So we found a goal that is not optimal. This happened because we expanded one way of reaching C before we expanded another way of reaching C.
Using terms we’ve been looking at already, we have admissibility but we are lacking optimality. And this means we need a constraint on our heuristic. Constraints on heuristics are very important; they are essentially saying that “it’s not enough that something is right; it must be right while following certain rules.” Specifically this constraints is that that our heuristic must be consistent. The idea here is that the estimated heuristic costs are less than or equal to the actual costs.
- ADMISSIBILITY: a heuristic cost less than or equal to actual cost to goal
- CONSISTENCY: a heuristic “arc” cost less than or equal to actual cost for each arc
When I say “arc” that’s just the connector from one state to another. So if we take the difference of the heuristic cost of A and the heuristic cost of C, that’s how much the heuristic value decreases going from A to C. That value has to be smaller than the cost from A to C. If that’s true, what this basically means is that the f value the second time a given state is pulled from the fringe is higher than the f value the first time that state was pulled. Effectively that means the path cost was a worse way the second time.
Opinionated Tester Stance
I say this next part allowing for the fact that I might not have explained all of this material in the best way. Even with that caveat thrown in, I will boldly say this:
If you found yourself not willing, or not able, to follow the thread of discussion in these last two posts, you will not last very long in many of the most competitive and value-adding of future testing positions, which are those around machine learning, rational agents, and various forms of artificial intelligence. And here I’ve just been talking about problems like pathing in the context of search problems. Wait until you get into reinforcement learning.
The next post is going to have you using Pacumen again. I was a bit vague about the differences of these algorithms in terms of how effective and efficient they are although I think I’ve given a lot of clues. So next time you’re going to apply them to Pacumen. (Don’t worry – they’ll be written for you!) And then I’ll ask you, as a tester, what you expect to observe and then what you actually have observed.
This will get into focusing on you as tester as experimenter. You need to have a hypothesis, have a way to put that hypothesis to the test, and then reason about the outcomes of that test.
And it might not hurt for you to start thinking about this now: what are some test strategies you’ve come up with? What do you think are some complications to these test strategies? What questions would you have for the business or the developers about this software? How black box can you be with all of this and still be effective? How much of being “too white box” would actually compromise your ability to test with the fewest possible assumptions?
This has been a lot! So if you’ve made it this far, congratulations. And … welcome to my world.