Path analysis is often considered an activity that is done only by developers. In fact, path analysis is a technique that can be used by both developers and testers. I find many testers, however, who are unaware of this test technique or are uncertain of how to best apply it. So let’s talk about it.
In the context of development, path analysis is used to identify all possible paths through an application, usually with the intent of finding incomplete paths or discovering portions of the application that are not on any path.
When path analysis is applied as part of a test strategy, it’s sometimes referred to as path testing or even as basis path testing. In this context, the goal is to exploit the control structure of the application. This means the control or flow of the application will be used to design test cases. Various levels of control and flow will be exercised based on the paths the tests traverse.
Whether path analysis is utilized in the context of development or testing, the technique allows you to utilize two types of coverage — decision coverage and condition coverage — as well as focus on paths that are linearly independent. All of these concepts are what I’m going to be talking about here. Along with covering the technique of path analysis, I will apply the technique to a few specific examples. This may take more than one post.
Paths and Risks
As a test technique, path testing refers to testing that is performed to satisfy coverage criteria as well as to focus on risk-based areas of an application. An example of how coverage criteria are satisfied is when each logical path through the application is tested at least once. In terms of risk-based testing, those paths that are considered more risky than others can be called out and focused on for more specific testing.
Path analysis can also support other test techniques, like equivalence partitioning. Paths through the application will be grouped into a finite set of “classes” or “partitions” based on how similar (or even identical) the functionality is. One path from each class is then tested rather than all of them. This can cut down on testing time but it can also increase risk unless you are certain you are dealing with truly equivalent functionality.
Any time an analysis activity is performed, there may be various goals, but one of them is always to mitigate risk. Any risk-based approach that uses path analysis is done to optimize testing. This means that test conditions should closely resemble actual usage conditions and should include the minimum set of paths that ensure that each usage path is covered at least one time, while avoiding unnecessary redundancy (i.e., testing the same exact path more times than necessary).
That was quite a mouthful, so let’s simplify it: an effective strategy in using path analysis is to find the total number of linearly independent paths and plan a test strategy around covering those paths.
For now just think of a path as literally going from one point to another point. A common example is a login screen that then takes you to a home or landing page. That’s a simple path. Consider a GET request for a list of widgets that goes to a service layer for processing. Again, a simple path. Obviously paths can get much more involved depending upon the functionality of the application being tested.
Here’s an important point: a path through the application is linearly independent from other paths only if it includes some path segment that has not been covered before. The path is truly independent of all other paths, hence the name. The total paths in an application are then combinations of the linearly independent paths through the application.
I’ll come back to linearly independent paths as I go on. For now just understand that considering paths through an application allows you to consider ways to test the application. This lets you start moving towards a very structured notion of testing.
Structured Testing and Structured Systems
Structured testing often gets mired in complex descriptions but really it’s nothing more than an approach used to generate test cases. This is usually done by using a “logical complexity measure” of a given area of an application and then defining a basic set of execution paths through that area. The logical complexity measure helps define the maximum number of test cases required to ensure that all statements in the area of the application under test are executed at least once. I’ll talk more about that logical complexity measure soon but for now realize that structured testing gets its name from the fact that it is used to test structured systems.
And just so I’m clear on the terms I’m using, a structured system is any system that has only one entry point and one exit point. Most applications can most certainly be thought of as a structured system. You can also think of various modules or sections within the application as being structured systems as well. A structured system is one where the data and the processing are predetermined and well-defined. This would be in contrast to unstructured systems which have no predetermined form or structure. The reason this matters is that a structured system is amenable to considering paths through the system, where which path is taken (and what happens during that path traversal) is dependent upon certain inputs to the system as a whole as well as the pre-defined processing that will occur on those inputs.
Structured Paths
So let’s think of some paths through a hypothetical application.
Login --> Home Page --> Search Results --> Add to Cart --> Purchase Login --> Home Page --> Bargain Page --> Add to Cart --> Purchase Login --> Home Page --> Bargain Page --> Add to Wish List
For any structured system, the idea is that you draw each path from the entrance of the system to the exit. Only count the path the first time you walk on it. In the process of traversing all the paths from the entrance to the exit, it may be necessary to retrace some path segments in order to reach a path segment that has never been traversed before, or to reach the exit.
In the example paths above, you’ll notice I traverse “Login –> Home Page” three times. Some people make the mistake of saying that you should only count that as a linearly independent path once. Not true. The path is the full path segment, not just individual parts of the path. Using a term I introduced earlier, in the above path scenarios I have three linearly independent paths, even though they do happen to have some of the same elements. For example, you can see in the first two path segments I end up at the same state (“Purchase”) but via different paths.
The thing to keep in mind is that a path through the system is linearly independent from other paths only if it includes some path segment that has not been covered before. So there are two paths that end up at “Bargain Page” but from that point on, each path goes to different places (“Add to Cart” and “Add to Wish List”, respectively) thus making both paths independent.
This seems simple when you consider a linear sequence of actions but there are ways to loop or iterate over actions in applications. The paths that are caused by looping, feedback, and recursion are only counted once as linearly independent paths, even though they may be gone over and over again as different paths are traced through the system.
Now here’s an important point for testers to realize why all this matters: in an application, how many times a loop is traced, along with how many times a recursive call is made, is controlled by data. When test cases are built, testers will add data sets to those linearly independent paths in order to exercise the data boundary conditions and the logic paths accessed by them. Some paths will be traversed many times because of the different data that will be tested each time through a given path.
So it’s important to realize that finding these independent paths is where test design comes in: you look at the application as components (or modules) that act as a structured system. This is really important. It doesn’t mean you have to be a system designer or a developer. All you have to do is observe a certain limited set of elements in the application and determine how they can behave.
But what “limited set of elements” am I talking about? Essentially it comes down to nodes and edges that exist within regions. I’ve found this is better presented visually, so here’s a relatively standard way to break down those elements.
Edges “I go from here to there.” Lines that connect from one part to another. These are essentially control paths. |
|
Decisions (Nodes) “Either I do this or I do that.” More than one edge can be an input. Only two edges are output. |
|
Processes (Nodes) “I do this as a result of that.” More than one edge can be an input. Only one edge can be an output. |
|
Regions “I do all this stuff here as a result of that.” Chunks of code that execute according to the decision or process nodes. |
Those are the elements that make up a structured system. And from these simple little bits, you can apply path analysis and then apply the results of that analysis to your testing. Before getting into the “how do I actually do all this,” I do want to explore a bit about what’s being discussed regarding these graphs. And I want to do this because I see so many sources just throw this stuff out there and expect you to just grok why all of it matters.
A Bit About Graphing
Since you may encounter various other blogs or discussions that talk about path analysis and use certain terms to do so, it seems wise to include a bit of what you might come across here. First, consider a program as a directed graph. The program can then be modeled as a control flow graph.
Whoa! What does that mean?
First, a graph is a simple thing, right? It’s just a representation of some components where pairs of components are connected by lines. The components that are connected are usually referred to as vertices (and sometimes as nodes) and the lines that connect them are called edges. Below is an example of a graph object that, as you go from left to right, gains more components/nodes/vertices and thus more edges that connect them.
Each vertex in the graph represents a basic block. When applied to a program structure, a basic block is essentially a single line of code. This will be a statement or expression. A directed graph, such as those above, is asymmetric. What that means is that the graph has edges that go one way only. An undirected graph, by contrast, is symmetric — meaning, there is no inherent direction to the edges. Directed edges are used to represent jumps in the control flow, from one basic block to another.
With this context, control flow graphs describe the logic structure of software modules. A “module” here corresponds to a single function, routine, or method. A module has a single entry and exit point and is able to be used as a design component via a call/return mechanism.
Technical Note: I may as well cover this. Directed graphs can have what are called “strongly connected components.” When components are strongly connected, it means there is a path from each component (vertex) to every other component. Another way to word this is that if you have a strongly connected graph, it’s possible to reach any component starting from any other component by traversing edges in the directions in which they point. The idea of a “strongly” connected graph simple means that no part of it can be “pulled apart” without breaking any edges.
Program control flow graphs have two specially designated blocks. One is the entry block, through which control enters the flow graph. The other is the exit block, through which all control flow leaves. This is important because program control flow graphs are not “strongly connected.” However, they become strongly connected when a virtual edge is added connecting the exit node to the entry node.
A “virtual edge” sometimes causes people problems. The idea is that the virtual edge connects the module you are looking at with the rest of the program. In other words, it represents the control flow through the rest of the program in which the module is used. After all, something must call your module. And your module, when done, must return its processing somewhere. The “something” and “somewhere” in that sentence refer to the application as a whole, in which your module is embedded.
So … was this worth it?
This is the point where many testers I encounter start questioning how this is actually relevant to them. Fair enough. So what do we get out of this, besides a little boredom?
We get this (admittedly obvious-in-hindsight) understanding: program graphs are a graphical representation of a program’s source code. The nodes of the program graph represent the statement fragments of the code, and the edges represent the program’s flow of control. What does that mean? It means that by examining a program graph, a tester can gather an important piece of information: is the program structured or unstructured? That will give a very good indication of how difficult that program is going to be to test. The more difficult it is, the riskier it is. The riskier it is, the more tests are likely to be needed.
This post was a lot of the basis behind path testing as a technique. This may have been a bit of a rough slog but it was necessary to set up for the next post, which will focus more on the practice.