This post follows on from the previous two in this series (theory and path coverage). A key theme of these posts is that of modeling paths through an application and then using tests to exercise those paths. What we’ve seen so far is that while many path segments are covered repeatedly, others may only be touched once. You most likely won’t have time to test them all. Along with that, some paths will be more data-dependent than others. So a selection process is critical. Here I’ll talk more about that selection process.

Using the terminology I’ve already introduced, a path through a program is a node and edge sequence from the starting node to a terminal (ending) node of the control flow graph of a program. That’s how the academics will word it anyway and while it is accurate, it can be hard to see how to apply it. In this post, I’m going to try to give an example.

Before going on I’ll state that writing tests to cover all the paths of a typical application is usually impractical, at least depending on the nature of the application and the various states it can be in based on data that is input, processed and output. This inherent complexity is why testing based on path analysis usually does not require coverage of *all paths* but instead focuses on coverage of the *linearly independent paths*.

## Testing Independent Paths

I talk about this in the other posts, but to briefly recap, a linearly independent path is any path through the application that introduces at least one new edge that is not included in any other linearly independent path. That’s easy enough, right? But now consider this: if a path has one new node compared to all other linearly independent paths, then that path is also linearly independent. The reason for this is because any path having a new node automatically implies that it has a new edge. This means that a path that is a sub-path of another path is *not* considered to be a linearly independent path.

Still with me? It always feels like a challenge to make this stuff understandable. Before I move on to an example where I’ll show you a way to calculate independent paths (which I hope makes it a lot clearer!), you might be saying, *“What if I don’t have access to the source code like you’ve been showing in these posts? What if I’m not performing code-based unit testing?”*

Great question. My answer is: you can still do path analysis in terms of those aspects of code that filter up through a user interface. In the example of either the Greatest Common Divisor or Maximum Integer, which I showed in the last post, if there was a user interface, such as a form, that allowed a user to type in values to get a value back, you could do your path analysis based on that. Think to any web site you’ve used. Certainly you would be able to figure out and model paths through it even without having access to the logic behind it.

So now you might say: *“Well, okay, so if that’s the case, why am I even bothering to read these posts about path coverage and path analysis if I can just figure it out by looking at its user interface?”*

Another great question. My answer is: sometimes the user interface is not always amenable to that kind of analysis. A GUI might be (to some extent) but consider a web service or an algorithm engine. Even in the case of a GUI (whether desktop client, browser application, or mobile app), it’s very possible that certain particular functions are never called directly by any aspect of the user interface. I worked on a hedge fund application and a clinical trial application and, in both cases, there was vast amounts of stuff going on below the GUI that would never have been covered in path analysis if those paths were considered solely from the GUI.

So this is where communication between testers and developers can take place to make sure that effective test coverage is achieved. This is also where a distinction can be made between coverage that’s done at the unit (i.e., code) level and coverage that’s done at the integration level or acceptance levels. As many tester roles have shifted into SDETs (Software Development Engineer in Test) or some variation thereof, the ability to look into code becomes not just possible but also part of your job.

### Calculate the Independent Paths

What I want to show you now is how you can calculate the number of linearly independent paths through *any* structured system. That’s the key to seeing how to apply this yourself on your own applications. My goal here is to come up with a way to find the maximum number of linearly independent paths in a given part of a hypothetical application that I’m testing. That will allow me to structure test cases based on that maximum number.

It’s important to understand the idea of paths, so let’s go through that idea really quick with an isolated example. Say you have these paths:

**path 1:**1-11**path 2:**1-2-3-4-5-10-1-11**path 3:**1-2-3-6-8-9-10-1-11**path 4:**1-2-3-6-7-9-10-1-11

Note that each new path introduces a new edge — meaning, a line to new nodes on the path. Now consider this path:

**path 5:**1-2-3-4-5-10-1-2-3-6-8-9-10-1-11

Path 5 is *not* considered to be an independent path because it’s simply a combination of the already specified paths (paths 2 and 3) and does not traverse any new edges.

Paths 1, 2, 3, and 4 constitute a *basis set* for a possible control flow graph. That is, if tests can be designed to force execution of those initial four paths, which make up a basis set, then *every* statement in the program will have been guaranteed to be executed at least one time and every condition will have been executed on its true and false sides.

Path testing is sometimes referred to as basis path testing and now you know why. A basis set is a set of linearly independent test paths. Any path through the control flow graph can be formed as a combination of paths in the basis set. I should note that the basis set is not unique. In fact, a number of different basis sets can be derived for a given procedural design.

With that out of the way, consider this control flow graph:

Let’s calculate the independent paths through this part of the application. An important thing to take note of here before I start is that I’m going to do this calculation without having to know what the actual logic I’m modeling is. (This, by the way, is a *great* way to impress people you meet in a bar.) [Right now you are no doubt wondering what kind of bars I frequent.]

First you have to know the total number of nodes. This is simple to calculate:

- Nodes = Decisions + Processes

Nodes are where anything can happen and so the full count of nodes are those places where decisions are being made and those places where a specific algorithm or bit of logic is being performed. There are then three equations that you can use to calculate the independent paths.

- Independent Paths = Edges – Nodes + 2
- Independent Paths = Regions + 1
- Independent Paths = Decisions + 1

**Technical Note:** Those three equations are from graphing theory and they are used to calculate the number of linearly independent paths through any structured system. These three equations and the theory of linear independence are the work of a Dutch scholar named Claude Berge who introduced them in his work *Graphs and Hypergraphs*, which was published in 1973. Specifically, Berge’s graph theory defines the *cyclomatic number* v(G) of a strongly connected graph G with N nodes, E edges, and one connected component. This cyclomatic number is the number of linearly independent paths through the system. I bring all this up not because it really matters but mainly because you might have heard the term “cyclomatic complexity” thrown around.

The number of nodes is 6. These are the processes and decisions of the graph (p1, p2, p3, p4 and d1, d2). The number of edges is 7, which are indicated by e1 through e7 on the graph. So, taking the first calculation path above:

Independent Paths = Edges - Nodes + 2 Independent Paths = 7 - 6 + 2 Independent Paths = 3

You could also calculate by the number of regions. Areas bounded by edges and nodes are called regions. There are two regions in the graph. So:

Independent Paths = Regions + 1 Independent Paths = 2 + 1 Independent Paths = 3

You could also calculate by the number of decision nodes. There are two such nodes in the graph. So:

Independent Paths = Decisions + 1 Independent Paths = 2 + 1 Independent Paths = 3

I bring up these variations because different resources out there use different calculations and I’ve found people getting confused when what they were reading didn’t seem to add up. It’s essentially looking at it in three different ways but, note, getting the same answer. In fact, all three equations must be equal to the same number for the “logic circuit” of the application to be valid. If the system is not a valid logic circuit, it can’t work. I should also note that the first equation, regarding edges minus nodes, is usually given as:

Independent Paths = Edges - Nodes + 2P

Here P refers to the number of connected components. In this case, P is the number of unconnected parts of the graph. Usually this is going to be 1.

So now let’s go back to the Greatest Common Divisor example. Here’s the control flow graph again for easy reference:

And here is the code:

1 2 3 4 5 6 7 8 9 10 11 12 |
public class gcd { public static void main(String[] args) { float x = Float.valueOf(args[0]); float y = Float.valueOf(args[1]); while (x != y) { if (x > y) x = x - y; else y = y - x; } System.out.println("GCD: " + x); } } |

At a glance, it’s fairly obvious there are 6 nodes in the example. For the edges, you just count the connecting lines, of which there are 8.

Independent Paths = Edges - Nodes +2 Independent Paths = 8 - 7 + 2 Independent Paths = 3

There are 2 decision points: the IF construct and the WHILE construct.

Independent Paths = Decisions +1 Independent Paths = 2 + 1 Independent Paths = 3

A region is any area that is enclosed by a ‘nodes and edge’ sequence. In the Greatest Common Divisor example, there are 2 regions.

Independent Paths = Regions +1 Independent Paths = 2 + 1 Independent Paths = 3

Thus we see that from a complexity standpoint, the Greatest Common Divisor example and my previous example from above both have three linearly independent paths. A different structure, but the same complexity. Also note that in the one case, you had access to the source code and in the other you did not. Yet we were able to do some basis path analysis both ways.

Okay, but, so what, right? I mean, seriously: if I was on *Star Trek* I’d be like, “Dammit, Jim, I’m a tester not a mathematician!” Yet … consider that what all of this shows us is that if you determine the set of linearly independent paths you can write tests to force execution along each path. Here are some tests:

- x = 1, y = 1
- x = 1, y = 2
- x = 2, y = 1

With that test data set, all statements in the previous Greatest Common Divisor example are executed at least once, as such:

**Test Case 1:**Covers path 1, 6**Test Case 2:**Covers path 1,2,3,4,5,1,6**Test Case 3:**Covers path 1,2,4,5,1,6

Pretty exciting, huh? Now you wish you frequented the same bars I do.

## Data Conditions and Independent Paths

Let’s go back to the earlier example. Here’s the flow graph again for reference:

Keep in mind that I didn’t indicate what the functionality was. So consider Amazon. On the Amazon site …

- … you can attempt to locate a book via a search query (p1).
- … you get a book landing page and then you can decide what to do with that book.
- … one path is that you can decide (d1) to add the book to a wish list (p2).
- … or you could decide (d2) add the book to your shopping cart (p3).

The chart above seems to imply that either path you take will end with you at a certain common point (p4) where you can do further actions. You might argue that’s not the case since where you go after adding to a cart is different than where you go when adding to a wish list. However, consider that this only applies if you are a logged in user. If you are not a logged in user then an attempt at either action (p2 or p3) will require you to log in with your Amazon account (p4).

As you can see, we automatically have some benefit from considering a possible path through the application and realizing we already have four paths to consider:

- Adding to a cart while logged in.
- Adding to a cart when not logged in.
- Adding to a wish list while logged in.
- Adding to a wish list when not logged in.

We have also implicitly decided on some data conditions, such as a user with an account and a user without an account. After all, one of the paths for a user that has no account will be to create one whereas a user with an existing account will have a different path since they will use their existing account.

What gets tricky is when the number of paths can become uncertain and confused based on data conditions that are applied. That’s what the technique of data analysis would help you figure out and is beyond the scope of what I’m talking about here although I’m hoping to cover that in another post.

## Keep the Focus on Testing

What I’ve hopefuly showed you is that while you can calculate complexity based on relatively abstract paths through a given area of the application, you ultimately are going to want to know some specifics because those specifics will determine what data conditions you consider. And data conditions drive test conditions. However, what you also saw is that you can do all of this with the source code (as in the Greatest Common Divisor example) or without (as in the Amazon example).

Either way, you are dealing with the strong connection that exists between complexity and testing. A structured testing approach is predicated upon the idea of making this connection explicit. Complexity is a common source of bugs in any application. Complexity can be used as a direct input to how tests are allocated across the application by leveraging the connection between areas of the application that are more complex and the likelihood of the presence of bugs. This allows the testing process to be concentrated on the most bug-prone areas — or paths — of an application.

A possbile typo:

In the first paragraph of the part Testing Independent Paths, it reads “a linearly independent path is any path through the application that introduces at least one new node that is not included in any other linearly independent path” and then “But now consider this: if a path has one new node compared to all other linearly independent paths, then that path is also linearly independent. The reason for this is because any path having a new node automatically implies that it has a new edge.”.

The logic is a bit strange. Do you want to say “a linearly independent path is any path through the application that introduces at least one new *edge* that is not included in any other linearly independent path”? And it also match the point I find on another web page (https://ifs.host.cs.st-andrews.ac.uk/Books/SE9/Web/Testing/PathTest.html).

You are absolutely correct! My sincere thanks for spotting this and bringing it to my attention. I have amended the portion of the text accordingly.

Hi Jeff

This is a very nice explanation of a concept that has a fundamental flaw. When there are two or more branching nodes in series, it undercounts the number of distinct entry-exit paths by at least 1.

Here’s an example:

Graph with four entry-exit paths and V(G) == 3

A more detailed discussion is provided in my

Testing Object-oriented Systems, pp 376-380.Bob Binder

What if the flowgraph has a shape of two rhombuses having a common node…

i.e. node1 is predicate node, which results into node2 and node3; node2 and node3, both are followed by another predicate node, node4, then node4 is followed by node5 and node6 which in turn are followed by node7;

Now, if we try to calculate cyclomatic complexity, it’ll come out to be 3;

but what, if I include 1-2-4-5-7 and 1-3-4-6-7 only in my basis set of paths?

Will it be incomplete?

So if you look at it this way (based on your description):

Here the (p) refers to the predicate path. What I can end up with is:

Here I’m going down each basis path as per the predicate nodes. The key is that each predicate provides a path and if I want to be complete I need to explore each path that stems from it.

The detail of the “independent path” can be following:

“Each path has an associated row vector, with the elements corresponding to edges in the flow

graph. The value of each element is the number of times the edge is traversed by the path.

Consider the path described in Figure 2-3 through the graph in Figure 2-2. Since there are 15

edges in the graph, the vector has 15 elements. Seven of the edges are traversed exactly once

as part of the path, so those elements have value 1. The other eight edges were not traversed as

part of the path, so they have value 0.

Considering a set of several paths gives a matrix in which the columns correspond to edges

and the rows correspond to paths. From linear algebra, it is known that each matrix has a

unique rank (number of linearly independent rows) that is less than or equal to the number of

columns. This means that no matter how many of the (potentially infinite) number of possible

paths are added to the matrix, the rank can never exceed the number of edges in the graph. In

fact, the maximum value of this rank is exactly the cyclomatic complexity of the graph. A

minimal set of vectors (paths) with maximum rank is known as a basis, and a basis can also be

described as a linearly independent set of vectors that generate all vectors in the space by linear combination. This means that the cyclomatic complexity is the number of paths in any

independent set of paths that generate all possible paths by linear combination.

”

from mccabe-nist235r.pdf which explain the problem occurred in the flowgraph with a shape of two rhombuses and why two basis path is incomplete.