Finding Those Hidden Bugs

One area of testing that often gets neglected are the various ways that bugs can be ferreted out by considering how they “hide.” Testing an application is sometimes like walking a mine field, where you have to trigger the mines in order to see where they are hidden. Another analogy might be a virus hunter, who has to find signs of an outbreak by testing the population, some of whom are hiding to cover up the fact that they have a virus. Here I want to present some information about why testers (and developers!) should consider a technique known as hidden-fault analysis and, more importantly, show how the notion of hidden faults can skew test results.

I’ll start off by saying I’m not a huge fan of the term “fault” nor do I worry overly much about talking about “failure” as distinct from a “fault” as distinct from “error.” In some cases, perhaps you need those distinctions but I can usually get by just fine with the term “bug.”

The technique of hidden-fault analysis, like a closely related study called sensitivity analysis, is concerned with three things:

  1. Execution
  2. Infection
  3. Propagation

The idea here is that each “location” in an application has three attributes that a tester needs to be concerned with. Those are:

  1. The probability of the location being executed.
  2. The probability of infection occurring due to execution.
  3. The probability of propagation occurring due to infection.

Here just think of “location” however you want to. It could refer to a module (such as a set of source files that provide some functionality), a particular class, or a particular method.

I’ll be honest: I really struggle with how to make this seem interesting and practical. I’m still learning myself. So the most effective approach I can think of right now is just to tailor a little program so that you can see how this concept works. You’ll also be able to see that it’s quite possible for some testing techniques to miss “hidden” bugs — and that’s really my main goal here. So I wrote this little Java program called Quadratic. Here it is:

This program will highlight code-level issues that suggest unit testing needs to be considered in terms of hidden bug analysis as well. This is something I’m not seeing many testers advocate but, then again, I don’t see a lot of developers advocating it either.

If you were to run this program, you will be asked to enter “Coefficient a”, “Coefficient b” and “Coefficient c”. Upon doing so the program will indicate if there’s an integral solution to the equation. As you can see, I call this program Quadratic for a good reason as its job is quite simply to display an integral solution to the general quadratic equation, for any values of a, b, and c:

ax2 + bx + c

The only lines that will play a major part in my discussion have a line number in comment after the end of the line, which are lines 39, 46, and 68. Most importantly, the program has been “seeded” with a bug in line 68. Let’s check it out:

If you know how quadratics work, you’ll immediately recognize that the numerical literal in the equation should be 4 instead of 5. I mention this now because it’s relevant to show my next points. The reason for this bug is to highlight the idea of using sensitivity analysis in terms of computation, specifically related to unit level concerns. With that being said, I’ll now consider four separate situations or test execution scenarios.

  • Scenario 1: You run the program and enter the values 0, 3, 6 for the variables a, b, and c. In this case, since the value of a is 0, the line with the bug — which is called by line 39 — isn’t called at all. So here you’ve got a situation where the fault is not executed. Obviously what this shows is that you only potentially find the bugs that are actually executed. Here we were on the path to finding the bug, but our data condition intervened.
  • Scenario 2: You run the program and enter the values 3, 2, 0 for the variables a, b, and c. In this case, the bug is reached because line 39 does call the method that contains the error. However, since c is 0, line 68 — again, the bug line — will return a value of 0 anyway. Thus even though the bug is executed, it has no effect on the computation in this particular case. So here a bug is executed but there are no visible results. Thus you essentially have a false positive. In other words, your test executed the logic that had a bug but the bug stayed hidden.
  • Scenario 3: You run the program and enter the values 1, -1, -12 for the variables a, b, and c. In this case, the value of iReturn in line 69 is actually 61. (In reality, if line 68 did not contain the seeded bug, the value of iReturn should be 49.) That means the error, which is a data-state error, is carried to line 46. However, notice that, in either case, the value of 7 is computed from (int)Math.sqrt(fDiscriminant). In other words, you do have propagation but the propagation didn’t continue to the actual output. As in Scenario 2, you have a bug executed but no visible results and you end up with a false positive.
  • Scenario 4: You run the program and enter the values 10, 0, 10 for the variables a, b, and c. In this case, the bug is certainly executed — just as it was in scenarios 2 and 3 — and the data state is infected, and there is propagation and this time the propagation reaches the output.

So what does this tell us?

Well, one of the first things it should tell you is that testing is most often not predicated upon test conditions so much as it is upon data conditions. Those data conditions can hide problems.

Another thing to notice here is that in Scenario 4, the same correct answer is received, whether or not line 68 contains the bug. This should tell you that the line is important to the difference between Scenario 3 and Scenario 4 — both of which return the correct answer. The difference is that in Scenario 3 the computation worked out right — the answer 7 was derived — even with the bug. In the case of Scenario 4, the computation was incorrect. With the bug, the value of the discriminant was -500. Without the bug, the value of the discriminant was -400.

Pay Attention! In either case, the correct answer “There is no integral solution” was reached but it was reached for the wrong reasons and, more importantly, the bug itself did not actually visibly manifest, even though, technically speaking, the output was in error based on the values it had. That means the bug did, in fact, manifest — at least in the strict sense.

What these examples show is what I was talking about previously: execution, infection, propagation. These are what make up the bulwark of sensitivity analysis and this starts to show the basis of hidden bug analysis, something that I feel every good test team should have as part of its toolbox.

One question, of course, is this: “Could this issue have been found during a code review?”

Well, obviously we can say that if the bug is in the code then, almost by definition, it’s possible to find it. Whether you actually do find it or not is a different matter. For example, you could have a bug like this in code that is the result only of integration with another module, in which case a code review might not ferret out the issue.

So there are three main things to consider with code reviews:

  1. Sometimes code reviews are not done.
  2. Sometimes things are missed in code reviews.
  3. Sometimes an issue could not be found in a single code review.

So, on that basis alone, you have bugs that are not going to be found at an earlier stage, even if they should be. A good tester should understand that and then have ways to not only mitigate that but be able to assert and validate those types of situations in a later phase of testing.

What I want to show here is that bugs can hide even when you use varying test data, such as equivalence partitioning, and even when you have seemingly excellent coverage. In the example above, I showed four different aspects of coverage, each with different test data, and each one displayed a different aspect of bug hiding. Keep in mind that all of this was done with just a simple little program. So imagine how much more relevant this could be with larger, more complex applications with much more inherent interaction and more combinations and/or permutations of inputs.

The other thing I wanted to at least hint at is that there’s a statistical nature to relating test coverage to bug coverage. If you know how to do that and if you know how to apply hidden-fault analysis — again, sometimes called sensitivity analysis — then you can make predictions about two things:

  • The bugs you are not going to find.
  • The bugs that you most likely still have to find.

The first one is a cautionary statement of risk analysis and the second is a proactive statement of test estimation. While I’m not doing into detail of this here, I believe that these ideas are how you can tie in risk analysis with test estimation beyond just a simple demarcation of certain areas in an application that are deemed “critical.” Those critical areas will potentially have hidden bugs and what I’m showing here is that using techniques that take into account hidden bugs can be another layer to your risk analysis.

This was my first attempt to talk about hidden bugs and I probably need to refine my approach a bit. I do want to explore this area more later on but if the above just gave you a taste of the idea or even just introduced some new concepts to you, I’m more than happy with that.


This article was written by Jeff Nyman

Anything I put here is an approximation of the truth. You're getting a particular view of myself ... and it's the view I'm choosing to present to you. If you've never met me before in person, please realize I'm not the same in person as I am in writing. That's because I can only put part of myself down into words. If you have met me before in person then I'd ask you to consider that the view you've formed that way and the view you come to by reading what I say here may, in fact, both be true. I'd advise that you not automatically discard either viewpoint when they conflict or accept either as truth when they agree.

6 thoughts on “Finding Those Hidden Bugs”

  1. Interesting article, Jeff. I’m curious how you’d go about calculating the probability of execution, infection, and propagation in the first place. You mention those as three attributes that a tester would have to be concerned with but I don’t really see how you’d go about doing this.

    1. Yeah, this is area that I didn’t trust myself to write up too much because I sort of go counter to the grain of many other testers on this. Depending on who you ask, some people get all technical about it and use various prediction algorithms to calculate these probabilities. Sensitivity analysis, which I brought up a couple of times, is one. Other testers use so-called fault exposure ratios. Here’s my take: let’s assume that the probabilities don’t matter as much as simply identifying the bug in the first place.

      With my example program, if you had a requirements document with the proper formula (in my case, the quadratic formula) and the application to validate (in my case,, how would you properly select test data that would weed out bugs (whether hidden or not)? In this case, if the tester had the proper formula, a code review would weed that out because the formula is patently wrong: it had a 5 instead of a 4. So the probability there becomes 100% that a bug exists!

      But let’s just assume you’re left with selecting effective test data without that crucial piece of knowledge being known. Well, that’s basically the technique of partitioning sets of data that are relevant to testing the domain. In this case, you have to apply data to quadratics based on the logic of how quadratics are supposed to work.

      That’s why there is one test case with a 0 at the end and another with 0 at the front. That’s a traditional test for quadratics. You also try one in the middle (“coefficient b”). Ideally, you should have a negative number at each end, while the other two are positive. Then have a negative in the middle, while the outers are positive. Then have all negatives. That’s just testing for latent problems with the calculation itself. This is really where you cover your boundaries by doing some data analysis. That can be translated into path analysis. You could also do the reverse: start with path analysis and derive the data analysis from that.

  2. So then path analysis and data analysis are techniques for working out whether it’s even possible for bugs to “hide” maybe?

    1. Most certainly! You can start speaking about probabilities that a bug would exist — or could remain hidden — given the paths you are following and the test data you are using.

      But do keep in mind that my example showed that if a code review (or code-based unit testing) is not done them it’s quite possible for a bug to be hidden to the extent that certain test data might not uncover it even when coverage is more or less complete. You can have execution and infection but not propagation. Bugs can be tricky little things and sometimes you need to tailor your testing net accordingly to catch them. So being able to look into the code is a great area for tests; it’s one of the most responsible places in which to put tests. Being able to look at the UI is another great area for tests; but it’s not always the most responsible place for tests. Still: various techniques traverse so-called white-box and black-box boundaries. Path analysis, data analysis, sensitivity analysis, and so on.

    2. So couldn’t the execution probability be determined by monitoring the code with coverage tools or even just the execution of specific test cases?

    3. Sure; you could also do path analysis ahead of time to make sure that those paths are likely to be executed. If you are certain you have good path coverage, then you can apply varying data conditions along those paths and see what happens. Again, if you are covering all the paths then the execution probability is 100%. If you have effective test data to permute the path execution, the infection probability can be 100%. The propagation probability then makes little difference (to me, at least) because whether or not the bug manifests to the user interface, I now know whether or not it is there.

      Incidentally, the infection analysis can be done with another technique I brought up regarding mutation testing. You mutate the code and compare the data produced by the mutants to the data created by the base code. If a mutant changes the data, it’s considered an infection. The probability of a mutant infecting data is the infection estimate. The propagation probability is the likelihood of a variable affecting the output and is measured with random perturbations. This suggests the highest infection rate can be accomplished by mutating code that affects variables with a high rate of propagation.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.