Domain Testing and the Limits of Intuition

The technique of domain testing goes under several names, such as equivalence partitioning, boundary analysis, and category partitioning. And probably a few others that I’m either forgetting or never heard of. While this technique is a formal test design method, many testers will tend to do this technique somewhat intuitively. So what I want to do here is show the basis behind this intuition as well as a few simple working examples of how the technique is applied — with an emphasis on how intuition can fail.

Domain testing is basically a sampling technique for choosing a reasonable set of test cases from a (usually) very large set of possible test cases. As a technique, domain testing falls under the approach of functional testing where you view the application, or at least one area of it, as a function and test that function by feeding it inputs and evaluating its outputs. Key to this technique is minimum tests for maximum yield. In other words, you don’t want duplications of inputs that lead to the exact same outputs because your test set gets inflated without any increase in effective test coverage.

I should note that this technique can fit in quite nicely with path testing, particularly the notion of looking for and testing independent paths.

What is “equivalence”?

The essence of domain testing is that you partition a domain into sub-domains — equivalence classes — and then select representatives of each sub-domain for your tests. Creating equivalence partitions — another name for equivalence classes — is how a tester can reduce the number of test cases to a manageable size while still maintaining a reasonable amount of test coverage.

Here the term “equivalence” means that a set of inputs or a set of outputs are handled the same way by the application being tested. In other words, how the inputs/outputs are handled is equivalent. This means that, in theory, a set of tests can be created that test only a few of the inputs or outputs to see if there is a problem without having to test all of the possible inputs/outputs. And what this means is that all elements within an equivalence class are essentially the same for the purpose of testing. However what “essentially the same” means can differ.

Equivalence Depends on Context

It’s easier to see this by looking at examples. So for this first example, we’ll take a look at some code. Imagine this code exists in an application you are testing. So let’s consider these two methods:

int findInArray01 (int [] a, int valToFind) {
    for (int i = 0; i < a.length; i++) {
        if (a[i] == valToFind) return i;
    }
    return a.length;
}

int findInArray01 (int [] a, int valToFind) {

for (int i = 0; i < a.length; i++) {

if (a[i] == valToFind) return i;

}

return a.length;

}

and

int findInArray02 (int [] a, int valToFind) {
    for (int i = a.length - 1 ; i > 0; i--) {
        if (a[i] == valToFind) return i;
    }
    return -1;
}

int findInArray02 (int [] a, int valToFind) {

for (int i = a.length - 1 ; i > 0; i--) {

if (a[i] == valToFind) return i;

}

return -1;

}

These simply find some value in an array. So you can imagine that these methods are sitting behind some search form in your application.

Here’s a question: are these methods functionally the same or different? This isn’t a question of whether or not they are different code because they obviously are. The question is whether you could substitute one implementation for the other. That’s an important question because if you can do that, the implementations are equivalent.

For the curious: Before reading on, try to work through the methods with your own sample data.

If you try to work out tests for these methods, you’ll find that they do have different behavior.

If valToFind is not in the array:

findInArray01 returns the length
findInArray02 returns -1

If valToFind appears twice in the array:

findInArray01 returns the lower index
findInArray02 returns the higher index

If valToFind appears exactly once in the array:

findInArray01 returns the index of the value
findInArray02 returns the index of the value

Important point here: the two methods behave exactly the same in only one of the cases above. So if these bits of logic were in different areas of the application that both tried to find a value in an array, you could only treat them as equivalent in one case.

This means these implementations are not in an equivalence class in all situations. However, let’s say it wasn’t possible for the first two cases to manifest in the application. In that case, it would mean that the two methods are in an equivalence class.

The context here depends on how it is possible for these methods to be interacted with. The ways they can be interacted with is the way that you will be testing them. The extent to which you do more or less tests determines on how equivalent the results of those tests will be based on how equivalent the functionality you are testing is.

This is what makes domain testing a bit more complicated than people — including many testers — give it credit for. And this is exactly where you have to be careful that your intuition does not fail you.

The Conceptual Basis of Equivalence

So let’s back up a bit here and consider a bit of theory. While equivalence classes have a mathematical basis, the conceptual understanding of them is fairly simple. An equivalence class consists of a set of data that is treated the same by a module or that should produce the same results (output) when the module operates on the data. Here “module” can be as granular or as high-level as you want. A data value within a class is said to be equivalent, in terms of testing, to any other value in that same class if the end result of using that data would be the same.

Domain testing — equivalence partitioning — is then the act of creating test cases that are designed to execute representative tests from the equivalence classes.

The test technique of equivalence is equally applicable at the unit, integration, system, and acceptance test levels. All it requires are inputs or outputs that can be partitioned based on the system’s requirements. Equivalence testing can significantly reduce the number of test cases that must be created and executed but still allows you to maintain a high level of coverage. Yet … as you saw with the above example … it can get tricky. So let’s get less tricky for the next example.

A Simple Equivalence Example

Applying equivalence classes to testing means you are partitioning the input value set into classes (subsets) using some equivalence relation. Here I’ll provide the standard example many give when considering equivalence classes and boundary testing. Consider a numerical input variable, x, whose values may range from -200 through +200 and cannot have decimal values. Then a possible partitioning of testing the input variable may be:

-200 to -101
-100 to 0
1 to 100
101 to 200

Wait! I hear you asking: What about just beyond the boundaries, using values of -201 and 201? Good question. Let’s say the value of x was entered via a form where the form did validation of the number being entered. In that case, the user could not enter -201 and 201. But what if those invalid values could be injected via a service or a database? Well, that impacts your test consideration, doesn’t it? Notice how the context of the implementation matters?

If you want to get a little mathematical, an equivalence relation — call it R — is a relation defined on a data set — call it D — such that the data set has certain characteristics. We need an equivalence relation and so let’s say this is “same sign.” This refers to the sign of the number, where it can be either positive or negative. It must be one or the other and it cannot be both – except for that one standout case of zero. Given this, our partitioning will be:

-200 to -1 (negative sign)
0 (no sign)
1 to 200 (positive sign)

The “same sign” relation is an equivalence relation and it happens to be a simple one based on the type of data. You can imagine, of course, that your own business domain may have more complicated relations, like “valid policy with initial premium paid” or “multiple trades with butterfly spread” or “phase 2 clinical trials with therapeutic indications of cancer”.

A sample equivalence test set from the above “same sign” relation could be something like this:

-5; 0; 8
-123; 0; 64
-59; 0; 13
…

Obviously that could be a long list. The point here is that any one of those numbered test cases would be equivalent to any other, given the equivalence relation being tested. Which means, practically, speaking I only really need one such test.

Equivalence At Different Levels

Above I showed you a few examples. One was some methods about an array. Being able to see the code helped you determine that the methods were not equivalent in all cases. Another example involved a simple range of numeric values that broke down into specific equivalencies (negative, zero, positive). Now let’s look at another example. First let’s start with some two methods that calculate a stardate:

function calculateTNG() {
  stardate = $("#stardateValue").val()
  stardate = Math.abs(stardate);

  var staryear = Math.floor(stardate / 1000);
  var startime = stardate % 1000;

  var outyear = 2323 + staryear;
  var inyear = outyear;
  var length = (leapYear(inyear)) ? 31622400 : 31536000;

  var outtime = startime * length;
  var finalyear = Date.UTC(outyear, 0, 1, 0, 0, 0);
  var finaltime = finalyear + outtime;

  var outdate = new Date();
  outdate.setTime(finaltime);
  outdate = outdate.toGMTString();

  return outdate;
}

function calculateTNG() {

stardate = $("#stardateValue").val()

stardate = Math.abs(stardate);

var staryear = Math.floor(stardate / 1000);

var startime = stardate % 1000;

var outyear = 2323 + staryear;

var inyear = outyear;

var length = (leapYear(inyear)) ? 31622400 : 31536000;

var outtime = startime * length;

var finalyear = Date.UTC(outyear, 0, 1, 0, 0, 0);

var finaltime = finalyear + outtime;

var outdate = new Date();

outdate.setTime(finaltime);

outdate = outdate.toGMTString();

return outdate;

}

and

function calculateTNG_PerYear() {
  origin = new Date("July 5, 2318 12:00:00");
  stardate = $("#stardateValue").val()

  stardatesPerYear = stardate * 34367056.4;
  milliseconds = origin.getTime() + stardatesPerYear;

  result = new Date();
  result.setTime(milliseconds);

  return result;
}

function calculateTNG_PerYear() {

origin = new Date("July 5, 2318 12:00:00");

stardate = $("#stardateValue").val()

stardatesPerYear = stardate * 34367056.4;

milliseconds = origin.getTime() + stardatesPerYear;

result = new Date();

result.setTime(milliseconds);

return result;

}

Unlike my previous example with the array methods, this code is much harder to look at and determine if it will do the exact same thing. Why? Because it requires a bit more understanding of methods — Date(), Date.UTC(), getTime(), leapYear() and so on — that are being called that generate their own output that will in turn dictate the output of the above methods.

That said, as with the array example earlier, this isn’t a question of whether the code between the two methods is different since it clearly is. It’s a question of whether the code is equivalent in implementation. Your intuition probably tells you that the code is not functionally equivalent. To spare you the suspense, I will tell you that your intuition in that case would be correct.

That said, there’s more to consider here from a testing perspective. Consider that these methods are called from my own Stardate Calculator:

As you can see from the implementation (and only somewhat from the code), when you do a TNG calculation, there is an option to do the calculation in a “per year” manner or not. Is this functionally equivalent, though? Well, if you try to convert the provided stardate (42353.7) without using stardates per year your output is:

Mon, 10 May 2365 02:24:43 GMT

With stardates per year, your output is:

Wed Aug 19 2364 09:33:16 GMT-0500 (Central Daylight Time)

Here even if the code was functionally equivalent behind the scenes, which it’s not, the output is entirely different. Now, in this example I gave you the benefit of seeing the code so you could determine functional equivalence. You may not always have that opportunity. But you do have the opportunity to test inputs and outputs. Thus, in this case, what we’ve found is that you do not have an equivalence in the output domain even though the input will be identical. Meaning, the same input (42353.7) is fed in but different outputs result based on the path through the code I take. (Remember earlier how I said this ties into path testing?)

Speaking of input, let’s consider the input domain as a whole. We know with my earlier number range example that you needed negative, zero, and positive. It didn’t matter which negative number you used or which positive number. So applying that thinking to this example, how many stardates do you test? And how many for each path? Would it be okay to run just these two tests:

Convert 42353.7 using stardates per year
Convert 42353.7 without using stardates per year

Is that enough? In this case, unlike the number example, the boundaries are not necessarily clear. What about numbers without a decimal? What about numbers outside the range of 40000?

Here you have to know what the applicable business domain is. As it turns out, TNG stardates must be five digits and can, but do not have to, include a decimal value.

A key thing to notice is that in this example, knowing the code doesn’t really help you much, because even knowing that the code is not functionally equivalent only tells you that you have to test the methods independently. Knowing the implementation — i.e., looking at the form — helps you a bit because a sample input is provided. If you ran that input through the form, even if you had not seen the code, you would know there is a functional difference, but again that just tells you that you have to test the paths independently.

The differences that matter — the values of the input and output domain — are encoded in the business rules. For example, notice how with the output of stardates per year, the output says “(Central Daylight Time)” whereas the non-stardate-per-year approach does not. Does that mean one part of the code will eventually show “(Central Standard Time)” when we are not on daylight savings? And notice how the year in the output is entirely different between the two approaches? Is this always going to be the case, or are there boundaries where in fact the year should be the same in both cases?

Even playing around with the form to explore inputs and outputs may not tell you everything you need to know, at least not in a reasonable time frame. Once again, you can see that pure intuition will not necessarily help you through even if it does give you a good start.

Domain Testing is a Refined Technique

Testers — and those who rely on them — have to realize that what I show here is only part of what makes testing hard. I provided relatively simple examples and showed that there were potential complicating elements to each. Certainly it should be clear that as the domain of inputs and outputs increases, and as the business rules themselves become more complicated, the ability of doing effective and efficient domain testing scales up in complexity.

So, if nothing else, I hope this article has provided a measure of respect for a particular test technique that many dismiss as something “anyone can do because, after all, it’s really quite intuitive.” It is intuitive to an extent but ultimately the intuition must become more systematic and that requires the craftsmanship that goes along with thinking like a tester.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …