Effective Tests, Not Positive and Negative Tests

Written by Jeff Nyman 5 May 2012 5 Comments

My opinion is that the “positive” and “negative” distinctions for tests are a faulty conceptual distinction. To me, only “old school” testers talk about “positive testing” and “negative testing.” Pretty bluntly stated, huh? I feel strongly about this because the way the terms are promoted, they take focus away from how a tester should be thinking about testing. That’s my belief anyway. I’ll try to defend that.

As a simple exercise, do this. Look up the word “negative” in the dictionary. Consider what it means to put the word “negative” in front of the word “testing” and decide what this actually would mean. One of the things I want to show is that “negative testing” is often treated as a technique but it’s defined as a classification or an approach.

Boris Beizer once defined “negative testing” as “Testing aimed at showing software does not work.” I found that interesting because a lot of people would consider that a positive aim of the testing process. In fact, it’s largely the basis upon which you test an application. Now consider Elisabeth Hendrickson’s interesting distinction:

“Positive Testing: If a tree falls in the forest and someone is there, did it make a noise?”
“Negative Testing: If a tree doesn’t fall in the forest and someone is there, did it make a noise?”

Notions like “functional testing” (a test approach), “system testing” (a test type), or “regression” (a technique) can and should be defined well enough so that everyone knows what is meant by those activities. This works when the conceptual distinction between the terms is clear. I’ve seen testers attempt to define terms like “negative test” and “positive test” as if the conceptual distinction was clear. What I’ve usually found is that this distinction hinders thinking about tests. When I have terms that I use, I feel they should help me dictate the choices that I make in designing and running the tests as well as providing some understanding of the scope of my tests.

Whether or not you have a “positive test case” or a “negative test case,” you’re dealing with an input, an action, and an output. The action acts upon the input to derive a certain output. That means a good test is one that deals with those three things: pure and simple.

Conceptualizing Positive vs. Negative

What I believe, based on many years of operationally defining terms, is that any conceptualizing about what is a “negative test” and what is a “positive test” entirely depends on how you view (think about) a test.

Ask yourself this: are you saying the design of a given test determines whether it’s a positive or negative test or are you saying it’s the result of a given test that makes this determination?

I think the preference of many testers would be that it’s the thinking behind the test that should determine a notion of “positive” or “negative.” Yet, in my experience, the most effective testers think in terms of “what can I do to establish the level of risk?” If that type of thinking is held to, all concepts of “positive” and “negative” go out the window. I say that because you can’t artificially compartmentalize risk under terms that are more or less ambiguous. Replace “ambiguous” with “flexible” if you prefer a more ‘positive’ characterization to my ‘negative’ spin. (See how easy those terms are to throw around?)

Result-Based

One of the primary goals of a test is to communication information about how the application is functioning. Whether or not the application is behaving correctly or incorrectly, by the act of testing you determine what the application was actually doing and those are, by definition, good results. The result tells you about the application and that’s good (without recourse to terms like “positive” and “negative”). If the result tells you nothing about how the application is functioning, those are obviously bad results (and, again, this is without recourse to “positive” or “negative”).

But, even if the distinction is not necessary, is the distinction sufficient? That’s another question, one I’ll try to answer as I go.

Personally, I just apply the term effective to the types of tests that can inform me of what the application is doing and then I can say that all test cases (whether someone calls them “positive” or “negative”) should be effective. If they’re not, why am I running them? If they are, I should be running them. It really is that simple.

Intent-Based

I just talked about the result of the tests. What about the idea of relying on the thinking behind the test? I think this is important but I also think that this kind of concept is often presented in a way that’s a little too vague. I say this because how people think about testse can be more or less different, even on this issue, which can often depend on what people have been taught regarding these concepts.

That said, I do believe that it’s easy to show how you can transform a “positive test” mentality into a “negative test” mentality just by thinking about the results of the test differently.

This is where intent comes in.

Consider that if negative testing is thought to be solely about “disrupting an application” or “disrupting a piece of functionality,” even a positive test can do that if there’s a bug in the application or in that specific functionality. I suppose I’m being a little flip because with the notion of the thinking behind the test, obviously I would be talking about intent. The intent is to disrupt the application so as to cause a failure condition and that would constitute a “negative test” (by some people’s viewpoint) while a positive test would not be actively trying to disrupt the application — even though a disruption might, in fact, occur. The key differentiator there is the intent.

Now, this starts getting into the design that you’re testing against. What does the application handle? What should it handle? What are its limits? Boundary testing, as just one example, is a testing technique that could be said to make an attempt at disrupting an application. Why? Because you’re seeing if the application can handle boundary violations. That’s one way to look at it. On the other hand, it’s not necessarily an “attempt to disrupt” if what you are testing is that the application handles the boundary violations correctly (which it presumably should).

Yet note how this leads us right back to the results of the tests. That sort of speaks to the intention of what you are hoping to find but also how you view the problem in terms of the results you are expecting to see. If the disruption you tried to cause in the application is, in fact, handled by the application logic then you will get a so-called “positive” test result — an error message of some sort or an automatic filtering of the invalid input. The application did what it should, which is generate an error or handle the faulty input.

Now, if a boundary condition is not handled by the application logic, exercising that condition will potentially severely disrupt the application. While that may not have been the stated intent with this test (at least necessarily), that might be the result. Yet note the key point here: this is also a positive test result! You’ve learned something about the application and found how it can be disrupted. The application did not do what it should.

Intent or Result?

In both cases here, utilizing just one technique (boundary analysis), I believe that I’ve shown how you can derive a “negative” result and a “positive” result — with, I might add, the exact same test! So, once again, this is showing me that the distinction between “negative” and “positive” is not necessary and it’s not even sufficient.

The above example also shows why the distinction, for me, blurs in terms of “positive” and “negative” because what you end up with depends on how the application functions. But what you started with (the intent) is basically always checking to see how the application handles a variety of conditions (such as valid and invalid inputs). I argue that artificially compartmentalizing that into “negative conditions” and “positive conditions” is simply not an effective way to think about testing because it can be difficult to operationally define the concepts in a consistent manner.

Okay, but now, let’s forget all about intent of test case design for a moment and look at the distinction of what the results are in terms of a “positive result” (the application showed me an error when it should have) and what some would call a “negative result” (the application did not show me an error when it should have). What I’ve just argued is that this latter case isn’t a negative result and therefore isn’t a “negative test.”

When you learn something about how the application works, I can’t see how that’s a negative, either by intent or by result.

In a future post, I want to go through a few more examples around what I’m talking about but, for now, the overall point that ties together the intent of the tests and your expected results of the tests is that sometimes our intentions for tests are changed by the reality of what exists and what happens as a result of running the tests. Note, however, that this is the case with all effective tests. It doesn’t matter whether they’re called “positive” or “negative” and, in fact, as I argue, it’s meaningless to distinguish because the one can become the other. More seriously, I believe these distinctions force testers to draw lines in the sand where no such lines should be drawn.

5 thoughts on “Effective Tests, Not Positive and Negative Tests”

Jeroen Mengerink says:

7 May 2012 at 7:01 am

Good post! I couldn’t stop reading and I have to say that I agree with you. Each test that gives information on the SUT is a positive test. I’m going to try to stop using the terms positive and negative testing from now on!

Reply
Jonathan Locker says:

8 May 2012 at 11:52 am

I believe it is useful to break tests down into two categories; one where the intended functionality is used in the directed/prescribed way, and the other category where the system is used in a manner that should not result in successful use of the function under test.

Using expected/unexpected use would be incorrect, as many expected user actions are specified to cause a system error.

Perhaps a better categorisation would be pre-scribed use scenarios and non-prescirbed use scenarios. This has the benefit of indicating those tests that should allow the fulfilment of the intent of whatever feature/functions you are testing, and the expected user actions of fulfilling that intent.

While non-prescribed scenarios would include expected interaction with the system that should result in errors and frustration of intent.

Most product owners want to know whether the system can behave correctly under the prescribed scenarios, and that data and security integrity are maintained under the non-prescribed scenarios. But the advantage of these categorisations is that we are still expecting the system to behave in specified ways in both categories, and that a passing/positive/successful test would include the expected system behaviour.

So now a test where you enter the wrong password during login, and you get the message “wrong password” is no longer a negative test, but actually a positive one as the system behaved as expected. Additionally this allows a tester to explore non-prescribed scenarios, and determine if the actual system behaviour is expected and acceptable to the product owners, even if it does not follow previously marked out routes.

I agree that positive/negative testing may not be the best words to describe prescribed path/non-prescribed path testing, but I still believe that the categorisation is useful and can help testers, developers, and product owners think about how the system should actually work in the many “expected” ways a user might interact with the system.

Reply
1. Jeff Nyman says:
  
  8 May 2012 at 12:50 pm
  
  Interesting thoughts. Thanks for sharing.
  
  I guess to me, tests don’t suggest categories unless I break them into “effective” versus “ineffective.” Otherwise, I don’t care about their category, per se. If I categorize them it has to do with what the focus of their testing is: such as performance, security, usability, data integrity.
  
  So for example, with prescribed and non-prescribed, the key for me is that regardless of what we call them, if the application should allow a user to do something, then it should in fact allow it. If the application should not allow a user to do something, then it should in fact not allow it. Now how that action is allowed and how the user is informed that their action is not allowed are implementation details of the test itself.
  
  So my test library, for lack of a better term, has to be filled with effective tests that go through scenarios of likely and unlikely actions on the part of users.
  
  When you get to the level of the test itself, the expectation is not based on what the user is going to do but with what they did and how the system reacts. So whether what the user did was prescribed or non-prescribed, our software should handle the situation. In some cases, the non-prescribed may only be a temporary state. Maybe someone is trying to change data on a form that has to be saved before data is allowed to be modified. So here “modification of data before saving” is an invalid condition that the user may attempt to exercise. My test will prove that if they do they are given a suitable message.
  
  Let’s say my test found that the system didn’t handle this situation. In that case, I write up a bug saying that the system should. Even if I’m told this is a non-prescribed scenario or an unlikely one, the fact is the system should handle it. I don’t need to categorize a test by some distinction to make this clear. It’s an effective test because it tests an aspect of the system that should work in a certain way and it has a clear observable to a defined action. The intent of the user was to modify their data. That intent is foiled given an aspect of how the system works. My test is utilizing the user intent to check the system response (result) to it.
  
  Not sure if we’re agreeing or disagreeing. 🙂
Tatiana says:

10 May 2012 at 10:10 am

This is interesting Jeff because when I’ve encountered the term “negative testing” I’ve often found that developers think it’s “testers being too picky” or “testing too many ‘out there’ scenarios that are unlikely to ever happen.” Assuming that a tester has a ‘I’m going to break it” attitude, I wonder if that’s saying the same thing.

If I have a field that only takes certain values but then I purposely try to put in values that I know the field shouldn’t take, am I attempting “negative testing”? What if it’s a field where you can’t literally type in something too invalid? So maybe I could type in a bad value but I couldn’t type in a bad value that’s more than five characters if the field only accepts five characters. Do you see what I mean? In the one case I have a bad value that can be used in the field and in other case I have a bad field that can’t be used in the field.

I guess as I think more about it I wouldn’t call those “negative” or “positive” tests either, but I’ve heard testers do that. Now you’ve got me wondering…

Reply
1. Jeff Nyman says:
  
  14 May 2012 at 7:15 am
  
  Tatiana: with your example of a field that can accept values, you are also saying that the field cannot accept ALL values. Thus your testing is focusing on various aspects of valid and invalid conditions for that particular field. That is what you are testing: valid and invalid conditions. What you want to see is whether the application handles them.
  
  And the application should handle all of them: all valid and all invalid conditions. There is absolutely no benefit (in my mind) to calling any of these negative or positive. It’s like in science: if you have a concept that is not operationally useful (i.e., the concept of absolute rest or absolute motion), then you discard it.
  
  So let’s say I have that field and it should only accept characters and numbers. It should also check if there is a minimum of eight characters and/or numbers, but there must be at least one number. This is really just a variation on the idea of a field that accepts values between 1 and 100. In that case, I would try to enter 0 and 101. Those are invalid values. Presumably the application will check for valid values and respond accordingly for valid and invalid values.
  
  But what about your idea of invalid values that cannot be used at all. So with my first example, the field only accepts letters and numbers. It will not accept symbol characters. Now you could try typing them; but let’s say the field is designed to not even allow such characters to be entered. So they are invalid — but I can never submit them to the system because the field itself pre-validates its data. So I still have valid and invalid data and I still have effective tests.
  
  It would not be a “negative test” to try to enter symbols into that restricted field because, of course, the test is designed to make sure that the restriction is in place. If it’s not, you’ve found a bug. If it is, great — the field is working.
  
  Calling those tests positive or negative is meaningless, at least in my view. Or, put another way, regardless of whether you call them “positive” or “negative”, they are testing the same thing: certain data conditions. So I would rather focus on the nature of the conditions: valid and invalid because that, to me, is more descriptive and, ultimately, it is more explanatory.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …