The Intersections of Testing

A lot of testers I know come across Joe Rainsberger and his declaration that integrated tests are a scam. This always leads to interesting discussions so I figured I would use this post to distill my own thoughts particularly because the opinions of “scam-based testing” are usually predicated upon a profusion of testing terminology.

There’s a video presentation that Joe gave of many of these ideas as well. It’s quite important to note that Joe has specifically stated that he means “integrated tests, and not integration tests, are the scam.” This clarification has been posted a few times on his blog and he has stated his terminology may have been confusing.

Even more specifically in one of his articles Joe mentions:

When I refer to the integrated tests scam, I mean that we shouldn’t use integrated tests to check the basic correctness of our code. I’m talking only about programmer tests: tests designed to give the programmers confidence that their code does what they think they asked it to do. Some people call these ‘technical-facing tests’ or ‘developer tests’ or even ‘unit tests’. Whatever you call them, I mean the tests that only the programmers care about that help them feel confident that their code behaves the way they expect.

I want to help Joe drive his own points home because I agree with some of his thinking but I also think that the basis for how a developer like Joe conceptualizes testing can help testers who don’t work at the code level but who also have to interact with developers who do.

This is important because even testers are not always clear on this topic of integration testing. As a a case in point you might consider Reinventing Testing: What is Integration Testing? (part 2) where a good discussion took place regarding how testers via this concept. (Or, rather, part 2 was the distillation of those thoughts. See part 1 for much of the discussion.)

The Isolation Problem

Let’s first consider why “integrated tests” may have been a problem to begin with and thus a scam. I’m not going to repeat all of Joe’s logic or his thought formation. However, regarding the rationale for his overall argument, Joe says:

You write integrated tests because you can’t write perfect unit tests. You know this problem: all your unit tests pass, but someone finds a defect anyway. Sometimes you can explain this by finding an obvious unit test you simply missed, but sometimes you can’t. In those cases, you decide you need to write an integrated test to make sure that all the production implementations you use in the broken code path now work correctly together.

So the general argument there seems to be that you can’t find out what’s actually wrong with integrated tests, at least at a glance, because the integrated test itself is defocused over a set of functionality. So I can’t point to a location — right there! — and say that exact location is where the bug is.

But you can turn this around and say “unit testing is a scam” because of the opposite problem: a unit test failing does not necessarily tell me much about what exactly is failing from a user experience or whether the user will be impacted at all. I’ve seen applications that work just fine but that have plenty of failing unit tests. And, of course, I’ve seen applications that deliver a terrible user experience in a particular context and yet have hundreds of passing unit tests.

Unit testing may help with the design of the code itself but it doesn’t necessarily help at all with the design of the application from a user-facing perspective. This has long been a dilemma in the software development industry: how much testing to do and at what level of abstraction?

These kinds of discussions may seem entirely semantic but they do have impacts on the level of testing we consider as teams as well as how we consider we have “enough” test coverage, particularly when we consider tests that are unit, integrated, and whatever other terms we throw into the mix.

Micro and Macro Intersection

In trying to clear up the “scam” issue, I think Joe made an interesting distinction but not one you often hear:

The integrated tests scam happens when we use integrated tests for feedback about the basic correctness of our system, and write them in place of microtests that would give us better feedback about our design.

“Microtests,” huh? Okay. Let’s consider what that might mean.

But first I’d like any tester reading this to consider that it’s fantastic to get the insight into the developer way of looking at testing! This is exactly what non-developers need to be doing, particularly testers who often have to work with developers.

Micro Tests

Okay, so micro-tests. Here’s my view: micro-level tests are generally valuable and desired because they can provide the fastest feedback you can get. Why is that? Because micro-level tests don’t (or at least shouldn’t) trigger the use of any external resources, such as files, sockets, databases, screens, etc. At the very least, such tests minimize the amount of external resources they utilize. That’s what makes them so fast. Such tests only test small chunks of logic and, from a technical standpoint, should execute entirely in memory.

This means that hundreds, thousands, or even tens of thousands of tests can run very quickly. When the tests execute that quickly, it becomes possible to verify a lot of the logic in the application in a very short time. Now, the downside, of course, is that these tests normally don’t verify that a user can actually perform a business-value function.

Also, consider that tests on a micro-level can cause problems if they’re written a certain way. For example, if the tests use knowledge about the structure and implementation details of the code they test, they’re on the same level of detail as that code. This means that for every piece of code that needs to change, several tests have to change as well.

So keep this in mind: In order to get more high-level, business-facing feedback, you need a complement to micro-tests. These would presumably be called macro-level tests.

Macro Tests

Macro-level tests typically launch a GUI, click buttons, enter text, navigate around an application, just like a user would. By doing so, these tests exercise the application in what’s often called an “end-to-end” manner. Also, it’s also very common for these actions to store state in a persistent manner, generally by using a database. As a collection of macro tests are run, that collection verifies the business value of the application.

The downside of such tests is also their end-to-end characteristic because they often take a long time to execute. Yes, they’re able to provide loads of high-quality feedback, but that feedback arrives much more slowly than the feedback from micro-level tests. It’s certainly possible to make efforts to improve the execution speed. It’s also possible to tighten the feedback cycle to some extent. All that being said, these improvements tend to come at the price of the amount and quality of feedback the macro tests provide.

So another point to keep in mind: In order to get faster feedback, you need a complement to macro-tests. Those are the micro-tests we already talked about.

Terminology Intersection

It’s popular for testers and developers to come up with numerous categories that seemingly describe what their testing is actually doing, but this often seems to lead to a certain amount of confusion. In fact, there’s often a lot of confusion in the testing community over test types. These types are often conflated with test approaches and test techniques.

After all, just in the terminology we’ve been talking about here: what’s a unit? What’s too much versus too little integration? When is integration not system testing? When is system testing not integration testing? Is the latter even possible? Joe even brought up the confusion with integrated tests versus integration tests. Integration can mean at the code level (i.e., sets of interacting objects) or it can mean at the level of multiple parts working together (i.e. a database, middleware, and a web server). The distinction being made by Joe (and many others) is that integration tests are those that validate the integration of different components as well as external systems. Whereas integrated tests are those which attempt to do the work of validating units without fully isolating those units.

End to End? Or Edge to Edge?

Getting into other terms, an “edge-to-edge test” interacts with some part of the system, stubbing, mocking, or test doubling other parts as necessary. Whereas an “end-to-end test” interacts with the system only from the outside: through its user interface, by sending messages as if from third-party systems, by invoking its web services, and so on.

Are You a Collaborator? Or a Contract?

Then there are so-called “collaboration tests” which are tests that verify the interactions between a given “unit” (whatever that means) and its collaborators. Those collaborators can be mocked or otherwise replaced by test doubles. To complete those tests, there are so-called “contract tests” which verify that a certain “unit”, when given certain inputs, either produces a certain output or produces an internal state change that can be verified, or perhaps both.

Customer Tests and Programmer Tests

Some people have tried simplifying by just using two categories. “Customer Tests” are those tests that business people write to get confidence that a feature exists. “Programmer Tests” are those tests that programmers write to get confidence that the code does what they think they asked it to do.

Presumably we’d be safe including “unit tests” in Programmer Tests and most would say “acceptance tests” go in Customer Tests. But then where does integration go? How about integrated? How about contract or collaborator? A lot of these terms actually get framed by developers discussing testing at the code level — but wouldn’t these same conceptual terms have validity at the non-code level?

That last question is important and I want to come back to that because I believe there’s an idea out there that code-based terminology for testing does not apply to non-code based testing. This has been harmful in a variety of ways.

Add a Dash of TDD or BDD

On top of all this possible confusion, as an industry, we’ve then slapped overall approaches on top of this soup, TDD and BDD being the most prevalent. I’ve already questioned if we’re defocusing our practices too much so I won’t rehash that argument here.

The one thing I think many testers — and hopefully developers — would agree with is that testing is a means of communication at various intersections. Unit testing (via TDD) has become an intersection of design, coding, and debugging. Behavioral testing (via BDD) has become an intersection of requirements and design.

The intersection of communication part is important, I think, and speaks a bit to how code-based and non-code based testing concepts can be brought further together to allow more fruitful discussions between developers and testers.

Stay Tuned …

This was a lot to cover and it was admittedly more of a “setting up the conversation” type post. I wanted to showcase a lot of intersections of testing concepts and terminology. With that established, I will continue this post with a second, related post. That post will explore a bit more how to operationalize that last idea of “intersection of communication.” I want to look at a few examples and see where those can take us.

One thought on “The Intersections of Testing”

J. B. Rainsberger says:

24 March 2016 at 3:28 pm

I do not include “unit tests” inside Programmer Tests, since I have written and frequently write what James Shore called “customer unit tests”, which amount to tests of great interest to Customers (the XP word I use for approximately “business stakeholder” here) that nonetheless are also microtests. So far, these have always been examples of a calculation of particular interest to the Customer with significant numbers of complicated variations, but that I implement within a single module. I run the examples (provided by or at worst approved by the Customer) by running only and directly the single module that implements the calculation (microtests or “unit tests”). The Customer often enjoys running the examples, and from time to time ends up adding more complicated examples as they think of them. With a system like Fitnesse (in the old days), the Customer added rows to a table and re-ran the tests before sending me a nervous “We forgot something!” email. We will almost certainly have a small number of workflow tests (edge-to-edge or end-to-end) that shows that the calculation occurs at the right time and how success or failure affects the overall feature, but if there are 37 detailed variations, then I prefer to write and run those as microtests that double as Customer and Programmer tests.

This only matters inasmuch as I prefer not to conflate the issue of “microtests or macrotests?” with “Programmer tests or Customer tests?” They overlap a lot, but not entirely. Sometimes I care about the audience of the test (Programmer or Customer?) and sometimes I care about the execution speed of the test or the role the test plays (detect logic errors or unintended consequences? detect integration errors or miscommunication in which problem we’re trying to solve?). As if it might help, I grasp for clearer terms that more precisely articulate my concern at any given moment.

I await the rest of this series with interest!

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …