Integration and Integrated, Part 2

In the previous post in this series, I talked about the counterargument of there being no distinction between integration and integrated. That post ended on a question. In this post, I will start from the presumption that there is a distinction between the terms and explore that a bit.

Consider The Domains

So let’s start off with some specific terminology that I happen to believe is accurate:

An integration test is a test that checks several classes or functions together.

An integrated test is a test that checks several layers of abstraction, or subsystems, in combination.

This is distinct difference between the two. The second point is what testers tend to mean when they speak of “integration” while the first point is what developers tend to mean when they speak of “integration.”

Let’s focus on the term “integrated” for a bit because I’m willing to bet that it will be the one that causes the most cognitive friction. What are we really testing with an integrated test? The business rules? How the UI behaves? How we communicate with external systems? How we publish our business rules as a web API?

The problem — or, rather, the reality — is that an integrated test suite goes across several problem domains. As such it would seem we can’t — or at least shouldn’t — focus solely on any one of them.

Let’s Bring BDD In Right Away

This may seem like I’m going on an odd tangent, but bear with me.

Let’s consider this: in which domain language are we writing our “acceptance tests”, such as in a traditional BDD approach? If we decide to use a single language — for example, the business rules language — we lose a lot of scenarios at the UI level and at the web API level, because we (often) don’t want to express them in the business rules language. If we decide to mix the languages, then we (often) end up with very unreadable and hard-to-maintain BDD test suites.

Now let’s keep in mind an important point: an integrated test checks several problem domains at the same time. This is as opposed to checking a single problem domain in isolation.

That very distinction — coupled with the bullet points earlier — is why there should be an operational distinction between the programmatic notion of “integration” and the business notion of “integrated”.

Note my wording there: “the business notion of integrated.” I don’t say the “test notion of integrated”, to contrast with programmatic, and that’s because the idea of testing is present in both. But when a system is integrated is when that system is able to provide business value. This may drive home some of the points from the last post about particle collider components and different gears for different bicycles.

Now this gets interesting when you consider that you can slice your system into several subsystems, each one taking care of a single problem domain. I say this is “interesting” because it sounds like we’re going back to the isolated components that I was referring to in the last post.

But consider this: each one of these subsystems will have a cohesive language and will serve a subset of your stakeholders. Because of that, it’s (relatively) easy to define a very precise and possibly exhaustive BDD feature set for it. Or, if not doing BDD, a precise and possibly exhaustive test suite.

So that’s one step: slice your system into several subsystems. In the microservice arena, you might call these bounded contexts but let’s not get specific to any one implementation right now. There is another step. The second step is to slice each subsystem into abstraction layers. There will always be the domain layer where you implement a set of rules and data models to solve whatever the domain problem is. However, there will be one or more technical layers providing infrastructure services.

Scope Tests to Abstraction Layers

If you’ll forgive a bit of simplicity here, one aspect of internal quality means a focus on architecting and designing the system in a way that these layers can be tested independently of one another. These technical layers should be ideally very thin. So you can slice across problem domains to obtain a set of subsystems and abstraction layers.

That’s from a pure development perspective. From a testing perspective, for each one of these abstraction layers, you might create a different test suite. While doing this, however, you always face a particular question: Would this test suite give me enough of a return? And you can’t answer this question if you don’t have a clear sense of the cost of building the test suite. The goal should always be to write the tests that really pay off and avoid writing those that don’t give you much value.

A common anti-pattern is what’s often referred to as the inverted pyramid. We’ve all seen this, right? In this context, the anti-pattern is that there are little to no small-scoped tests, with all the coverage being in large-scoped tests. This is an anti-pattern because projects adhering to an inverted pyramid often have extremely slow test runs and thus very long feedback cycles. If these tests are run as part of continuous integration, you won’t get many builds and the nature of the build times means that the build can stay broken for a long period when something does break.

All this being said, the idea of scoping the appropriate amount of tests can be complicated because it’s hard to have a balanced conversation about the value something adds versus the burden it entails. The intelligent curation and management of larger-scoped, high-burden tests certainly falls into this category.

Yet I do think the test industry as a whole is realizing that you should focus on a small number of core journeys that serve as comprehensive tests for the whole system. Further, any functionality not covered in these core journeys needs to be covered in tests that analyze components or services in isolation from each other. These journeys need to focus on high-value interactions and be very few in number.

So the heuristic here is to separate your domain layers from the technical ones and slice across functional domains. This brings development thinking into testing but without losing the business focus of that testing. And, believe it or not, this actually has kept me on track for the point of this post.

Correctness and The Need for Integration

What I’ve been talking about here is “integrated” and what I hope I’ve shown so far is that the notion of “integrated” is something we, as testers, have been doing, even if we tended to refer to as “integration.” Whether that matters to you depends only if you’ve accepted my bullet point definitions at the start. But I’m going to assume you have not and plow forward a bit.

I’ll repeat here a point that J.B. Rainsberger has made: integrated testing is inefficient and ineffective when such tests are used for assessing basic correctness. This is a role that is much better suited to unit tests and maybe some level of integration tests. But what is “basic correctness”? You can check out this description of basic correctness if you want but the salient point there is this:

By basic correctness I refer to the myth of perfect technology: if I ran the system on perfect technology, would it (eventually) compute the right answer every time?

In the first post I was asking a question: if I could test all units, would I even need to integrate? The notion of basic correctness would argue that, yes, I would.

Although a single service or product may be entirely understandable and “correct”, the overall system is only comprehensible when the connectivity between those services and products is known. “Correctness” is only possible when the impacts of changes on the whole system can be identified and assessed. This applies on the safety side as well, where qualities like runtime scalability are concerned. Further, although individual components may be isolated and made resilient in a modularized or cohesive architecture, the system availability is not assured unless the interdependencies of the components are understood.

I think many of us have been in situations where testing is sometimes totally disconnected from the real value of the software. By the “real value” I refer to the behavior of the system as a whole, above and beyond merely how it functions. Think back to that first post with the mountain bike with the gears of a child’s bicycle. The bike could possibly function in such a configuration and, certainly, each component in isolation would be “correct.” But the value to a mountain biker — due to the overall behavior — would be missing.

You could argue that testing whether a low-level component works according to its technical design is not necessarily useful. You could equally argue that this approach only works in small systems where there is an almost one-to-one correlation between behavior and components. In fact, in small systems TDD (functional focus) and BDD (behavior focus) are very similar. This can prevent people from appreciating the difference between them. I also think this divide has served to mask the distinction between integration (functional focus) and integrated (behavior focus).

Testing to Prove the Value

Going back to value-adding and effective test suites, there are some interesting points to consider:

If a test fails, you should know which behaviors — not just functions — are broken.
If a test fails, you should detect the cluster of objects that needs to be fixed.
If there is a change in an existing behavior, you can easily locate which tests to change.

If you agree with those points, do you see why a distinction of integrated and integration can make sense, particularly in terms of discoverability at different levels of the domain problem (behavior) and the programmatic stack (functional)?

The Integrated Test Scam

I talked a bit about the integrated test scam before in terms of conceptualizing the intersections of testing.

Linking that idea with a few points from this post, the integrated tests scam is said to happen when you use integrated tests for feedback about the basic correctness of your system, and write them in place of lower-level tests that would give you better feedback about your design.

The actual “scam” part of this comes in when people try to add integrated tests in order to “fill in the gaps” between those lower-level tests. If the larger-scale, higher-level, tests are focused on, this often makes it easier for developers to write highly interdependent modules that in turn become more difficult to test with the low-level tests.

This is what aggravates the problem of the inverted test pyramid. And that is really important to understand. A lot of testers trot out the inverted pyramid and point to it as “what not to do.” But even more important is being able to articulate why it happens. That’s important because it often happens in very gradual manner.

So …. Is There a Distinction?

The main thing I wanted to do here was make sure you at least understood why I, and others, feel there is an operational distinction between “integration” and “integrated.” But there’s an interesting dilemma as well. We really can’t speak to the basic correctness of the system as a whole without integrated testing. Yet that testing is often the hardest to construct and maintain, not to mention execute in a way that is timely as the suite of integrated tests grows.

Lower-level tests are “unit” (which test objects in isolation) and “integration” (which test some aspects of objects talking to each other). Those are quicker to write and quicker to execute. But they are not enough because while they might point to functionality that is broken, the impact on behavior is what is going to matter more.

Developers often know that and so the “scam” part comes in where “integrated tests” start to plug in supposed gaps in the functional coverage so that we get behavioral coverage. The problem is we start conflating our ability to use tests as a means to put pressure on design in the most responsible way given the abstraction layer we are working at. The fact that “integration” and “integrated” were conflated terms is what allowed this “scam” to perpetuate as it has, at least in my opinion. Personally, I don’t think we should need the distinction but until we all get some clarity around the ideas, I believe that we do need it.

I’m fully aware that people reading this series of posts may not even see what I’m describing as a problem or, at the very least, feel I’m simply overstating it. I do believe, however, that the kind of thinking that leads to the “scam” is what makes our overall testing not as effective and efficient as it could be. In further posts in this series, I’m hoping that by talking it out I’m going to come to some better understanding of what we, as practitioners, can do about this.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …