A Plea for Testability

Part of achieving quality in software means treating testability as a primary quality attribute. Once you do that, you can then adapt your requirements and development styles from that point of view. Whether you call that “agile”, “lean”, “scrappy” or whatever else is largely beside the point. The focus is on testability. But let’s talk about what that means.

Observability and controllability are the two key aspects of testability. Without them, it’s hard to say anything about correctness. And correctness, ultimately, is what we are attempting to talk about. There is the notion of “basic correctness”, which J.B. Rainsberger states well in Interlude: Basic Correctness:

“By basic correctness I refer to the myth of perfect technology: if I ran the system on perfect technology, would it (eventually) compute the right answer every time?”

But that notion of correctness broadens when we bring in the tolerances and sensitivities of both technology and people. Those tolerances and sensitivities influence the perception of quality and the ability to produce it. But how does that notion of “correctness” influence how we consider testability?

Well, I just said that testability is about the ability to observe and control. Any time you need to control and observe, you have to look at the number of moving parts that you have to control or observe. It would seem, at first glance, that the fewer of these there are, the easier it will be to test.

But is that true?

Minimize the Parts

If that is true, that means, as just one example, that microservices should be much harder to test because there is so much more distributed code connected up by interfaces. That sounds reasonable, right? But now let’s consider that a key aspect of this kind of design is that the distributed code tends to be a bit smaller or constrained in nature. Thus there is perhaps more places where the code is, but less code in each individual place.

So is testability, in this context, a case of where the code is located? How much code there is? And — importantly — does that translate into how many features we support with our code?

Well, on that last point, it can’t be just that. And this is mainly because of differences in code language. What it takes, programmatically, to support a feature in one programming language can be very different in another. So the “amount of code” is not really an issue.

Amount of Code

Actually, that’s a bit too simplistic, isn’t it? After all, it’s demonstrably the case that different programming styles can raise the level of abstraction which in turn can reduce the size of the codebase. Sometimes that reduction comes at the cost of the code still being expressive, however. (See my post on clarity trade-offs with code.) But the point here is that sometimes that reduction in codebase can make a difference to comprehension and thus to testability.

Okay, so the code is a concern. And part of that concern is around what I just said: testability.

So let’s ask this: what does code not designed with testability in mind look like? It tends to mean that the various bits of code (classes, functions, whatever) have multiple areas of responsibility. It’s also often the case that code without the testability attribute has elements that operate at different levels of abstraction at the same time. It also tends to be the case that the behavior and data of such code have multiple sources of truth and exhibit high coupling and low cohesion.

To be clear on those last two, you want low coupling (meaning different parts of the code have minimal dependencies on each other) and high cohesion (meaning code that is in the same logical unit, such as a class, is all related).

If the code is designed with testability in mind from the start and each bit of code has a single area of responsibility, then it tends to follow that all interesting abstractions and their functionality will be primary concepts in the code. This was largely the basis of Domain Driven-Design.

Design Pressure

The important point is that there’s a key pressure on design here. Putting it simply, the idea should be to create standalone abstractions with well-defined behavior. This is what TDD — testing as putting pressure on design at the code level — should be helping with.

But what does this mean for a tester?

Well, first of all a tester should be aware of these approaches — like being domain-driven and test-driven.

Secondly, it means those abstractions will operate on their own data types and domains. Why is that important? Because it means the abstractions will have their own boundary values and equivalence partitions. And that tends to mean the abstractions will have their own kind of error and exception handling.

Specialist testers know that you don’t just want tests starting at the boundary of the public interface, but rather at these lower levels because those lower levels will hit the targeted functionality using its own domains and abstractions. By the way, testers being aware of these concepts is largely what I mean when I say specialist testers have to be considered developers, as distinct from programmers.

All that sounds great … but hold on. Let’s back up a bit here.

Project Forces

We have some forces here: testability (akin to gravity) and time pressure (akin to friction). Lack of testability, often combined with time pressure, can and does result in bug-ridden and broken software. This happens because these are project forces.

As testers, we want to look for the sources that are the equivalent of friction and gravity; those things that apply both gradual and sudden pressures, some of which are unexpected. This can be many things beyond what I’m talking about here: changing requirements, bugs, tests that you can’t trust, single-points of failure (whether human or technology), test suites that are too large to reason about, environments that are hard to spin up, test data that cannot be relied on, and so on.

Design, collectively, refers to the forces that exert sudden and unexpected pressures on your work. This is another reason why I often talk about testing as a design activity rather than just as an execution activity.

So if we are going to deal with these pressures, we must have a good handle on what testability is.

Breaking Down Testability

Let’s start with a simple aspect. When I say “testable” — that means … what?

Put really simply, it means that something can be verified. That a question can be put to it and the means of answering that question will provide evidence that one can observe and reason about.

Okay, so if software is developed so that its behavior can be verified, what does that mean?

It can mean that it’s easy to confirm that the software supports a certain feature. It can mean that the software behaves correctly given a certain input. Or that it adheres to a specific contract. Or that it fulfills some constraint.

Okay, not bad, I guess. But let’s reduce that a bit further.

Let’s ask the question again: “Testable” means what?

And now let’s answer that with what we just said.

It means that the software can be put in a known state, acted on, and then observed. It also means that this can be done without affecting any other code or architecture elements and without those elements interfering. Thus to “be testable” has a spatial and temporal component to it. As in spatial coupling and temporal coupling.

Testers! Do you know what I mean by that? You should. Developers will most certainly know what I mean by that and, thus, testers should also. Specifically, we have to consider the dangers of spatial coupling (state) and temporal coupling (invocation).

Spatial Coupling

What this ultimately means is that one of the key aspects of what makes code “testable” — and this includes at the interface level — is the amount of direct and indirect input and output in the code and how it handles state.

Data Conditions

When the behavior of some bit of code is affected solely by values that have been passed in via its public interface, that code is said to operate on direct input. From a testing perspective, it means that the largest concern is to find relevant inputs to pass in as arguments to the interface, without caring about any other aspects or circumstances that may affect the behavior of the code.

Note there that the “public interface” can mean different things depending on your level of abstraction. This can be a method, it can be the messages between objects, the endpoints of an API, or the widgets on a GUI.

Direct Conditions

Direct output, as you would likely imagine, is analogous to direct input. Output is said to be direct if it’s observable through the code’s public interface. This, too, has a great impact on testability. It means that tests only need to query whatever the tested code exposes.

From a so-called black box perspective, testing on these direct input/outputs tends to amount to finding good equivalence classes and boundary values.

But the most important point here is that direct input/output is observable through the code’s public interface at the appropriate level of abstraction.

This makes testing easier, because the tests need only be concerned about passing in interesting arguments and checking the results, as opposed to looking at state changes and interactions with other bits of code, by which I mean other interface elements. And if other interface elements necessarily interfere then that’s most definitely telling you something about your design.

Indirect Conditions

Input is considered indirect if it isn’t supplied using the program element’s public interface. Collaborating objects are often the source of indirect input, but there are many other possible sources. Static variables/methods, system properties, files, databases, models, queues, and the system clock are all sources of indirect input.

Indirect output is any kind of output that isn’t observable through the public interface.

Indirect input/output cannot be observed through the public interface of a program element and requires tests to somehow intercept the values coming in to and going out from the tested object. This usually moves tests away from state-based testing to interaction-based testing.

Temporal Coupling

Temporal coupling is a close cousin of state and, in fact, you could argue it’s actually state in disguise. “Temporal” means that something has to do with time. In this case, it’s the time of invocation or, more specifically, the order of invocation.

Temporal coupling arises as soon as one program element needs something to have happened in another program element in order to function correctly. Temporal coupling becomes dangerous if the succession of invocations isn’t apparent.

Examples of this might be calling a method out of order which ends up putting the application in an invalid state. There might be, for example, temporal coupling between some validation logic and any logic that relies on what was validated.

As an example of that, it might be the case that how the data is passed around can be entirely correct, and how the data is validated can be entirely correct, but sequences of actions that rely on the validated data may kick in before the data is validated.

Practical Testing Thoughts

So let’s bring all this around to some practical testing thoughts that most specialist testers already (hopefully) understand outside of the specific context I’ve been talking about.

Types of Data

Consider that tests usually need two kinds of data: reference data and possibly some entity data. So-called “static” entity data, like user credentials, are often treated like reference data.

The main point is that unless something very specific is happening to the reference data, the tests should rely on the data being available and not be concerned with the setup of the data. For entity data, you can start by calling the component/service, such as an API, that can create or inject the entity. Alternatively, the use of builders or factories can be helpful.

All of this obviously relates directly to data conditions that are placed into test conditions. And this, as I hope is clear, gets into state and time considerations.

The important point is that the thinking of testers and programmers is aligned here. We are both, in effect, acting as developers when we think about the data.

Types of Outputs

We already know that there are some specific ways to frame test outputs:

Single values: Only one value is the correct response.
Range of values: The correct response is within a known range or interval.
Set of values: There are multiple correct values, corresponding to a set of finite size.
Predicate values: The correct response can be determined by a true or false.

That’s easy enough. We also know that you can start with the outputs or outcomes. Specifically, you can take the outputs that will deliver the most value, and work back to find the minimum set of features you need in order to produce the outcomes that use those outputs. Then you can expand your understanding so that you can cater for the variety of inputs or behavior that may affect the outcomes.

Once again, tester and programmer (thus developer) thinking aligns.

And this all seemed pretty obvious, I would hope, if you are a specialist tester. If it wasn’t obvious — and if it still isn’t — I urge you to think more about this.

Isolation

Okay, so now let’s consider a framing device we can use in the form of an operational question.

If I have systems that have isolable components with well-defined responsibilities …
… and I have a series of comprehensive component and/or service tests …
… does this allow me to reduce the need for end-to-end or system tests?

Here the notion of “isolability” means being able to isolate a given aspect of code under test, whether that aspect be a function, class, web service, or an entire system, such as a database, API, and so forth.

The above framing question is an interesting one to ask.

Constraints

Another observable is constraints. As just one example, contracts define constraints that apply throughout the execution of an application. Having to think about the produced code in terms of clients and suppliers (or consumers and producers, if you prefer) and the consequences of formalizing responsibilities is very important.

We need to think about where responsibilities lie and which part of the code should do what. And, again, when I say “code” here this can be talking at different levels of abstraction, including APIs, mobile screens, web pages, etc.

So That’s Observability

A lot of the above was about the observability aspect of testability.

With that, there’s also an observability to intrusion ratio to consider. The level of abstraction helps us classify observability in terms of increasing intrusiveness. To increase observability beyond the application’s obvious and less obvious inputs and outputs, we have to be willing to make some intrusions.

But how many? And where? And to what extent? And does this compromise what we are trying to observe?

And those considerations take us into controllability.

The Ability to Control …

Controllability is the ability to put something in a specific state, usually via a series of invocations. This is clearly of paramount importance to any kind of testing because it leads to reproducibility.

… And Thus Reproduce …

Reproducibility is another way of saying “understand under what conditions certain behaviors occur.” If we have that understanding, we can isolate behaviors and conditions, fix them, refactor them, and add to or subtract from them as part of feature development, and so on.

The ability to reproduce a given condition in a system, component, or class depends on the ability to isolate it and manipulate its internal state and to deal with how and when it is invoked.

… And Thus To Predict

The ability to control and verify along with the ability to reproduce leads to predictability. And I don’t think the notion of predictability needs too much expansion here in terms of why the ability to do this better, rather than worse, for our projects.

Wrapping Up

Noel Rappin, in his book Rails 5 Test Prescriptions: Build a Healthy Codebase, says this:

“Tests act as a universal client for the entire codebase, guiding all the code to have clean interactions between parts because the tests, acting as a third-party interloper, have to get in between all the parts of the code to work.”

And this is true whether we consider tests at the direct code level or at any level of abstraction.

For another point, Alexander Tarlinder in his book Developer Testing: Building Quality into Software says:

“Traditional testing has focused on the black box most often. When what has been needed is a focus on building windows into the black box and adding appropriate control levers that introduce as little intrusiveness as possible.”

This, to me, is another way of talking about the abstraction layer we want to work at. And being able to move between these abstraction layers is a key skill of specialist testers, particularly in an industry where they really have to be a developer with a particular set of focused skills working as part of a delivery team.

What we all share in common as part of this delivery team is a focus on testability as a primary quality attribute because, ultimately, that helps us observe and experiment with external and internal qualities, each of which is subject to various tolerances and sensitivities, both from the technology and the humans who benefit from (or suffer due to) that technology.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …