Modern Testing and the Artifact Crutch

I’ve gone through a lot of posts on modern testing and I’m nearing the conclusion of my thoughts on this. (Or so we can hope, right?) Here I’ll recap a bit and then push forward.

I started with the idea of design pressure and sources of truth. But let’s break some of that down again.

A large part of what I wanted to do was have readers focus on figuring out how to minimize all the artifacts and tools that they use now. Taking that to a logical extreme, I wanted readers to consider what it would take for documentation or those artifacts to become completely unnecessary. While asking and attempting to answer this question, there are always two paths to consider:

From customer conversation down to where it intersects with code.
From code conversation up to where it intersects with the user.

A mediating influence on these conversations are tests. But tests are not just there to detect changes in behavior. Instead tests are there to specify and elaborate on the behavior — often and most helpfully — before the actual behavior is created. Even if not before, these tests definitely must be concurrent with the creation of the behavior.

Concurrent with? Hmm. Now, is that how most test teams operate? Keep in mind I’m talking about tests at various levels, not just unit tests that can be created “test first” or “test concurrent.” I’m talking about the evolution of tests at various levels being done along with the development code. But in order for this to happen, I do believe a large portion of those tests will be in code themselves.

But hold on — let’s back up a bit.

Concurrent and Interleaved Testing

Let’s go back to that concurrent part with tests at various levels. Does that make sense? I would argue yes — if you treat testing as a design activity that puts pressure on design.

It’s important that test teams see and understand the value of this. It is critical that test teams recognize that design occurs at various levels and is interleaved with the development activities that turn the design into a working implementation. Collectively, we’re always testing our designs. Anyone who thinks about something to be designed is, by definition, exploring the problem space with an eye towards how specific functionality adds value.

This means testing is done by various people at various times. And it’s not just people with the word “Tester” in their job title who do this testing. But this testing isn’t always applying pressure. And that’s where people with a testing speciality come into the mix: to guide the “natural testing” we all do into a more “systematic testing.”

Any act of testing acts as a means to aid communications, to drive a clean, simple design, to understand the domain, to focus on and deliver the business value, and to give everyone involved clarity and confidence that the value is being delivered. When you can do this, this is the democratization of testing.

In short, we — collectively as a team — make a series of decisions on a project and those decisions become realized in the features that, taken together, we call a product or service. Those decisions need to be encoded as artifacts. But too many such artifacts become a crutch that can hinder our ability to refine or even outright throw away our decisions. Testers, and their tooling, have to move at the speed of those decisions. To allow this to happen, effective and efficient testing must work to minimize those artifacts thus preventing the crutches from being used. That minimization is what allows the democratization but also the resilience that I talked about as part of this series of posts.

Tests as Code

Now let’s bring that back to what I said above about how “a large portion of those tests will be in code themselves.” Here I don’t want readers to take this as me saying all tests are automation code, with the sense of “automation” being strictly in terms of executing checks against a functioning application. But if not that, then what do I mean?

I do believe that testing, as a function, needs to be more push, rather than pull. At one level, this means test teams have to put pressure on design at a pace that matches how product teams specify and how development teams write code. This means it is important to interleave product, test and development activities in a tight cycle. The more tightly your interleave, the less room you have to create artifact crutches.

This kind of thinking means the test code I’m talking about must be capable of being reflective of the design as it evolves and capable of being changed rapidly with that design. This means the test code itself — and particularly the artifacts that support it — should be as minimal as well.

In the previous post I talked about getting a little more prescriptive rather than descriptive. I’m building up to that a bit here, so bear with me.

Key to what I’m saying here is that our production code — that which developers produced — will ultimately encode business intent (‘why’) and demonstrate business understanding (‘what’). Test code should do the exact same thing. Development code shows the how (how it actually works), test code also shows the how but also confirms or validates it.

This allows us to focus on a mantra I brought up before: document ‘Why’, specify ‘What’, automate ‘How’. A key part of that is the ‘how’. Ultimately, no matter what else we produce as part of a project, in the technical disciplines we are in, the ultimate source of truth is going to be that which is directly executable. This is simply fact.

From a resilience standpoint, test teams must be resilient enough to recognize this fact and support the kind of approach that carries this fact to its logical conclusions.

So now let’s start talking about this.

Testing as Bottleneck

Let’s consider — and hopefully agree — that the shape of modern development is definitely leaning towards the idea of constant communication with minimal documentation artifacts. This is certainly what the whole “agile movement” has been leaning towards for some time and the “lean movement” is simply a refinement of that approach. I actually look at agile and lean as means of removing many of the conceptual or tangible crutches we often saddle ourselves with.

Let’s also agree that product and development generally want the flexibility to change, largely on-demand, without causing testing to become a bottleneck.

But testing often is a bottleneck. Even in so-called agile environments. Even in environments that claim to practice BDD. But why is that?

I maintain that people who don’t really get modern testing — or, quite frankly, don’t want to do it — tend to want to turn it into a management problem or a programming problem instead. Thus they put emphasis on tracking tools or automation tools. What I’ll call “traditional testers” are often the ones requesting more artifacts to be created or creating said artifacts themselves. They want copious requirements. They may even want business requirements and then technical requirements. They generally want a slew of test cases. They want all those things to be traceable to each other and a test management system to exist that allows all this to be “managed.”

It might even sound reasonable to some people, given their experiences. The problem I’ve found — and what I believe hampers modern testing — is that these artifacts start to serve as the crutches this post is talking about. And those crutches start to dictate the options, and thus the choices, of the test team.

Reframe Documentation

Consider that different artifacts being created tends to suggest separate activities to create them. Separate activities induce a lot of waste and lost opportunities. Basically the same knowledge is manipulated during each activity, but in different forms and in different artifacts, probably with some amount of duplication. And this “same” knowledge can evolve during the process itself, which may then cause inconsistencies among the different artifacts.

There is the original source of truth — usually the requirements in a traditional project — and all the copies that duplicate this knowledge in various forms. Unfortunately, when one artifact changes, for example the code, it is hard to remember to update the other documents. As a result the documentation quickly becomes obsolete, and you end up with an incomplete documentation that you can’t trust. Documentation that you can’t trust rapidly stops being read, which means it ossifies even more.

Now there’s two ways we, as a discipline within an industry, can deal with this.

Refine the product team + development team notion of discoverable artifacts.
Refine and treat code+tests as specifications.

If we go the former approach, you tend to get into BDD concepts. What about with the second approach? Well, above I said, the original source of truth is usually the requirements. But what if the original source of truth is not the requirements, but the code?

I’ve found many test teams resist this. Instead we end up with an industry that says, instead, what if the source of truth is still the documentation but what if the documentation is executable? It’s not really the documentation that’s executable in these schemes, of course. Rather, it’s documentation that’s instrumented in particular ways and that is executable, ultimately, as code.

Okay, fine — but then why not just have code be executable as code rather than as documentation? And, instead, focus on having code serve as documentation — or, rather, a way to generate documentation. That last point is key and goes back to what I’ve talked about in prior posts: push up rather than pull down. Don’t necessarily consume the documentation (i.e., the requirements) but rather be able to generate a representation of the documentation.

This has interesting ramifications because — and this is the important part for me — it means the non-code documentation that serves as a guide to what gets coded can become more and more minimal. It needs to do just enough to document ‘why’. The code, as it is constructed in close contact with the product team and the test team, starts to elaborate on the ‘what’. It reflects the solutions that developers and testers feel can provide for the business and deliver the value.

Minimal Documentation, Code as Specification

So now let’s return to a question I posed before: “What does testing look like if we go with the idea of minimal documentation and production/test code as the most reliable specification?”

I really hope it’s clear how we got to this point because this is returning back to how I started this entire series of posts.

Testing, as a communication activity, and tests, as a design artifact, are used as material to trigger conversations and as a support to explain how things are, why they are this way, and to suggest improvements. The goal is, at all times, making every decision explicit, where the consequences are reflected throughout the code — both production and test. That code — again, both production and test — is the first-class citizen as is anything that supports the creation of this code. Not just the consequences, but also the rationale, context and the associated business stakes expressed, or even modeled, using all the expressiveness of the tests and code as a documentation media.

But it comes down to that code at all times. And it comes down to the idea that code can generate documentation about what it does and how it does it. And the good thing is that code is always executable. So we can always have up-to-date documentation and we can always tell if that documentation is not accurate, because the code will no longer work. “Ah, but Jeff,” I hear you say, “this is madness you utter. What if the code is not doing what it should? The code may work, as in function, but not be providing the business value.”

True enough! That’s why test code must be a first-class citizen with the production-code. Incidentally, this can also be a good argument why for what I talked about regarding your test programming language differing from your production programming language. But let’s not lose our thread of discussion here.

Code — both development and test — become the specification. The code is ultimately the source of truth. After all, what the code does is what it actually does. You don’t get any more immediate than that. But how do you know that what it’s doing, it is doing correctly? That’s where the tests come in. Going back to my point about varying levels of tests, these tests could be unit tests, integration tests, integrated tests, system tests, and so on. Basically any test that falls into the category of “functional” or “behavioral” — and that’s pretty much all of them — provides a working specification of the system.

If we minimize what we consider a “specification” — and we make that specification directly executable in its native form — then we have reduced the need for multiple artifacts and reduced the chances of artifacts becoming crutches.

Tests + Code = Source of Truth

In the context of our product development, if somebody comes and asks you how something works, my personal belief is that you should be able to open a test that can answer their question. If you can’t, and if you are treating tests as a form of executable documentation, this means you probably have some missing documentation.

A nice corollary to this is that our existing (code) documentation for the existing functionality doesn’t change unless the functionality changes. You want to be in a state where every test failure means that there is an undocumented change to the system.

But … wait a second! … isn’t this just the Living Documentation approach?

Here’s my concern with that wording. Some people say that the term “Living Documentation” came out of the idea of the Specification by Example approach. This is absolutely not true and, putting on my insensitively blunt hat for a second, it’s somewhat hard to believe anyone thinks the term “living document” only came around as a result of a software practice.

However, in the software development world, the context for “Living Documentation” refers to the idea that an example of behavior can be used for documentation and is also used as an automated test. The proposed benefit of this approach is that whenever the test fails, it signals the documentation is no longer in sync with the code.

However, what you’re really talking about here is documentation that can evolve at the pace of decisions that are made and reflected in code.

That may sound like a semantic nitpick but living documentation can still fall apart pretty easily. I would argue the BDD conflation of “living documentation” and “executable specification” proves this quite a bit when you deal with large projects, particularly those with many business rules or a lot of logic with complex data conditions. You end up doing what I said before: having teams use these artifacts and tools as crutches.

Tests, Checks, and Encoding Decisions

So perhaps a good question to ask is whether you can produce meaningful living documentation by observing the execution of automated tests as they run against a product or service? However, test teams need to take this to the next level: the automation tests — i.e., the tests written as code — can provide that documentation even if they are not running against a product or service.

This requires reframing automation not just as a technique to execute tests but as a technique by which to produce documentation that the tests inherently encode. That documentation is really the collective decisions about behavior that were agreed upon as being value-adding.

Even if you buy all this, however, if the test design and writing is not interleaved with development activities, then it’s possible for there to be drift. That’s why test teams and development teams must interleave their activities.

This means you want a testing framework that is capable of handling both manual tests and automated tests. Because let’s face it: not everything can be automated. But remember what I said: this isn’t just automation that deals with execution, it also deals with the documentation of behavior in the form of tests. As such, the distinction between “manual test” and “automated test” is simply one of technique.

Or, rather, maybe this is how we deal with the “check vs test” debate that I jumped into a bit myself, when discussing automation being a technique, not a test. The fact is that a manual test and an automated test are really just checks. What differs is what is doing the execution: a human or a machine. The testing part comes in when we have the conversations and collaborations to decide what in fact should ultimately be checked. Test tooling comes in with how that information is encoded as code and what kind of framework will support that encoding. Tests should be reflective of the work that was done by development. This means the testing framework — whatever that happens to be — must be good enough to quickly encode the business understanding and push the business intent.

But what about that idea that development teams, or product teams, may need to change their mind at any time?

Test Tooling Design Matters

Well, this need to allow for rapid change means test teams need to stop thinking in terms of monolithic frameworks for automation and start thinking in terms of composable microframeworks. And the logic of these microframeworks must do a few things:

Provide business-focused separation of concerns
Place emphasis on a narrative coding style
Adhere to a strict Single Responsibility Principle
Adhered to a strict Open-Closed Principle

There would be a whole lot to unpack with each of those statements. But I do believe that if a test team goes in with this focus, they stand a very good chance of not having their automation tool or their automated tests — which are two different things, keep in mind — become artifact crutches.

Just to take a few of those points home, consider the first one. I think good test microframeworks break things down into Goals, Tasks, and Actions. The reason I say this is because this matches the UX/UI design focus as well as the product and business analysis focus. Focusing on that a bit consider that this has a nice breakdown in terms of how we gather information and how we construct models to help automated that information:

Goals: What are you trying to achieve? (tests or scenarios)
Tasks: What do you need to do to achieve this goal? (steps)
Actions: What interactions with the system do you need for each task? (page objects, service objects, etc)

Beyond those particulars, this way of thinking indicates that the means by which we encode the business domain matches the ways by which the business tends to articulate that domain.

Consider the second point about the narrative coding style. Any programmer knows you spend more time maintaining than you do writing. This requires a careful design. Responsible programmers know that it’s just as important — if not more so — that our programs be readable by humans as it is that they be correctly executed by computers.

Both of these points, taken together, point us — even if only peripherally — to the concepts of domain-driven design. Specifically, we identify the code and the modeling of the business domain, breaking with the tradition of models kept separately from the code. One consequence is that we expect from the code to tell the whole story about the domain. And if the code can push that information up, you are in a good position to do this.

Regarding the last points in that list, the Single Responsibility Principle and the Open-Closed Principle, this is where effective test teams start to meet and meld with effective development teams. Those two principles in particular deal with ideas of coupling and cohesion. I won’t get into all that here but suffice it to say these two principles act to keep you from using code structures as crutches to support a faulty design.

It’s All About Being Lucid

All these modern testing notions are about what I continue to call “becoming lucid” in your overall approach. This is something I’ll cover in a later post.

Ironically enough, I named one of my own testing tools Lucid and, because I simply parroted what tools like Cucumber were doing, I created simply another artifact crutch. Worse, I created a crutch that relied on lots of other crutches like feature files that eventually become part of a “test management” solution, that often used another artifact, whether that be Cucumber Pro, JIRA, Confluence, TestLink, whatever.

What I’ve been arguing for is that the closer the tests sit to the code, the closer the tests are aligned to the business criteria, the less artifacts you have to use to “manage” everything. When you have less artifacts, being lucid becomes quite a bit easier because your sight lines of what quality means are not obfuscated by an intermediating set of artifacts stored in a variety of tools, only some of which are directly executable.

But one of the challenges that test teams, and other teams, run up against here is that tools get hitched to. Tools like, say, JIRA (for test management) and Cucumber (for test execution) or whatever else.

When that happens, any strategy you may come up with is forced to accommodate the tool, as opposed to the opposite approach: figure out the strategy, and then choose tools that support it. So, in these cases, when testing decisions are made they are made with a latent background fear of how that decision will impact test management and test automation, neither of which usually provide good insight into the state of where testing is at or how effectively you, as a team, are testing.

This Leaves us Where?

I promised to be a more prescriptive rather than solely descriptive in this post. I hope I’ve done that a little bit already but to further refine the world of “being lucid”, I believe it means, as a starting point, the following:

Take testing out of the “acceptance criteria” writing business.
Take out the focus on the “right” way to generate acceptance criteria.
Don’t (necessarily) focus on Given/When/Then for acceptance criteria.

I want the business to write acceptance criteria in whatever way allows the developers and testers to jointly explore the domain with them, asking good questions, and encoding the answers in gradually evolving sets of interleaved source code: production code and test code. So, taking one example I used in another post, the acceptance criteria might be this:

Doctors provide information about medical conditions to patients via targeted articles.

Patients need to be able to access those articles conveniently.

Patients must be able to print an article

Patients must be able to reprint an article

Patients must be able to send an article link to an email address

Doctors must be able to view activity on the article

Doctors must be able to deactivate an article

Doctors must be able to edit provider and location for a scheduled article

Developers and testers will work to provide a behavioral implementation that satisfies these criteria. The level of detail in the criteria will be up to the team. It may make sense to break out particular conditions for each the above points in the acceptance criteria. Or it may make sense to just code up the behavior that operates under those conditions and have the code push up that information if requested.

If someone asks me “how it works” — whatever feature “it” refers to a given time — I should be able to point them to the source code (i.e., show them the application in use). The tests provide steps on how to perform the usage. The requirements — acceptance criteria — simply indicate why this feature exists and why it provides value with some high-level descriptions of the behavior, as you see above.

This means tickets — in systems like JIRA or whatever you use — are entirely transient. They are there simply to allow us to track and measure various work (tasks, bugs, etc). They are not there to store acceptance criteria or to frame “epics” that have numerous tickets associated with them. Acceptance criteria should live as close as possible to the code, be versioned with the code, capable of being elaborated upon by the code pushing out relevant information. The elaboration of that acceptance criteria should be the code.

Finally, last point here: measures of control become their own crutches. Lucid is about being transparent. When our work is transparent to the organization, the need for control is reduced. Visibility into what the product+test+development teams are doing helps build trust with the business stakeholders. This trust fostered by visibility reduces the need to place “control artifacts” in place, including numerous metrics.

Where to from here?

I know these posts have been way too long. But they have been streams of consciousness that has allowed me to refine my thinking in the best crucible I know: exposing that thinking to others, warts and all, to see if has any merit whatsoever. My next post in this series will likely close out this initial round of thinking by focusing concisely on what I mean by “becoming lucid.”

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …