Tests as Specifications

In Testing’s Brave New World, I ended up talking about BDD and the concepts that testing, acting as a design activity, has to work within. That was a post fairly heavy on nomenclature concerns. I sort of skirted around the issue that it’s the tools we use in our BDD type activities that are often forcing us to consider this terminology. Even if you are not using a specific tool, but rather a technique like spec workshops — which is a “BDD activity” — you are still wrestling with these terms.

So let’s put this in some context.

If testing and BDD practices are about communication and design activities then those tools and activities must support that. I like the following from Ana Asuaga:

Crafting a problem model from information gained through social communication involves abstraction: we extract the important concepts and express them in terms that allow us to use the tools at hand to design and implement a solution.

You’ll often hear that tools (like Cucumber) or activities (like spec workshops) were designed around the core principles of BDD, and one of these principles is to improve stakeholder collaboration through something called a ubiquitous language. This fancy sounding term essentially just means that both code and executable specifications — usually called “scenarios” in BDD tools — should be written in the language of the business domain.

I keep coming back to tools so let’s just focus on that for a bit. You’ll often hear that these are tools that support behavior-driven development, specification by example, and something called “agile acceptance testing.” What you often won’t hear is that these tools are not testing tools. True, you can use these tools to automate functional validation in a form that is (perhaps) easily readable and understandable to business users, developers and testers. That, by itself, however, does not make these tools “testing tools.”

In spending some time learning Scala, I was pointed to the Structure and Interpretation of Computer Programs lecture series. The first video in the series has a great line:

It’s very easy to confuse the essence of what you’re doing with the tools that you use.

While that’s being said in a programming sense, it most certainly applies to testing activities as well.

I’m an advocate of the idea that any tool use has to be infused with good thinking that recognizes the tool is just a means by which a technique is applied. That hopefully leads you to ask questions like:

What’s the technique?
What makes it a good technique?
What makes it a good technique in your particular context?
How can the technique be subverted?
How do you know when you’re stretching the technique beyond its bounds of applicability?
How do you know when you are using the technique as a crutch to prop up bad or half-formed thinking?

There are really three takeaways here:

Tests are a communication mechanism, both as input to and output from conversations.
Tools as a way to manage or write tests are fine, as long as they serve your needs.
Using tools is not testing.

On that last point, just because I’m using TestLink, Quality Center, QTP, Cucumber, Selenium, and so forth: that’s not testing. Those are tools that are letting me express, manage, and execute some of my tests. The “testing” part is the thinking I did ahead of time in discerning what tests to think about, what tests to write about, and so forth.

I’ve found that when tools and technique get conflated, you can subvert that inclination by first rallying around a banner phrase. So let’s use this phrase: test-as-specification. Then use that phrase to make clear distinctions between the essence of what you are doing and the tools that may help you do some part of that. The reason I like the “test-as-specification” phrase is because I’ve found it can help put the focus on teams creating executable specifications that are:

A goal for development.
Acceptance criteria for business analysts.
Manual and automated functional tests for testers.
Regression checks for future changes.

In this way, most BDD tools — Cucumber, FitNesse, SpecFlow, and so on — do allow teams to create living documentation, by which I mean a single authoritative source of information on application functionality that is always up-to-date because the specifications are capable of being executed directly from the documentation. That execution can be manual or automated.

In order to give you the flexibility you need, these tools must support different ways of describing executable specifications — including story-like prose or narrative, lists, tabular data, and so forth. Getting a little deeper into topics of instrumentation or orchestration, these tools must also allow scripting, abstraction, and component reuse on several levels so that both technical and non-technical users can efficiently describe specifications as tests. However, those are details I really don’t want to go into here because I want to get into one of the more pernicious problems.

The problem comes in directly at that point where the tools supposedly shine: when the specification part also becomes a test part. As an example, consider the “total of the items” example from this article. I think there’s a good point there.

Let’s consider test specifications that specify exact numbers for calculations. For example, let’s say you have an application that works with bank accounts. Let’s say it’s possible for the system to do an “account split”, which is essentially where the bank account is split off into a separate track so that you now have two similar accounts, but where one is no longer connected to the other except for certain fields.

Now let’s say that in this case the starting percentage in the split account should match the starting percentage in the original account. Further, the current value of the split account should be the value of the original account minus a 10% fee.

If this was all taking place as part of a conversation with business folks and developers, we could write a test like this:

Start with this account:
Bank Account 1
Account ID: 12345
Current Value: 5,687,145.28
Starting Percentage: 85%

Create a split account.
You should now have the following account:

Split Bank Account 1
Account ID: 12345
Current Value: 5,118,430.75
Starting Percentage: 85%

I could do that. However, my initial instinct is that I would much rather specify the business rules (“split starting percentage will equal original starting percentage”, “current value of split account is 10% less than current value of original account”) since that’s true for any numbers. The actual numbers used would be examples that illustrate the business rule.

In fact, notice how if I don’t specify that second rule regarding the current value, it’s not necessarily immediately obvious from the test itself why the current value went from 5,687,145.28 to 5,118,430.75. You could obviously work out that this is a difference of 568,714.53 and then determine that this was 10% of the original account’s current value. Or the test could have just told you that up front and captured that information.

My opinion is that over-specifying detail at the expense of understanding the underlying business rules is not necessarily helpful. One reason I say this is because it can lead you to repeat a lot of information. Another reason is that such a practice may cover up what is actually being tested. Taking the above specific example into account, let’s now write a test like this:

When an account is split
Then the following should be the case:
  The Current Value of the Split Account will be the Current Value of the Original Account minus 10%.
  The Starting Percentage of the Split Account will equal the Starting Percentage of the Original Account.

Here capitalized terms are referring to domain-specific terms. That’s still kind of wordy. You could do this:

When an account is split
Then the current value of the split account is 10% less than the original
And the starting percentage of both accounts will be the same

The key question to be asking is whether you are leaving room for ambiguity in how you word things. But I can say that, for me, the above is a lot easier to parse than the example with a bunch of numbers, even though I can see how the “bunch of numbers” does serve as a concrete example. That may or may not be valuable given the complexity of the story.

But is this … right?

In the above case, the fact is that the starting percentage in the split account should match the starting percentage in the original account. Likewise, the current value should have a fee of 10% applied. That’s really what someone needs to know from a business perspective. Whether that number happens to be 5,687,145.28 or some other number is less relevant. Is that an accurate statement?

Well, at the very least I think we can agree that there are two concerns: intent (business rule) and implementation (a specific number to validate).

Now, a BDD tool user might say that you absolutely need the example with specific numbers because otherwise you can’t automate. I maintain that’s not true. An automated test is just a manual test that is run via a tool. If a manual tester could run the test, then a tool could as well. Let’s consider an entirely different example. I’m not going to give you context here; rather, I’m just going to give you the scenario:

Given the default trial plan
When the data collection method of that plan is not set
Then the Data Capture Costs should be zero.
When the data collection method of that plan is set to "Electronic Data Capture"
Then the Data Capture Costs should be non-zero.
When the trial subjects are increased
Then the Data Capture Costs should increase.
When the fees and costs are increased
Then the Data Capture Costs should increase.

Assuming you knew the application well enough to know what a “trial plan” is and how to set things like “data collection method”, do you think you could execute the above test and know when the observable has been met or not been met? Do you understand, from reading the above, what business rules are being executed? At the very least, I think most people would be able to see that:

Data capture costs are being tested.
These costs will be either non-zero or zero.
Costs can increase under a very specific condition.

I’m guessing you could glean that even without knowing the application that I’m describing here. The above test scenario — which can be automated — reveals intent and speaks very little to implementation. I also think the above could guide questions, such as:

Once a data collection method is set, can it be unset? (Meaning, can the costs be “reset” to zero?)
Does non-zero include negative? (Probably not, but it’s good to ask and even better to know.)
Are there are other parameters besides trial subjects that can cause costs to increase?
Are there are parameters that can cause a cost to decrease?
Is there an upper limit on cost increase?

The very fact that the test was speaking to intent is what I believe made it more amenable to questions. Examples, by their very nature, sometimes have the opposite effect, where people simply make sure they have an understanding of what that example or those specific numbers mean, without wider consideration of the example in context.

Agree? Disagree? Well, let’s experiment a bit. Consider this version of the above scenario:

Given a plan where:
  Trial subjects are set to 10
  Fees and Costs are set to 0
  Subject rate cost is set to 1000
When data collection method is "Empty"
Then Data Capture Costs equal 0.

When data collection method is "Electronic Data Capture"
Then the Data Capture Costs equal 10,000.

When the trial subjects are increased from 10 to 20
Then the Data Capture Costs equal 20,000.

When the fees and costs are set to 100
Then the Data Capture Costs equal 2,000,000

Now what do you think? It’s perhaps harder at a glance to see the business rules, right? But you do have some interesting additions. For example, the plan has more stipulations on it. What do you think is happening here? What are the specific values telling you?

What’s happening is that the subject rate cost is multiplied by the trial subjects. That gets the data capture costs, assuming there is a data capture collection method set. It’s also indicating that no additional fees and costs are set. Were such costs set, then they act as a multiplier against the data capture costs.

So here you have a better idea of what to expect in specific terms given specific conditions. Prior to this you had an idea of what to expect in general terms given only moderately specific conditions.

You might argue that I stacked the deck a bit. The original non-detailed example scenario left out key details. Yet notice that the original scenario said “Given the default trial plan”. Here the test data might be such that the default trial plan is in fact set up as described in the more detailed example scenario. Yet it is true that this information was not clear from the scenario itself; you had to know that or find it out from another source. The second scenario, which is more detailed, explicitly spelled out the operating context and the numbers.

The less-detailed scenario could be manually executed and automated just as well as the more-detailed scenario but obviously the more details you can provide, the more likely testing is to be unambiguous.

So before I said I liked the idea of a test scenario that told me what the business rule was regardless of the specific values. Yet now I’m saying I like the idea of specific values. So which is it? Well, why not both? You can have a general scenario that is backed up by an illustrative example. Let’s try it out:

Feature: Data Capture Costs

Explanation:

Data Capture Costs only apply when there are trial subjects.
Data Capture Costs will be zero if there is no form of data collection method and there are no fees and costs.
Data Capture Costs will be non-zero if there is a data collection method specified.
Data Capture Costs will be non-zero if there is are fees and costs specified.
Data Capture Costs = trial subjects * subject rate cost * fees and costs

Example:

Given a plan with
  10 trial subjects
  a subject rate cost of 1000
  fees and costs of 0
When the data collection method is "Empty"
Then Data Capture Costs should be 0

When the data collection method is "Electronic Data Capture"
Then the Data Capture Costs should be 10,000

When the trial subjects are increased from 10 to 20
Then the Data Capture Costs should be 20,000

When the fees and costs are set to 100
Then the Data Capture Costs should be 2,000,000

Here I have an explanation part that covers details that would have come out of a high-level discussion of what the application should be providing. That’s the business need. I then have some examples that capture that need, thus translating the need into a business understanding. What this provides is a shared notion of what we mean by quality for this feature.

Let’s say someone decides that there is an addition to this functionality. Specifically, let’s say that if there are meetings specified, those increase the data capture costs whereas if no meetings are specified, then data capture costs are not affected. Here is how that information can be incorporated into the living documentation, with the additions bolded:

Feature: Data Capture Costs

Explanation:

Data Capture Costs only apply when there are trial subjects.
Data Capture Costs will be zero if there is no form of data collection method and there are no fees and costs.
Data Capture Costs will be non-zero if there is a data collection method specified.
Data Capture Costs will be non-zero if there are fees and costs specified.
Data Capture Costs will be non-zero if there are meetings specified.
Data Capture Costs = trial subjects * subject rate cost * fees and costs * meeting cost

Example:

Given a plan with
  10 trial subjects
  a subject rate cost of 1000
  fees and costs of 0
  meetings set to "None"
When the data collection method is "Empty"
Then Data Capture Costs should be 0

When the data collection method is "Electronic Data Capture"
Then the Data Capture Costs should be 10,000

When the trial subjects are increased from 10 to 20
Then the Data Capture Costs should be 20,000

When the fees and costs are set to 100
Then the Data Capture Costs should be 2,000,000

When the meetings are set to "Investigation"
And the meeting costs are set to 10000
Then the Data Capture Costs should be 20,000,000,000

I simply added a bit of relevant narrative for the business need and then broke out an example for the business understanding.

You might notice how I’m chaining results here. The fact that data capture costs are specifically 2,000,000 when the fees and costs are set to 100 is because of the previous step that set the trial subjects to 20, which increased the data capture costs to 20,000. Without that intervening step, then the data capture costs would have been 1,000,000 rather than 2,000,000.

Some would argue that steps should not be chained in this way since it introduces a dependency. To me that’s not such a bad thing since, in this case, I’m testing the evolution of data capture costs as part of a scenario. The alternative is to specify a background context that applies to all tests. That might look like this:

Feature: Data Capture Costs

Explanation:

Data Capture Costs only apply when there are trial subjects.
Data Capture Costs will be zero if there is no form of data collection method and there are no fees and costs.
Data Capture Costs will be non-zero if there is a data collection method specified.
Data Capture Costs will be non-zero if there are fees and costs specified.
Data Capture Costs will be non-zero if there are meetings specified.
Data Capture Costs = trial subjects * subject rate cost * fees and costs * meeting cost

Example:

Background:
Given a plan with
  10 trial subjects
  a subject rate cost of 1000
  fees and costs of 0
  meetings set to "None"

Scenario:
  When the data collection method is "Empty"
  Then Data Capture Costs should be 0

Scenario:
  When the data collection method is "Electronic Data Capture"
  Then the Data Capture Costs should be 10,000

Scenario:
  When the trial subjects are 20
  Then the Data Capture Costs should be 20,000

Scenario:
  When the fees and costs are set to 100
  Then the Data Capture Costs should be 1,000,000

Scenario:
  When the meetings are set to "Investigation"
  And the meeting costs are set to 10000
  Then the Data Capture Costs should be 100,000,000

Again both styles are testable via manual or automated execution. So, once again, it falls to you to put the emphasis where it belongs: on what you believe is the effective design of your tests such that they remove ambiguity, clarify understanding, and indicate success criteria.

Did I make a point anywhere along the way? The point is that I believe tests can serve as specifications. You can have the business need, which are the business rules that are ultimately being satisfied. This is where you tend to have the overall data conditions. You also have the business understanding, which is how those rules can be exercised with specific test conditions. Thus test and data conditions are aligned with the business needs. Working together you have provided a single specification that can encode a shared notion of quality.

In a future post, I want to work through a single extended example that can provide better context. And, in case you were wondering: yeah, I’m still deciding what I think is effective and efficient in terms of the idea as tests as specification. I’m on a path of discovery here and I hope that, if nothing else, this path is moderately interesting to others who share my interest in how we communicate when we build software.

2 thoughts on “Tests as Specifications”

Phil Kirkham says:

23 February 2012 at 3:22 am

I’m currently working my way through Gojko’s Specification By Example and The Cucumber Book – and having this series of blog posts to read as well is proving to be really useful in getting me to think about how specs/tests/reqts all fit together.
So yeh, your posts are interesting to at least one reader 🙂

Tatiana says:

23 February 2012 at 3:14 pm

Jeff: I’m definitely liking these posts. I’ve found this is a hard subject to make clear. It really helps me that you’re providing examples and that you’re actually providing more complicated or in depth examples. I’m finding way too many people who use these tools or engage in these discussions tend to be testing against very simple sites with very simple functionality.

I’m still trying to decide if Cucumber and tools like it are an extra layer of complication that isn’t worth, as opposed to an extra layer of complication that IS worth it.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …

2 thoughts on “Tests as Specifications”

Leave a Reply Cancel reply