Abstraction Levels for Tests

Previously I talked about the testing craft and abstraction and here I’ll expand on those thoughts a bit more.

It is now, and always has been, imperative that we can express what we test in English (or whatever your common language is). This is a key skill of testers and this is a necessity regardless of the test tool solution that you use to manage and/or automate your testing. Writing tests is all about breaking concepts down into conditions and then stating those in a way that is expressive and intent-revealing. But this still seems to be a struggle that the industry has, particularly as it continues to conflate development and tester roles.

This conflation has led to a focus on tools like Cucumber, RSpec, JBehave, SpecFlow, Behat, Mocha, Yadda, and many others which promise to wrap all your tests — at any level! — with natural language statements that make everything clear and obvious.

If only that were so.

These tools, when used as a communication mechanism, very much need the skills of a trained tester rather than that of a trained developer. Or, rather, a trained tester who also has either the training or intuition of a business analyst. I’ll be the first to admit, that’s painting with a very broad brush and begs the question of exactly what “trained” means in this context. Rather than focus on all of that here, I’d rather provide a specific example of what I mean.

I was originally going to title this post “Can BDD Work for Complex Domains?” The reason for that original title was that if you look at examples for many of those tools that I mentioned, you will often come away thinking people out there are not testing anything more complicated than a shopping cart. So the question inevitably comes to me from testers I talk with: “Can a BDD approach constructed with such tools actually work in our complex domain?” Certainly it’s a logical question to ask. Another way to word the question is: “Are there domains in which the overhead of an executable specification is no longer cost-effective?” So let’s talk about that a little here.

By way of showing a related viewpoint into this, you might find this relatively short video interesting: Find the Right Abstraction Level for Your Tests.

Here I’m focusing on tool solutions that adhere to popular structuring formats, such as Gherkin. With these solutions, the test spec — sometimes called a feature file — is a descriptive format; it’s a way to describe a set of example-driven scenarios with either specific or generalized test and data conditions. These specific examples should be usable for capturing the knowledge of business users in the form of tests and allowing those tests to be executable with an understanding of intent and high-level implementation.

The Example

First, a bit of context for the domain I’m taking this example from. This particular example had to do with the ad serving business domain, which is quite a bit more complicated than people give it credit for. As I was talking with my team they were using some domain terms I didn’t know so I had to get those understood. Let me just provide you with that understanding before I jump into the example.

A service being used was AppNexus and they are a service that provides a platform specializing in real-time online advertising. There is a concept in the advertising world called “in-app advertising”. This typically involves a relationship between the application developer and an ad network, where the ad network pays the developer to incorporate the ad network’s code into the developer’s application. When the app is running, the ad software allows the ad network to send ads to the user’s device. Finally, the design teams in such environments often refer to a “creative” and that basically means the ad itself.

This was the context in which I was told it was difficult, if not impossible, to do test spec’ing. So let me show you how I approached this.

Start With Your Test Intent

We started with the following as our discussion point:

“Implement inapp strategy on AppNexus.”

That was, for the most part, our business requirement. Here the “strategy” basically meant the test condition. The term “AppNexus” was certainly not vague but “inapp strategy” was. To refine what was meant by this, we decided:

“inapp strategy = place bid and get creative”

This is just one particular inapp strategy; there were certainly others. The above, however, was our high-level statement of what to test. Refining our original statement:

“implement a test to place a bid and get a creative on AppNexus”

So many tests, and thus so much development, falters due to a failure to do this simple thing: make sure the intent is clear and expressive for a very defined unit of functionality. If the phrase “unit of functionality” there bothers you, replace that with “behavior.” But before you do so take note of the fact that nothing I’ve said here necessarily presumes the level of testing that this test condition will be hit at.

Determine Your Context

We decided that we had to break down the context that allows us to understand what needs to be in place to “place a bid and get a creative”. What follows are the iterations we went through for stating our preconditions. Treat each row here as a progression of our ideas.

1.	browser valid	IP valid	price is “acceptable”	qualify for a campaign
2.	mobile browser	IP valid	price is “acceptable”	qualify for a campaign
3.	mobile browser	geo-IP validated IP address	price is “acceptable”	qualify for a campaign
4.	mobile browser	geo-IP validated IP address		qualify for a campaign
5.	mobile browser	geo-IP validated IP address		user registered for campaign known by AdManager

What that shows is the progression from what we started with (1) to what we ended with (5).

Notice in each case we took any vague elements and made them more operationally specific. In one case — price is acceptable — we decided that it wasn’t relevant and so removed it. Do note that this also started to make it clear we had other data conditions lurking within this test condition. For example, “mobile browser” could certainly be contrasted with “non-mobile browser” and even mobile-browser itself could be broken down into browsers on tablets versus those on smartphones or even watches. The geo-IP validated IP address at least made us think about the possibility of IP addresses in different locales.

Incidentally, I should add that the team, for the most part, had almost no patience for going through this exercise. They felt they had already done this in our tests anyway. Maybe that was true, maybe not. What was undeniable was that all of our tests were currently only written in automated test code and that test code was particularly opaque to anyone not well versed in Java. So while we may have been testing this stuff, it wasn’t in a way that was easily discoverable by various people who were consumers of our tests.

That, by the way, is a perfect example of what happens when companies conflate development with testing.

So, taking that last row, we ended up with this for our setup / context:

Given [AppNexus] is sending an [inapp display request] to the BidManager
And the following is part of the request
  mobile browser
  geo-IP validated IP address
  user registered for campaign known by the AdManager

The square brackets above indicate elements that we felt could be iterated over. Meaning, we knew this should apply for various publishers and different inapp requests. The assumption was that we had an equivalence class. We would test that assumption at the end. For now, those square brackets were a reminder for us to figure that out before we considered this test an executable spec.

Some might argue that the phrase “to the BidManager” could be excluded, particularly if there is nothing else that could receive any such request from a service like AppNexus. This is purely up to individual teams to decide, regardless of how religiously some BDD community members will argue one way or the other.

Determine Your Key Action

We had our context. That’s the setup or the preconditions, if you prefer those terms. We now needed the action, tied to the context we just created so we added the following:

Given [AppNexus] is sending an [inapp display request] to the BidManager
And the following is part of the request
  mobile browser
  geo-IP validated IP address
  user registered for campaign known by the AdManager
When the request is sent to the AdManager from the BidManager

So note what’s happening here. We had two types of requests. AppNexus would send a request to our BidManager. That request would then be sent from the BidManager to the AdManager. This helped make it very clear the messaging and interactions that were going on as part of this business rule and test condition.

Determine Your Observable Result

We have a context and we have a key action. Now we just needed an observable to say how we would know that the action took place and did so successfully. We ended up doing this:

Given [AppNexus] is sending an [inapp display request] to the BidManager
And the following is part of the request
  mobile browser
  geo-IP validated IP address
  user registered for campaign known by the AdManager
When the request is sent to the AdManager from the BidManager
Then the AdManager responds with a creative url and a bid price
And the BidManager bids

Again, notice we made it quite clear what components of the system were responding. This was not only important for our business teams to have confidence that we knew what was to be done but it was also important for our development teams. Keep in mind that if you are practicing TDD/BDD, then your test specs will be design-guiding. They will indicate how the system as a whole is going to respond. Nothing in the above said exactly how the AdManager would respond or how the BidManager would bid (implementation), but it was quite clear which components would be doing what actions (intent).

At this point, we re-asked the question: can the WHEN and the THEN still be valid if we permute the elements in square brackets? For example, if I replace AppNexus with, say, Facebook and “inapp display request” with “inapp video request”, would the test still be valid? The answer was yes and thus we confirmed that we did seem to have an equivalence class.

Provide a Scenario Statement

Since this is the case, it seemed suitable to have a relatively high-level scenario name that described what we were doing:

Scenario "Valid Bid Request Will Lead to Bid Response"

Given [AppNexus] is sending an [inapp display request] to the BidManager
And the following is part of the request
  mobile browser
  geo-IP validated IP address
  user registered for campaign known by the AdManager
When the request is sent to the AdManager from the BidManager
Then the AdManager responds with a creative url and a bid price
And the BidManager bids

Some would argue that the scenario title should have come first. Perhaps it would have, had we known what we were talking about. And, to be sure, in later spec workshops, the scenario title did come first. Again, don’t be too religious about what the community says you should do in order to “do TDD/BDD” correctly. Do what works for you.

By the way, since I said we likely had an equivalence class, what was our answer to that as far as our test writing? This was a case where we investigated a scenario outline. A scenario outline lets you state a high level scenario and then permute either the test conditions or the data conditions in a table that is attached to that scenario. I won’t delve into that too much here, but a start of such an example might be:

Scenario "Valid Bid Requests Will Lead to Bid Responses"

Given a  is sending an  to the BidManager
And the following is part of the request
  mobile browser
  geo-IP validated IP address
  user registered for campaign known by the AdManager
When the request is sent to the AdManager from the BidManager
Then the AdManager responds with a creative url and a bid price
And the BidManager bids

Examples:
 | service  | request type          |
 | AppNexus | inapp display request |
 | Facebook | inapp video request   |

Once people understand the basis of the scenario, all they have to do is look at the examples table to see how you are permuting the test. A business person, tester or developer should be able to look at that table and ask relevant questions, such as “Can AppNexus make video requests? Can Facebook make display requests? What other services are there? What other request types are there?”

As a further refactor, it might be nice to come up with a domain phrase for the GIVEN-AND clause that was a bit long but we left that out for the time being. But what do I mean by that? Perhaps something as simple as this:

Given an inapp display request sent from AppNexus with the following:
  | mobile browser |
  | geo-IP validated IP address |
  | user registered for campaign known by the AdManager |

Or we might even try to condense it further by treating the last two elements in that able as a given (or convention) of our test data and instead only specify the browser type. In that case:

Given a mobile browser inapp display request sent from AppNexus

This is where the level of abstraction comes in. What matters most is whether people understand what’s being said. And if there are conventions, it’s important to make sure those are understood. The convention here, for example, might be: “Unless stated otherwise as part of the context, all inapp display requests are assumed to have a geo-IP validated IP address and a user registered for a campaign known by the AdManager.”

Deciding on your conventions has a huge impact on your eventual test data, the level of specificity in your tests, and the way your tests can be automated. I’ll come back to that last point later.

Repeat With Other Test Conditions

There was another strategy brought up by the team:

“inapp strategy = place bid and get creative, price goes down as we get messages”

In this case, it seemed like we could reuse much of what already did. For example:

Given AppNexus is sending an inapp display request to the BidManager
And the following is part of the request
  mobile browser
  geo-IP validated IP address
  user registered for campaign known by the AdManager
When multiple requests are sent to the AdManager from the BidManager
Then subsequent bid requests cannot have a higher price than the original request

Notice that the WHEN step is different as is the THEN step. Also notice that I don’t necessarily state the same THEN steps from the previous scenario. After all, those were tested in that previous scenario. The point of this particular scenario is to determine that prices go down on subsequent messages. Hence the scenario title should reflect that:

Scenario "Bid prices go down when more messages are sent"

Given AppNexus is sending an inapp display request to the BidManager
And the following is part of the request
  mobile browser
  geo-IP validated IP address
  user registered for campaign known by the AdManager
When multiple requests are sent to the AdManager from the BidManager
Then subsequent bid requests cannot have a higher price than the original request

Consider Alternative Expressive Formats

We also investigated the possibility of making the automation more “smart” behind the scenes, by allowing very high-level declarative phrases that would take care of the above in a more concise fashion. For example, business liked stating this as such:

Behavior 'Prices decrease on multiple requests'
  example 'subsequent bid request cannot have a higher price than the original request'
  example 'a third subsequent bid request must be lower than the first bid request'

If you have a tool, and programming language, that allows you to create a nice DSL for your testing, the above is no problem in terms of making an executable test specification. But that last example is quite an interesting level of abstraction, no? Can you really automate that? Sure. It’s quite possible to have automation that is “smart enough” to handle all of that. But is that the right level of abstraction? And if so what does that mean for automation that has to handle it? This is another key area where test development, as opposed to enterprise application development, takes over.

Abstraction and Test Automation

It breaks down nicely:

In order to automate, I obviously have to start with the context.
In order to start with the context, I have to know what kind of phrasing I need to support.
“What phrasing I need to support” is directly tied to how data and test conditions are going to be specified.
The more that is specified, the less “smart” I have to make the automation.
The less that is specified, the “smarter” I have to make the automation.

So from the standpoint of test solution development, a few operative questions rear their heads:

Are we going to determine how “smart” the automation should be then write test cases accordingly?
Or are we going to decide how specific the test cases should be, then build the automation accordingly?

Ideally, you decide how specific test and data conditions should be first. That’s a practice that exists separate and distinct from automation. It impacts automation, but it also impacts general test writing.

Think of it this way: the automation framework is a consumer just as much as a human who has to read the test. As a test solution developer, I look at it such that the business analysts, developers, and testers are my customers. They are providing me with the requirements, if you will, that they want supported and thus what the automation will have to consume. What I’m periodically doing is stepping in and just making sure the complications and/or ramifications of those decisions are understood.

That’s something NONE of these tools come even close to doing. And, arguably, they never will nor should they ever. THAT is where test thinking comes in. That is where you want someone a certain amount of verbal and written ability to step in and guide these kinds of discussions.

All of this leads to really interesting discussions that can get a team excited about their job. For example: How smart do we have to be for the automation to be simple? That’s a question that I like to ask mainly because I think it’s so important. Tying that in with the example I just took you through above, the question can be asked like this: How much do we have to specify for the automation to be simple?

That will be the key point: how much do we want to spell out the data conditions?

The more we specify, the less intelligent the automation will have to be in some ways, simply because it will be told what it needs.
The less we specify, the more the automation has to know certain (unspecified) details before it takes certain actions.

Wow, look at that! Not only are our tests design-guiding for the enterprise application developers but they are design-guiding for the test automation developers as well. Talk about wolfing down your own dog food!

I know I’m harping a bit on the automation here but it is important because a reality of the modern software development arena is that we can’t rely on manual testing all that much anymore. All those tools I mentioned above are predicated upon being able to take those feature files / test specs and executing them by delegating down to step definitions, matchers, and driver libraries.

So, with that in mind, the automation we ultimately create, has to be flexible enough to:

Create a data condition from convention (meaning, very little specific data is specified).
Create a data condition from specification (meaning any specified data is used, other required data not specified is from convention).
Create a data condition based on action (where the action may in turn cause a new data condition).

Then there is the observable. If the automation is given something specific (i.e., “must be zero”, “must not be zero”), it can check that. If it’s given something relative (i.e., “things should be populated” or “value must be greater”) the logic can check that as well — but these are then cases where the automation may have to check the prior state, then make the change, then check the resulting state to know that a change has occurred. Sometimes it means “check for any values?” but it may also mean “check for any values as long as they are not zero” or “check that no values are blank” and so on and so forth.

The final takeaway here is that the automation currently has to support a lot of flexibility in handling data conditions and checking end states. This all comes down, remember, to specification. Manual tests are input to automated tests. So how we specify tests is going to be crucial to how we automate them and thus how we support effective and efficient regression testing. The choices we make in test design determine the way the automation has to support our choices.

But all of this thinking is not happening — at least in my experience — in an industry that seems convinced that developers are testers and testers are developers. On the surface, it may seem like it. But I truly do fear the long-term ramifications of how the current industry treats testing and the activities around it. Posts like this one are my attempt to start systematizing my thinking around these issues.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …