The BDD Lure and Trap

The whole notion of BDD is something I’ve talked about in a series of posts. I recently did an exercise with a group of business users, testers, and developers. We used a contrived example but the example was certainly a realistic portrayal of the activities that tend to occur on “real” projects. So let’s talk about that.

BDD, as an approach, is often tied in with the idea of executable specifications. And, in some ways, executable specifications can be seen as an extension of test-driven development to business rules. In fact, this is where ATDD — acceptance test-driven development — comes from. (And here I’ll remind about my post on defocusing our practices.) Painting with broad strokes, this is the process of gathering examples to clarify requirements, deriving tests, and ultimately providing automation for those tests.

So some of the immediate questions that come to mind for most BDD practitioners are: Do the examples replace tests? Are the examples themselves tests or just descriptions that help you write tests? If these examples are part of requirements or user stories, does that mean the tests are now directly part of those as well?

The answers to these questions are subject to lots of debates but in essence how I look at it is that your goal is to create a good shared understanding, and give people the context they need to turn ideas — represented by those examples — into working software. For me, examples are not necessarily tests or, at the very least, there is not necessarily a one-to-one between them. I say that because relatively few examples that are easy to understand and at the right level of abstraction are much more effective than hundreds of test cases.

Consider also that these examples will tend to fall into scenarios of behavior. Several simple groups of key examples are much easier to understand and implement than a huge list of complex scenarios. Smaller groups of examples and thus smaller groups of scenarios make it easier to evaluate completeness and argue about different data and test conditions. This allows teams to discover and resolve inconsistencies and differences in their understanding.

Stardate Calculator

So now let’s take a look at an example.

I have a working stardate calculator here that I use for demonstration purposes. Try it out. Check the “Enable Form” checkbox, click the “TNG Era” radio, enter “54868.6” in the text field, and click the “Convert” button. You’ll get some output that shows what that stardate value is in standard calendar time.

Issues with Test Writing

So now let’s consider an example of writing up a BDD-style specification:

Ability: Calculate Stardates

  Background:
    Given an authenticated user on the stardate page

  Scenario: Convert Valid TNG Stardate
    When the tng stardate "54868.6" is converted
    Then the full calendar response should contain "Thu Apr 06 2378"

But … what does that tell you? All that does is repeat what you saw on the screen. Should I instead be checking that the actual values of the specific date are accurate rather than just their display? How about this scenario instead:

Ability: Calculate Stardates

  Background:
    Given an authenticated user on the stardate page

  Scenario: Convert Valid TNG Stardate
    When the tng stardate "54868.6" is converted
    Then the calendar year should be "2378"
    And  the calendar month sould be "April"
    And  the calendar day should be "6"

Okay, so, presumably my assertions behind the scenes are doing more than just checking a text field on a UI. But — are they? And, if so, has this helped with a shared understanding? How do I know that “54868.6” should lead to “Thu Apr 06 2378”? Has this scenario given you a means to better test this functionality?

Maybe some examples would help? Okay, how about we do it this way:

Ability: Calculate Stardates

  Background:
    Given an authenticated user on the stardate page

  Scenario Outline: Convert Valid TNG Stardates
    When the tng  is converted
    Then the calendar year should be 

    Examples:
      | stardate | year | comment      |
      | 46379.1  | 2369 | DS9 begins   |
      | 48315.6  | 2371 | VOY begins   |
      | 56844.9  | 2380 | TNG: Nemesis |

So now I have some specific examples and I can see that these correlate to particular events in the Star Trek canon of material. But, still, I don’t feel like I really understand this functionality. Do you? I mean, sure, I know that a stardate somehow becomes a calendar date. But I probably knew that just by playing around with the application. I’m not really testing here. Am I exploring? Sure, to an extent. But how effective can that exploration be if I really have no idea about the functionality itself?

So what’s the appropriate level of abstraction here? Obviously I can’t say this:

  When a valid stardate is converted
  Then the full calendar response should be an accurate representation of the stardate

That would just be ridiculous, right? But go back to one of the original ideas:

  When the tng stardate "54868.6" is converted
  Then the full calendar response should contain "Thu Apr 06 2378"

Is that much different? Certainly it has actual data, which is a difference, but the relationship between the input and the output is not clear. Or, put another way, how would you as a tester start building different test cases here? If I wanted to use data “48315.6”, what do I predict should be the output for the calendar response? Unless I know the calculation, I have no idea.

Stating this in different terms, in order to determine if this calculator is providing business value to our users, I have to know how it provides value. And how it provides value is by giving accurate representations of the stardates as calendar dates. Keep that in mind.

Issues with Automating

Let’s leave that for a second and look at the other part of BDD. The construction of logic that would execute logic. Here I’m not going to go through a particular tool like jBehave, Cucumber, SpecFlow or whatever else because I want to focus on some key design decisions. These decisions may seem like they only matter to the technical side of things, but let’s consider if they do.

Given the above calculation, here’s an example automated check:

  @Test
  public void convertTNGStardate() {
    stardatePage = new Stardate();
    stardatePage.verifyCalendarDateForTNGStardate(54868.6);
  }

@Test

public void convertTNGStardate() {

stardatePage = new Stardate();

stardatePage.verifyCalendarDateForTNGStardate(54868.6);

}

Essentially I’m just creating a context of the stardate web page and then I’m calling a particular method to check a given stardate. Here’s the code that handles this:

  public void verifyCalendarDateForTNGStardate(double value) {
    enableStardateForm();
    setTngEra();
    convertStardateValue(value);

    String result = calendarDate.getAttribute("value");
    Calendar calendar = modelTNGCalculation(value);

    int year = calendar.get(Calendar.YEAR);
    String month = new SimpleDateFormat("MMM").format(calendar.getTime());
    int date = calendar.get(Calendar.DATE);

    assertThat(result).as("Year was not valid").contains(Integer.toString(year));
    assertThat(result).as("Month was not valid").contains(month);
    assertThat(result).as("Date was not valid").contains(Integer.toString(date));
  }

  private Calendar modelTNGCalculation(double value) {
    double stardatesPerYear = value * 34367056.4;
    DateTimeFormatter dtf = DateTimeFormatter.ofPattern("MMMM d, yyyy HH:mm:ss");
    Date origin = Date.from(LocalDateTime.parse("July 5, 2318 12:00:00", dtf).toInstant(ZoneOffset.UTC));
    double milliseconds = origin.getTime() + stardatesPerYear;
    Date dateResult = new Date();
    dateResult.setTime((long)milliseconds);

    Calendar calendar = Calendar.getInstance();
    calendar.setTime(dateResult);
    return calendar;
  }

public void verifyCalendarDateForTNGStardate(double value) {

enableStardateForm();

setTngEra();

convertStardateValue(value);

String result = calendarDate.getAttribute("value");

Calendar calendar = modelTNGCalculation(value);

int year = calendar.get(Calendar.YEAR);

String month = new SimpleDateFormat("MMM").format(calendar.getTime());

int date = calendar.get(Calendar.DATE);

assertThat(result).as("Year was not valid").contains(Integer.toString(year));

assertThat(result).as("Month was not valid").contains(month);

assertThat(result).as("Date was not valid").contains(Integer.toString(date));

}

private Calendar modelTNGCalculation(double value) {

double stardatesPerYear = value * 34367056.4;

DateTimeFormatter dtf = DateTimeFormatter.ofPattern("MMMM d, yyyy HH:mm:ss");

Date origin = Date.from(LocalDateTime.parse("July 5, 2318 12:00:00", dtf).toInstant(ZoneOffset.UTC));

double milliseconds = origin.getTime() + stardatesPerYear;

Date dateResult = new Date();

dateResult.setTime((long)milliseconds);

Calendar calendar = Calendar.getInstance();

calendar.setTime(dateResult);

return calendar;

}

I didn’t want to repeat massive amounts of code here so just know that the method calls in lines 2 to 4 above are calling out to WebDriver, which exercises the web application in the browser.

One thing to note is that I have a model of how the calculation is being done, which is what the method modelTNGCalculation is providing. That’s allowing me to essentially simulate the actual behind-the-scenes activity that takes place to generate a calendar date based on a stardate.

But consider this: if I took out the WebDriver stuff, I could still execute the model by passing it a value, and checking what it returned. Since the model method encapsulates how the calculation is performed, would this be enough? After all, if I remove the WebDriver parts, I effectively have a unit test.

Hmm. Interesting, right? I have an abstraction here over what could be considered a unit test but, when I add in my WebDriver logic, I also have that test being executed as what someone might call an integration test (i.e., the integration of the calculation engine and the web interface).

The “Right” Abstraction

This gets into a key area of discussion that I’m not going to cover in this post but I am going to cover in another one regarding the alleged “scam” of integrated tests. However, for now, let’s consider our abstraction level. We all know that UI tests can be brittle, expensive to write, and time consuming to execute. So some might argue: just decouple the tests from the UI. Don’t test if clicking all those widgets and inputting text leads to a valid result. Simply feed a valid value to the calculation and see if that leads to a valid result.

That would be unit testing, though, right? That wouldn’t be, say, acceptance testing. And yet you could argue my test above is very much an acceptance test. After all, it’s demonstrably testing if our calculation is acceptable. True, it’s not what the user sees, but it’s important as acceptance criteria.

But it’s not enough, right? First of all, this model isn’t the actual code that gets executed via the web application. The actual code that gets executed happens when the user clicks on that “Convert” button. Then some JavaScript fires. That JavaScript is this:

function calculateTNG() {
  origin = new Date("July 5, 2318 12:00:00");
  stardate = $("#stardateValue").val()

  stardatesPerYear = stardate * 34367056.4;
  milliseconds = origin.getTime() + stardatesPerYear;

  result = new Date();
  result.setTime(milliseconds);

  return result;
}

function calculateTNG() {

origin = new Date("July 5, 2318 12:00:00");

stardate = $("#stardateValue").val()

stardatesPerYear = stardate * 34367056.4;

milliseconds = origin.getTime() + stardatesPerYear;

result = new Date();

result.setTime(milliseconds);

return result;

}

So my model-driven, code-based test doesn’t really test what the user sees. It’s a representation (in Java) of the code (in JavaScript) that does get executed. So it’s valuable. But it doesn’t show business value because what gets rendered on the web application is what the user is deriving their value from. That, too, is acceptance criteria.

Many testers will tell you that one of their goals is to not unnecessarily inflate test coverage. The logic here often goes that tests at the acceptance level (what some people call the BDD level) should not duplicate verbatim those being done at a unit level. That being said, acceptance tests do more than just specify the elaborated requirements as business level examples. They also provide confidence that the underlying implementation is not only working in a stated context but is robust within that context.

So … doesn’t that mean there’s a balance to deciding what elements of unit test coverage should also be stated in the context of acceptance tests? And if that’s the case, how does that impact BDD, which is really just an extension of TDD, which in turn is just a systematic way of doing unit testing? More specifically, how does this impact how many “BDD scenarios” we write and how many “examples” we provide within them?

In reality, you could argue — as I often do — that much of this is the false dichotomy the industry has painted itself in. Unit tests are just another form of acceptance test when you realize that acceptance testing is an approach to testing, not a type of testing.

Does This Matter for BDD

So I imagine you asking: “Is there a point to all this?” Well, the point is this: in our BDD context, is this functionality being tested at the right level of abstraction? But that forces me to ask: what is the right level of abstraction? If I have the stardate calculation only done as part of a unit test, what are the acceptance tests? But if I assume the UI tests are the acceptance tests, I’m forced to consider that these tests are just showing that the stardate value returns some date, as opposed to actually running code that determines the date is valid.

Let me make sure that last point is clear. If I just run my BDD scenarios from earlier in this post, I’ll be checking that, for example, the output of the convert action is “Thu Apr 06 2378”, which is culled from the output display of “Thu Apr 06 2378 09:51:10 GMT-0500 (Central Daylight Time)”.

But how did you discover what the correct examples are? How did you know “54868.6” means “Thu Apr 06 2378”? That’s pretty much the exact same question I started off with in this post. As far as how you knew it, presumably you talked to the business users along with the developers, right? But, still, there’s a lot of context that goes into these discussions and knowledge. How much of that is, or should be, reflected in your BDD artifacts?

This is the lure of BDD. It’s also the trap of BDD. If you fall for the lure, you risk falling into the trap. Check out Steve’s article on The Cucumber Trap for what is probably a much better, and undoubtedly more concise, view on this than my own. This lure and trap is why people writing and promoting such tools often feel the tools become misunderstood. See my own post on Cucumber’s woes in this regard.

This lure and trap is important for testers to be aware of as they investigate BDD as an operating strategy.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …

Stardate Calculator

Issues with Test Writing

Issues with Automating

The “Right” Abstraction

Does This Matter for BDD

Leave a Reply Cancel reply