Driving Design with Code as Specification

This post follows on from my code is a specification. I highly recommend reading that post to get the context because here I’m going to add a bit to the sample code from that post. This is being done to illustrate the idea of test code and production code working together to act as an executable specification. Here I’m going to focus a bit on how this has relevance to the business as well.

Let’s consider the bit of test code we put in place to put pressure on the design:

RSpec.describe 'a project' do
  it 'that has no tasks is done' do
    project = Project.new
    expect(project).to be_done
  end
end

RSpec.describe 'a project' do

it 'that has no tasks is done' do

project = Project.new

expect(project).to be_done

end

I’m not showing you the production code that makes this code pass but, again, keep in mind that it’s the two acting in concert that is the basis of practices like TDD. As I argued in the previous post, you could equally well argue this is BDD. By discussing the construction of this code along with developers, testers, and business, pressure was put not just on the design of the code but on the overall design of the behavior.

Incrementally Refine the Design

So as part of our nascent design, we’ve managed to encapsulate the idea that a new project is done. This would then lead to a question of what a “non-done project” is. Let’s say we end up with another test that drives our code, such as this:

  it 'that has an incomplete task is not done' do
    project = Project.new
    task = Task.new
	
    project.tasks << task
    expect(project).not_to be_done
  end

it 'that has an incomplete task is not done' do

project = Project.new

task = Task.new

project.tasks << task

expect(project).not_to be_done

end

This test is similar to the first one, but now we have a second domain concept introduced: the Task. We now strictly model the fact that a Project can contain Tasks. From a code perspective we indicate a tasks related attribute of the Project, but we’re still keeping this at the behavior level without worrying too much about implementation. The important thing to note is how this is forcing us to consider our level of abstraction and our domain terminology.

Also notice an assumption here. The assumption in this code is that a new task is incomplete. Therefore that also tells us another assumption we are making: a project with an incomplete task is not done. It’s very important when putting pressure on design to rise assumptions to the surface.

All of this, of course, brings up the distinction between complete and incomplete tasks. So now let’s consider this next bit:

  it 'recognizes tasks that have been completed' do
    project = Project.new
    task = Task.new
	
    expect(task).not_to be_complete

    task.mark_complete
    expect(task).to be_complete
  end

it 'recognizes tasks that have been completed' do

project = Project.new

task = Task.new

expect(task).not_to be_complete

task.mark_complete

expect(task).to be_complete

end

Notice here how we are defining, in broad strokes, how the application behaves but we are doing so outside of too many details of implementation. This is important because we can imagine a situation where there is a GUI interface (say a web site) as well as a service interface (an API). The notion of “mark complete” can have meaning in both, but certainly via different levels of interaction. Here we are just talking about the design.

As a tester or developer, at this point I know an empty project is done. (Perhaps it’s an open question with business as to whether a project with no tasks should be considered “empty.”) We know that tasks can be part of projects and they can be completed or incomplete. This then leads to the next point of discussion and design which we wrap in a test:

  it 'that has all tasks completed is done' do
    project = Project.new
    task = Task.new

    project.tasks << task
    task.mark_complete

    expect(project).to be_done
  end

it 'that has all tasks completed is done' do

project = Project.new

task = Task.new

project.tasks << task

task.mark_complete

expect(project).to be_done

end

Let’s again reflect. Am I doing BDD here? TDD? Well, kind of both, I would argue. The above code separates out into what BDD would look like:

# SCENARIO: a project with all tasks complete is done
  # GIVEN a new project and a new task

    project = Project.new
    task = Task.new

  # WHEN the task is marked as complete

    project.tasks << task
    task.mark_complete

  # THEN the project is considered done

    expect(project).to be_done

# SCENARIO: a project with all tasks complete is done

# GIVEN a new project and a new task

project = Project.new

task = Task.new

# WHEN the task is marked as complete

project.tasks << task

task.mark_complete

# THEN the project is considered done

expect(project).to be_done

This is all pretty important. We now have a Project that can be populated with Tasks. Those tasks can be marked as complete. Further, the project recognizes when it has incomplete tasks, thus recognizing that it is not done.

Encode Understanding

We have created tests that have encoded our understanding. Those tests have natural language aspects that can be used to communicate at different levels. Further, these tests operate at a behavioral level — without getting too much into implementation details — that can serve as a good regression test suite that can be updated as business rules change, but stay relatively static if the implementation changes.

If I wanted to push English up at this point, here’s a possible output based entirely on what I’ve shown you in the logic above:

a project
  that has no tasks is done
  that has an incomplete task is not done
  recognizes tasks that have been completed
  that has all tasks completed is done

Since the test logic is wrapped with English statements, those statements can be pushed back up to provide insight.

Use Tests to Pressure Design

As part of this project, as I mentioned in the first post, we need to be able to calculate how much of a project is remaining and the rate of completion, and then put them together to determine a projected end date. So the project ultimately needs to be able to calculate how much work is remaining. Let’s say the following is created:

RSpec.describe 'a project' do
  describe 'providing estimates' do
    let(:project)        { Project.new }
    let(:done)           { Task.new(size: 2, complete: true) }
    let(:small_not_done) { Task.new(size: 1) }
    let(:large_not_done) { Task.new(size: 4) }

    before(:example) do
      project.tasks = [done, small_not_done, large_not_done]
    end

    it 'accurately calculates the total size' do
      expect(project.total_size).to eq(7)
    end

    it 'accurately calculates the remaining size' do
      expect(project.remaining_size).to eq(5)
    end
  end
end

RSpec.describe 'a project' do

describe 'providing estimates' do

let(:project) { Project.new }

let(:done) { Task.new(size: 2, complete: true) }

let(:small_not_done) { Task.new(size: 1) }

let(:large_not_done) { Task.new(size: 4) }

before(:example) do

project.tasks = [done, small_not_done, large_not_done]

end

it 'accurately calculates the total size' do

expect(project.total_size).to eq(7)

end

it 'accurately calculates the remaining size' do

expect(project.remaining_size).to eq(5)

end

The book Rails 4 Test Prescriptions (from which I am borrowing the salient aspects of this example) has this to say about constructs like the above:

A couple of minor style choices make the test easier to manage. All the task objects have meaningful names so that at a glance I can tell each object’s reason for being in the test. If the tasks had descriptions or names I’d also give them meaningful data so that if the object gets printed to the terminal it’s easy to tell which object it is. The specific score numbers that I’m using for each are deliberate. Each task has a different score, and neither of the two adds up to the third, which is a very small thing that makes it harder to get a false positive test.

Use Code to Discuss

The important thing here is that, yes, this is code. But it is code that is understandable. There seems to be this fear of introducing code like this as part of a discussion with business teams. But why is that? After all, business teams certainly (and rightly) expect developers and testers to understand their language and their business domain. And, guess what, that business team is operating in the context of a technical discipline. So there’s no reason they should be unexposed to what makes their business ideas realizable in a technical form.

This is probably one of the most important ideas I ultimately want test teams — and teams they work with — to start embracing. BDD, speaking generally, has worked to try to insulate the business from code and I think that’s a terrible mistake. I’ll also be the first to admit this is an idea I’m coming to after firmly drinking the BDD Kool-aid for quite some time.

Refactor to the Business Domain

There’s one last bit I want to show you here and this time it jumps into some (admittedly simplified) code. Here is what the Project class looks like that satisfies the above tests:

class Project
  attr_accessor :tasks

  def initialize
    @tasks = []
  end

  def done?
    # tasks.reject(&:complete?).empty?
    incomplete_tasks.empty?
  end

  def incomplete_tasks
    tasks.reject(&:complete?)
  end

  def total_size
    tasks.sum(&:size)
  end

  def remaining_size
    # tasks.reject(&:complete?).sum(&:size)
    incomplete_tasks.sum(&:size)
  end
end

class Project

attr_accessor :tasks

def initialize

@tasks = []

end

def done?

# tasks.reject(&:complete?).empty?

incomplete_tasks.empty?

end

def incomplete_tasks

tasks.reject(&:complete?)

end

def total_size

tasks.sum(&:size)

end

def remaining_size

# tasks.reject(&:complete?).sum(&:size)

incomplete_tasks.sum(&:size)

end

The commented lines in the done? and remaining_size methods are there to show how a particular bit of code was refactored as part of the test design. Specifically, an incomplete_tasks method was created. As the book says:

[we wrapped] a slightly opaque functional call containing a negative condition in a method with a semantically meaningful name. And if the definition of completeness changes, we only have to change one location.

This type of refactoring does make the code more clean but it also makes the code easier to discuss with business. And that gets interesting for another reason entirely.

To explain, looking at that code, you might notice another bit of duplication there. At two points there is a summation of the tasks by calling sum(&:size). If you have a developer mindset, you might wonder if you could refactor that down to a method as well, such as sum_tasks or something like that.

But … does that method make sense on a Project? It certainly doesn’t make sense for the Task class. But it’s not clear that a Project should have this responsibility either. Yet, as the book indicates, this leads us to question whether the Project is even the correct abstraction. Perhaps what we really want is a TaskList for most of these activities. Then a Project would hold very little except references to TaskList objects.

And the reason this is so important is because deciding those abstraction levels, which reflect the business domain, is very important to being able to align business, developers, and testers along the same axes of discussion. And please note that this is putting pressure on design at both levels — intent and implementation — but still using code — production and test — as the ultimate specifications.

Expressive and Intent-Revealing Code

One final thing I’d like to point out. Here I’m showing code, both production and test, written in Ruby. Clearly having the test language aligned with the production language makes sense when you are taking about testing at the unit and/or integration level. But when you start to get into integrated (and thus “system” and “acceptance”), you don’t necessarily have to use the same test language as your production language.

As I’ve talked about before, sometimes it makes sense to align your test language with your development language. Other times, however, your test language is not necessarily your development language. And, of course, you can have a resilient strategy wherein you are a bit polyglot in your approach.

I mentioned the “push English up” idea earlier to showcase just the natural language part of the specs. This is exactly what I was talking about regarding “pushing English up” versus “pulling English down” when I showcased the use of a tool like Serenity in the context of Cucumber.

This idea does have some impact on BDD style tools. Adding Cucumber or whatever else often means adding the pure English abstraction layer (“feature files”), then a secondary layer (“step definitions”) that is used to match the English to regular expression annotated methods, and then finally that secondary layer delegates down to some code that performs the actions. Using an approach like what I’ms showing here can short circuit a lot of that. Further, you can output the natural language information from the code-based spec file. I’ve been showing Ruby here but an example of doing this in a Java context might be using a tool like Spock.

Even with all that being said, the nature of the test code needs to be written in such a way that it leverages a DSL if you want to have it act as a communication mechanism for different people. It may seem that I’ve stacked the deck here with Ruby, given that it is a clean language with very little boilerplate. But as I showed in those posts on Serenity, you can do something like this with Java as well:

Actor jeff = Actor.named("Jeff");
jeff.can(BrowseTheWeb.with(theBrowser));

givenThat(jeff).wasAbleTo(StartWith.anEmptyTodoList());
when(jeff).attemptsTo(AddATodoItem.called("Digitize JLA vol 1 collection"));
then(jeff).should(seeThat(TodoItemsList.displayed(), hasItem("Digitize JLA vol 1 collection")));

Actor jeff = Actor.named("Jeff");

jeff.can(BrowseTheWeb.with(theBrowser));

givenThat(jeff).wasAbleTo(StartWith.anEmptyTodoList());

when(jeff).attemptsTo(AddATodoItem.called("Digitize JLA vol 1 collection"));

then(jeff).should(seeThat(TodoItemsList.displayed(), hasItem("Digitize JLA vol 1 collection")));

The same thing could be done in C#:

IActor jeff = new Actor("Jeff");

jeff.Can(BrowseTheWeb.With(webDriver))
   .WasAbleTo(StartWith.AnEmptyTodoList())
   .AttemptsTo(ToDoItem.AddAToDoItem("Digitize JLA vol 1 collection"));

jeff.AsksFor(TheItems.Displayed()).Should().Contain("Digitize JLA vol 1 collection");

IActor jeff = new Actor("Jeff");

jeff.Can(BrowseTheWeb.With(webDriver))

.WasAbleTo(StartWith.AnEmptyTodoList())

.AttemptsTo(ToDoItem.AddAToDoItem("Digitize JLA vol 1 collection"));

jeff.AsksFor(TheItems.Displayed()).Should().Contain("Digitize JLA vol 1 collection");

Using code as a specification does mean you want that code to be as boilerplate-free as possible, at least in terms of the test code that you use to communicate with different teams. Ideally your code should help you be expressive and intent-revealing. When you can accommodate such an approach, you will often feel much more comfortable about the idea of code being the ultimate specification.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …

Incrementally Refine the Design

Encode Understanding

Use Tests to Pressure Design

Use Code to Discuss

Refactor to the Business Domain

Expressive and Intent-Revealing Code

Leave a Reply Cancel reply