The Challenge of Testing

Awhile back I talked about what makes testing complicated. To be honest, that post is embarrassingly written as I look back on it. That said, I think there is some value in what it says. But to show how my thinking has refined, as well as become a bit more operational, let’s piggy-back on my previous “code as specification” posts and look at why testing is challenging.

It might help you to refer to some previous posts (When Code Is The Specification and Driving Design with Code as Specification) just so you have the context. I’ll also remind everyone that I am selectively borrowing code from a book called Rails 4 Test Prescriptions, which is an excellent book, albeit it is focused on a specific language (Ruby) and even a specific tool (Rails) in the context of that language. What I’m doing is distilling some points to put the challenges of testing in more context.

In those first posts those points were distilled in an effort to show how test and production code, acting together, can be the executable specification, particularly as they drive design. Here I want to showcase a bit about how testing can be reflective of the design, but that this can lead people to fall into a few traps.

One final note before we get to it.

While the test code I’m showing from the book is more predicated upon unit testing, these same ideas I’m talking about apply with code at any level of testing. In fact, the complexity I’m talking about increases the further you get from the code. And the further you get from the code, the more intervening “specifications” there likely are. When you stay close to the code, but include the business domain, you start to reduce your sources of truth, as talked about in my modern testing post on that very subject.

Modeling the Domain Concept

From those previous posts, I showed you the beginnings of a Project and Task API and how the testing was putting pressure on two levels of design. The goal, as discussed with the business team in the first post, is to use the API to let people calculate a projected completion date for their projects. The rate of task completion will be the velocity of the project as a whole. So what this means is that the tasks need to be aware of whether they have been completed in a particular window of time that is considered to be “in velocity”.

So let’s say design has begun on the “calculating velocity” part. With that, we can jump to some possible test code here:

  describe 'indicating velocity' do
    let(:task) { Task.new(size: 3) }

    it 'counts a recently completed task toward velocity' do
      task.mark_completed(1.day.ago)
      expect(task).to be_part_of_velocity
      expect(task.points_toward_velocity).to eq(3)
    end

    it 'does not count a long-ago task toward velocity' do
      task.mark_completed(6.months.ago)
      expect(task).not_to be_part_of_velocity
      expect(task.points_toward_velocity).to eq(0)
    end

    it 'does not count incomplete tasks toward velocity' do
      expect(task).not_to be_part_of_velocity
      expect(task.points_toward_velocity).to eq(0)
    end
  end

describe 'indicating velocity' do

let(:task) { Task.new(size: 3) }

it 'counts a recently completed task toward velocity' do

task.mark_completed(1.day.ago)

expect(task).to be_part_of_velocity

expect(task.points_toward_velocity).to eq(3)

end

it 'does not count a long-ago task toward velocity' do

task.mark_completed(6.months.ago)

expect(task).not_to be_part_of_velocity

expect(task.points_toward_velocity).to eq(0)

end

it 'does not count incomplete tasks toward velocity' do

expect(task).not_to be_part_of_velocity

expect(task.points_toward_velocity).to eq(0)

end

I don’t want you to worry about production code right now. Just soak in the tests above. Ruby is pretty readable even if you don’t know Ruby. So, again, just soak it in.

What you should immediately notice is that I’m specifying the dates relative to whenever the tests are executed. This is important because as many of us know, hard-coding time and date values into tests is usually a recipe for disaster somewhere along the way. Notice that having a nice DSL that allows the information to be readable (“6.months.ago”) is a huge boon for making tests understandable.

But beyond the particular trap of absolute dates that we just avoided, notice something else: our specificity above seems to imply we have understanding. Do we? I’m using 6.months.ago for the out-of-velocity task and 1.day.ago for the in-velocity task. But this doesn’t tell us the boundary, does it?

Just looking at the test code, without access to the production code, what is the boundary for “out-of-velocity?” I might assume anything that was six months ago is out of velocity. But if you’ve ever worked in any sort of “agile” environment before, that probably gets your Spidey-Sense to tingling. Six months seems like a long time.

So … what is the boundary between in-velocity and out-of-velocity tasks? Right now, we don’t know. And the tests aren’t helping us. They are not reflective of the design we are trying to achieve. Or, at least, they have not reflected a fairly crucial bit of that design.

TDD … But Not Always

You might argue that if we are doing this in a TDD fashion, our tests would drive the code. But sometimes that’s simply not the case. You are writing tests after the fact. Sometimes the code is your specification, whether you want it to be or not. So let’s dig in to that code. Ah ha! Doing so we see this in the Task class:

  def part_of_velocity?
    return false unless complete?
    completed_at > 3.weeks.ago
  end

  def points_toward_velocity
    if part_of_velocity? then size else 0 end
  end

def part_of_velocity?

return false unless complete?

completed_at > 3.weeks.ago

end

def points_toward_velocity

if part_of_velocity? then size else 0 end

end

So if our eyes are not deceiving us, it seems that if the task was completed less than three weeks ago, it is considered part of velocity. Note how that wording — “less than three weeks ago” — can be confusing given the code, which is talking about “greater than 3 weeks ago.” The notion in code is whether the date value is greater than a value of 3 weeks ago, but the human notion is less than three weeks ago.

This is another trap we can fall into: mixing up our domains by the language constructs.

But, anyway, now we have our boundary: three weeks. So what if I did this to make the tests more reflective of the design:

    it 'does not count a task right at the boundary toward velocity' do
      task.mark_completed(3.weeks.ago)
      expect(task).not_to be_part_of_velocity
      expect(task.points_toward_velocity).to eq(0)
    end

it 'does not count a task right at the boundary toward velocity' do

task.mark_completed(3.weeks.ago)

expect(task).not_to be_part_of_velocity

expect(task.points_toward_velocity).to eq(0)

end

That could work but even that could be confusing. Our human notion of reading that is likely that it was exactly three weeks ago, therefore it should pass. It could be easier to consider this in terms of days. What if I express that first statement like this:

task.mark_completed(21.days.ago) # <-- fails

task.mark_completed(20.days.ago) # <-- passes

task.mark_completed(21.days.ago) # <-- fails

task.mark_completed(20.days.ago) # <-- passes

Yet consider that what this might be suggesting is a problem in our code. We should be checking for whether the date is exactly three weeks ago as well as “greater than”. That may seem obvious in retrospect but there are many traps like this you can fall into, even when testing is guiding your design.

Why Developers Need Testers

The Rails 4 book brings up a very interesting point about the original dates chosen (6.months.ago and 1.day.ago):

There’s a different interesting question about the test design and the dates. Neither six months nor one day is particularly close to the boundary between in-velocity and out-of-velocity tasks. Shouldn’t I test more days or test something closer to the boundary? This question reflects the difference between testing as a design aid and testing for verification. In strict TDD you would avoid writing a test that you expect to pass, because a passing test doesn’t normally drive you to change the code.

The bolded emphasis is mine and I do that because that is such a crucial point! Massively crucial! It is one of the key indicators of why developers “miss tests” that testers have to catch. Developers are not writing the code, or even the tests in TDD style, to catch every problem. They are writing tests to put pressure on design at the code level. In some cases, that will mean a more comprehensive set of tests. But in other cases, it will mean enough to drive the code but no more.

It is this kind of example that I rarely see presented out there as to why testers and developers are both necessary. Developers are often using test code only to suggest the next bit of code they should write. They are not necessarily attempting to be comprehensive. This is why coming up with “all” unit tests during TDD is a fallacy. You will backfill unit tests as you go along and do testing at other levels.

What I’m talking about here is one of the key dynamics of the tester-developer relationship. It’s fundamental to understanding why the conflation of the tester and developer (in so-called SDET roles) has often been less than stellar, at least in my experience.

Test Data Provides the Context

In my previous post, right under the section of “Use Tests to Pressure Design”, I showed a bit of code that was being driven by the tests. I’ll repeat that bit here:

  describe 'providing estimates' do
    let(:project)        { Project.new }
    let(:done)           { Task.new(size: 2, complete: true) }
    let(:small_not_done) { Task.new(size: 1) }
    let(:large_not_done) { Task.new(size: 4) }

    before(:example) do
      project.tasks = [done, small_not_done, large_not_done]
    end

    it 'accurately calculates the total size' do
      expect(project.total_size).to eq(7)
    end

    it 'accurately calculates the remaining size' do
      expect(project.remaining_size).to eq(5)
    end
  end

describe 'providing estimates' do

let(:project) { Project.new }

let(:done) { Task.new(size: 2, complete: true) }

let(:small_not_done) { Task.new(size: 1) }

let(:large_not_done) { Task.new(size: 4) }

before(:example) do

project.tasks = [done, small_not_done, large_not_done]

end

it 'accurately calculates the total size' do

expect(project.total_size).to eq(7)

end

it 'accurately calculates the remaining size' do

expect(project.remaining_size).to eq(5)

end

None of that reflects the date and time considerations at all. When we do reflect the dates, we can actually drive the design by tests that talk about velocity, project rate, projected time remaining, and — perhaps most importantly — an indicator of whether the project is on schedule or not. What would that look like? How about this:

  describe 'providing estimates' do
    let(:project)          { Project.new }
    let(:done_in_time)     { Task.new(size: 3, completed_at: 1.day.ago) }
    let(:not_done_in_time) { Task.new(size: 2, completed_at: 6.months.ago) }
    let(:small_not_done)   { Task.new(size: 1) }
    let(:large_not_done)   { Task.new(size: 4) }

    before(:example) do
      project.tasks = [
        done_in_time, not_done_in_time,
        small_not_done, large_not_done
      ]
    end

    it 'accurately calculates the total size' do
      expect(project.total_size).to eq(10)
    end

    it 'accurately calculates the remaining size' do
      expect(project.remaining_size).to eq(5)
    end

    it 'accurately calculates its velocity' do
      expect(project.completed_velocity).to eq(3)
    end

    it 'accurately calculates its rate of task completion' do
      expect(project.current_rate).to eq(1.0 / 7)
    end

    it 'accurately calculates its projected time remaining' do
      expect(project.projected_days_remaining).to eq(35)
    end

    it 'accurately determines if it is on schedule' do
      project.due_date = 1.week.from_now
      expect(project).not_to be_on_schedule

      project.due_date = 6.months.from_now
      expect(project).to be_on_schedule
    end
  end

describe 'providing estimates' do

let(:project) { Project.new }

let(:done_in_time) { Task.new(size: 3, completed_at: 1.day.ago) }

let(:not_done_in_time) { Task.new(size: 2, completed_at: 6.months.ago) }

let(:small_not_done) { Task.new(size: 1) }

let(:large_not_done) { Task.new(size: 4) }

before(:example) do

project.tasks = [

done_in_time, not_done_in_time,

small_not_done, large_not_done

]

end

it 'accurately calculates the total size' do

expect(project.total_size).to eq(10)

end

it 'accurately calculates the remaining size' do

expect(project.remaining_size).to eq(5)

end

it 'accurately calculates its velocity' do

expect(project.completed_velocity).to eq(3)

end

it 'accurately calculates its rate of task completion' do

expect(project.current_rate).to eq(1.0 / 7)

end

it 'accurately calculates its projected time remaining' do

expect(project.projected_days_remaining).to eq(35)

end

it 'accurately determines if it is on schedule' do

project.due_date = 1.week.from_now

expect(project).not_to be_on_schedule

project.due_date = 6.months.from_now

expect(project).to be_on_schedule

end

That, to my way of thinking, is some incredibly concise test code that is still reflective of the business domain, right? With the exception of Python (maybe) and Groovy (definitely), Ruby makes this more concise than most other languages can hope for.

And yet …

Test Data Can Lead To Magic Numbers

The above code does perhaps introduce some interesting numbers. One of the expectations uses an equation (1.0 / 7) and another uses a hard-coded value (35). Do you know what those numbers represent?

This is another trap many tests fall into: providing numbers based on some underlying implementation. Okay, so what if I show you the code from the Project class?

  def completed_velocity
    tasks.sum(&:points_toward_velocity)
  end

  def current_rate
    completed_velocity * 1.0 / 21
  end

def completed_velocity

tasks.sum(&:points_toward_velocity)

end

def current_rate

completed_velocity * 1.0 / 21

end

Oh, by the way, notice how the code in the project is depending on the tasks attribute? We have some coupled code there.

This is a bit of a digression and this isn’t something I’m going to tackle here but it’s really important because even in this extremely simple example, you can see how coupling begins to happen with code. And when there is coupling, there is more complexity. And when there is more complexity, it is difficult to test things in isolation. And when it’s difficult to test things in isolation, you end up needing heavier integrated and system tests to prove everything out. This is another key area of where testing gets challenging.

Anyway, the point of showing you this code is that it’s showing you why this test line works:

expect(project.current_rate).to eq(1.0 / 7)

1	expect(project.current_rate).to eq(1.0 / 7)

The current rate depends on the completed velocity. That in turn relies on a summation of points_toward_velocity, which is what I showed you earlier in this post. Here it is again:

  def part_of_velocity?
    return false unless complete?
    completed_at > 3.weeks.ago
  end

  def points_toward_velocity
    if part_of_velocity? then size else 0 end
  end

def part_of_velocity?

return false unless complete?

completed_at > 3.weeks.ago

end

def points_toward_velocity

if part_of_velocity? then size else 0 end

end

Following all that? So if we go with all this production code and we look again at our test data:

    let(:done_in_time)     { Task.new(size: 3, completed_at: 1.day.ago) }
    let(:not_done_in_time) { Task.new(size: 2, completed_at: 6.months.ago) }
    let(:small_not_done)   { Task.new(size: 1) }
    let(:large_not_done)   { Task.new(size: 4) }

let(:done_in_time) { Task.new(size: 3, completed_at: 1.day.ago) }

let(:not_done_in_time) { Task.new(size: 2, completed_at: 6.months.ago) }

let(:small_not_done) { Task.new(size: 1) }

let(:large_not_done) { Task.new(size: 4) }

… what does all this tell us? It tells us that we have three tasks, with a total size of 7, that are not part of the velocity and we have one task, with a size of 3, that is part of the velocity. So the current_rate method will be calculating the following:

3 * 1.0 / 21 = 0.1428

And that does equal — as per the test expectation — the value of 1.0 / 7.

So now it’s all clear, right? Err … not so much. But at least we know it makes sense, sort of. We know it now because we did the archaeology necessary. Will we remember it later? Will someone else new to the project know it without having to do that archaeology?

What about that hard-coded value of 35? According to the test expectation, that’s relying on some calculation of projected_days_remaining. Here’s the code for the method along with the methods it relies on:

  def remaining_size
    incomplete_tasks.sum(&:size)
  end

  def current_rate
    completed_velocity * 1.0 / 21
  end

  def projected_days_remaining
    remaining_size / current_rate
  end

def remaining_size

incomplete_tasks.sum(&:size)

end

def current_rate

completed_velocity * 1.0 / 21

end

def projected_days_remaining

remaining_size / current_rate

end

Well, we know that the current rate is 0.1428. We just calculated that. So the remaining size is the summation of the tasks not yet complete, which are sized at 5. So 5 / 0.1428 = 35. Again, though, it’s not very clear, right?

This is another common trap: when you have to do a bunch of software archaeology to figure out not only how things work but why the tests work in the way that they do and why certain values were chosen as they were. And notice, by the way, that this archaeology can be even more confusing when the “magic” numbers are in the source code as well. For example, our current_rate method in the code can seem a little odd. So can the on_schedule? method.

Let’s dig a tiny bit into that.

Code Reflects Design

Consider this:

  def on_schedule?
    (Date.today + projected_days_remaining) <= due_date
  end

def on_schedule?

(Date.today + projected_days_remaining) <= due_date

end

The notion of Date.today + projected_days_remaining really means the projected end date of the project. Here’s a perfect chance to make the code a bit more reflective of the domain concept:

  def projected_end_date
    Date.today + projected_days_remaining
  end

  def on_schedule?
    projected_end_date <= due_date
  end

def projected_end_date

Date.today + projected_days_remaining

end

def on_schedule?

projected_end_date <= due_date

end

When you start to treat code as a specification, this kind of refactoring is important. Notice that this change would require no changes at all to the tests I have been talking about. In fact, that’s the whole point of a refactoring: it does not change the outward behavior.

Let’s consider another little insidious problem that crops up. Consider the current_rate method from the Project class:

  def current_rate
    completed_velocity * 1.0 / 21
  end

def current_rate

completed_velocity * 1.0 / 21

end

And also consider the part_of_velocity? method from the Task class:

  def part_of_velocity?
    return false unless complete?
    completed_at > 3.weeks.ago
  end

def part_of_velocity?

return false unless complete?

completed_at > 3.weeks.ago

end

Here the “21” and the “3.weeks.ago” are actually referring to the same type of value: the boundary of in-velocity tasks. Here’s a bug just waiting to happen. If a change is made — moving to a two week velocity measurement, for example — then both areas need to be changed. The change to the Task class stands out (“3.weeks.ago” to “2.weeks.ago”) but does that same visibility apply to the “21” in the current_rate method? I think not.

What Are You Modeling

Yet where do we put something that both Project and Task can recognize and call on? I want to extract the “21” and the “3.weeks.ago” to one common place. But where? The Rails 4 book says:

It’s not clear what to do with this information. To me the velocity length feels most like a static constant value owned by the Project class, since velocity applied to a single task makes no sense.

That makes sense but the Task class will then have to directly reference this constant from the Project class. We’re back to that coupling again, that I mentioned earlier. See how once it starts, it’s easy to keep promoting it? This is a key aspect of what makes testing challenging, particularly when a change “here” impacts something “there” — even though everyone swears that should not happen.

Regardless of what ultimately gets decided, the point is that these are challenges developers have to face all the time. It’s important for testers to understand that. And when these problems rear their heads, developers are often left wondering if the abstraction level is correct. Again, going with the book:

The way that particular value is needed by both the Project and Task classes makes me wonder if we really just need a VelocityCalculator class.

But what is that change going to do to everything, including tests?!?

This is why it is hard to correct for a lot of domain modeling mistakes that are made early on. This is why testing needs to be putting pressure on the overall design as early as possible. This is why concepts like domain-driven design should probably be given more credence than they usually are.

The Industry Challenge

I’m very critical of what I write. I know it’s never quite good enough. With that said, I’m actually somewhat proud of my last three posts. I think they showcase exactly where my mindset is now in the testing field. But more importantly I think they showcase some of the challenges testers have in terms of using tests to interface with the business and with developers.

As I’ve said in other modern testing posts, I do believe more development-thinking has to be brought into test-thinking, rather than the reverse as is often suggested. I also think that more development-thinking has to be brought into business-thinking. This is a challenging and fun arena for testers to work in, but it does require the industry as a whole to start realizing these challenges.

To be fair, this is a tricky balance. I believe we have to be wary of the technocrat tester, yet still look for technical testers with broad skills all the while making sure we have a healthy focus on test-specialists, but technology-generalists..

There is a lot to consider in all of that. I do believe our testing industry has to reinvent itself a bit, around its craft. I want to help with that but, before I could, I felt I had to reinvent myself in the context of my career. I’m still wrestling with a ton of questions and I still struggle with my writing style. This series of “modern testing” posts have been guiding me down some path. Whether that’s a fruitful path remains to be seen. If nothing else, I hope some insight into my own journey is useful to others in the industry who are on their own journeys.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …

Modeling the Domain Concept

TDD … But Not Always

Why Developers Need Testers

Test Data Provides the Context

Test Data Can Lead To Magic Numbers

Code Reflects Design

What Are You Modeling

The Industry Challenge

Leave a Reply Cancel reply