An Ode to Testability, Part 6

We’re continuing to build out our Benchmarker application, putting pressure on design as we go and keeping testability front-and-center as the quality attribute we want to provide, enhance, and maintain. Keep in mind we’re still slogging toward value, assuming that, for the most part, we’re handling correctness as we go along.

In the last post we dealt with the Activity a lot. Let’s now focus on our DigSite.

We have to determine the pace of the archaeological team, which means determining the rate of their work in terms of finishing activities. That should ultimately tell us how many days it will take, assuming that pace, to finish the dig. That will allow our users to ask the question: “Are we on schedule?”

So for the dig site (dig_site_spec.rb) tests, I’ll create a new section of tests with just the setup first. This setup is similar to what we did for the “costs” tests for our DigSite.

Again, as you can see, this is very similar to what we did before when we set up the context for costs.

Evolving the Code: Calculating Pace

What’s the key bit of evolution we need for our code right now? Well, arguably, that would be that the dig site — based on the context given here — should be able to calculate its pace.

Hmm. Notice how that sounds. The dig site is calculating its pace. It might make you wonder if we’re stretching our abstraction a bit too far. These kinds of warning bells often go off and just as often are ignored. Yet, the very fact that there is a warning bell doesn’t mean you are doing something wrong necessarily. But developers and tester do have to be attuned to these warning signals.

So let’s get a test for this ability that we want to provide, which will count as a successful interaction with the feature. Before I show you it, take a minute and look at that context I provided above. What would the project pace be given that data?

Keep in mind that you are going to write a test with an expectation of an answer to that question. So actually having that answer is a good thing. This would be the case even if we weren’t writing code-based tests as we are here. Remember: we’re asking about successful interactions with the feature and our expectations are about the outcomes that we are looking for given that interaction.

We can add this test:

Does that expected value make sense to you, given the context data?

That test requires the addition of the following logic to our dig site (dig_site.rb):

That would pass just fine.

Evolving the Code: Calculating Rate

Okay, what’s next? Well, the pace is fine and all but in order to do a projection of when our dig might end, we need to make sure we know the rate that the dig team is working at. But … wait. Isn’t the “pace” the rate at which they are working? Interesting, right? I just kind of kept talking about team pace and the rate of the team. Operationally we sort of said that the rate is the amount of activities that are being finished. But again … wait. Isn’t that what I just looked at with my test for calculating the pace?

I’m deliberately adding some possible confusion to my description here to couple that with the very empirical fact that we have tests and you’ve seen the code being added for each test. And, further, our tests are passing based on a set of data. So it’s possible to have all this confidence-inspiring stuff in your code — test and production — and yet still run into a little cognitive friction.

Is that friction not clear yet? Okay, for now, let’s add a test for the “rate” as such:

Notice the wording on that test. The “current rate of finishing activities.” And yet notice the code in our two tests so far. The first refers to “finished_pace” and the second test, about rate, refers to “current_pace”. So — what’s the difference? Aren’t both tests about pace? Or is one about pace and one about rate? Or are they both really about rate? After all, isn’t the “rate of finished activities” the “pace of the team”?

It might be unclear from the two tests we just added why these things are different. And that can easily lead to situations where people are talking about the same thing using different terms or talking about different things but using the same term.

This is really important because what I showed you here is something that happens all the time and it’s a key driver of complexity in our projects. I purposely showed you how this happens in “slow motion,” as it were, but you can imagine how amplified this can be in larger code bases, with more complex relationships, and under time pressures.

As far as our current work, I’ll ask you to remember the conversation we had with business in the first post, where it was said “we want to use the rate of activities being finished — the team’s pace”. So pace is being used as a term that means the rate of something. In this case, the rate of activities that are being finished as part of the dig.

Here’s how we have to update our dig site:

I assure you that these tests would pass. But … and this is a key question for you testers! … can you determine why?

Do you feel confident with these tests and with the production code created to make them pass?

If so, do you understand the calculations and why the numbers were chosen?

Please take some time and figure it out. This is making a crucial point.

Understand Your Domain Calculations

In our last test, let’s consider what happens when the current_pace method is called. In that case, the “finished_pace” value would be 2. (Why? Look at our test data.) That calculation is thus “2 * 1.0 / 14”. Why 14? You’ve probably guessed that. What did we say in our part_of_pace? method in our Activity class that we developed up to this point? That method looked like this:

We had “14.days.ago”. So here the 14 is referring to those 14 days of our duration window for activities.

Fair enough. But then why does my test have a calculation of “(1.0 / 7)”?

Look at the test data — the context provided to the test — and see if you can figure out why this is. And please do try this out. It will be important for you to see how certain mistakes happen, not just that they do.

Do you think you have it? Okay, now let’s try something different. I purposely used different cost values for my test setup in the “pace” tests than I did in the “costs” tests. So let’s get our data closer to that which we used in the “costs”, just for the sake of consistency. So in the dig_site_spec, under the “pace” block, the context will change to this:

Note the differences in costs, which are the only changes. If you were to put that in place you would find that the tests would still pass. Does that make sense to you based on how you think the calculation is working?

Think about it. Once you have, change the data slightly again, like this:

Here both of our tests under “pace” would fail. The one failure is pretty obvious. The first pace test just needs to be changed as such:

But does it make sense to you why this test is failing?

What number has to go in place of 7 to make this work?

And that is my point. I just took you through a test where it passed with a certain context. Then we changed the context and it still passed. But then we changed one additional part of the context and it failed. That should worry you a bit, at least if you felt you had understanding of why the number 7 was chosen.

So why was the value 7 chosen?

Consider that, in our original test context, we had tasks that were unfinished (cost 2 and 4) and a task finished outside the time window of two weeks (cost 1). Those add up to 7. So this kind of makes sense, right?

But in the modified context, we have tasks that are unfinished (cost 2 and 25) and a task finished outside the time window of two weeks (cost 10). Those add up to 37. So clearly the addition — which is what many people leap to when deciding why 7 was used — is not correct. It literally just happened to work out that way that the test data and the calculation seemed to have a direct correspondence. That second test, in fact, needs to become this:

So what you see here is that in the test with our first context, it was actually 14 days divided by the total cost of finished activties in the time window: 14 / 2 = 7.

In the case of our modified test context, we had to take 14 and divide that by some number that would equal 5, which is the known finished activity cost. And that happens to be 2.8.

What you’re seeing here is that I made a calculation that I had to figure out in order to make my test pass. It is an accurate calculation but it’s opaque at best if you weren’t involved in understanding why it was made. And, by pure happenstance, the test context data was originally just such that you could be led to an entirely wrong conclusion about how things were working.

Reasoning About Code Intent

So one point here is another danger that crops up. Whether you have done all this work at the start (test-first), concurrent (test-next), or whether you are doing a bit of code archaeology and have to figure out these things from a distance, you can run into these situations where you’re trying to reason about the code from the tests — or the tests from the code! — and stumble across elements that seem to be necessary and yet don’t seem to make sense with what you know or at least with what you can figure out.

I just took you through a bit of a long slog, though, so please try to internalize a bit of what I’m saying here and, if necessary, go through that example again to see what I’m talking about.

It’s quite possible for us to introduce bits of intransparency into our code. Further, ask yourself this: is it actually clear to you how this is working (we know it’s working correctly) such that you believe it will add value? If things are a bit murky for you now, at this low-level, imagine how it will be as we start adding abstractions on to this, such as APIs or a web interface. Pretty soon those little details are going to be buried pretty low.

What I just showed you here is very contrived, admittedly, but it does show you how technical debt starts to creep in and how maintainability can take a bit of a hit. My goal was to make sure you saw that in action.

This also shows, albeit a bit epiphenomenally, that no matter how much automation you have at a higher level of abstraction, the level of abstraction we’re working at right now is exactly the place you want to find out these kinds of problems.

Evolving the Code: Remaining Days

Let’s (finally!) get to that key aspect of the value we started out with: projecting how many days of work likely remain on the dig site, given the pace of finished activities.

Let’s add a test:

And there seems to be yet another number pulled out of thin air, right? Take a moment and figure out how 75.6 might be arrived at.

The logic to add to the DigSite is:

So, 75 days is about 10 weeks, which is a little over 2 months.

Reasoning About Calculations and Values

Notice here a difference in how we express our tests, if you just compare the expectations:

One is a calculation — and one that we determined might be a little obtuse — and the other is a direct number that we figured out from a calculation. So we don’t have a level of consistency in how we are expressing elements of our tests that rely on calculation.

There’s also another danger here. One of those dangers is that when we use a calculation in our test — as we do for the current_pace — we are tying the intent of the test more closely to the implementation of the logic than we are otherwise. This can be a very subtle point and it depends very much on the nature of the calculations you are considering. But, nevertheless, this is something to be watchful of.

This is an important point! This is how our tests become tied to a specific implementation rather than an intent and thus this is what locks our tests into evolving with the implementation of the business rules rather than the business rules themselves.

And lest you think this is a code-only problem, as a tester you can consider how you would relate this information in so-called “higher-level” tests. If you were writing BDD style scenarios, for example, in a feature file, what would be the level of expression that you would be going for here? I won’t take you through all that but it is instructive to think about it.

Where to Next?

Okay, I think there’s probably one post left in me for this series. So join me in the next (and final!) post as we at last provide the final bit of value that we talked about way back in the first post, which was determining whether the dig site is on schedule or not.


This article was written by Jeff Nyman

Anything I put here is an approximation of the truth. You're getting a particular view of myself ... and it's the view I'm choosing to present to you. If you've never met me before in person, please realize I'm not the same in person as I am in writing. That's because I can only put part of myself down into words. If you have met me before in person then I'd ask you to consider that the view you've formed that way and the view you come to by reading what I say here may, in fact, both be true. I'd advise that you not automatically discard either viewpoint when they conflict or accept either as truth when they agree.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.