An Ode to Testability, Part 5

In the previous post we ended up creating tests with a context. And that context was allowing us to bridge the gap between correctness and value while also continuing to put focus on testability. We saw some warning signs along the way but, overall, made progress. Here we’ll continue that progress and also start to see how while testability is something to strive for, just doing so by itself guarantees us very little.

We’ve come pretty far with our Benchmarker software: we have the notion of a dig site and activities on those dig sites modeled as domain objects. We can distinguish between activities that are finished and unfinished, which in turn lets us figure out if a dig is finished or not. We can set costs for an activity. But we’re still a bit away from delivering value, right?

Going all the way back to the first post, one of the core values we wanted to provide was to be able to calculate a projected date for the dig to be finished.

The requirement is to calculate the dig’s end date based on the number of activities finished in some duration of time. Ah, so now here’s another thing we have to figure out. What’s the “duration of time”?

Well, first of all, let’s consider that what we are overall looking at here is the rate of activities being finished, which is the pace of the dig team. So, for now, let’s use the barometer of two weeks. Thus the dig team will consider batches of activities over the course of two week intervals, giving them a nice way to roughly approximate work based on a month. A month is chosen because that allows the organizations that fund digs to more easily determine time frames, since months are what they might use for budgeting purposes.

Boundaries Provide an Impetus

In much of our testing, we evolved our code when we found ourselves at boundaries that we had to deal with. Those boundaries normally corresponded to some aspect of state. And we’re still finding that. What our pressure on design is now telling is that we need to distinguish between activities that concluded in the last two weeks from those that did not.

It’s pretty clear here that our focus is thus on having our Activity instances be aware of whether they have been completed in a two-week window. Well, hold on: is that pretty clear? I mean, we could imagine that this is something stored on the DigSite, couldn’t we? Or is that ridiculous? Much technical debt accrues from answering exactly these kinds of question wrong.

And that’s relevant to testers and developers because what we’re defining here as we move along is the interface. We are talking about the messages that entities in our domain send to each other. These become a contract by which our code aligns with our requirements. So it’s important to consider what is talking to what; what kind of messages (data) they can send back and forth; and even more importantly what kinds of messages (data) they reject. Those interfaces are all boundaries.

And notice how this is the case regardless of whether you are in a distributed or microservice-based context. I bring that up only because in many discussions of modern testing, people often draw faulty lines around test techniques as being only applicable in, say, a microservices context.

But, for now, let’s say Activity makes the most sense for this.

Evolving the Code: Pace

Our pressure on design has led us to evolving the code such that we can check what kinds of activities count towards pace. Here I’ll show tests that we might start adding for activities. I’ll do this in pieces and then show the full code, similar to what I did in the previous post.

We’ll create a new block of tests (this will be in activity_spec.rb):

And this will have the benefit of letting me show you directly how a passing test may not mean anything. Meaning, it may not show correctness and it may not provide value. Let’s modify our Activity class (activity.rb) based on this test accordingly:

That will pass just fine. Even though one of the methods has nothing in it! Just take note of that. I don’t want to go down too many rabbit holes here. But here I can also show you how the developer might have been thinking ahead. What’s the minimum I needed to put in the counts_towards_pace method to get this test to pass? All I had to do was have it return a value of 0, like this:

So what happened here? Well, as I said in a previous post, sometimes developers think ahead based on what their understanding of the value is but they also have an eye on what would evolve the code. In this case, the idea is that just putting “0” as a return value would not really tell us much, except prodding us to make the method return a non-zero value. But that wouldn’t help us evolve the part_of_pace? method at all.

So what’s the thinking here? Looking at the code, can you intuit how this might evolve?

Evolving the Code: Durations

What we’re clearly doing is returning either a value of the cost for a given activity or a 0. A value of 0 would seem to imply the cost was 0. But that’s actually not the case here because of the call to “part_of_pace?” and it’s obviously the intent that the call to that method is what determines whether we return ‘cost’ or ‘0’. So, again, how is this code likely going to evolve?

Well, keep in mind that an activity finished in the last two weeks counts toward the pace of the project. We just stated that above. So, from a testing standpoint and thus from a testability standpoint, this implies two cases: an unfinished activity (which we just dealt with) and an activity that was finished. But that data condition of “an activity that was finished” in turn breaks down into activities that were finished within the two week time frame and those that were finished outside of that time frame.

And that seems to provide us with the point of our part_of_pace? method and thus how to evolve it from its current state.

You might wonder why I just explained all that in such a roundabout way. The reason is that I need you to keep in mind that the code in Activity was written only after the code in the test. So the very expression of the test was guiding the design. And the code is reflective of how much we understood that we needed.

Whether this makes the case for TDD or argues very strongly against it will be very different depending on who you are. What I think it does argue against is there being a “pure” version of TDD.

Evolving The Code Based on the Test

Let’s add one more test:

This will require a few changes to Activity and I’ll ask you to bear with me here: there’s a reason for going through all this as I am. Let’s get these changes in place:

What’s interesting here is that this code would now pass our newly added test but would cause the previous test to fail. You might want to take some time to think about why that it is. Again, don’t see this as an exercise in coding so much as it is an exercise in diagnostics; as an ability to investigate something at a particular abstraction and trying to understand how the conditions are providing a specific sort of behavior.

Testers! Feeling adrift? Feeling like this is really more about developers (programmers) than testers? Well, here’s a fact: testers, just as much as developers, need to move between abstraction layers. Testers, just as much as developers, need to hone their instincts for investigation and drawing conclusions based upon available evidence.

I won’t spend too much time on this but I will show you what the one line addition is to get all this working. This addition to the part_of_pace? method does the trick:

Let’s add one more test for an activity finished outside of the two-week window:

That would just pass as-is, with no changes to code.

Expressing Durations

You might notice in these last tests, the descriptions specified — “recently finished activities” and “non-recent finished activities”. That does sort of beg the question of what “recently” and “non-recent” means, doesn’t it? In those tests, those descriptive terms were made prescriptive by “1.day.ago” and “1.month.ago”, respectively. But outside of the business context we don’t know how recent is “recently” and what constitutes “non-recent.”

Or, rather, we do — but not from that code which is, remember, our test. What we do know is that from business we are talking about two week windows. And we might know, or intuit, that the “14.days.ago” in the part_of_pace? method refers to that two weeks, just measured in days.

But you can see here how our knowledge is getting a bit distributed. That can potentially be a problem. For example, in that method, a developer could have used “2.weeks.ago” rather than “14.days.ago”. Should they have? Perhaps. But they didn’t. Why not? A micro-decision was made but the rationale for the micro-decision cannot be recovered just via code archaeology nor via the tests of that code.

Going back to the tests, “1.day.ago” and “1.month.ago” are data conditions that were chosen simply to fall within the ranges we wanted: inside a two week window and outside of it. All of the thinking around boundary testing that a tester would normally do could potentially apply here. A positive aspect to note is that we have helped our testability mandate by making sure that our tests are not using hard-coded dates but, rather, are always reflective of when they are being run. This is providing an internal quality — reliability — to the tests.

Code as Specification

Test code is just code that is not itself tested and, just like production code, it can have issues of unnecessary complexity and technical debt. So here we are using our pressure on design not just at the production code level, but on the level of the test code as well.

And what this does is start exploring an idea of the code — both production and test — as being the primary specifications. I won’t dig into that too much here but I think if you reflect on where we’ve come from so far in these posts, that idea does have a lot of merit. The fact that we are using a test runner that allows encoding of tests with natural language is not a coincidence.

That being said, if these were our primary specifications, I just earlier pointed out some possible ambiguities not explained by the code, such as why certain data conditions or values were chosen.

Beware of Slippery Domain Concepts

Okay, let’s get back on track here.

Looking at our Activity code, it’s clear that our part_of_pace? method is designed to return true if the activity has been finished within the last 14 days.

The counts_towards_pace method works as such: if an activity counts towards the pace, it must have been completed in the last two weeks. In that case, we return the cost of the activity. That’s “how much” it counted toward our pace. If the activity did not count, then we return a cost of 0. It had no cost for us one way or the other within that two-week time window.

But notice how the cost is sort of slipping in there but without any test specifically calling that out. The tests don’t mention anything at all about cost; simply whether or not an activity was finished within a time frame.

This is a very common situation where details of what’s actually going on get “hidden.” This is how intransparency occurs when we reason about a system. You literally just watched how this can happen and that was in a very simple situation. Imagine how much easier this is in a large code base. And this is the case even if you are being entirely test-driven which, as you’ve seen, I pretty much have been here.

In previous posts, I talked about how we were perhaps oversimplifying the idea of cost by treating each activity as essentially an equal cost. So we ended up adding a cost element to the activity. Yet notice: does the cost really matter after all? So far, with our design pressure, no. The cost gets returned as a value so that our counts_towards_pace logic is not 0. But that’s about it.

I’m going to leave it to you to determine whether you think correctness has been achieved and value has been achieved so far. Obviously ultimate value you can’t speak to yet because that only comes when the feature is in the hands of users and they are receiving demonstrable value from it. But you could ask some questions about how the value is being represented currently. And this isn’t a leading question, by the way. There are multiple ways to look at this.

I want to make sure you have the full test set for the Activity, so here that is:

Testability Does Not Guarantee Correctness (or Value)

And while you’re thinking of those multiple ways, keep in mind that idea of testability. We’ve certainly made this testable, right? After all, we can execute tests. We can show these tests correctly pass. We can also show that these tests correctly fail. For example, in the last test we wrote, you could change the “set_finished” line to this:

In that case, the test will fail, just as it should, given the conditions we have set up.

This inspires a bit of confidence. But do keep in mind some of the caveats I’ve shown you as we’ve gone along here: while things are testable, that does not guarantee that we got things right. What it does guarantee is that we have the opportunity to figure out if we’ve got things right. But it’s up to us, putting pressure on our design, to seize that opportunity. There is still a lot of need for the human in the technology here.

For those wondering if “artificial intelligence” is going to do away with development or testing as a design activity, all this may give you some comfort.

With that, I think we can close off this post. In the next post we’ll continue on the path we started here, narrowing the current gap we have between correctness and value.

Share

About Jeff Nyman

Anything I put here is an approximation of the truth. You're getting a particular view of myself ... and it's the view I'm choosing to present to you. If you've never met me before in person, please realize I'm not the same in person as I am in writing. That's because I can only put part of myself down into words. If you have met me before in person then I'd ask you to consider that the view you've formed that way and the view you come to by reading what I say here may, in fact, both be true. I'd advise that you not automatically discard either viewpoint when they conflict or accept either as truth when they agree.
This entry was posted in Testability. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.