An Ode to Testability, Part 3

Here we’ll continue on from the first and second posts in this series. We made a good start with looking at the idea of correctness and attempting to encode our assumptions about that correctness in the form of code that was driven by tests. So let’s keep evolving our Benchmarker application.

We left off with a test that seemingly tested for the distinction of activities being finished or unfinished but, in fact, did not test for that at all. And yet the very fact that it doesn’t test for that is what allows us to evolve our code. If you remember, that desire to evolve the code from its current state is what putting pressure on design means. And we, in these posts, are doing that via the mechanism of tests.

Evolving Current Code: Activities

So let’s handle that distinction we just talked about for activities. Previously our test has been focused on the DigSite. Here we’ll shift gears a bit and focus on the Activity itself. So here we’ll add logic to our activity_spec.rb file (currently empty). Let’s start with this:

Something to note here: in this test and in the previous one for DigSite, I’ve been wrapping these tests in a “state” block. That’s by no means necessary but I do want to call attention to it because if you have read my plea for testability, I talked about state quite a bit. State can be an area where we get into a lot of trouble when we design software.

To make this test pass, we’ll add to our activity.rb code, which was pretty minimal up to this point:

Now, you might notice something here. In the first post, our DigSite also had a finished? method and there we initially just had this:

Well, we’re doing sort of the same thing here with Activity. An activity, given the code above, will always be initialized as false. I point this out only because from these bits of simplicity can technical debt start to accrue as decisions are made. Most importantly, however, the reason for bringing this up is that it points out that our code still doesn’t handle the distinction we talked about. We have an activity that is unfinished now, but not one that is finished. So let’s evolve that via a test in our activity_spec:

We have to add a little to our Activity to make this pass:

Something to note is that while we’re talking about this in the context of state, we’ve actually introduced an action. In this case, “set_finished” which implies there is an ability to set an activity as finished. This is really thus more of an interaction test, according to some, than a state test. I’m not going to harp on this here but I simply wanted to call it to attention.

One other thing to note is that we are entirely divorced from any higher-level implementation, such as an API call or a widget on a web page or app screen. We are simply defining the rudiments of an interface, specifically that there is a way to make an activity finished, which is by calling the set_finished method.

So we’re saying that a brand-new activity is unfinished and that activities can be set as being finished. Does that suggest any next steps for our code evolution?

Aside: Things Begin to Couple Together

In my plea for testability, I talked about spatial and temporal coupling. Keep a bit of that in mind as we move on here. I trust it will become obvious why I mention this.

Evolving Current Code: Dig Site

So, what about our dig? We want to test the dig’s ability to determine if it is finished or not and, now, finally with our above changes, we can do that because an activity in the context of the dig actually can have a state of “being finished.” And, as we’ve said, a dig with only finished activities is itself considered to be finished.

So let’s throw that into our dig site tests (in the dig_site_spec.rb file):

Tester! What do you notice about that? You can base this on what I’ve said above or just the code itself. Either one provides the same reference point.

What you should notice, just as a developer should, is that we have a dependency. We have coupled the dig to the activity. It could be argued that this is a very loose coupling. In fact, from a pure code perspective, there’s no coupling at all between our DigSite class and our Activity class. Look at our code again for the Activity and the DigSite. There’s nothing in either that directly calls the other. But our tests are now starting to show us that this is likely to start happening.

Our tests are putting pressure on our design! Whether we use that pressure to guide our design is up to us. What I just showed you here is an understated — or even entirely unstated — benefit of using an approach like TDD. Your tests can start to serve as warnings. A warning does not mean you are doing something bad necessarily; it just means that there is something you should probably be aware of and be thinking about.

Now, to get that to pass, we just change one method of our Dig Site (in dig_site.rb):

Previously the “finished?” method was doing this: activities.empty? That’s what we ended the last post with and that was not good because all we were checking for was an empty list of activities. Do you remember why? This is another important point. Sometimes we make a lot of micro-decisions in code and it can be difficult to remember why we did what we did. This becomes even more difficult if we are reasoning about why a feature in a higher-level abstraction works the way it does.

The answer to the above question of “why did we do that?” is that, basically, if there were no activities associated with the dig — which meant an empty list of activities — the dig was considered to be finished. Now we are checking for all activities that are in a finished state. Specifically, we’re using that state of “finished” or not as a determination for the dig itself being finished.

This may seem really trivial given the nature of this code. But consider what happened here: our pressure on design has led us to a consistent design. Yet, going with my previous point, we now have the dependency between DigSite and Activity become a little more overt. (That didn’t take long!)

What Do We Do Next?

Consider how we’ve answered the “what next?” question before. Would there be other immediate tests suggested from what we’ve done here?

The main thing would be if we could come up with some way to consider the “finished?” method as being the opposite of what it is. Remember earlier we knew “finished?” was returning a value of true so we asked how it could be false. Here we know “finished?” is returning the list of all activities that are marked as finished. So what that leads us to do is start thinking about activities that are not finished. But, of course, those won’t be considered by this method, except peripherally.

And that’s another important point. We used our design to keep our code in line with what’s called a single responsibility principle. Our methods are, so far, very short. And they do one thing and one thing only. And this means we can reason about our code a bit easier because there is less to reason about.

Before we move on, what does thinking about “activities that are unfinished” mean? After all, our tests now check that a new activity is unfinished by default. And we check that an activity can be marked as finished. And we have tests for the intersection of finished and unfinished activities in the context of the dig site.

What this is telling us is that we can probably start considering other aspects of our feature set now.

Connecting Value and Correctness

The first post was about value; the second post was about correctness. I purposely kept those a bit isolated so I could focus. Here in this post, we started bringing those together and the next post will really start to join them at the hip. Let’s briefly talk about how for a second.

We need to be able to calculate how much activity for a dig is remaining (“how much work still to do”). If we can know that, we can presumably know the rate of finished activities (“how much work has been done”) for a dig. In the first post, we talked about that rate of finished activities being the pace of the dig team. Both of those aspects can then presumably help us determine a projected end date for the dig itself.

I somewhat blithely said above that we can “probably start considering other aspects of our feature set” at this point. Fine, but how do we bring together what I just said to suggest our next bit of work? Well, let’s consider something by way of business example:

“I have a dig site. It has two activities associated with it. One is finished, one is not.”

Okay, great, so assume I encoded that as a test — as a way to evolve my code. Does that work? Is it meaningful? What would it tell us? If you think about it, hopefully you’ve come to the conclusion: not much. What are we missing here?

Well, what are we calculating based off of? The finishing of activities, right?

But there’s something subtle going on here. What we’re doing here is normalizing the activity such that any one activity is like any other. So in my example above, one activity is finished and one is not. The reason that doesn’t tell us much is because of two crucial points. The first is that we don’t know the duration between one activity being finished and the other being finished. But also, crucially, there’s no distinction between them when, in reality, there would be. Some activities will simply be “heavier” than others in terms of cost or time.

Tests and Value

I’ve mentioned a few areas in these posts where TDD can go a bit off the rails and this is another. Specifically, when we don’t connect value to correctness, it’s true to say that we can come up with a series of microtests that do pass but that don’t really get us to what we need (value) as efficiently as possible.

I want you to think about that and specifically about how that might play out if we literally took what I just said about activity calculations and wrote a series of microtests in the fashion that we’ve been doing in these posts. If we did exactly that — and we won’t, but if we did — that would be a very quick way for people to get sick of TDD and this is often why I find development managers find the approach tedious, at best, or non-value-adding, at worst.

From Unit to Integration

So we have to start reframing our tests here a bit. We have to get our tests to focus on those interactions that drive the value we want to see. And that often means considering these tests as acting in a context. So next we’ll talk about how we can create a context, and what this means for when “unit testing” becomes “integration testing.”

That distinction matters quite a bit because it’s often one of those places where, if you think back to the end of the second post, people start asking when “developer (programmer) work” becomes “tester work” or, in this context, when “programmer tests” start becoming “customer tests.”

I’ll leave you with these thoughts and hope you join me in the next post where we’ll get into exactly what all that means.


This article was written by Jeff Nyman

Anything I put here is an approximation of the truth. You're getting a particular view of myself ... and it's the view I'm choosing to present to you. If you've never met me before in person, please realize I'm not the same in person as I am in writing. That's because I can only put part of myself down into words. If you have met me before in person then I'd ask you to consider that the view you've formed that way and the view you come to by reading what I say here may, in fact, both be true. I'd advise that you not automatically discard either viewpoint when they conflict or accept either as truth when they agree.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.