An Ode to Testability, Part 4

Here we’ll continue on from the previous posts, getting more and more into aligning correctness of implementation with value for the business. We’re also going to look a bit at that line where the “unit test” starts to shade into an “integration test” and thus where a “programmer test” might start becoming a “customer test.”

We ended up with a pretty good idea of what the states of the dig site and activities within a dig site are and can be. We left off with a bit of concern regarding how the activities themselves were not really distinguishable. Why is that a concern? Because it seemed to threaten the link between correctness and value.

Specifically, we need a couple of aspects to be considered about activities. One is a measure of them (a cost), which was brought up by the business team early on in our first post, the other is a measure between them (a duration). What this is doing is getting us to think about the value of activities in terms of how quickly they get finished and what each of them cost.

Tester! Any concerns when you hear me bring up a term like “cost”?

What should be of a bit of concern is that the idea of “cost” is generic; it can factor in a lot of things. For our context, generally speaking, the higher cost activities may likely be the ones that take the longest. For example, on a dig site this might manifest as the bringing in of heavy machinery to plow the ground as well as removing one level of ground layer at a time. Yet, while that would be a high monetary cost it actually could be a relatively quick cost in terms of time. There’s also the use of workers to sift through buckets of dirt to find artifacts, including the logging and photographing of artifacts that are in place. These can be relatively cheap in terms of immediate outlay of money but are very slow and thus costly in that sense with time.

So here we realize that our properties of cost — monetary and time — may relate but also may not.

That bit of context is clearly important and I’ll come back to more of why that is in a bit.

Contexts and Examples

What you’re seeing here is the start of having to think in terms of contexts. In a programmatic sense, we might call these fixtures. But essentially what we need to start thinking about is example-driven development, where we take a context that serves as an example. That example allows us to evolve our ideas and our code. The existence of that example is part of how we know we are thinking in terms of testability.

For now, let’s set up a a new block of tests for our DigSite, called “costs”. This will be added to the dig_site_spec.rb:

Here I’m describing the costs as part of the DigSite tests. I’ll show you the full code momentarily so you can orient yourself but take a look at that above code, which is basically setting up a context. That context is what we want our examples to use.

Now, before I said the cost context is important. And what you see here is that as we talk about cost we’re going to be framing it not as currency but rather as a relative size that encapsulates the monetary cost outlay as well as the time duration over which the cost applies.

But imagine if you didn’t know that and you were reading the above tests. What would those numbers (10, 2, 25) mean to you? Would you just assume they referred to some local currency value ($10, $2, $25)? Or should I treat those numbers as some sort of “time cost” and assume they reflect hours?

My only point here is that no matter how expressive your tests are, it is possible for ambiguity or outright misunderstanding to creep in. These kind of abstractions can get lost in the code when it’s not clear what we’re talking about or what kind of decisions were made.

Context Sometimes Changes Code

One thing I want to call out is that even just this bit of logic — which is just setup, remember — has actually forced a design change. Think about how an activity was set up before and think about how it is set up now, given the above code. Specifically, the activity now has to take in some parameters that give it a state. So the Activity logic (in activity.rb) would change a bit:

Simple enough but do notice how we’ve made a code change for something we’re not doing anything with yet in our actual code. We are putting this is in solely as a result of the fact that our tests, putting pressure on design, have forced us to evolve the code in this way.

Some people get terribly nervous about this sort of thing and use it to argue against approaches where tests are written first, as opposed to written concurrently or after the fact. I won’t take a stance on that here but just be aware of this.

Evolving the Code: Costs

Now let’s get back to how we want our code to evolve. What do we want to do with our test in this dig site context? Well, one obvious thing is that we can get the total cost of the dig. Regardless of what activities are finished or unfinished, we can add up their costs to get a total. That seems like a really simple thing to do via a very straightforward calculation.

How about this (again, I’ll show all the code soon):

To get this passing would lead to one extra method on the dig site (dig_site.rb):

Well, actually, that wouldn’t pass. You would need to make one more addition to Activity:

As an incidental note, here the coupling we talked about in the third post is coming home to roost!

That code addition wouldn’t be enough for the test to pass either. The reason is because the new total_cost method would be trying to sum up the cost attribute on the activities but, in fact, we haven’t made that available as a property on the instance. That change would mean our initialization of an activity becomes this:

And now the test passes.

Now, you may be wondering why I just bothered to go through all that. My point is that our use of tests to guide our design is tightening up that connection between the DigSite and the Activity. Crucially we are making a lot of low-level — albeit, simple — changes here that are far below what the actual user layer would be. Yet, this — right at the level we’re currently working at — is where complexity ultimately stems from.

That’s a really important point!

This is why development is hard. This is why saying you want testability as a primary quality attribute is really hard. This is why “design” is a discipline onto itself. Too many people use the word “design” as if they knew what it meant but what I want to show you here is how to start building up your intuition for what “design” actually means.

This would apply, by the way, regardless of whether you were using tests to drive your design.

Okay, so from a code perspective, all of that is relatively simple. What next? How do we keep evolving our code?

Look at the Boundaries

We just got the full cost of our dig based on the activities and that was something we could do without considering the distinction between finished and unfinished activities. So, it would seem, evolving that means performing a calculation that is based on that distinction.

And, lucky us, we set up our test context with one finished activity and two unfinished activities. So that means we can calculate the remaining cost for the dig site. That test:

To get that to pass, we need one more simple addition to our dig site:

That test would actually fail at this point. It would fail because, if you ran it, you would see that while the test was expecting 27, in fact the logic was producing 37. There is a clear reason for this and based on how we’ve stepped through the code here, you might be able to figure that out. Hint: it has to do with how an activity is initialized.

Don’t worry if you didn’t figure it out. This isn’t a coding challenge. Here’s the slight addition needed to make our test pass:

Previously all activities being created were simply initialized to not being finished (@finished = false). That’s no longer the case. A small thing, but an important one. But notice how long — well, relatively speaking — it took us to find that out. Our tests were passing up until we got to a certain point.

And, once again, please note how we keep shifting between adding code bits to our DigSite and our Activity. Because we only have two classes and we’re early in development, this is easy to manage. But imagine how much more complicated this can get — and the possibilities for issues — when the code base is much larger or the individual code elements being dealt with are much more complex.

The Duplication Enemy

I often talk about two main enemies of quality: intransparency and churn. But there is another one as well: duplication. And, as most any developer will tell you, duplication is also a code smell. So take a look at our DigSite class, which I presented above. Given what I just said here, what do you notice?

Regardless of your facility with code, please do take a look. Reason out what I might be referring to here.

You might notice there how we have two places where we reference (&:finished?). What that really represents is a domain concept: finished activities. But the important part there is really that “reject” notion. That changes that to a domain concept of unfinished activities. Given how we’ve framed this, it’s pretty clear we could extract that bit of domain knowledge out. Check out this revision:

This is what would be referred to as refactoring in a strict programmatic sense but do notice that what “refactoring” really means is “putting pressure on design.” It just happens to mean you do so in a way where the overall behavior already in place does not change. And how do you know that the behavior does not change? Well, that’s what you have your tests for!

Does this really matter, though? Refactoring is one of those things that can be hard to justify for some people. After all, the code is already there. And it is working. And we want to get more features out rather than “tinkering” with existing features. The problem is just that: some people view this as a form of “tinkering” with the code or “gold plating” it.

Yet bear with me here as I take a moment and consider the differences here with our own code. With the “remaining_cost” method, we went from this:


(which basically says, “first reject any activities that are finished and then sum up the cost of those that are left”) to this:


(which basically says, “sum up the cost of all unfinished activities”).

With the “finished?” method, we went from this:


(which basically says, “check if all the activities are finished”) to this:

to this:


(which basically says, “Are there any unfinished activities?”).

So that bit of refactoring, simple as it was, actually made the code easier to understand and perhaps even a bit easier to describe.

Oh, and by the way, notice how in “finished?” we are now back to basically checking for an empty list — which is actually what we had started with in the second post! Back then we had this:

It’s interesting when that kind of thing happens.

The Domain Emerges

I say interesting because what you are seeing there is the expression of the domain evolve in the code. Some of the best programmatic developers I know are very good at being able to spot this. Further, some of the adherents of TDD who are very good at “seeing ahead” will have actually leap-frogged past much of what I did in the second (and perhaps even third) post and simply ended up where we’re at now.

However, some people see that as a “violation” of TDD because you didn’t increment “correctly.” In fact, however, what this simply shows is that there is no one, right way. And the way that works is often going to be predicated upon the specialist skills of the team you are working with. That said, there is a balance between doing the minimum that’s needed and looking too far ahead. Anticipating what a feature might have to do or become is often how we end up with over-engineering and that can become a form of technical debt.

I can see how, depending on your experience, this may seem trivial but reflecting your domain in your code is a good thing to do. That’s the basis of Domain-Driven Design. Regardless of your adherence or lack thereof to that discipline, it’s still a good idea to be able to reduce ambiguity when and where possible and you generally do that by exposing the domain.

Finally, from a purely programmatic perspective related to the above example, we also put a functional call into a context, which can help for clarity. (See my post on the functional clarity trade-off for more on this.)

Test Expressiveness

I did promise I would show the full test at this point, so here that is:

This notion of using an xSpec style — as RSpec does; also tools like Mocha, Jasmine, Spock, etc — is something that some people really don’t like. They don’t like that idea of mixing code and natural language. I happen to like it and I also find it can be a nice way to keep the natural language close to the code rather than separate from it (as you might find in traditional BDD, or xBehave, approaches, using tools like Cucumber, SpecFlow, etc).

Also of note, when these tests are executed, you get this output:

    represents a dig with no activities as finished
    represents a dig with an unfinished activity as unfinished
    considers a dig as finished if all activities are finished
    provides a total cost of all activities
    provides a remaining cost based on unfinished activities

This is a way to push natural language up (xSpec) rather than pull natural language down (xBehave). This can allow you to have high-level discussions with business about how value is being represented. Also, if you look at the code of our tests, each is simple enough and almost reads like natural language that business shouldn’t have too much trouble interacting with it at that level as well.

So, from a testing standpoint, this is a call to choosing your test solutions, and thus your test abstractions, wisely. Ruby happens to be really good at this kind of thing. But none of what I’m describing here is particular to Ruby. You can do all of these techniques with suitable test runners in just about every language out there.

Bringing Ideas Together

This was the longest post in this series so far and this seems like a good place to end for now. We’re ending here with a cautionary note against being too purist in any given approach, but with a recognition that there are various ways to go about evolving code (correctness) based on putting pressure on design (value), and reflecting our domain in our code while adhering to testability.

Putting pressure on design, in this context, is not just about the design of the business feature itself (an external quality) but also on the code itself, in terms of how it is architected to be scalable or modular (internal qualities) as well as reducing the likelihood of certain kinds of technical debt.

Testers — not just programmers — care about both of these kinds of qualities. That’s why, working as part of a delivery team, we’re all developers.

Shading Into Integration

I mentioned a few places above where the DigSite and the Activity were getting more intertwined. That’s a good way to determine when you are starting to move a bit beyond so-called “unit” and getting more into “integration.” The fact that you need a context and that that context will start to provide the basis for a business scenario is the context where you are leaving strict “programmer test” (correctness) and venturing more into “customer test” (value).

I talked about some of these ideas in the integration pact so I won’t revisit too much of that here. All I wanted to do was showcase the line where the distinctions start to happen or at least become visible on your testing horizon.

Onward and Forward!

Join me in the next post as we get into some areas that can introduce a bit more complexity into how we frame our examples. Specifically we’ll get into durations and that gets tricky because you run the risk of temporal coupling (for invocation), brittle tests (that rely on that invocation), and making your code somewhat intransparent regarding its intent.


This article was written by Jeff Nyman

Anything I put here is an approximation of the truth. You're getting a particular view of myself ... and it's the view I'm choosing to present to you. If you've never met me before in person, please realize I'm not the same in person as I am in writing. That's because I can only put part of myself down into words. If you have met me before in person then I'd ask you to consider that the view you've formed that way and the view you come to by reading what I say here may, in fact, both be true. I'd advise that you not automatically discard either viewpoint when they conflict or accept either as truth when they agree.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.