An Ode to Testability, Part 2

We’re going to continue on from the first post in this series by starting to build our Benchmarker application. In the first post we considered design pressure on value. Now we’re going to get into correctness. Value and correctness are two sides of the testability coin. So let’s get started.

There is a lot of code in these next posts but I firmly believe that you do not need to be a coder to get value from these posts. At least, that’s my intent. Let’s hope my implementation matches up. I mentioned at the end of the first post how you can follow along if you want to.

Test as Successful Interaction

I hope I bring a lot of thoughts to your mind as you go through this series, but if there’s only one thing you retain I would say make it this: think of a “test” as being anything that helps you understand what a successful interaction with a given feature looks like.

You can call that “behavior” rather than “test”, if such is how you view the world. You can call that a “scenario” or an “example”, if you prefer that terminology. The nomenclature doesn’t matter as much as a key point, which is that correctness depends on that idea of what a successful interaction looks like. And that, in turn, should always be thought of in terms of value. A successful interaction isn’t just correct but is one that provides value.

Note, also, that a “successful interaction” does not just mean the so-called “happy path.” An application handling errors or unexpected situations also counts as a successful interaction. Don’t think about negative and positive tests. That’s a terrible way to think. Well, in my strong opinion, anyway. Instead, think about interactions and the data conditions (invalid and valid) within those interactions.

So now let’s start building this. If you haven’t by this point, I certainly recommend you take a read of the first post in this series since it basically defines what the Benchmarker application is going to be.

Design Pressure: Correctness

So, based on our design pressure on value, we seem to have two entities: a dig site and activities that can be done on that site. So presumably our application will let a funding organization set up a dig site and set up activities. So those are high-level domain objects in our project. Starting out with the application for the first time, any dig site, being brand new, will have no activities.

Tester! Any concerns yet? (“Concerns? Are you serious? We literally have barely done anything yet!”)

Well, is that last statement above necessarily true? What if there are standard activities for all dig sites? Perhaps a dig site would be created with some activities. The point here is that you have to build up your intuition when something is being said that could be questioned. When I said that last statement, your tester mind might have immediately gone to, “Okay, I’ll test creating a new dig site and verify it has no activities.” That’s your execution side. Your design side should have been saying: “Okay, so a new dig site has no activities initially. But is that always true?” You are employing here an “always/never” heuristic.

For now, let’s just say that business confirms: a new dig site should have no activities as defaults.

So let’s ask this: “What does an interaction with a newly created dig site look like?”

Presumably, if there are no activities, then there’s nothing to calculate (in terms of activity finish rates or the dig schedule) because, as far as we know, there’s nothing to do. A dig site with nothing to do is finished.

Tester! Any concerns? (See how easily these can pop up? And we haven’t even gotten to a single line of code yet.)

Notice here how we are making that distinction of “a dig site with nothing to do is finished” from the outcome side of things. This is focusing on the output. If a dig site ends up in a state with no activities (nothing to do), it’s considered finished. But what about from the input side of things? You could, for example, specify that a new dig site with no tasks is “Not Started” rather than “Finished”.

Yet, as someone might argue, many dig site proposals are put forward and it’s not even clear if they can or will be started at any point because so much goes into the funding side. And the funding side is not part of Benchmarker. So calling such a dig site “Not Started” probably isn’t right either; “New” or “Proposed” might make more sense.

But we don’t know. And business isn’t too sure either, to be honest. So, you know what, let’s keep it simple: a dig site with no activities is considered finished for now. Will this come back to bite us? It really depends on how much time there is between when we start encoding this idea and when (if) we decide it was a bad idea.

The First Interaction

So let’s start with an example of behavior and, again, you can safely ignore that this is Ruby using a test runner called RSpec, if you want. I just wanted a language and a tool that removed as much boilerplate as possible so we could focus on what we’re talking about, not the code.

If you are following along, this code will go in dig_site_spec.rb file (which is completely empty as we start this).

For any of these tests, if you are playing along with me, you can run the command rspec within the “benchmarker” directory. In case you are not following along, I will indicate when tests pass or fail.

This small example of behavior makes a few design decisions. First of all, we are calling an entity in our system a DigSite. And we can check if instances of that entity are finished. Most crucially here, from a design perspective, we are saying that a brand-new instance of DigSite qualifies as finished. Again, this last expectation is perhaps the most questionable as I just talked about earlier. For example, someone could argue that a dig site can’t be considered finished unless there is at least one completed activity associated with the dig.

An important thing to notice here is that we’ve started to encode these decisions as code. We’ve basically added some friction to our project. Albeit, a tiny bit of friction, but it’s worth noting these things when they happen. Another important thing to note is that these are choices that are being made in terms of the business logic. And they are being guided by a test — by an example of interaction with the feature. We are putting some pressure on what it means to have a dig site and what “finished” means.

Now, here’s where TDD goes off the rails for some people. To make that pass, we have to add some code to the dig_site.rb file (which is empty at the start). And a TDD “purist” — although I’m not too fond of that term — will say you should code only the simplest possible implementation to get that test to pass. Let’s go with this:

This is about the most minimal thing we need to do to get that test to pass. And a lot of people see that as a waste of time. After all, clearly that’s not going to be the final implementation. Even if you are not any sort of coder, it doesn’t take much acumen to realize that a method called “finished?” that always returns “true” is probably not anywhere close to what the real implementation is going to be.

Many coders would, instead of doing the above, rather just start working on the actual logic. Proponents of TDD will suggest that the above has provided the minimum possible design, which admittedly isn’t much. But it also hasn’t constrained us too much either.

And there is a good point lurking there. Another way to say this is that we’ve made the fewest possible assumptions we could. As we’ve seen, we did make assumptions. But we were conscious of making them, we kept them small in number, and we kept their expression in the code fairly minimal so that we can change our mind if we discover information based on how we continue to apply pressure to the design.

This is all really important to understand. Yes, you can perhaps be too “purist” when doing TDD. There is no “right way” or single formula regarding how much to implement for a given test. What’s important is to consider how you are able to put pressure on design, using that pressure to make sure that what is being designed is always testable.

The TDD and BDD Divide

When you are putting pressure on design like this, in a TDD context, the goal of your next example is to write a test that fails — given the current code. This is important and what distinguishes TDD from BDD in many respects. The idea of efficient coding is that you don’t want to write code that you don’t need to write. And if TDD is front-and-center, you don’t need to write code that isn’t in response to a failing test.

In a BDD context, by contrast, you might come up with many scenarios initially. These scenarios, at least by default, are not constrained by the same pressures as TDD. It is really important to understand that fact. Both BDD and TDD are applying pressure to design; the difference is the level of abstraction they are doing it at. However, nothing is to stop BDD and TDD from intersecting more closely by simply not writing a whole bunch of scenarios at the start, but rather, like TDD, write one and make it pass.

But there’s a time pressure there. If business is involved in the BDD process, they don’t necessarily have time — so the argument goes — to wait for implementation to be completed for each and every scenario. Is that true? Well, it depends on how much the business team is considered part of the delivery team and thus, I would argue, if they are considered developers. It also depends on how large the scope is. It also depends how related the functionality is that’s being produced in a given BDD workshop.

The point? There are lots of factors to consider for successful implementation of BDD — and that’s the case whether or not your team is doing some form of TDD. So don’t necessarily listen to the BDD advocates who tell you are “doing it wrong.” Instead, figure out the dynamics for your delivery team.

The Second Interaction

So, getting back on track, you need that failing test first. But it’s easy to lose sight of what this is doing: this is putting pressure on the code (and thus the feature) to evolve. So given what we have so far, how do we put pressure on our current code to evolve?

Well, at this point the code for our DigSite says that “finished?” is always true. So you want to think about conditions in which that would not be the case. So you should create a case where “finished?” is false, which would force that code to evolve to be less specific and more situational. So how about this:

We have a second entity now: activity. And we have the idea that a dig has some sort of holder for instances of that. In other words, a dig keeps track of activities associated with it.

Testers, and developers, have to be aware at all times of possible technical constraints. A technical constraint here might be that we are starting to see a schema emerge, wherein a DigSite (table) has a relationship of one to many with an Activity (table). But none of that is reflected in the design yet and none of this code is calling out to any sort of persistence mechanism; not even a purely in-memory database.

The way I worded the database context started suggesting implementation, right? Specifically it was suggesting a relational data store. But what if we eventually decided on a NoSQL implementation? Or what if we felt that the DigSite itself didn’t need to be represented in a database at all. Why might it not? Well, I don’t know. Perhaps because a dig site will not have data of its own. Right now, who knows? That’s the point. The other point is: we have options. Yet another point is: we don’t know which of those options makes the most sense.

One way to view design — important point here — is as the discipline of making sure that change remains cheap at any point in time.

Tester! Beyond all of that, you should notice an assumption here with our test logic. Do you see it? Think about it. There’s not much to reason about yet, so assumptions should (hopefully) be pretty clear.

What do you think?

Well, we’re assuming that a new activity is unfinished. And therefore a project with an unfinished activity is itself unfinished. So notice how we’ve actually got two levels of what it means to be finished: an activity can be finished (or not finished) and a project can be finished (or not finished). And that is reflected in the wording of our new test, which is perhaps a bit cumbersome (“represents a dig with an unfinished activity as unfinished”).

Perhaps we don’t like that. Perhaps we want to use “Done” or “Complete” for one or the other so that we don’t use the same word (“finished”) for both. Or, perhaps, we like that we’re using just one word, keeping the complication down.

And note also that we are saying a domain object in this context is an “activity.” This would be as opposed to a “task” or “work item” or some other term. In fact, on an actual dig site you might have each of those terms in use! You might also have an “assignment.” And they can all mean different things, depending on how someone runs and organizes their dig. Here we’ve simplified the domain and thus — importantly — the tracking of progress. That may be just fine. Or it may not.

Regardless of what we ultimately decide, we are putting pressure on design. If someone felt that “activity” was too vague, or too specific, now would be the time for them to speak up.

Implementation (Partially) Evolves

Let’s consider some code that would make this test pass. The activity part is pretty easy right now and it would go in the activity.rb file (currently empty).

Note how this suggests that, beyond mere presence, an activity is not something we’ve made many decisions about. For the dig site logic, our changes are simple:

Okay, so it’s pretty clear that we’ve introduced a distinction: activities can be finished or unfinished. Right now a dig being “finished” just means that the list of activities assigned to it is empty.

Wait. Notice what I just said there.

In fact, there is nothing at all in this logic that deals with the activity distinction I just mentioned.

This is blazingly obvious given how small our code base is right now. But this problem comes up a lot when people use TDD simply to verify something (“hey look, it passed!”) versus using TDD to put pressure on design in terms of correctness. This code right now is not correct. But it does pass our tests. And while this may seem terribly contrived, again given the smallness of the code base, a challenge in developing software is that this can be become much more opaque when we have more code to reason about or when that code is more distributed.

So I’ll begin to close out this post with that sobering thought. There are cautionary aspects to focusing on TDD. TDD can lead you astray, just as BDD can. TDD has to be backed up with a discipline of putting pressure on design, using that pressure to evolve code, and using that evolution as an impetus to ensure testability, and making sure that the testability is consistent with a stated value. Here the “stated value” can be understood as a “shared understanding of what quality means for this feature.”

Which Kind of Work?

There is one other thing I want to note. I purposely chose to focus on unit-style tests here because of the intersection of when developer work becomes tester work. That’s a key question we grapple with in the industry: “When does developer work become tester work?”

What I’m showing is that it’s all developer work, which is why testers need to be considered a type of developer.

What we’re really asking is when programmer work becomes tester work. And then we can take that away from the roles and we’re asking: “When does programming become testing?” And, in the context of automation, we reverse that and ask: “When does testing become programming?”

I can’t stress these operational questions enough!

There is no need for terms like “checking” when we can just ask the much better questions (in my opinion) that I just did. There’s also a nice symmetry to it all and this, to me, is key to getting us past a lot of our hurdles that question whether testers (as a role) are needed, distinct from whether testing (as an activity) is seen as useful.

With all that being said, join me in the next post as we continue our journey.


This article was written by Jeff Nyman

Anything I put here is an approximation of the truth. You're getting a particular view of myself ... and it's the view I'm choosing to present to you. If you've never met me before in person, please realize I'm not the same in person as I am in writing. That's because I can only put part of myself down into words. If you have met me before in person then I'd ask you to consider that the view you've formed that way and the view you come to by reading what I say here may, in fact, both be true. I'd advise that you not automatically discard either viewpoint when they conflict or accept either as truth when they agree.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.