Tests – Human Readable, Machine Expressive

In my previous post on human and automated testing working together, I showed an example of test design by humans that could be executed by automation. However, the focus point was making sure that the automation helped the humans adhere to an unambiguous domain model of what is being tested. Here I’ll follow up on that practical example with some of the theory.

In my previous post I had said that one of the obstacles to covering the gap between the principles of testing and the practice of testing is the mechanics of writing tests. This is even more so the case when you realize that you do want your tests to reflect the business domain and how it operates. This is about straddling the line between intent and implementation but it’s also about putting pressure on design.

My Basic Goal

My goal in the previous post, along with this one, is to show something useful that keeps testers relevant in their career but without sacrificing aspects of testing. One of those aspects is choosing the appropriate abstraction so that your automation can consume that abstraction and encode rules about it. This assumes that automation will be one of the techniques in your arsenal to scale testing as an execution activity, freeing up humans to do testing as a design activity.

I think this kind of discussion — the intersection of human meaning with human or technology execution — is much more enlightening than the discussions that testers currently like to embroil themselves in regarding how manual testing has died or even perhaps never existed. Or how automation is really just checking. The goal here is to be relevant in your career while also making sure that you can situate human testing in a technology context.

Being able to provide a working implementation but one that still does adhere to some design ideas around testing, including a test data strategy, is going to serve you in much better stead — and make you more marketable — than redefining common words while also allowing you to avoid falling into the technocracy.

The Mechanics of Abstraction

Let’s go back to what I said earlier regarding that gap between the principles of testing and the practice of testing. I feel that this gap often sees the light of day when it comes the mechanics of writing tests. I use the word “mechanics” there quite purposely.

These mechanics are focused on abstractions. A lot of the practice of testing comes down to that: finding the right abstractions. When you do that, you figure out your structural and organizational principles. Which, incidentally, might suggest where and how you want to store your tests or whether, in fact, you want to have a suite of tests that exist as some artifact. All of this is helping you decide what your representations are. And deciding on representations is what provides a grouping strategy and an explanation model.

This is all dealing with the human component of testing.

When we bring automation into the mix, I firmly believe that your automation framework should consume your preferred abstractions. That’s accomplished via the use of certain patterns. A test framework can support patterns internally or allow them to be imposed externally.

Description Languages

A description language is a domain language that is relevant and applicable to some domain, in our case the domain of our business. You might hear the idea of a description language referred to as a DSL (Domain Specific Language). That’s not an entirely accurate phrase, however.

A DSL tends to be a programmatic term referring to a computer language that has been specialized for a given domain. SQL (Structured Query Language) being a great example of this. But when we write our tests, these are not a programmatic computer language. They are the natural language that we all use to communicate about our software and the business domain. These are more like a specification language as opposed to a domain specific language.

If it’s natural language, then why refer to it as a “description language”? This is because there will be certain constraints we want to put on this language. In some cases, this will be done to allow ourselves to express with more consistency while, in other cases, this will be done to allow us to harness our language so that it is unambiguously executable, by both human and machine. You saw that in my previous post with the clinical trial application and the idea of a study.

You’ll also hear me refer to a TDL which, in this context, refers to Test Description Language. I have a whole category of posts on the TDL topic. I should note that this is not, to my knowledge, an official term or even a widely used one. In fact, I’m not sure if TDL is used at all.

I use it as a way to put a name to tests that are written for human and machine consumption. For humans, this might be natural language. For automation, this might be some form of code that is perhaps annotated with or surrounded by natural language. I use this terminology in some cases because it more often than not forces people to figure out what we mean by that term. This is in contrast, for example, to just saying we’ll “practice BDD.”

I bring this up because I do want to cover how some people seem to think about these ideas.

Consider James Bach in Behavior-Driven Development vs. Testing. In one of the comments to Sergey Moshnikov, James says:

“What you’re doing with your so-called automated spec is creating a new layer of technology that is a pretense of communication while subtly discouraging it. Users and domain experts aren’t programmers — you are tricking them into thinking they are not programming while in fact they are programming. This trick gives them a false view of both the product and the automated check of the product. I would much rather walk my clients through the code, or else give them a fully realized English specification, than to write a boneheaded set of hundreds of simplistic examples and expect them to understand them, or give them a hideously oversimplified construct in which to express their ideas for the specification. We already have a construct: it’s called English. We also have diagrams. We also have conversation. And we have prototyping. Those things work together to solve the problem.”

There is truth to a lot of this, I think. But I also think it’s very possible to combat the bad things that James sees while also having the good things. You’ll notice this is a theme, by the way. Many test practitioners lately want to set up oppositions with “versus” as a separator. So we have the notion of oppositions and either-or dichotomies when, in fact, the testing landscape is a bit broader than that.

As another data point, let’s consider the words of Luke Daley (a contributing founder of the Spock testing library) from his Foreword to the book Java Testing with Spock:

“Readability as the primary goal takes you down a road of expressing tests in a contorted ‘natural’ language and using barbaric regular expressions based on token extractions in order to turn it into something executable.”

My thought, upon reading this, was “Okay, clearly this is a person with an axe to grind.” The language does not have to be “contorted” unless it’s felt that being concise is contorted. The quotes around ‘natural’ as if this wasn’t human language being expressed is also odd. Whether or not regular expressions are “barbaric” or not is clearly a subjective point of view. Like any tool, regular expressions can be used well and can be used poorly.

Luke mentions writing highly expressive and intention-revealing tests (mainly in code, for the developer focus he has):

“The efficiency of authoring and evolving tests is just as important as readability, and this doesn’t necessarily come for free with ‘readable’ tests.”

With that, Luke is, in my view, spot on. Just making something readable does not mean it’s expressive nor that it reveals intent. Further, all the natural language, business-domain focused tests do not necessarily mean that you get benefits of authoring and/or evolving those tests. In fact, it can often seem like the opposite.

So let’s agree on this for right now: before anything else, the bare minimum practice — and I would argue the core practice — of BDD is that it promotes conversations between people and the use of concrete examples (called scenarios), written in the language of the business domain, to detect misunderstandings and ambiguities early. These scenarios then become tests: ways to verify that the scenarios, as described, reflect the reality. The reality not just of discussion but also of implementation. Those tests can be manual and automated; or human-based (“manual”) and machine-based (“automated”).

Yet notice what we have here … two sources of truth!

Descriptions Becomes Sources of Truth

Scenarios describe the behavior of the application, but the source code of the application also describes this behavior.

Let’s agree that, all things being relatively equal, duplication can be an enemy of quality.

Now, on the one hand hand this duplication can seem like a good thing: the scenarios expressed in business domain language, if done properly, are accessible to non-technical audiences like business people who don’t want to read code.

On the other hand, this duplication can also seem like a bad thing: if some scenarios or parts of the code evolve independently, then we have a potential problem. To wit, that problem is the question of what we should trust, the scenarios or the code? Technically the code is what executes and is out there so should it be the primary source of truth? Yes — of how it behaves. The scenarios should be the source of truth for how the code should behave. Hopefully, of course, the two align!

The problem comes in that the scenarios can pass and yet the code still doesn’t work correctly. Or doesn’t work correctly given certain conditions. That’s a problem that is ameliorated by more discussions and the encoding of those discussions as examples.

But there’s also another problem: how do we even know when the scenarios and the code are not in sync? Well, in part, if the scenarios execute against the code and fail. But that only works for the scenarios we have, of course; it doesn’t do a thing for the scenarios we didn’t think to write.

A few things fall out of this:

  • If the scenarios are executable, that means we can reconcile our two primary sources of truth.
  • If the scenarios are executable by machine (automation), this means we can do so faster.

Note that the second is simply a scaling factor. Also note that this doesn’t replace human testing. It’s not a “versus” situation with testing by humans and testing by automation. This is scaling our testing as an execution activity (via tooling) so that we can put more effort into our testing as design activity, which only happens via humans.

It also recognizes that the testing performed by tools is — obviously and clearly — going to be only the most formulaic and rote aspects of that done by a human. We don’t have to repurpose the word “checking” for this. People are smart enough to get it. And even if they’re not, we are (presumably) smart enough to educate them.

“But they won’t listen!”, I hear someone saying. “They just want automation and feel that human testing is not worth the effort.” I get it. I’ve encountered that as well. I encounter it far less when I present a solution like I did in the previous post.

I should also note that this also still leaves room for exploration. Keep in mind those “scenarios we didn’t think to write.” Well, when do we think to write them? If we explore and learn things, we tend to find possible value threateners. As such, we have performed scenarios. Those scenarios can be encoded. Yes, they become scripts at that point and, yes, that removes some of the exploration component.

But we already did the exploration component! That’s how we got the script. And nothing stops us from continuing exploration around new areas or even the areas we just scripted. What that does is come down to discipline. The practice of automation doesn’t remove our discipline around testing — unless we let it. And if your’e letting it, well … then automation isn’t your problem. How you frame automation supporting humans doing testing is the problem.

So don’t set things up in opposition, as in “testing vs automation” or “testing vs checking.” To quote myself, testers are not either-or.

An Example

Let’s look at where this description language / TDL kind of thing goes wrong and how it can go right. I’ll show an example from an actual environment I was in.

Background:
  Given I log in as "MaxManager"

Scenario: Add Survey Form - w/Name Only - Delete From Index
  When I visit "AdminIndex"
  Then I click on "Survey Forms"
  And  I check if form "1" has been deleted
  And  I click on "Add Survey Form"
  When I type form "1" in the name field
  And  I click save survey form
  Then I should see "Create successful!"
  Then I click on "Survey Forms"
  And  I check for form "1"
  When I click the x next to the Survey form to be deleted
  Then I click on "Confirm"
  And  I check if form "1" has been deleted

Hopefully, as a tester, you see this is beyond bad. Then again, testers wrote this in the first place. I’ll spare you all the in-the-moment refactoring that was done and show you what I helped the team end up with:

Background:
  Given a Client Manager is logged in
  
Scenario: Successfully Creating a Survey Form
  Given the Survey Forms page
  When  successfully adding a survey form
  Then  the message "Create successful!" appears
  And   the survey form appears in the list

Scenario: Successfully Deleting a Survey Form
  Given an existing survey on the Survey Forms page
  When  successfully confirming the deletion of the form
  Then  {{ is there a message? }}
  And   the survey form no longer appears in the list

Given you don’t have domain context, I’ll point a few things out. The idea of a “MaxManager” was just a term that the team had made up. There was no such thing in the application or the domain. There was the domain concept of a Client Manager and, in fact, a client manager was someone that had maximum permissions. That’s where “MaxManager” seems to have come from. So instead we used the actual domain terminology.

Notice the terrible title from the original scenario? Why was it called out that the data was “with name only”? It turns out that didn’t matter at all. It was just a way to indicate that the only required field to be filled out was name.

Why was “add survey form” and “delete from index” — two seemingly different things — called out? In fact, they were two separate things and, as you can see, the refactored version made that explicit.

Notice how the refactor also led us to consider something: should there be a message when a delete happens? There was for an “Add” action as the original test was clearly calling out a check for that. But there was no similar check for the “Delete” action. Why not? Was that intentional? The “{{ is there a message? }}” in my refactored version would have led to either of these two possibilities:

Then  the message "Delete successful!" appears

or

Then  no confirmation message appears

That last one is important. We have framed our knowledge as a test. If that latter one was the case, then business has apparently stated that no such confirmation message will appear. But notice how we were forced to confront that question and forced to state that, for whatever reason, we are not providing a confirmation of delete.

A detail that was somewhat hidden in the implementation details of the original scenario became more clear — ironically, for its lack of clarity — in the version where we focused on intent.

Also, consistent with my previous post, the automation that consumed the above could have been made to flag decisions like this with a warning.

The Design Focus

What I want to make sure doesn’t get lost here between these two posts is the idea of putting pressure on design, specifically around how we express our tests, both in terms of human domain expression and the automation code that consumes that human domain expression. This also, as you’ll note, puts some pressure on the design of the code for the automation itself.

To help me make this point, I’ll bring up Roy Fielding, who came up with the idea of REpresentation State Transfer (REST). Fielding defined REST as a series of constraints imposed upon the interaction between system components. Basically, you start out with the general proposition of machines that can talk to each other, and you start ruling some practices in and others out by imposing constraints that include (among others):

  • Use of a client-server architecture
  • Stateless communication
  • Explicit signaling of response cacheability
  • Use of HTTP request methods such as GET, POST, PUT and DELETE

I believe the exact same concepts can be stated for test frameworks. The general proposition is that you have some representation of a test and that test needs to talk to code. That code will execute the intent. The constraints are:

  • You want a humanizing interface for the tests.
  • The tests should be intent-revealing, business-focused, design-guiding and quality-oriented.
  • Tests should be a balance of expressive and compressible.

Coupled with this:

  • You want a fluent interface for the tests.
  • You want a factory that instantiates domain objects (representations of activities or pages).
  • Elements of interaction should be declared on the domain objects.
  • Actions should be called on the domain objects that query or manipulate the elements of interaction.

Notice here how I smoothly moved from the expression of my tests to the automation that consumes the expression of my tests.

Also note that I state all these as factual statements. They have been for me in the contexts I’ve worked in. But you may find you have different statements of design as you work with your team. You may come up with different principles and the constraints around them.

This is a way to engage with your delivery team on exactly what this post title is about: tests that are human readable while also being machine expressive.

I ask in all seriousness: wouldn’t you rather be having these kinds of discussions with your delivery team as opposed to just telling them how automation is checking, not testing? Wouldn’t this be a more fruitful discussion than debating whether “manual testing” is dying or is, in fact, already dead?

Clearly your mileage may vary, but I’ve found what I’ve talked about here to be much more career-enhancing for me personally but also more helpful in making sure that testing is seen as the craft-driven, specialist discipline that it is and always has been.

Share

About Jeff Nyman

Anything I put here is an approximation of the truth. You're getting a particular view of myself ... and it's the view I'm choosing to present to you. If you've never met me before in person, please realize I'm not the same in person as I am in writing. That's because I can only put part of myself down into words. If you have met me before in person then I'd ask you to consider that the view you've formed that way and the view you come to by reading what I say here may, in fact, both be true. I'd advise that you not automatically discard either viewpoint when they conflict or accept either as truth when they agree.
This entry was posted in Automation, BDD, TDL. Bookmark the permalink.

One Response to Tests – Human Readable, Machine Expressive

  1. Rodrigo Martin says:

    Great series of blog posts, Jeff. I also truly believe that the power of automation lies on “augmenting” teams rather than deleting human intervention altogether.
     
    Cheers

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.