Human Test Design, Automated Test Execution

One of the obstacles to covering the gap between principles of testing and the practice of testing is the mechanics of writing tests. This is particularly the case if you work in a business domain with a lot of complex business rules. This is even more particularly the case if you want to use automation. So let’s dig in to this a bit with a case study.

This post is going to focus on some implementation to make the ideas I’m talking about immediately practical. There is a bit of conceptual follow-up that I’ll provide in a separate post.

There are a few truisms to be aware of.

  1. You will be working in some business domain.
  2. You will likely be using domain-relevant data for testing.
  3. You will likely have domain-relevant contexts that use the data.
  4. You will likely want to encode some knowledge as tests.
  5. You will likely want to use automation to support testing.

As I worked on a particular project, I found I had a way to wrap up all of those ideas in a somewhat concise example and it’s that example that I’ll be sharing here as my case study.

Of particular importance for me here is exploring the test data context. I find contexts are often the areas that testers struggle with the most. I’m also hoping to provide a slightly wider view of what automation can do and provide. As such, this post is specifically going to look at how data contexts are specified as part of a model that humans and automation can make sense of.

Domain Context

The examples I show here will be for a clinical trial application that I helped a team work with.

This is generalizable, however. I just recently helped a company do what I present here with insurance products, in terms of rules around claims, policy serving, policy aging and claims accounting. Further back, I also helped a company do this with hedge fund products, in terms of profit/loss volatility, valuation engines, and investment fee structures.

In the context of the clinical trials, I will focus on one of the more simple entities in the application, which was a study. A clinical study involves a particular area of research around an indication and a therapeutic area.

An “indication” for a clinical trial refers to a valid reason to use a certain test, medication, procedure, or surgery. A “therapeutic area” refers to the type of disease or situation that would be treated by the tests, medications, procedures or surgeries. A clinical trial study is distinct from a clinical trial plan in that that study indicates the “what” whereas the plan indicates the “how.”

To support this work, I did two things:

  • I worked with the team to come up with a TDL (Test Description Language) that essentially was encoding the requirements as tests. Specifically, as scenarios that served as examples of successful interactions with a feature such that we could determine if the feature was delivering value.
  • I worked with the team to come up with a DSL (Domain Specific Language) as part of an automation framework. This framework would consume the scenarios just mentioned. As part of that, however, the framework was required to create its own data conditions, based on the test data context, as well as verify if the necessary conditions were already in place.

I’m going to show you a series of test specification statements that provide a context for getting a study in place as part of a test. With each example, I have indicated if the example is valid or invalid. After that, I’ll explain why each is valid or invalid. This is at the TDL level. I will also cover how these statements work behind the scenes. This is at the DSL level. Of import will be how the automation knew if the examples were valid or invalid.

First, let’s take a look at the scenarios.

Scenarios

The scenario titles here are meant to be pedagogical and not what were used in the actual project. I want you to understand why each scenario is different.

Spec Example #1 [VALID]

Scenario: A study with minimal information
  Given a study

Spec Example #2 [INVALID]

Scenario: A study, no conditions, specific phase that is ambiguous
  Given a phase I study

Spec Example #3 [INVALID]

Scenario: A study, no conditions, generic phase
  Given a study with any phase

Spec Example #4 [VALID]

Scenario: A study, no conditions, specific phase
  Given a phase III study

Spec Example #5 [VALID]

Scenario: A study, no conditions, generic phase
  Given a late stage study

Spec Example #6 [VALID]

Scenario: A study, with conditions, specific phase
  Given a phase II study with
    | Therapeutic Area | Mental Disorders Behavior Modification |
    | Indication       | Personality Disorders                  |

Spec Example #7 [INVALID]

Scenario: A study, with conditions, generic phase
  Given a late stage study with
    | Therapeutic Area | Mental Disorders Behavior Modification |
    | Indication       | Personality Disorders                  |

Looking at those you might wonder what makes some valid and invalid. Feel free to speculate a bit but I’m going to go over that so don’t worry about it overly much.

The Domain Model

What I really want to show, however, is how your automation can help provide a litmus test that tells you, as a test writer, that the above are valid or invalid. This is something that isn’t touted much in test writing circles but it is a way that automation can be broadened to provide a bit of a model.

Or, put more accurately, it’s how automation can be used to encode a model of a business domain and then recognize certain test statements as being valid or invalid in the context of that domain.

In the oft-stated “testing vs. checking” debates, I would say that what I just described would be the actual “checking” (akin to “linting”) that automation does. Here it would be checking that the statements, as written, correspond to a model that the automation has in place, similar to how an experimenter in many situations would make sure that the basis of their experiment is valid or invalid.

Where I differ with the “checker” crowd is that the automated execution itself is still very much testing; it’s just rote testing that is being performed by a machine and not a human. Yes, you can argue all you want that machines don’t test and all that. Or you can work to sound relevant in your career by showing the dividing line where human testing and automated testing intersect a bit.

That’s a key point for me. You have testers out there bemoaning the use of too much automation — and they’re usually right. You have other testers out there claiming that automation isn’t used well enough — and they’re also usually right.

Here I want to give one definition of what “well enough” might look like given that I don’t want my automation to rule out human thinking but I also do want it to encode that thinking (particularly about the domain) as tests.

Those are good discussions to be having. They are focused around the idea of design and people of various specialties can get behind those discussions more than they can or will around whether a “test” is really a “check.”

The Abstraction

Behind the scenes, all of the above test statements would lead to the following action being executed:

This is an example of the DSL. This is using a very simple context factory pattern.

To briefly explain, there is an on() method that is created as part of a context factory. That method takes a particular definition as an argument. In this case, the definition is called Study.

In the project, there was a Study class as part of our set of definitions that our automation relied on. For a web app this might be a page object, for a mobile app this might be a screen object, for an API this might be a service object, and so forth.

There was also a create action (i.e., method) on that class. This class could be instantiated dynamically at runtime if a specification mentions that it needs to work with a study. And that would happen if the test statements indicated as such, which they do in the TDL that I showed above.

I’m not going to show you too much code beyond that which looks like the above but do note that the signature of the create action would be this:

You can see here that the ‘type’ (as in the specific type of study) will default to “adhoc”, the conditions will always default to nothing, and the ‘who’ (as in what user is doing the action) will default to “clinical administrator”.

The notion of a ‘type’ can differ based on the entity. How I designed this was to be the most logical way to break up the “type of entity” that is being specified. So in the case of a study in a clinical trial context, the most common way to discuss the type was by what’s called the “phase” of the clinical trial. For a clinical trial plan — which differs from a study — the notion of a type would be different.

The conditions are what let you set or override specific data conditions relevant to whatever entity you are creating; in this case, again, that’s a study. You would use conditions when you want to provide for data that differs from what you would get by default.

All test data was pulled from a template-based system that allowed for more specific conditions to override more generic ones. So, for example, I had a study.yml file where default data is pulled from and that file looked like this:

That template, as you can see, can contain literal values or descriptive values. A literal value is exactly what will be used in a given context. A descriptive value was between (( )) characters and describes how a value will be used or generated. So ((ignore)) means the value will not be used whereas ((random)) means a random value from a given set will be selected.

I’m not going to be showing all those implementation details but you can imagine a set of data stored in an array and a given random value would be pulled from that array. Say, a random indication, for example. But if a specific indication was specified in the test, then that would be used instead, overriding the default template.

Where TDL and DSL Meet

Okay, so given that you now have some context of how the DSL works, let’s look at our individual spec statements. Here I’ll show you the TDL along with the DSL I just described. I will do this in the context of each of the spec examples I mentioned earlier.

Spec Example #1 [VALID]

TDL
Scenario: A study with minimal information
  Given a study

DSL
on(Study).create

What this does is simply create an “ad hoc” study, meaning a study “for the moment.”

Note: Some testers use “ad hoc” as if it was synonymous with either ‘exploratory’ or ‘random’. That is not what ad hoc testing means and likewise it’s not what ad hoc means in this case.

Here is the data that would get used in this situation:

That’s a hash (of the study.yml I showed you earlier) that contains a key-value pair for each field that was on the study screen in the application. What this is telling you is that the name text field will be set to “Lucid Study” and the therapeutic area (which was a drop down) will be set to a random value, and the sponsor field will be set a configured value, and so on.

Note that fields are handled in order. So, for example, with the above data set a random phase for the study would be selected. Only after that is done would a random therapeutic area be selected. And only after that is done would a random indication be selected.

And that matters because in clinical trials, and as it was in the application, a phase must be specified, and only then can a therapeutic area be chosen. And only when both of those are in place can an indication be chosen.

That’s how the above works in terms of making sure no invalid values can be used with a randomized set of data. Each ((random)) will be choosing one value based on the choices that were made available to it by the previous one. This is essentially providing contextual knowledge — and note that no “Automation AI” was needed.

Spec Example #2 [INVALID]

While the first example was very generic (“a study”), Spec Example #2 contains a bit more specificity:

TSL
Scenario: A study, no conditions, specific phase that is ambiguous
  Given a phase I study

DSL
on(Study).create("phase I")

This specification would not, in fact, execute. Were you to try this, the following error condition would result:

Phase I ambiguously refers to two different types of Phase I studies.
(Errors::NotSpecificEnoughDataError)

This error would be generated by the automation well before the automation attempted execution. And this error is intentional.

The reason for this error is because, in the application, there were two phase I study types and what is available with one is not necessarily available with another. In this case, there was no way for the automation model to determine which phase I you meant. Notice the same would apply for a human tester.

As a note, it would have been possible to write the DSL portion as:

That would have specified one of the two phase I study types. However, that would mean the specification statement and the test definition statement were (potentially) inconsistent. Rather than making the change above, the specification statement should be changed instead to:

Given a phase I (Oncology/Vaccines) study

This is an important point. The test definition should not be making assumptions about what the test specification says or meant to say. The test definition will simply take what is said and use that to begin execution.

Spec Example #3 [INVALID]

This example is meant to show how the framework prevents free-form language that is not domain specific.

TDL
Scenario: A study, no conditions, generic phase
  Given a study with any phase

DSL
on(Study).create("any phase")

This does not work and will lead to the following error:

Your domain context ('any phase') is not valid for a study.
(Errors::InvalidDataConditionError)

As with example #2, this error is intentional and would be generated before execution.

Semantically, there would be nothing wrong with handling this phrasing. It would certainly be possible to accommodate this and allow the phase to be randomly chosen. From a domain language perspective, however, we should aim for operationally specificity to the extent that it makes sense for us to do so.

Note that another approach here would have been to avoid having a matching test definition at all. However, doing that would then have simply caused a message saying that the test definition should be created. That could lead someone to believe that the definition should in fact exist, it’s just that someone hasn’t gotten around to defining it.

Instead what happens is that the phrase is supported from the TDL but the DSL kicks in to indicate that this phrasing was considered and was found to be invalid.

This is a way to help people start talking (and writing) more in a way that makes sense for test execution but notice how it also forces us to consider test design.

Spec Example #4 [VALID]

This one is fairly simple and is very much like spec example #2 in structure:

TDL
Scenario: A study, no conditions, specific phase
  Given a phase III study

DSL
on(Study).create("phase III")

This would work for any specifically defined phase: “phase II”, “phase IV no IND”, and so forth. Note, however, that the framework did know what phases are valid in the application. So if you typed a name in the specification that didn’t exist, you would get the same error you saw in spec example #3.

Think back to spec example #1. In that case, an adhoc study was created. Something fairly similar happens here. The following data will be used:

The only difference here is that with spec example #1 the phase was random. Here, since a specific phase was specified, that is the phase that will be used. Otherwise, though, the data is still ad hoc data.

Note that we could allow for synonym phrasings. For example we could allow “phase III” to be called “phase 3”. Currently, with the framework as I wrote it, if you used “phase 3” you will get the error you saw in spec example #3. The reason that the framework was very specific like this is because it was ‘taught’ to recognize domain words and key phrases.

Spec Example #5 [VALID]

Where example #4 was a specific phase, this specification relies on a generic phrase.

TDL
Scenario: A study, no conditions, generic phase
  Given a late stage study

DSL
on(Study).create("late stage")

In this case the descriptive phrase “late stage” is translated into one of the phases that the company designated late stage – essentially, any stage that is not “phase I (Healthy Volunteers)”. In terms of what specific stage is picked, it will be a random draw. So, as an example, here is one possible data set that would be generated:

Here the phase chosen was “IV with IND.”

I should note that this example could also be attempted using “early stage” as the descriptive phrase. In that case, this acts as a synonym for “phase I (Healthy Volunteers)” because those two terms in the domain were considered synonymous and the team wanted that reflected in the model.

In spec example #4, I said we have to be careful with synonym phrasings and that’s still true but, again, in the business domain the phrase “early stage” was in fact another way of saying “phase I (Healthy Volunteers)” and the two were used often enough to justify this.

Spec Example #6 [VALID]

This example is one that uses a specification format known as a data table.

TDL
Scenario: A study, with conditions, specific phase
  Given a phase II study with
    | Therapeutic Area | Mental Disorders Behavior Modification |
    | Indication       | Personality Disorders                  |

DSL
on(Study).create("phase II", conditions)

In this case, a specific phase is being specified. What is also being specified is a set of conditions on the study that is being created: a specific therapeutic area and a specific indication.

So what will happen here is that a phase II study will be chosen and then, in place of the ad hoc values that would otherwise be used, the specific values from the specification will be used. The data set will look like this:

Notice here how therapeutic area and indication – which would normally be random from the ad hoc default data – are given specific values.

Spec Example #7 [INVALID]

This one is very similar to what you saw in example #6 and is arguably a harder situation to parse and test for.

TDL
Scenario: A study, with conditions, specific phase
  Given a late stage study with
    | Therapeutic Area | Mental Disorders Behavior Modification |
    | Indication       | Personality Disorders                  |

DSL
on(Study).create("late stage", conditions)

Here the only difference is that you are specifying a late stage study with conditions. However, this phrasing would lead to the following error:

The data context specified can lead to invalid data sets.
(Errors::PossibleDataContextError)

The reason this is in error is because “late stage” can include “phase I (Oncology\Vaccines)” and that would not work if the data did not match up. That can be a little confusing if you don’t know the domain. But with the above example, the particular therapeutic area and indication chosen are not valid for a “phase I (Oncology\Vaccines)” study but they would be valid for any other late stage study.

So rather than build in a lot of conditional logic to support how the current domain was implemented in the application I was helping test, the framework simply looks for situations that could potentially lead to invalid data conditions and doesn’t allow them to be part of the test design.

Test Definition Matchers

With the above examples you saw lot of statements like this:

  • on(Study).create("phase I")
  • on(Study).create("phase III")
  • on(Study).create("late stage")

That seems like a lot of information that has to be hard coded and thus potentially fragile if there are changes. In fact, this is handled via some generic code templates as part of the framework. Specifically, all of the above phrases (and the valid variations that you can probably think of) were handled by these test definition matchers:

If you’ve used tools like Cucumber before, this will be very familiar.

You can see here that regular expressions are used in some cases to allow for generic matches that will recognize specific situations. In other cases, literal phrases are matched. There is a very fine line to be drawn when using regular expressions like this.

In fact, I am actually not a fan of regular expressions in this context because they can get quite unwieldy and lead to a lot of problems when you have natural language matched against a structured language. However, there are some guidelines that the framework used to allow regular expressions and the above is largely what I wanted to support and not much beyond that.

I say that because I believe the matchers should still be readable. The above regular expressions are essentially serving as simple placeholders and I believe that the surrounding literal text gives enough clues as to what the regular expressions are placeholders for.

This is the area that I was the least comfortable with at the time the project finished but, in discussing with the team, it did seem like a responsible balance between having too many test definition matchers and having just a few matchers that could match just about any combination of phrases you could think of.

Code Design

One thing I haven’t shown as much is the behind the scenes code from the automation side and a lot of that has to do with non-disclosure agreements. Even without seeing that code, I will say that just as with test design, code design matters in an automation context as much as it matters in a non-automation context. There were a few principles that I was able to follow with the above code:

  • Embrace small code.
  • Abstraction encourages clarity.
  • No computation is too small to be put into a helper function.
  • No expression is too simple to be given a name.
  • Small code is more easily seen to be obviously correct.

A few other principles fell out of those:

  • We often have to trade elegance of design for practicality of implementation.
  • Embrace brevity, but do not sacrifice readability. Concise, not terse.
  • Prefer elegance over efficiency where efficiency is less than critical.

These were all aspects that I discussed with the team, both developers and testers.

Test Design and Execution Meet

As I talked about in a few places above, test design and test execution were linked up. The knowledge about the domain was encoded. Note that even if the automation wasn’t run against the application, the automation could be run against the specification itself, triggering the errors I mentioned above.

What I’m talking about there is a very different thing than just the “dry run” executions you might do with tools like Cucumber or RSpec to see if you are missing code matchers.

So there are two levels of automation that can be considered here but note that all of that automation is doing nothing less than encoding the domain model that was thought about by humans.

The Human Is Present

Yes, there was a lot about automation here.

But humans had to explore how to talk about the domain and what assumptions should and should not be considered safe. Humans had to think about what default data looked like and what initial set of ad hoc data made sense to serve as a template. Humans had to consider the nature of the ubiquitous language that focused on business intent rather than code implementation.

This type of framework also encourages thinking about design: the design of testing (as activity), the design of tests (as artifacts), and the design of code (that supports execution of the artifacts and supports the activity).

Further, all explorations made by the delivery team were capable of being encoded fairly quickly as scripts. This is important. Exploration is a core aspect of testing. But scripts can also be important once we are sure the exploration has yielded fruit. This is particular true if you are in an environment, as I was, that required test artifacts for compliance purposes. So having a way to quickly adapt exploration into scripts is helpful.

With that I’ll close this post. I hope this case study was useful to you in thinking about how your own human test design can be situated within the context of technology in the form of automated test execution.

Share

About Jeff Nyman

Anything I put here is an approximation of the truth. You're getting a particular view of myself ... and it's the view I'm choosing to present to you. If you've never met me before in person, please realize I'm not the same in person as I am in writing. That's because I can only put part of myself down into words. If you have met me before in person then I'd ask you to consider that the view you've formed that way and the view you come to by reading what I say here may, in fact, both be true. I'd advise that you not automatically discard either viewpoint when they conflict or accept either as truth when they agree.
This entry was posted in Automation, BDD, TDL. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.