Testing and Model Building

In his book The Black Swan, Nassim Taleb talks about “Platonicity,” which is defined as the desire to cut reality into crisp shapes. This is a form of dividing up a large domain into a smaller domain. This, by definition, means establishing certain boundaries. This is a key part of how people experiment and thus of how they model … and thus of how they ultimately explain things. So let’s talk about what this has to do with testing. Incidentally, I should note as quickly as possible that in this article I’m not talking about the technical aspect of model-based testing, but rather about the human aspect of modeling what we are testing.

Categorizing

Categorizing is, to some extent, necessary for humans. It’s a large part of how we make sense of a complex world and decide what to focus on. The problem is that this can become pathological. What this means is simply that the “crisp shapes” become rigid; they become seen as “the way things are.” This can prevent people from noticing a certain fuzziness at the boundaries. And that can translate into preventing people from understanding the interfaces between boundaries. Testing at boundaries and with interfaces is a core part of the testing skill-set and mind-set. So anything that even potentially compromises those aspects should make testers sit up and take notice. Another interesting aspect is that categorizing pretty much always produces a reduction in the actual complexity that exists. That reduction is, in fact, why we are categorizing in the first place. As an evolving species that worked relatively well for us. As a species that has evolved specialist testers that work in a technical context, that has led to some pathologies. Specifically, the reduction in complexity becomes pathological when that categorizing rules out, or at least falsely smoothes over, sources of uncertainty. This is often mistaken for “managing uncertainty” when, in fact, it’s just shuffling it around a bit.

Constructing Models

To summarize, and overly simplify, several decades of research: everyone constructs mental models of the world. Thus to model is a very human activity. We acquire knowledge by adding information to our models. This influences our behavior. Collectively we create explanation models that are essentially an aggregation of what we believe to be facts. Any fact becomes significant in proportion to its relevance to a specific explanation model. This is just kind of how we’re wired. What makes this interesting is that when the new information we get is in concert with the model, we end up behaving (largely) the same way. Sometimes, however, new information contradicts the model. In that case, we have to adjust our models to accommodate the new information. That can change our behavior. The key point here is that our brains are always at work, always taking in new information, and thus always adjusting our mental models to fit. But there’s a pathology here too. In pathological cases, we spend time trying to adjust reality to fit our model rather than the reverse.

Behavior Modification via Models

This notion of influencing and changing behavior is important for testers. As a general rule, people change most when they are shown a truth that influences their feelings. When their feelings are engaged, that is when they are most likely to change their behavior. That’s certainly something to consider when we decide how to engage about a bug or about confusion around a particular user story or attempting to clarify design choices.

Managing Mental Models

The above is mostly fact, even if arguable fact. Here’s what I get into my personal interpretations of those facts and how I apply them in my career. Testing is not source of all wisdom on a project. Instead testing, as an activity, produces artifacts that allow the activity to be an indirect manager of mental models. We can’t manage those models directly but can only create situations in which (1) people have the opportunity to adjust their own models and (2) people have the opportunity to help us refine the models. I’ve often said — and still believe — that everyone tests, but not everyone is a tester. When I break that statement down into something useful, what I’m stating is my belief that testing, as a discipline, is about creating practices and patterns that help team members in different disciplines collaborate and then learn “just enough” of the specialties from each other. This way we can share each other’s models and yet not be consumed by them. This can help us spot pathological cases. For specialist testers, what’s really happening here is that you are harnessing each person’s ability to test, but attaching more specific aspects to that generic view of testing so as to make it an empirical, demonstrable, repeatable discipline that is capable of applying a variety of techniques such that the testing activities are efficient, effective, and perhaps even elegant.

Project Models Are Inherently Temporal

The nature of time on our projects is a critical element for testers to not just be aware of, but harness. Huh? Okay, let’s break this down. If you think about it, much of what we model is the past: what code currently exists? How does the code actually work? What test artifacts do we already have in place? What did the BAs describe to us in that last spec workshop? What exactly is this user story saying? As such, we are always put, at least to some degree, in the role of the historian. With no disrespect to the specialized discipline, a “historian” can be anyone who asks an open-ended question about past events and answers it with selected facts which are arranged in the form of an explanatory strategy. The types of question being asked, and the answers being gathered in relation to a given model, are fitted to each other by a complex cognitive process of adjustment. Adjustment of the model, the way the questions are asked, and the kinds of answers accepted. This is all done in the context of feedback loop that allows for a near continual readjustment of each of these elements. The explanatory strategy that falls out of all this at any given time may take many different forms (and likely a combination of them). For example, we may create causal models, or narrative models, or statistical models, and so on. We may rely on analogy just as much as we might on a flowchart. The commonality here is that the model we produce is an encoding of project history. And this aspect of history can be tricky. Why? Because it’s entirely subject to the categories of error that we human beings are all privy to. For example, we can fall into the error of confirmation, which is the tendency to look at what confirms our knowledge, not our ignorance. We can also easily commit the narrative fallacy, which is how we fool ourselves with stories and anecdotes, seen most often in team lore and institutionalized knowledge. In particular, we fall prey to the problem of silent evidence, which can be thought of as the “tricks” that temporal distance — i.e., history — can use to make things opaque to us. That opaqueness is a key point as it leads to intransparency. And intransparency is forever and always the enemy of quality. History can be opaque. There are collective and individual epistemic limitations, obfuscations or distortions, most of which manifest in our confidence (or lack thereof) in knowledge.

The Generator of Project History

I genuinely believe it’s important for testers to understand why this happens. The challenge — and this is what testing as a discipline helps other teams see — is that you only see what comes out of history. You don’t necessarily see the screenplay, if you will, that produced events. Even if you lived through the events that led you to here, you don’t actually see the generator of history. This is part of that reduction in complexity that I started off with. There is a fundamental incompleteness in your grasp of such events. This is the project singularity that I talked about. The future always describes the unknown, the abstract, and, as Taleb describes it, “the imprecise uncertain.” The past, on the other hand, describes the possibly known, the concrete and the abstract, and the approximately certain (at some level of precision).

Explanation Models and Explanation Strategy

So I mentioned “explanation models” and “explanation strategies.” These two form a nice feedback loop, where time serves as the mediating influence between them. To quote the great Tom Servo, “time” is nothing more than “a factor in a space-time model of relativistic causality and determinism.” (If the reference is lost, see the Mystery Science Theatre 3000 episode of “The Gunslinger.”) Tom had it right: it’s a model. Going back to Taleb, whom I started this post with, he talks about something called the “Platonic fold.” This is where our representation (our model) of reality ceases to apply. The trick is that we often don’t know it. Why is this relevant? Because testing is ultimately a way to study the flaws and limits of any models; looking for the “Platonic fold” where they break down. And, going back to Tom Servo’s point, they tend to break down over the factor of time. It’s important to avoid a certain naïve empiricism that consists of learning from casual, shallow historical facts. A narrow interpretation of past time often leads to a great misunderstanding of risk.

Models and Strategies

Testers — just like historians (and archaeologists and botanists and physicists) — seek to explain things by means of many different, but sometimes overlapping, explanation models. We use generalization models, narration models, causation models, motivation models, composition models, analogy models, and many others. The nature of these models is very complex but they allow us to craft explanation models and those, in turn, are how grouping strategies are formed. Take “grouping strategies” to mean “categorization” and you’ll see how all this relates together.

Some Final Points

Wrapping this all up, let’s consider a few things.

Testing becomes an effective and efficient feedback system when it puts pressure on design at various levels of abstraction and does so via exploration that guides experimentation.

Experimentation and exploration are two of the oldest means by which we create models.

Models are one of the core means by which we understand and incorporate feedback.

Human beings learn through model building. It is literally built in to us from both nature and nurture. Board games are models. Video games are models. Sports are, in a very real sense, models. Architecture, scientific classification, practice of law, application of medicine: all models. To model is very human. So when we help others model, we help them retain what it means to be human. And that’s very important in a domain where technology tends to dominate. It’s yet another area where I believe specialist testers can stake out a solid piece of ground that differentiates them from other specialities.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …