Testing at the Crime Scene, Part 1

As human beings working in complex situations, like software, we know that we vary quite a bit in our abilities. That’s the case whether we’re testers or programmers or analysts. Any role that provides cognitive friction around these complex situations will amplify variations in our abilities. That’s why you have certain developers that are better at some things than others; likewise with testers. That variation impacts quality and how we look for it. So let’s dig in to this a bit.

First, I should caution: this is going to be a hands-on style series of posts although you can get by with just reading them if you want since I will provide output of the tools that I’ll be showing you. The goal in this series of posts will be to show how engineers — of the test and programmer variety — can get better insight into the evolution of their code base. That evolution is a key determiner and predictor of quality problems. The focus here is on recognizing that a key quality metric is cognitive economy.

Let’s approach this idea of cognitive economy a little obliquely.

Stability and Quality

There is plenty of evidence out there that supports a fairly obvious idea: the importance of change to a given area of code is so high that any other metrics will provide little to no extra predictive value when it comes to quality detractors.

Put another way: there’s a very strong and evidential correlation between the stability of code and the quality of that code.

Here when I say “area of code” treat that pretty much however you want but I tend to mean it in the context of whatever a language specific term is for a group of code. For example, every .py file in a Python source code repository is called a “module.” A bunch of modules become a package and that package is distributed as a binary “wheel.” Languages like C# or Java tend to focus on each source code file as being a “compilation unit” and those too are bundled up in some form of executable binaries. I’ll mainly be focusing here on those discrete, individual source code files and I’ll refer to those generally as “modules.”

Okay, so let’s keep on that stability idea for a second. If we abstract code up, we get features. Different code provides for different features. Different features — and thus the code that supports them — stabilize at different rates. So right there you have a great heuristic: you want to reflect this in your design. Modules related to a given feature should evolve at a similar rate.

When modules aren’t stable, this is one of our warning signs: this is our code base telling us that maybe a crime against quality has been committed. One of the more common reasons for this is when cohesion is lacking. Here “cohesion” generally refers to how much the elements of a module belong together. The idea being that related code should be “close” to each other in some sense so as to make the code highly cohesive. Now, remember, change is lack of stability. That doesn’t mean it’s bad. If a given module keeps changing a whole lot, that could be just because we’re developing a new feature. But it could also mean that we have little or no cohesion. In that case, maybe our module keeps changing because it has too many reasons to change. It can never become stable.

Maybe a given module we work on has, say, four responsibilities. Meaning, them module has four things it is responsible for handling. That means the module has four reasons to change. Maybe refactoring this module to four independent modules would help to isolate those responsibilities. That would then help stabilize the code.

The more of your code you can protect from change, the better. Design is about keeping the cost of change cheap at all times. But it’s also about reducing the need for change in one concentrated place too many times. The idea of cohesion lets you isolate change. It breaks down like this:

You have less code to modify in stable modules.
Less code to modify means less surface area for regressions.
Less surface area for regressions means less to test.

Most importantly, however, is that cognitive bit I mentioned earlier. What the above recipe means is that the brain of someone having to maintain the code has less knowledge to keep in their head. Thus: cognitive economy.

Crime Scenes as Useful Metaphor

Now, I mentioned a crime against quality. One book I really like is Your Code as a Crime Scene. Much like my “Testing is Like” focus, the idea the book puts forth is that we can use forensic techniques to investigate code, much as we use such techniques to solve crimes. Thus developing is like forensic pathology, perhaps?

A crime scene implies evidence. And that means you need ways to find that evidence and come to decisions about it. In the case of our software development that means we need to mine information from the evolution of our codebase. Most of us are using some sort of version control system that just so happens to keep track of that evolution. Wow! Lucky us, huh?

Each change we make to our code base leaves a trace; a little bit of evidence. Each of those bits of evidence provide a probative path to help us understand what we’ve built. This evolutionary history can show us modification patterns. Those patterns get us to look beyond just the structure of our code base and instead figure out how it got to where it is and how it evolved. This puts focus on humans because it allow us to better understand how the people working on the code base interact with each other.

All of that is basically what the above book tries to show. And what’s great is that provides some tooling to actually do so. So what I’m going to do in this series of posts is leverage those tools and walk you through how I, as a test specialist (or at least that’s what I call myself), try to work with development teams to understand our code base and thus understand one very key aspect of our quality.

Strategy

The strategy for this series of posts is the following:

Gather the evolutionary data from a version control system.
Augment that evolutionary data with complexity measures.
Use the combined data to identify hotspots.

The idea of hotspots is that they’re supposed to guide your eye and mind towards what you should be paying attention to. We see this all the time when we check for outages in services, for example.

Here our focus will be on seeing if we can improve what we pay attention to in terms of where we think we should concentrate on improvements. And improvements, in this case, means where we think we can make our code base a bit easier to maintain and thus understand.

Tooling and Setup

I’ll present this in such a way that you should be able to run all of this on Windows as well as any POSIX system. If on Windows, I would recommend using the Git Bash shell or Windows Subsystem for Linux. To get the most out of this, you’ll need Git installed as we’re going to be cloning repositories. To run the primary tool, code-maat, you’ll need a Java runtime installed. To run some of the scripts, you’ll need Python installed. The code-maat tool mentioned is written in Clojure but I’m providing a JAR file here called maat.jar that you can use.

It will also help to get a tool called cloc. From the GitHub page, you can see the various ways you can install that on any operating system.

If you are going to follow along, here’s what I recommend.

Create a directory somewhere called crime_scenes
In that directory download the maat.jar file that I provide.
In that directory create a scripts directory.

This directory is where we’ll do all of our work.

If you’re not going to follow along, that’s entirely fine. I will be providing commands to use the tools but I will also indicate the output.

Getting Started

Okay, for this post, let’s start simple. First we need a repo we can play around with. For purposes of this initial example, I’ve chosen site_prism. So in your crime_scenes directory, do the following: git clone https://github.com/site-prism/site_prism.git

Once you have that cloned, makes sure you cd site_prism to go into that directory.

Generate Your Evolution History

The general procedure here is that you first generate a git log file. git log --all --numstat --date=short --pretty=format:'--%h--%ad--%aN' --no-renames > evo.log

This is essentially a log showing the evolution of the code base. With that log file in place, we can now run some tooling to get information. Assuming you set things up as I suggested above, you should be able to do this:

Generate Summary Metrics

java -jar ../maat.jar -l evo.log -c git2 -a summary

Obviously source code repositories evolve — that’s the whole point of this — which means that by the time you run these commands, you might have different metrics than I do. That’s fine. But here’s the output I got as I wrote this post:

statistic, value number-of-commits, 1075 number-of-entities, 206 number-of-entities-changed, 3010

number-of-authors, 74

That tells me some interesting stuff: 74 people have worked on this code repository and there are 206 “entities” that can be considered when trying to understand the evolution of the code base.

Generate Revisions Metrics

Let’s try a variation on the above command: java -jar ../maat.jar -l evo.log -c git2 -a revisions

Here instead of a “summary”, I’m getting “revisions.” This will be a much longer list so I’ll only provide some of the output here:

entity, n-revs README.md, 167 lib/site_prism/element_container.rb, 119 site_prism.gemspec, 113 lib/site_prism/page.rb, 86 features/support/env.rb, 68 spec/page_spec.rb, 67 lib/site_prism/section.rb, 64 .rubocop.yml, 60 lib/site_prism/version.rb, 57 lib/site_prism.rb, 53 features/step_definitions/page_element_interaction_steps.rb, 51 spec/spec_helper.rb, 50 lib/site_prism/element_checker.rb, 50

spec/section_spec.rb, 50

Here “n-revs” just means “number of revisions.” It’s strangely nice to see a “README” being updated so much as it implies that it is probably being kept up to date. We can clearly see which files from the code perspective have the most revisions.

Generate Effort Allocation Metrics

So far I’ve been generating the output with an “-a” flag to provide a specific view. By default, the analysis tool provides something a bit more useful, which is the number of authors per module. Let’s take a look at that: java -jar ../maat.jar -l evo.log -c git2

Again, the list will be long but here’s some partial output:

entity, n-authors, n-revs README.md, 34, 167 lib/site_prism/element_container.rb, 22, 119 lib/site_prism/page.rb, 21, 86 lib/site_prism/section.rb, 15, 64 spec/page_spec.rb, 14, 67 site_prism.gemspec, 13, 113 lib/site_prism.rb, 13, 53 features/support/env.rb, 12, 68 spec/section_spec.rb, 11, 50 features/step_definitions/page_section_steps.rb, 11, 49 lib/site_prism/exceptions.rb, 11, 25

features/step_definitions/page_element_interaction_steps.rb, 10, 51

The more developers that end up working on a module, the larger the communication challenges ultimately become. Taking aside the “readme” file, the element_container.rb is an interesting metric: twenty-two developers have worked on that and it’s had 119 revisions. None of this is giving us some absolute view on truth but these metrics do provide heuristics; they tell us where a crime is likely to occur. As such, these metrics can be predictors of quality issues.

Generate Coupling Metrics

I mentioned cohesion earlier which is a key quality metric for code. There’s another one that’s often mentioned in the same breath, which is called coupling. At its simplest, coupling occurs when module uses the code of another one. This means there is a dependency between them. In our context, coupling thus refers to modules that tend to change together; meaning a change to one of them leads to a predictable change in the coupled module. Let’s see if we can analyze that: java -jar ../maat.jar -l evo.log -c git2 -a coupling

Partial output:

entity, coupled, degree, average-revs element_explicit_waiting.feature, element_explicit_waiting_steps.rb, 84, 10 spec/fixtures/css_page.rb, spec/fixtures/xpath_page.rb, 82, 9 section_explicit_waiting.feature, section_explicit_waiting_steps.rb, 75, 8 section_explicit_waiting.feature, element_explicit_waiting_steps.rb, 66, 9 element_explicit_waiting.feature, section_explicit_waiting.feature, 66, 8

lib/prismatic/element_container.rb, lib/prismatic/section.rb, 66, 8

(To make the content fit nicely, I removed some directory names.)

That second column of “coupled” provides the name of a logically coupled module to the one mentioned in the first column. The “degree” column then indicates the coupling as a percentage. The “average-revs” refers to the average number of revisions of the two modules when taken together. So what we see is that when the file element_explicit_waiting.feature is updated, the file element_explicit_waiting_steps.rb is also updated. Meaning, when we update the former, our history shows there is an 84% chance we will update the latter.

Now, here’s an interesting point: in this case, there’s a great reason those files should change together. Those are test files and the feature file is a specification that delegates down to the Ruby module that contains steps to carry out that specification. So this actually makes a lot of sense. But if there was actually little or no reason those files should change together, the analysis would be pointing us to a part of the code worth investigating.

A challenge with this repo, and many like it, is that it includes tests within it. Notice of the above output, only one of those lines (the one with “lib”) is actual source code that delivers our features. All of the rest are tests. We can exclude material like that. Just regenerate your evo.log file with an “exclusion” in place: git log --all --numstat --date=short --pretty=format:'--%h--%ad--%aN' --no-renames -- . ":(exclude)spec/*" > evo.log

Now rerun the analysis: java -jar ../maat.jar -l evo.log -c git2 -a coupling

We’re still getting the features and test_site which is also less than helpful for analysis. You can exclude those too: git log --all --numstat --date=short --pretty=format:'--%h--%ad--%aN' --no-renames -- . ":(exclude)spec/" ":(exclude)features/" ":(exclude)test_site/" > evo.log

Rerun the analysis and now we get a much more condensed list!

entity, coupled, degree, average-revs lib/prismatic/element_container.rb, lib/prismatic/section.rb, 66, 8 lib/site_prism/page.rb, lib/site_prism/section.rb, 42, 75 lib/prismatic/page.rb, lib/prismatic/section.rb, 37, 16 lib/prismatic/element_container.rb, lib/prismatic/page.rb, 36, 17 lib/prismatic.rb, lib/prismatic/page.rb, 32, 16 HISTORY.md, lib/site_prism/version.rb, 31, 45

lib/site_prism.rb, lib/site_prism/exceptions.rb, 30, 39

Notice that element_container.rb file again. Our earlier analysis showed us that twenty-two developers worked on that file for a total of 119 changes. Here we see that when those changes are made, there is a 66% chance that another module (section.rb) will have to be changed as well.

Generate Age Metrics

A lot of the above is looking at a certain amount of what we might call “spatial” information. There are certain modules and they are changed by someone. But “temporal” information can also be quite helpful. After all, spatial and temporal coupling (two concepts I talked about in my plea for testability) are very indicative of quality. In our context, the change frequency of our code base certainly impacts how much it evolves and provides another way to view stability. So let’s try that analysis: java -jar ../maat.jar -l evo.log -c git2 -a age

This will be another long list so here’s some partial output:

entity, age-months .rubocop.yml, 1 .ruby-version, 1 lib/site_prism/waiter.rb, 1 CHANGELOG.md, 1 lib/site_prism/deprecator.rb, 3 README.md, 3 lib/site_prism/error.rb, 3 lib/site_prism/dsl_validator.rb, 3 lib/site_prism/dsl.rb, 3 site_prism.gemspec, 3 lib/site_prism/addressable_url_matcher.rb, 3 HACKING.md, 3 Rakefile, 3 lib/site_prism.rb, 3 lib/site_prism/version.rb, 3

lib/site_prism/page.rb, 3

So what is that actually telling us? It’s showing each module based on the date of last change, with the measurement unit in months. So those are the files with the most churn, as it were, recently. Since element_container.rb showed up on a few of our metrics before, where is it here? Well, at the time I generate this, it shows up on the list as:

entity, age-months

lib/site_prism/element_container.rb, 35

So it’s actually been awhile since that module has been changed. And while the section.rb file was shown as coupled to it, it shows up on the list as such:

entity, age-months

lib/prismatic/section.rb, 120

So that’s potentially good to know. Our most coupled modules that had the most authors making the most changes actually haven’t been updated in awhile. Thus we do seem to have some stability.

Generate Churn Metrics

Churn is basically change and we’ve already established that the research shows modules with a higher amount of churn will tend to have more quality issues. Something that shouldn’t be a surprise to anyone. One handy metric is to look at what’s called accumulated churn: java -jar ../maat.jar -l evo.log -c git2 -a abs-churn

This will be a long list but here’s a partial example:

date, added, deleted, commits 2011-12-12, 25, 0, 1 2011-12-13, 72, 17, 10 2011-12-16, 2, 1, 1 2011-12-18, 94, 12, 8 2011-12-20, 80, 9, 5 2011-12-21, 154, 79, 8 2011-12-22, 304, 253, 4 2011-12-23, 9, 0, 3 2012-01-11, 10, 4, 3 ...... 2021-08-23, 107, 86, 10 2021-08-24, 4, 8, 2 2021-08-31, 34, 42, 4 2021-09-01, 5, 0, 1 2021-09-14, 1, 1, 1 2021-09-17, 38, 42, 8 2021-09-20, 8, 2, 1 2021-09-24, 44, 12, 2

2021-11-26, 12, 5, 5

This is churn accumulated per date, starting with the birth of the code base and going up to the most recent work. This would be a great bit of data to put into a graph for better visualization. But, of course, it’s just a very high level view of time, not broken down by any other characteristic like authors of changes or the modules being changed. But we can augment our analysis to get that information: java -jar ../maat.jar -l evo.log -c git2 -a author-churn

Here’s an example of some output:

author, added, deleted, commits 3coins, 47, 47, 1 Adam Rice, 4, 4, 1 Alan Savage, 56, 39, 5 Andrey Botalov, 30, 20, 4 Andy Waite, 0, 1, 1 Antonio Santos, 4, 4, 1 Anuj Sharma, 25, 14, 4 Ben Lovell, 1, 1, 1 Betsy Haibel, 11, 10, 1

Bradley Schaefer, 3, 2, 2

We can also get a measure of the modules (which the tool refers to as “entities”): java -jar ../maat.jar -l evo.log -c git2 -a entity-churn

Here’s some partial output of that:

entity, added, deleted, commits README.md, 2833, 1047, 167 lib/site_prism/element_container.rb, 1390, 1396, 117 CHANGELOG.md, 1379, 153, 46 HISTORY.md, 1277, 1269, 33 lib/site_prism/page.rb, 533, 363, 86

lib/site_prism/dsl.rb, 531, 161, 34

Here we see some files (like those markdown files) that we might want to exclude from our list. But notice also that our friend element_container.rb is back front and center. But also keep in mind our previous temporal analysis, which showed that file hasn’t been updated as much lately. But this analysis gives us confidence because it makes sense that this file would have been one that had a lot of churn given our previous analysis.

Generate People-Focused Metrics

It can be really useful to see how people’s effort are distributed. We already know, for example, that many developers were working on one particular module. Let’s try to look at the idea of ownership, which is a loose term that just tends to mean “someone tends to work on this module more than other people.” java -jar ../maat.jar -l evo.log -c git2 -a entity-ownership

Here’s some sample output:

entity, author, added, deleted lib/prismatic/element_container.rb, Nat Ritmeyer, 114, 114 lib/prismatic/exceptions.rb, Nat Ritmeyer, 7, 7 lib/prismatic/page.rb, Nat Ritmeyer, 205, 205 lib/prismatic/section.rb, Nat Ritmeyer, 31, 31

lib/site_prism.rb, Ricardo Matsui, 2, 1

We can clearly see that one developer was making a lot of changes, which suggests perhaps some centralized knowledge. But we can also look at an individual file from that list. For example, let’s take our friend element_container.rb:

entity, author, added, deleted lib/site_prism/element_container.rb, Ricardo Matsui, 56, 15 lib/site_prism/element_container.rb, Nat Ritmeyer, 468, 339 lib/site_prism/element_container.rb, tmertens, 35, 30 lib/site_prism/element_container.rb, 3coins, 35, 35 lib/site_prism/element_container.rb, Jonathan Chrisp, 127, 129 lib/site_prism/element_container.rb, Mike Kelly, 16, 5 lib/site_prism/element_container.rb, John Wakeling, 13, 0 lib/site_prism/element_container.rb, remi, 2, 1 lib/site_prism/element_container.rb, Dmitriy Nesteryuk, 23, 21 lib/site_prism/element_container.rb, Michael J. Nohai, 5, 5 lib/site_prism/element_container.rb, Tim Mertens, 5, 5 lib/site_prism/element_container.rb, Jason Hagglund, 10, 6 lib/site_prism/element_container.rb, Betsy Haibel, 4, 3 lib/site_prism/element_container.rb, Travis Gaff, 4, 4

lib/site_prism/element_container.rb, Ivan Neverov, 256, 215

Here we see a more nuanced view of the work on that particular module. We can also refine our view to consider the specific effort that is made by individual developers on the different modules in the code base. java -jar ../maat.jar -l evo.log -c git2 -a entity-effort

Some output:

entity, author, author-revs, total-revs lib/prismatic.rb, Nat Ritmeyer, 6, 6 lib/prismatic/element_container.rb, Nat Ritmeyer, 8, 8 lib/prismatic/exceptions.rb, Nat Ritmeyer, 5, 5 lib/prismatic/page.rb, Nat Ritmeyer, 25, 25 lib/prismatic/section.rb, Nat Ritmeyer, 7, 7 lib/site_prism.rb, Ricardo Matsui, 1, 53 lib/site_prism.rb, Nat Ritmeyer, 8, 53 lib/site_prism.rb, tmertens, 2, 53

lib/site_prism.rb, Jonathan Chrisp, 3, 53

This is nice because even if a lot of developers have touched certain modules, we can get a good view of who specifically did what with which modules and this can help us narrow down who we might need to speak with if we have questions about a given module.

Heuristics, Not Truths

What we discovered here are some heuristics that can guide further exploration or further questions. None of this analysis, beyond the raw numbers, is meant to indicate some absolute truth about the code base. The fact that one developer may have worked on a particularly important module that has a lot of bugs really tells us nothing more than a complicated module that changes a lot is likely to have bugs. Would having another developer take over some of that work end up reducing bugs? Actually, probably not.

But we can start looking at questions that, as an industry, we’re still not all that good at answering. For example, do we understand how multiple developers (or multiple teams) influence code quality? We really don’t, even with all the research we have out there. Can we reliably predict which sections of code have the most quality problems and also maybe the steepest learning curves? We’re still not really good at that except in the most obvious ways.

Next Steps …

To see if we have a chance of answering those questions in any way that has more probative than prejudicial value, we have to keep leveraging the analysis we started in this post. In the next post, we’ll start using these tools a little more with some other repos out there and see if we can expand our insights a bit.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …