Testing at the Crime Scene, Part 4

In the previous post we did some deep dives, using all the techniques of this series so far, to try and get a feel for the overall landscape of a code repo to look for clues. Now let’s start to narrow our focus again a bit and then wrap up this series with a few points about the journey we’ve taken.

For this post, we’re going to look at an entirely new code repo for a project called RuneLite. This is a repository for a client for the “old school” version of the RuneScape game. “Old school” here was the official term for a previous version of the game that was introduced from a backup of the RuneScape source code from 10 August 2007.

Let’s get our crime scene:

git clone https://github.com/runelite/runelite.git

Make sure to cd runelite to get into the crime scene.

This project has a development history that goes back to 2014.

This project gained a lot of contributors in 2018 into 2019.

What we can glean is a lot of activity in 2016, some spikes in 2017, and then some regular cadence in 2018 and 2019 that begins to taper off. This is good. We have a history of the crime scene, albeit one at a very high level.

Let’s get an evolutionary log of everything before 2018.

git log --all --numstat --date=short --pretty=format:'--%h--%ad--%aN' --before=2018-01-01 -- . ":(exclude)src/main/java/class*" > evo.log

Then we’ll do a sum over coupling analysis, which I introduced in the previous post:

java -jar ../maat.jar -l evo.log -c git2 -a soc

When you look for modules of architectural significance in the results, you’re going to want to ignore things like project setup files. With this project, you’ll see a whole list of pom.xml files, for example.

entity, soc runelite-client/pom.xml, 1364 pom.xml, 1364 runelite-api/pom.xml, 1130 cache/pom.xml, 1125 runescape-api/pom.xml, 1003 model-viewer/pom.xml, 881

http-service/pom.xml, 608

You want to be looking at actual code which, in this case, means .java files.

entity, soc runelite-client/src/main/java/net/runelite/client/RuneLite.java, 812 runelite-client/src/main/java/net/runelite/client/plugins/PluginManager.java, 775 .... runelite-api/src/main/java/net/runelite/api/Client.java, 589 ....

runelite-api/src/main/java/net/runelite/api/Actor.java, 435

There we see some classes that appear to have the most cases of temporal coupling to other modules. Keep in mind that temporal coupling means that some entities change together over time. Given the nature of an MMO-style game like this, finding a class called “Client” in the API makes sense. The RuneLite class, in a “client” directory, also makes sense.

We want to use temporal coupling to track architectural problems. Yet, as stated a few times in the previous post, it’s the trends that matter. So let’s perform the trend analysis step by step so that we can understand what’s happening. The nice thing about this analysis is that each step is nearly identical; the only thing that changes is the time period that the analysis is applied to.

Tracking Trends

To track the architectural evolution of the api Client.java, we’re going to perform a trend analysis. The first step is to identify the periods of time that we want to compare. So let’s consider two analysis periods to gather clues from. Based on what we saw above from the GitHub statistics, let’s take one grouping as everything before 2018 and save that data to a file.

git log --all --numstat --date=short --pretty=format:'--%h--%ad--%aN' --before=2018-01-01 -- . ":(exclude)src/main/java/class*" > evo_pre2018.log

Now use that data for coupling analysis, also saving that to a file:

java -jar ../maat.jar -l evo_pre2018.log -c git2 -a coupling > coupling_pre2018.csv

To spot trends we need more sample points. We’ll define the second analysis period as the development activity in 2018 until 2020. We just have to change the filenames and exclude commit activity before 2018, once again writing our results to a file.

git log --all --numstat --date=short --pretty=format:'--%h--%ad--%aN' --after=2018-01-01 --before=2020-01-01 -- . ":(exclude)src/main/java/class*" > evo_pre2020_post2018.log

And once again run the analysis with, you guessed it, saving the results to a file:

java -jar ../maat.jar -l evo_pre2020_post2018.log -c git2 -a coupling > coupling_pre2020_post_2018.csv

Now we have two sampling points at different stages in the development history of the project. We saved a series of data as a result of this:

evo_pre2018.log
coupling_pre2018.csv
evo_pre2020_post2018.log
coupling_pre2020_post_2018.csv

What you would want to do is gather your two data files full of clues — coupling_pre2018.csv and coupling_pre2020_post_2018.csv — in a spreadsheet program. Then you can focus on a given module listed in the report and, across the two timeframes, see if the temporal coupling has increased for a given module.

One thing you’re going to notice is that if the project changes quite a bit over time, this can be very hard to do. For example, in coupling_pre2018.csv, you should see:

entity coupled degree average-revs

api/widgets/WidgetID.java api/widgets/WidgetInfo.java 88 22

If you look in coupling_pre2020_post_2018.csv, you should see:

entity coupled degree average-revs

api/widgets/WidgetID.java api/widgets/WidgetInfo.java 76 141

The coupling has remained the same but, crucially, you’ll find that WidgetID.java has not picked up any other temporal coupling in that time frame. That’s a good thing!

Likewise, we can see another module called WorldType.java. If we check that in our second data set (coupling_pre2020_post_2018.csv), we see:

entity coupled degree average-revs http/service/worlds/ServiceWorldType.java api/WorldType.java 66 9

http/api/worlds/WorldType.java api/WorldType.java 66 9

That doesn’t even show up in the first data set. Is that because the module didn’t exist at all enough to show coupling? Or is it that the module didn’t exist literally at all? Either way, we know we can now probably do some analysis after 2020 and see if the WorldType.java coupling has increased, decreased or stayed the same.

Let’s try it:

git log --all --numstat --date=short --pretty=format:'--%h--%ad--%aN' --after=2018-01-01 --after=2020-01-01 -- . ":(exclude)src/main/java/class*" > evo_post2020.log

Then run the analysis:

java -jar ../maat.jar -l evo_post2020.log -c git2 -a coupling > coupling_post_2020.csv

And what you should see is that in this report, WorldType.java doesn’t show up at all!

If you do some analysis of your own of the files between the three data sets we have, I think you’ll conclude that the RuneLite project does not have a lot of temporal coupling over time. If I was developing and/or maintaining that project, I would be pretty proud of that.

Code Bases Evolve!

One other thing to note: you can also tell by looking at the first two data sets that the idea of plugins gained a massive amount of significance with this project starting post 2018. What this shows us is that the architecture likely changed significantly. That means comparing this repository prior to that architecture change and after it is unlikely to tell us much. And that’s worth knowing! But we also took a third data set and didn’t see a lot of coupling either. Also worth knowing!

If we did see a lot of data points indicating increasing coupling, what would that mean? It would mean we have crimes being committed! Specifically, we are seeing a bit of architectural decay. And if that were the case, you would then likely want to generate some of the visualizations I’ve shown you in the previous posts to better illustrate the situation.

But wait! There’s something in our analysis to be aware of.

Coupled modules themselves are not necessarily a problem. Even the number of coupled modules may not be a problem. What matters here is a phrase I just used: architectural decay. Think of this as parts of the architecture falling apart, such that the boundaries between them are less solid. Thus work in one architectural area bleeds into another. So the analysis you want to be looking for is when the coupled modules that you find in your analysis are located in parts of the repository that are (or should be) architecturally distinct.

And this brings up a really good point that I think is worth digging in to. All of this analysis we’ve been doing is really in service of answering two questions:

What should I be concerned about? (What trends have I seen?)
What should I prioritize? (What trends actually matter?)

What To Spend Time On

Essentially we have lots of clues and we can choose to spend time on only some of those. How do we choose?

Research certainly has shown that most code repositories follow what’s a called a power law distribution. This distribution means that the majority of our code is in what’s called the “long tail.” The long tail refers to a statistical distribution that occurs for particular data sets. In our context, the long tail contains the code that is rarely touched.

But wait! That means literally what it sounds like: most of our code is rarely touched! If you look at the research on large code bases, that’s very much the case. And what that suggests is that most of our code really just isn’t that important from a cost or quality perspective. What that further means is that most of our development activity will be focused on a relatively small part of the overall code base.

So what we’re saying here is that the modules that attract the most changes are the ones that are central to your overall code providing value. Yet, as we’ve discussed in these posts, the modules with high change frequencies suffer the most quality problems. And so we are led to our conclusion: we want to focus our development work on those specific areas of high change frequency.

It’s very simple to find this information. Remember: our version control log is a history. So you just count the number of times each module is referenced in your Git log and sort the results. Let’s try it out:

Reminder for Windows users: the above should work in a Git Bash shell and certainly in a Windows Subsystem for Linux context. It will not work in the standard “command” terminal or PowerShell.

The --format=format: option gives you a plain list of all modules that have been changed. The egrep -v '^$' part cleans up the resulting data by removing any blank lines from the preceding Git command. Then the rest of the shell commands count the change frequencies and deliver the results in sorted order. Worth noting too is that we limited the number of results here with head -5.

Let’s inspect the change frequencies for all code modules and let’s get that data persisted. Change the above command to this:

git log --format=format: --name-only | egrep -v '^$' | sort | uniq -c | sort -r > all_frequencies.txt

You might already be intuiting a challenge here to this sort of analysis. You will find that your Git log output can reference modules that no longer exist. Maybe they were deleted; maybe they were renamed. That’s just something to be aware of.

You no doubt are going to see a lot of those pom.xml type files at the top of that list you just generated. So you might want to generate a list that doesn’t include those. Along with that, in large codebases, you may want to run the analyses on each subsystem within your project. You do that by specifying the path to the root folder of each subsystem. In the context of RuneLite, we could do the following:

git log --format=format: --name-only --before=2020-01-01 --after=2019-01-01 -- runelite-client/src | sort | uniq -c | sort -r

git log --format=format: --name-only --before=2021-01-01 --after=2020-01-01 -- runelite-api/src | sort | uniq -c | sort -r

Another little trick is that you can use git rev-list --count HEAD to aggregate all contributions and calculate hotspots on the level of logical components. You can run this kind of analysis on individual directories too, in case they align with the logical components in your codebase. In the RuneLite context, you can try this:

git rev-list --count HEAD -- runelite-client/src

git rev-list --count HEAD -- runelite-client/src/main/java/net/runelite/client/game

You can also dig into specific modules within your code base. This can get a little tricky since obviously at that point you’re also looking at some language-specific details. But in terms of just base data, you can specify the -L option, which instructs Git to fetch each historic revision based on the range of lines of code that make up a given function.

I’ve found this concept is not well known. The funcname is where the language-aware aspect comes in. In order to get this next bit to work, create a .gitattributes file in the main runelite project folder. Put the following line in that file:

*.java diff=java

With that in place, Git is able to detect Java function and/or method declarations, which it then treats as funcnames. With that in place, here’s an example to look at the onItemQuantityChanged function in the LootManager.java module.

git log -L:onItemQuantityChanged:runelite-client/src/main/java/net/runelite/client/game/LootManager.java

Because Git has no real concept of what a Java method is, you’re not really tracking the history of this specific method here. Instead, you’re tracking lines that probably contain a method or function with the same name.

But this is pretty cool stuff because the command outputs the complete state of the code as it looked in each revision. This comes in handy if you want to calculate complexity trends at this granular of a level. If you just want to get a proxy for the likely technical debt in this context, then you can count the change frequency of the hotspot function. Here’s an example of what I mean:

git log -L:onItemQuantityChanged:runelite-client/src/main/java/net/runelite/client/game/LootManager.java --before=2019-01-01 | grep 'commit ' | wc -l

Here grep filters out each commit hash, wc -l counts them, and the --after option limits the amount of data to a recent development period.

Another little trick to try out is that of using the command git shortlog -s which gives you a list of all contributing authors, including a count of their number of commits. You can run this command on a specific directory, too, by specifying a path with a double dash. Here’s an example:

git shortlog -s -- runelite-client/src

Just as in earlier examples, you can limit the depth of your analysis to a specific time period with the --after option. This is useful to get information on the recent amount of parallel development in a given module. You can also summarize the number of unique authors by piping the output like this:

git shortlog -s --after=2019-01-01 -- runelite-client/src/ | wc -l

Code age can also be calculated and that’s done in two steps. First you have to fetch a list of all modules in the repository (with the git ls-files command) and then you have to request the modification date of each module using a variation of git log. Here’s an example:

git log -1 --format="%ad" --date=short -- runelite-client/src/main/java/net/runelite/client/game/LootManager.java

There’s a lot of stuff I threw at you here and the one thing I want you to take from the above is that you really should explore your version control system and see what you can get out of it.

Explore Boundaries

In the testing world, boundaries are crucial. It’s where the most bugs tend to congregate; it’s where the edge cases exist. So for our context here, create a file called runelite_boundaries.txt.

What are the boundaries? Well, looking at RuneLite, we see that it’s primarily made up of a RuneLite API, which provides the interfaces for accessing the client. We also see that RuneLite is made up of the game client itself, along with all those plugins. So in the file you just created, put the following:

runelite-client/src/main/java/net/runelite => CLIENT
runelite-api/src/main/java/net/runelite    => API

These are transformations. Each logical name in the transformation corresponds to one filter that you want to apply to your results. This transformation allows you to detect potentially surprising modification patterns that break the architectural principle. Let’s try it:

java -jar ../maat.jar -l evo.log -c git2 -a coupling -g runelite_boundaries.txt

And we get back:

entity, coupled, degree, average-revs

API, CLIENT, 33, 230

This is helping us see the degree of coupling between the client and the API. Now, again, remember that some coupling would be expected in these kinds of applications. You could create more transformations that are more or less granular based on your code repo to start seeing what areas of the repository are coupled together. From that, you could then go into the hotspot and coupling analysis we’ve done in these posts.

Understanding Code Crime Scenes

Let’s do a recap of our journey and what we’ve seen.

Heuristics, Not Proofs

One thing to keep in mind is that none of what we investigated with our crime scenes was necessarily the final truth. It doesn’t even necessarily have to be framed as proof in some absolute sense.

Rather what we saw were heuristics that may guide us toward some version of truth — and provide situational proof — for a given context.

Analysis Fundamentals

Along the way we looked at two primary fundamental concepts:

hotspots
temporal coupling

Hotspots provide a time dimension by letting us identify code with high interest rates, which we generally frame as our “technical debt.” The time dimension also allows us to see change. Not only do the change frequencies let us identify the code where developers are doing most of the work, they also point towards potential quality problems.

But, as we saw, it’s necessary to add a second dimension to our model in order to improve its predictive power. We need to add a complexity dimension. Ideally, as we did in these posts, a good goal is to take a language-neutral approach, but without losing precision or information.

What we could then do is combine our complexity dimension with a measure of change frequency. This is what allowed us to identify hotspots that represented code with those high interest rates. Keep in mind that a hotspot is just complicated code that tends to get worked on often. As we went through our various analyses, we saw that hotspots are calculated by combining the two metrics we looked at the most:

Calculate the change frequency of each module as a proxy for interest rate.
Use the lines of code as a simple measure of code complexity.

With that in place, we found that we could discover how severe a potential problem is via a complexity trend analysis. This analysis looked at the accumulated complexity of the module over time. The trend is calculated by fetching each historic version of a hotspot and then calculating the code complexity of each historic revision.

This showed us something that we tend to know but often can’t prove very well: not all code is equal. And, as we saw, that can be true even at (or perhaps most especially at) the function/method level too. In this post we were able to do some analysis at the system level (the whole code base), at the module level (individual code files), and then elements within the code files (specific methods). You can capitalize on this aspect by running a hotspot analysis on the method level to identify the segments of code that contribute the most to the file being a hotspot.

This Helps Developers and Testers

This all seems very developer focused and, in a way, it is. Developers can use these hotspots to identify maintenance problems and focus code reviews on those areas. Testers, however, can use or generate the same information to decide where the most quality risks seem to lie. This can guide what type of testing should be put in place and relied upon in terms of finding bugs. If there’s a lot of temporal coupling, for example, then an exploratory testing strategy is going to be much more viable than even the most robust automation strategy.

Version Control is History

Version control data is means of tracking history that you can dig into to glean insights. Being a history, the log shows a time dimension (a series of commits throughout time). It also shows who made the changes. It shows what they did. Or at least what they thought they did. Or at least how they described what they thought they did. Bare minimum: we can look at who made the most changes to a given area of code. That would at least suggest that person has a good knowledge of that code.

You can also see if developers from multiple teams have an equal share in a given area of code. This gives you some idea about how much and to what extent teams have to coordinate on given areas of the code. As just a simple matter: the decision to devote time to refactoring a given module can depend on whether this is one developer in isolation or many developers constantly working on it. If a lot of developers are currently working on the same area of code, refactoring might not be worth it at the moment.

As I hope these posts showed, version control systems like Git support a large number of options to get the historical log output in a format that can tell you interesting things. You can use those options to derive a log format containing the minimum of what you need in order to answer the analysis questions you have.

Keep It Simple

Finally, as you saw in these posts, I followed a simple rules that guided my exploration of crime scenes and analysis of them:

Focus on language-unaware tooling.
Focus on measures that are straightforward to implement.
Focus on outputs that are intuitive to reason about and verify.

When all is said and done, I hope that if you’re a developer reading this post, you’ve seen a way to look at your code. If you’re a tester reading this, I hope you’ve seen a different way to engage with developers and to better understand how to determine areas of concern for quality, particularly internal qualities.

I wish you fun times in your sleuthing career!

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …