Testers and Data Science, Part 2

This post follows on from the first. Here we’ll focus on the skill of interpreting our Pokémon data so you trust the data enough to make decisions based on it. This is fairly important if you are a tester working in any context wherein actions against data will be presented to you or where you have take those actions yourselves as part of the testing.

What’s the Goal?

Consider this question: “Which is the best Pokémon character?”

Testers! What’s wrong with that question?

Well, it’s relatively non-specific, right? Yet it can sound very specific to many people. Just find me the “best.” But that’s like hunting “quality”, is it not? What does someone mean by quality in a given context?

Answering a question like this requires a much more specific task that can be precisely addressed and taken on. When testing an application, I might be looking for performance, or security, or basic functionality. Or perhaps a bit of all three. But something will usually have more of my focus on a given exploration. Well, the same applies here. Answering a question like the one above requires specific tasks that can be precisely addressed with a dataset.

Testers! What’s the next question we should be considering? What do we need to think about before we can become more specific?

How about this: who needs to know the answer to this question? Who will be judging whether the quality of our answer is better or worse?

Become Operational

What I’m getting at here, and what is a core part of working in a data science context, is that what we want to do is get to operationalized questions. (I wrote about thinking operationally way back in 2011.) Testers, just like data scientists, seek to refine and clarify questions. We do that until we find explicit links between the data that we can find and the question(s) someone wants answered.

So we want a translation of the high-level question “Which is the best Pokémon character?” into a set of concrete tasks over the data. That, in fact, is what this post will show you.

Testers! What do we specifically want to operationalize here?

We want to operationalize the phrase “best Pokémon”.

As a rough definition, a good Pokémon (probably) has many good stats. But “many good stats” is also not well defined. So maybe that’s what we need to do. Let’s get a look at some stats to figure out what a “good stat” is. Those stats, however, seem to be based on the type of Pokémon. So it sounds like to operationalize this, we have to look at stats in relation to types.

Notice what we did here: we replaced one bit of ambiguity (a “best Pokémon”) with three more bits of ambiguity (“many good stats”, “good stat”, and “stat related to type”).

The Basis For Our Project

As we go through this, I’m not expecting you to know the tools. I’m not expecting you to know how to code up certain things. All of that I’m going to provide for you, similar to what I did in the first post. However, I am going to ask you to reason about the data a bit and I am going to ask you to keep in the back of your mind this question: “As a specialist tester role, what value am I bringing to a data science context?”

As part of answering that question, let’s say that at some point in the future there is going to be a UI that sits in front of the Pokémon data and allows users to explore it and maybe there’s even going to be an API that users can query. But the UI and the API will basically just be calling out to Python logic behind the scenes that generates graphs and whatnot. So, arguably, we really want testing to be focused at the data level. In fact, it’s at that data level that our team will be deciding what kind of UI and API to build.

Beyond even that: whether or not you can relate this to your testing career, it’s perfectly valid in my view to simply use these posts as a way of understanding some data science, both in concept and in practice.

Getting Started

Let’s start as we did in the first post. Fire up Jupyter in the project directory (jupyter notebook) and create a file called pokemon-002. (As with the first post, I do have a version of the notebook you can compare with.) In the first code cell, we’ll do pretty much what we did originally:

(Remember that you do Shift+Return to process a code cell.)

And we’re off to the races!

Handling Null Data

I mentioned in the first post that handling missing or incomplete data is important. So here let’s check for any null columns in our data:

Here the “axis” refers to the axis of data being considered, which is basically going to be rows (0) or columns (1). Just to expand on that a bit, the “axis” refers to the dimension of the array. So when axis=0, this means the dimension that points “downwards” (thus columns) while axis=1 refers to the dimension that points to the right (thus rows). Another way to frame this is that an axis of 0 will act on all the rows in each column; an axis of 1 will act on all the columns in each row.

Here we’re looking for any null values in all of the columns in the data set. And it looks like we have some such null values in Type_2, Egg_Group_2 and Pr_Male. So now that we’ve figured it out, what do we do about it? Sometimes you might want to replace missing data in a given column with something generic. I’m not going to do that here but just to show you what that would look like:

That would replace any null values in the Type_2 column with the value “Unspecified.” In our case, however, we’re just going to drop the column. I’m doing that because, in this post, we’ll just look at the primary type (Type_1) since we know all characters must have that. (We do? Yes. You could have explored the data as we did in the first post to find that out or simply looked at the data.)

So let’s just drop the secondary type column entirely:

You don’t really need to do the second statement but that just provides evidence that Type_2 is no longer part of the columns in the dataset, which is another way of saying that Type_2 is no longer a feature we are considering. The “inplace=True” bit makes sure that the Type_2 column is removed from the dataset. This means we have affected the underlying data. The default for inplace is False which means that the drop will occur only in the immediate context.

Incidentally, I could have dropped all the null columns like this:

We’re not going to be using any of those but it doesn’t matter which of the above commands you use. I just wanted to show you some techniques for getting rid of null data or altering that data so that it’s no longer null.


Let’s practice with grouping our data. While there isn’t much here from a pure testing standpoint, try to reason about each thing we do here before you keep reading. Most of all, however, just get familiar with the cadence of entering in the code and seeing the result. In the third post, the kid gloves will be off and we’ll bring this around more to testing.

Grouping By Defense

Let’s say I wanted you to determine which type of character (so what value in Type_1) is most associated with the statistic of Defense. Don’t worry about coding that out. Just think about how you would want to grab that data. A good exercise here is to think about how you would articulate and/or write down the process by which you would explore the data to get an answer.

Have you thought about it? Okay, now let’s code it. Let’s create a DataFrame that is based on defense:

Now let’s get our defense data as a Series.

It’s pretty clear that a high Defense value is a characteristic of Rock and Steel type characters. What that means, of course, is that such characters should be good at defending themselves from attack. Now let’s get a visualization:

That simply confirms visually what we seem to have gotten from our dataset. That shouldn’t be a surprise, by any means, but in data science contexts (just as in test report contexts!), having a good visualization can be nice. That above example also gives you some idea of how you can structure the visualization, such as by providing indications of what an axis represents (the xticks, in this case) or how that axis data appears (rotation, in this case).

Grouping By Attack

Now how about you try. Why not do something similar to what I just did above, but instead of Defense, use Attack as the statistic. There are various ways you could do this. But take a crack at it. You basically have the template for what to do above.

Did you give it a shot?

Here’s one thing you could do:

The Rock type is still looking good but notice where Steel ended up. So Steel type characters are pretty good at Defense but a bit lackluster on the Attack, apparently.

So let’s group the characters by their Type_1 characteristic and get a count.

Okay, so, while Rock has a fairly good representation in the data set (41 characters with that Type_1), but Water types have the highest representation.

Grouping by Special Attack

Our dataset clearly distinguishes between an Attack and a “Special” Attack. So it might be interesting to see if that “Special” Attack correlates in the same way with the Attack. So give it a shot. I recommend trying it and even if you are basically copying your above solution, type it out anyway. This builds up the muscle memory.

You likely did something like this:

Wow! Psychic types really take the cake on that one. Rock and Steel have pretty poor showings. Water still does pretty well.

Grouping By Speed

You want to try one more? How about this time you try it by the Speed statistic.

Tried it?

Okay, you probably ended up with something like this:

So here we see that Electric and Normal (and Psychic, I guess) have pretty good values for Speed. In this case, Rock doesn’t do too well and Steel fares even worse.

Plotting Data

Bar charts, like line charts, are pretty simple to create and read. Let’s try plotting some of our data.

Plotting by Attack / Speed

Let’s plot out a “speed versus attack” distribution.

If you have any sort of data or even statistics background, a regplot like this can make sense. But if not, here’s probably your first example where you need to consider the type of visualizations that are possible. Here “regplot” refers to a “regression plot.”

Regression analysis is used in statistics to find trends in data. In this case, perhaps our thought process is that there’s a connection between a characters Speed characteristic and their Attack characteristic. So we plot out the data to see if that’s the case.

Regression for many testers means something very different so it’s important to understand here that “regression,” in this context, can be thought as something like a best guess at using a set of data to make some kind of prediction. That’s a fancy way of saying you are fitting a set of points to a graph.

In the case of the graph generated from the above code, the solid line in that plot is called a “linear regression model fit.” The translucent band lines, which in this case can be a little hard to see but ‘surround’ the solid line, describe a confidence interval generated for the estimate.

It would be hard to talk about every nuance of regression analysis here but, basically, what you’re going for is a statistical measure of how close the data are to the fitted regression line. The closer the data are, the more likely that the correlation — and the data around it — will allow you make accurate predictions.

Plotting by Defense / Speed

Okay, let’s try this with the following:

So what do you think? What are the differences in the two plots? Which sets of values do you think correlate better with each other? Perhaps you have utterly no idea. How do we start making some progress on that?

Correlate the Data

Okay, first, let’s get a subset of our data.

What we just did there is create a DataFrame composed solely of the combat characteristics. So let’s just check out the correlation:

Okay, what is that telling you? Well, here’s another concept to get used to: correlation. As a concept, most of us have heard of this. It basically means a mutual relationship or association between some values. Slightly more specifically, you are expressing one quantity in terms of its relationship with other quantities. One handy thing to do is look at the orthogonal. You will notice a column in each row that has a value of “1.000000.” You’ll notice this is only for the columns and rows that intersect with themselves. So — surprise, surprise — HP and HP are tightly correlated. Expected, given that they are the same thing.

But that should help you to interpret the data you are seeing a bit more. Clearly the closer to 1.0, the higher the correlation.

Let’s try another way of visualizing that:

Here it’s the same data, just shown in the form of a “heatmap.” A heatmap is basically exactly what it looks like: a table that has colors in place of numbers. In this case, I’ve annotated the table (“annot=True”) to show those numbers. The colors correspond to the level of the measurement. Here the orthogonal cut through the data stands out a bit clearer.

Umm … What Are We Doing Here?

What I’ve been showing you is that there are different ways to represent data. You may be shown any of the above or all of the above. It’s important to recognize “equivalence classes” between representations. Testers are used to dealing with partitioning functionality or conditions by equivalence. So here I’m showing how that skill translates. But beyond even that, what you’ve seen here so far was some simple exploration of the data.

Numerical Data Decisions

So let’s continue our investigation into the numerical aspects of these characters. Let’s also do this along with introducing you to another visualization and also by creating a new data set.

Here I’m getting a new data set and I’m doing so by dropping certain columns that just don’t make sense to be plotted, particularly with the focus on the combat characteristics we’ve been going with so far. Yet notice that two still snuck in there (Has Generation and Is Legendary) because I wasn’t sure whether they should be excluded. But this graph actually helps me figure that out.

But what actually is this graph. It’s called a box plot and it’s not necessarily obvious from how it looks but what it’s doing is providing a representation of statistical data based on the minimum, first quartile, median, third quartile, and maximum.

So we’ve got a set of values for each characteristic. Each set of values are spaced out along a number line. So, for each set of values, you can draw a line at the median of the data set — meaning, of course, the point or value in the set that divides it evenly in half, with an equal number of points smaller and larger. Then the next thing that hapepns is you divide each half of the dataset in half again. So you’re dividing it into four even sets of points. These four sections are called quartiles.

If the box plot is going from right to left, the leftmost line marks Q1 (first quartile) while the rightmost line marks Q3 (third quartile). When the box plot is oriented bottom to top, Q1 is at the bottom and Q3 is at the top. The median is sometimes referred to as Q2 (second quartile). So just know that every box plot has lines at Q1, the median (Q2), and Q3. Ah, but this can confuse people. The lines are tricky, so consider this stylized representation:

The IQR refers to the “interquartile range.” This is nothing more than the distance between Q1 and Q3. As with regression and other things in this post, I don’t want to go into too much more detail than I have.

Proportional Data

So for now we can see that we have a good picture regarding the distribution of these feature characteristics. It’s impossible to avoid noticing that Has_Generation and Is_Legendary are not proportional with the other values. Should we drop them? Well, let’s check them out.

Depending on your understanding of data, it may or may not be obvious from this that Has_Generation and Is_Legendary are discrete values. This is as opposed to the continuous ones that make up the other characteristics in our box plot. You could also glean this, of course, just by looking at the csv data. But, yes, this data can probably be dropped. However, we might want it at some other time so let’s create yet another data set from our pokemon data.

Dropping most of these probably makes sense to you, but think about it: Why drop Total, Generation, and Legendary?

Well, the Total value will be too high because it’s just that: a totalling up of all the characteristics. And, as we’ve seen with our box plot and distribution plots above, the Generation and Is_Legendary values — given that they are discrete — will be too low. Including these in our data is not necessarily invalid but it would make the data that much harder to read because we would have to scale appropriately.

Okay, so let’s graph this. In order to do this, I’m going to set up some colors. This is a bit of an affectation but it is an important one when you want everything distinct enough to be recognizable.

Okay now we can use those colors to create a plot:

So that’s somewhat helpful because we can see that a character with the Dragon type has the highest Attack value. The highest Defense goes to characters with a Steel type, by a landslide. We can see that characters with a Flying type take the award for highest Speed value. We can also look at some underachievers. Clearly characters with a Fighting type have the lowest Special Attack value (although they do have a high Attack value). Characters of the Bug type clearly have a low pool of HP and those of the Fairy type are distinctly lacking in Speed.

Correlating Data

It’s clear that the graph is showing us some characters are similar in terms of their distribution of statistics, so here again we can try a correlation.

Okay, that’s a slightly more involved heatmap than we looked at previously. But we can definitely start figuring out a few more things from the correlations.

Considering some we looked at earlier, notice for example that Rock has a negative correlation with Flying. Steel also has such a negative correlation. That probably makes sense. A creature of Rock or Steel might have trouble getting aloft. Water and Grass seem to have a very positive correlation, which likely makes sense.

And — testers! — notice that the “makes sense” here is really important. If we started finding things that seemed “odd”, that would be our cue to question things.

I bet you were sitting there reading this thinking “Is this guy ever going to say something relevant to testing?” Well, there you have it! Much of what we did here was test design and then test execution. That is a very viable way to consider what I talked about in this post (which you can imagine being discussed in a meeting with business, a developer, a tester, and a business analyst). The test execution is then the code we put in place to execute our design.

Notice that characters of the Ground type are greatly correlated with Poison types. Does that make sense? You can see via colors and the data annotations that some types have zero correlation with each other. That indicates a nonlinear relationship between those types. Do those instances make sense?

Learning From Data

Let’s revisit one of the correlations we created earlier. This time, however, we’ll leave in the Total, Generation, and Is_Legendary. We’ll also use some of our newfound knowledge to reason about this. So here’s the heatmap:

Okay, so as in the previous heatmap there is clearly an inverse correlation between Defense and Speed. But consider what we’ve learned. We know that characters of the Steel type have the highest Defense while characters of the Flying type have the highest Speed. We’e seen that Steel and Flying types are negatively correlated. Thus that would imply — and the data bears this out — tha tDefense and Speed have a negative correlation. Once again: makes sense.

Operationalizing Our Data

Now, you know how, as a tester, sometimes you do rely on developers to provide some bits of testability for you. Or perhaps you’re like me and you never have quite the amount of SQL knowledge you should, so you have a friendly DBA write you a good query. Well, you can leverage your friendly data scientist the same way.

What we’ve been doing in this post is feature relation. We’ve been seeing how features of our data relate to each other. It sure would be nice if we had a function of some sort that did a certain amount of work for us and we could just call it with features we want to test. So let’s say your friendly data scientist/developer/whomever provides the following for you to throw into a code cell:

Let’s see if that works with some of the stuff we checked way earlier in this post.

Uh, well, that works, I guess. But the friendly data scientist/developer/whomever apparently decided to use a scatterplot. If you’re not sure what these are, this kind of plot is used to display the value of two sets of data on two dimensions. Each dot on such a plot represents a single observation. In this case, the dots are more like expanding circles. Each circle represents a character type but the size of the circle represents the count of characters in the data set that have that type.

This scatterplot shows that, as attack (X) for a character increases, the Defense (Y) stays relatively consistent with it, although there are clearly some increases.

Try this one:

That’s looking little a negative correlation there.

How about this:

That’s looking like a very positive correlation.

Do you see, however, how we have gone along a path of operationalizing that initial question we started this post with?

Relying on Visualizations

Core to all of this has been a focus on visualizations. The book Making Data Visual has a very apropos point:

“It is tempting to believe that there is one beautiful visualization that will show all the critical aspects of a dataset. That the right visual representation will reveal hidden insights. That a perfect, simple, and elegant visualization — perhaps just a line chart or a well-chosen scatterplot — will show precisely what the important variable was and how it varied in precisely the way to illustrate a critical lesson.”

You see in this post we went through many visualizations as we learned more about our data, refining our understanding of that data, and thus better operationalizing the question we started with. Another quote from the same book is equally apropos:

“The task of figuring out what attributes of a dataset are important is often conflated with figuring out what type of visualization to use. Picking a chart type to represent specific attributes in a dataset is comparatively easy. Deciding on which data attributes will help answer a question, however, is a complex, poorly defined, and user-driven process that can require several rounds of visualization and exploration to resolve.”

What we did in this post is provide direct and tangible representations of data that we gathered through exploration. These results should allow people on our team to confirm hypotheses and gain new insights. It’s not too hard to see that we can actually alter the type of questions that people are asking, essentially refining our ability to better communication and collaborate.

Sounds like what testers do as a matter of course, right?

Relevance For Testers

I just covered the relevance a bit above. But let’s close this post out by making sure the relevance is extremely clear.

What we’ve done in this post is refine a high-level question into some specific, data-driven tasks. The outcome of that process is a set of concise design requirements. Those requirements tell us how to find answers to the question. That, in turns, tends to guide our use of the tools.

As you saw, the process of looking at the data to address the question had us generating some incidental visualizations. Doing so, we found what might be some odd patterns, outliers, or perhaps even surprising correlations. Yet everything still did seem to make sense. (Or as much sense as can be found in Pokémon creatures!)

Also, doing some analysis did lead us to doing a bit of data cleaning. And that’s important when you realize that odd outliers and surprising trends or correlations can very much be the result of data we don’t understand or have not isolated well enough. Think of what I just said there in terms of concepts like “edge cases”, “transient bugs”, “intermittent bugs”, “race conditions”, “poorly understood requirements”, etc.

What I hope you see here is how specialist tester techniques — both practices and ways of thinking — readily apply to data science contexts.

In the third post in this series, we’ll continue the trend of data exploration and test thinking that we started in this post.


About Jeff Nyman

Anything I put here is an approximation of the truth. You're getting a particular view of myself ... and it's the view I'm choosing to present to you. If you've never met me before in person, please realize I'm not the same in person as I am in writing. That's because I can only put part of myself down into words. If you have met me before in person then I'd ask you to consider that the view you've formed that way and the view you come to by reading what I say here may, in fact, both be true. I'd advise that you not automatically discard either viewpoint when they conflict or accept either as truth when they agree.
This entry was posted in Data Science. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.