Applying Test Thinking with Code

I’ve often talked about the idea of tests putting pressure on design. I’ve also talked about this idea in the context of code-based testing. Here I want to revisit those concepts while including a cautionary tale about how testing at the code level has its own interesting challenges.

There will be some code in this post but you don’t necessarily need to understand coding to understand the cautionary aspects I talk about here. That being said, I talked before about testers acting like a developer and this post is very much in line with that. I’ll be using Ruby for my code here and the context of this Ruby code is a new tool I’m writing to encapsulate the action of WebDriver.

So let’s just jump right in. I’ll break this into two case studies, for lack of a better term.

Case Study 1

The Code Context

I had code like this:

Here you can see I have a module that contains a specific method. That method is designed to look for a loaded gem and provide the version of that gem, if the gem is found as part of the context of the Empiric project. Essentially, it can call a gem as if it were a method. In my project, the Watir gem, as one example, is part of the project context. So a command like this would work:

“Work” here means it would return a version of the gem that is being used in the project. Capybara is not a gem that’s part of my project, so this would not work:

Here “would not work” means that the rescue situation would kick in.

The Test Context

Given that there is an exception situation, I certainly wanted to make sure to create an RSpec test for that and I came up with this:

And the test passes.

However, my test coverage indicated that the raise portion of the code was not being covered.

Yet if I changed the above test to instead say .not_to raise_error NoMethodError, the test failed. As it should. So note that I changed the test condition from “to” to “not_to”. Doing that immediately told me that my test was expecting no error but one was raised. Okay, so that showed me the test is, in fact, recognizing that an error was generated.

So … what’s going on here? The test passes, but the test coverage is saying the code allegedly being tested is not being covered. At best, I have useless test. At worst, I have an inaccurate test.

Does this have to do with the fact that I’m specifically rescuing an error? The rescue statement showed that it was being covered in the test coverage report, but the raise statement was not, even though I’ve verified that it does in fact occur.

The Problem Is …

Wait! Did you spot it?

Notice I’m calling get_version in my test rather than gem_version. Quite the stupid mistake, no? But, of course, one that we often make in the act of development.

And, to be sure, that’s why we have tests, right?

Except, wait. The test was passing.

So why didn’t the test indicate my clearly incorrect code? At the very least by failing, which I would have expected. The most obvious situation was that there was in fact a get_version method. But, in fact, there was not.

Okay so, as a tester, what do you do when there’s an oddity? You modify your test net, right? Just like in fishing, you might decrease the size of the holes in the net in order to catch the smaller fish. What I tried was adding in the message that the exception provides and thus increasing the granularity of my test condition. To wit, I changed my expectation as such:

Here I just added checking for the specific message I knew should be returned. That execution of the test then told me there was no get_version method defined. Specifically I got this:

expected NoMethodError with "No gem loaded for capybara",
got #<NoMethodError: undefined method `get_version' for Empiric:Module>

Ah ha! Now I see the actual problem: there is no such defined method and it’s easy to see I made a mistake in coding.

So obviously fixing the test to actually call the correct method is what allows my test to pass. Err, well, to pass correctly since it was already passing.

But what’s less clear is why the test was originally indicating as passing when I was clearly calling an incorrect method. Not only an incorrect method, but a non-existent one. I had no get_version method in my code at all.

Can you spot the situation here? Even if you don’t know coding, think about what the logic I’ve shown you is doing and the problem that was occurring.

Notice I’m rescuing a NoMethodError in the first place. So maybe that’s the case here? Maybe the fact of get_version itself being a non-existent method was throwing a NoMethodError but that error was getting swallowed up by the actions of the test when I initially wrote it?

It was only when I put in a specific message for my NoMethodError exception did that then force RSpec to realize that the actual NoMethodError — with a different message — was occurring.

So a mistake in the test (a call to a non-existent method) just so happened to coincide with the type of error condition being sought in the test (a rescue of a non-existent method) and the one covered up the other.

Lessons Learned?

This kind of situation is admittedly somewhat simple to see once I’ve spelled it out. But imagine had I given a more complex scenario where it wasn’t quite as obvious what as going wrong. That kind of thing happens all the time in our code. And that is why we can have lots of tests passing but still have problems in our application.

This is one of the ways that developers can make mistakes that persist. And notice you can’t even say that me, here as a developer, wasn’t testing. I was. I simply ran into a situation where the test and the code interacted in an interesting way.

So notice that if I just went by passing tests, everything would have appeared fine. I looked at my coverage, however, and found a discrepancy. Which, of course, I was only able to do because I was generating a coverage report and, crucially, I was actually checking it. Thus we see that passing tests alone are not necessarily indicative. Coverage alone by itself may or may not be indicative. But coverage can certainly backfill some information about tests.

Incidentally, this actually can lead to better design because rescuing NoMethodError is a bad idea in general. Pretty much for the very reason that my test (inadvertently) showed me! So what I ended up doing is something a bit more logical with my logic:

Notice here that I’m now using fetch to get a dictionary and I’m then checking if there is a KeyError, which is a much more robust thing to be checking for and thus a much more robust design.

So what we see here is that this was testing putting pressure on design, albeit at a very low level. But notice I didn’t have to get mired in whether I was “testing first”, “testing next”, “testing concurrent” and so forth. I simply kept my cost of mistake curve short by making sure the feedback loop between “time of making mistake” and “time of finding mistake” was as short a possible.

Case Study 2

Let’s consider another example. This one is going to be a little more involved but, to kill the suspense, this has to do with mocking and the dangers thereof when they are used in tests.

The Code Context

To use my tool, you basically have to call up an instance of a browser which is controlled by Watir which, in turn, delegates down to WebDriver. So you might have logic like this:

Simple enough. I’ll show you the code for those methods in a bit.

The Test Context

In order to test my code, however, this would mean I need a browser with WebDriver. But, as we know, at the unit level of testing we certainly don’t want to be calling up actual browsers. This means I have tests where I need to mock a browser. So first, I have a mock driver I set up like this:

This is what I was using to test my logic, such as the start_browser and quit_browser call from above. I had a whole lot of tests in my code base yet I was having problems with just these two tests:

The commented out bits in the second test are on purpose to illustrate what’s going on so bear with me on that. With just that one line in place in the second test, I get the following error on that test:

<Double "watir"> was originally created in one example but has leaked
into another example and can no longer be used. rspec-mocks' doubles
are designed to only last for one example, and you need to create a
new one in each example you wish to use it for

Further, if I entirely comment out the first test above, the above error doesn’t happen. So I know I’ve isolated the two tests that are interacting with each other in some way. And the above error is telling me exactly what’s happening: state is essentially leaking between tests.

Okay, now again notice the final line of my second test that is commented out. The error message I just got seems to be indicating to me that I need to create a new double in that second test. Okay, so I’ll change my second test so that the final line is uncommented:

So here I’ve uncommented the last line so I’m establishing the mock_driver in that test and — presumably — not allowing the code to leak. After all, how would it at this point? I’m starting the browser and quitting the browser in the context of the same test.

That, however, returns exactly the same error on exactly the same test. Hmm.

Maybe it would help you to see the methods that are being called in that test. First is set_browser:

And here is quit_browser:

It might help you to know that any variable with an @ in front of it in Ruby is called an instance variable. And I bring that up because the fact that RSpec thought one test was “leaking” into the other made me think that perhaps my @browser instance was the problem, essentially being what’s persisting between the two tests. But I didn’t see how to get around that.

Perhaps something occurred to you as it did to me at this point. I changed my second test to make sure I started the browser along with quitting it. But notice that in my first test I don’t actually quit the browser, only start it. So I thought that maybe if I quit the browser in the first test, that would help. So I changed the first test to this:

Notice I made sure to quit the browser now in this test. Surely this would stop the problem, right? In both tests I’m now starting and quitting the browser. (Extra points to you if you thought: “But when isn’t that really just the same test? Thus you could delete one of them.”)

That change, however, led to the previous error about “leaking” being shown on both tests! What?!?!?

The Problem Is …

There’s a lot of investigation I ended up doing and I’ll spare you all that. But what I will say is I spent a whole lot of time on a unit test when I knew for a fact the logic was working because I had a script I was running that started up an actual browser, did some stuff, and then closed the browser. Let’s call that an acceptance test.

I ended up doing this for my problematic unit tests:

All of that was needed as it is or I would get some error or other. So notice what I did here. I removed my mock driver and simply tried to incorporate what seemed to work. The above “contraption” is what I ended up with as the sweet spot. And make no mistake about it, sometimes as developers, our tests end up being exactly that: contraptions designed for one purpose. What’s the purpose? Not always to tell us that our code works but to just get the darn test to pass! This is another way that bugs persist.

Going back to my new tests, this all worked in the sense of giving coverage of the methods in question. What was interesting was that I had to add this to my RSpec configuration:

I needed to do this because RSpec was reporting that I was calling allow on something that was nil (which in Ruby basically means nothing). My tests were allowing that thing that was set to nil to receive a value. So what the above configuration did is simply tell RSpec not to flag that as a problem.

Uh, yeah, but … that is a problem, isn’t it? I mean, if the thing — in this Watir::Browser — is nil, then it’s not really a browser instance at all. It’s just a nothing that I’m allowing to receive messages so that my test passes.

Yet, that being said, if I looked at my code coverage, everything is 100% covered. On the other hand, notice that in my first case study, I showed that the cost of mistake curve was kept tight and thus my feedback loop was short enough that I could make decisions. The above is a case where I potentially am allowing that loop to be broken. Given how the above tests are written, combined with the test configuration, it’s quite possible for a mistake to last longer in the system than it should because I’m always testing a thing that can be nil in the test context but which most certainly cannot be in a real (acceptance) context.

Or, rather, my cost of mistake curve would have suffered if I didn’t have other tests in place, which I do. For example, beyond the acceptance test I mentioned before, I have tests like these:

With tests like those in place, if my “watir_browser” isn’t working, I would be alerted immediately. But then that calls into question why I need my more complicated tests above. Actually, I don’t. The only reason to keep those problematic tests I showed you is because they do directly test start_browser and quit_browser and thus make sure I have 100% code coverage.

Lessons Learned?

This brought up some interesting things, if you think about it. I have a test that is clearly passing. And it adds to my code coverage and thus my test coverage. But is it actually testing the start or quit action on a browser? Well, not really since it was testing start and quit actions on something that it thought was nil.

But — it does work. And it must be calling the lines of code in question because the code coverage, as reported by my coverage tool, indicates that the statements in question have been checked. So should I have just kept using my mock? Well, actually, what I showed you above as my solution is really no different than using the mock in the first place. The mock would have told me just as much — and thus just as little — as my above tests.

Key question: would I be confident releasing my code based just on the fact that the above tests are passing? Absolutely not! So the race to 100% code coverage should not be why tests are added at the code level. I have other tests, such as the “allows navigation” one I just showed you, that implicitly test the start and quit of a browser. But because it’s implicit, it does not show up on explicit test coverage.

This is where you start to have discussions about feature coverage rather than solely test coverage or code coverage. This is also where you have discussions about some tests needing to be at different levels of abstraction. In this case, there must be at least one test that explicitly calls up a browser and then quits that browser. This allows us to start talking about the ratio between heavier, longer-to-run acceptance tests and lighter, shorter-to-run code-based tests.

As with the first case study, I understand this can seem a little contrived. But I believe you can at least see the details and see why the testing put in place is potentially problematic. Now imagine how easy it is for these details to be hidden when you have a very large code base that has many things that are being stubbed or mocked to various degrees.

All This Tells Us What?

What I wanted this post to show is that here is exactly why bugs creep in even when we have unit tests or, in some sense, integration tests at the code level.

Sometimes our test tooling gets in the way a little bit. Sometimes it doesn’t tell us what it should be telling us based on what’s going on, allowing errors to percolate in our system longer than they should. Sometimes our tests are hiding the very thing we’re trying to find. Sometimes those tests are simply abstracting away the real thing to such an extent that it’s questionable what we’re actually testing.

But notice how we never left the domain of testing here. The idea of putting our code into the context of an executable experiment and observing the result was front-and-center to everything talked about here. This is what developers deal with because they are dealing with the human level of thinking around testing and turning that into coding. This is a very difficult thing to do.

The Developer + Test Symbiosis

A key point here is not getting distracted by our technology. I actually think that’s a large aspect regarding what being a true developer means. “Being a developer” is about providing experiences that add value. Those experiences occur as a result of design-based thinking tied to outcomes. And testing exists (in part) to put pressure on design at various abstraction levels to make sure those outcomes are achieved not only in the way that we expect but also looking for those things that we don’t expect.

That, right there, is the intersection between a “developer” and a “tester.” This is what I meant when I said I felt testers should become developers and why those people we now call “testers” must be capable of working across the technical abstraction stack because, after all, that’s what the people we call “developers” already do.

So what we need are generalists with specialist tendencies that can cross both domains with a certain degree of fluidity. That ability and that fluidity is what, in fact, makes us all developers — i.e., specialists in providing experiences. What I’ve found this type of thinking does is get people beyond thinking about “are you a developer that can test” or “are you a tester that can develop” and instead focuses on being part of a delivery team, having skills in multiple realms.

Now, some argue: “Well, maybe … but the proper level of skills in both realms is really hard to achieve.”

Yes, it is. This is one of the very reasons for trying to achieve it in the first place!


About Jeff Nyman

Anything I put here is an approximation of the truth. You're getting a particular view of myself ... and it's the view I'm choosing to present to you. If you've never met me before in person, please realize I'm not the same in person as I am in writing. That's because I can only put part of myself down into words. If you have met me before in person then I'd ask you to consider that the view you've formed that way and the view you come to by reading what I say here may, in fact, both be true. I'd advise that you not automatically discard either viewpoint when they conflict or accept either as truth when they agree.
This entry was posted in Testing. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.