In my post “Can I Function Test My Units?”, I talked about how testers often promoted a misuse of the phrase “functional testing.” I’ve found the same thing happens with the phrase “performance testing.” What I’ve found is that many testers don’t realize that performance testing, just like functional testing, is a test approach. This means there are various testing techniques that you can apply to the approach. If testers don’t have this kind of focus, I believe that the purpose of performance testing can be lost.
At its most basic level, testing for performance applies any time an application has to do any form of processing and this is true whether that processing is an internal formula (such as a Microsoft Excel sheet calculating a formula for a group of cells) or the interaction of a client with a server, such as a Web page making HTTP requests to a Web server or a traditional client/server business application performing database queries against a database server. What you’re really considering is transactions. A transaction can be thought of, perhaps loosely, as any action(s) that a user takes to initiate some form of processing by the application under test. Note that these transactions don’t necessarily have to be via a graphical user interface.
There are three general techniques of the performance testing approach:
- Load Testing
- Stress Testing
- Volume Testing
Load Testing tends to measure stability, responsiveness, and throughput. With this kind of testing, you use a varying workload that might be representative or unrepresentative of actual traffic. User behavior can include variations on the order in which transactions are done as well as the length of time in which they are done or in which they complete. While load tests can be quite repeatable, simply by utilizing the same load parameters each time, load tests can also be designed to exhibit variation that leads to a more randomized or conditional simulation.
To make this a little more concrete, let’s consider a Web site that in the course of one hour normally sees, on average, forty visitors. So, with that:
- A representative load test would be running anywhere from thirty to fifty visitors during a time span of sixty minutes.
- An unrepresentative load test would be only running ten visitors in that same sixty minutes or running ninety visitors in that same sixty minutes.
Load testing is, at heart, realistic testing, at least to the best level you can determine and achieve. Thus the actual focus of the load testing technique is to find degradation points that show up as bottlenecks when load is placed upon a system. A secondary goal is to take reasonable assumptions about possible increased visitor load to see what happens. Note, also, that when you do consider unrepresentative loads, you’re often trying to see if the system has the capacity to handle large numbers of transactions during peak periods and to show what a server can handle overall as far as bandwidth.
Stress Testing is the aspect of performance where you’re testing the system’s functionality under abnormal circumstances, e.g. with too little disk memory, a power or hardware failure, extreme work loads, and so forth. In general this is testing conducted to evaluate a system or component at or beyond the limits of its specified requirements to see how it copes. To a certain extent, of course, stress testing is really nothing more than a type of load test. A stress test is not just an extreme unrepresentative load test, however. While it is true that a stress test is a type of load test that is designed to determine how much load a given application (or a given system) can handle, the load itself may be secondary.
True, I may be stressing a system by placing a considerable load on the system as quickly as possible. But it’s equally possible that I’m using a representative load on a “compromised” system — meaning one that has been set up to have a small amount of RAM, or a poorly configured Apache, or a JVM with poor settings for the allocation of memory.
Within the technique of stress testing, I think it’s helpful to consider two categories of stress testing:
- Sustained Stress
- Maximal Stress
The difference is usually the time in which the stress test is designed to run, the idea being that a sustained stress has a longer execution time than a maximal stress test. So if you want to adhere to those concepts, then I think it’s important to realize that stress testing can accomplish its goals via two methods:
A maximal stress test tends to concentrate on intensity; in other words it sets up much more intense situations than would otherwise be encountered but attempts to do so in a relatively short period of time. As an example, a maximal stress test may have one hundred visitors (which the server could normally take without a problem) do a very data-intensive query search at the exact same second. Thus the intensity is much greater than normal. Conversely, a sustained stress load tends to concentrate on quantity because the desire is to run much more in the way of visitors and/or functionality than would normally be encountered. So, for example, here the idea would be to have six thousand visitors on the site — probably doing relatively little, though — instead of the one hundred.
Volume Testing is predicated on finding weaknesses in a given system with respect to its handling of large amounts of data during short or extended time periods. In fact, it’s this kind of general definition that has merged stress testing and volume testing as one entity, at least in some people’s eyes. (Prior to that merger, according to some, volume testing was more often known as stability testing and was defined as a type of load test. See how confusing this can get if you don’t operationally define your terms?) In any event, a volume test is a type of test that attempts to determine how dependable and robust a given application is, rather than focusing on the responsiveness of the application or the throughput of the network that supports it. A lot of times this means you are focusing on one aspect of the system (such as the database) and seeing how it handles a large amount of what it normally handles — data.
With the above distinctions in mind, there are a few broad categories of performance testing that can be talked about as part of a performance testing effort. Here are what I think those categories are and a brief blurb about what they mean in operational terms:
- Low-Load: Executed at not more than fifteen percent of the expected production user load. Used to identify gross performance issues that would negate the value of testing at higher loads and/or provide a basis of comparison for future tests. Examples of low-load tests are baseline, benchmark, and component-based tests.
- Load: Executed at expected (or theoretically possible) production user loads. Used to validate the actual performance of the system as experienced by users in production. Examples of load tests are response-time, scalability, and component-based tests.
- Heavy-Load: Executed at greater-than-expected user loads. Most often used to test application and system stability and recovery, and to collect data for capacity planning. Examples of heavy-load tests are stress, spike, and hammer tests.
- Specialty/Isolation: Generally created for a specific purpose to help resolve a particular performance bottleneck by further isolating what is actually causing the bottleneck.
I mention component-based tests above and those are examples of where memory-leak testing, as just one example, may be performed. The focus there is on determining if memory that the application is using is not being released and thus gradually (or perhaps not so gradually!) lowering the pool of available memory. This can be done under load but that can also be a load of one. Determining whether a memory leak exists is a form of isolated test, usually based on specific components. Determining when and to what extent the memory leak becomes problematic is when load considerations come in. In all cases, however, you are performance testing.
Any of the above test techniques can be conducted at various connection rates. These tests will always provide best-case results for connection rates slower than the network over which the simulation is executing but this will give a good idea of what the system will “feel” like to a user connected over, say, a LAN with a particular bandwidth or with a dial-up 56.6 kilobytes per second modem. Typically, only user experience tests are executed at various connection rates. It is possible to get valuable results by conducting the same test under the same load several times, each with a different connection rate, and then comparing the results.
I mentioned that a “low-load” test is any test executed at not more than fifteen percent of the expected production user load. While that may be accurate, I should also mention that conducting low-load tests can result in a collection of baselines and benchmarks that serve as a basis of comparison for multi-user tests and for verifying that the developed load simulation scripts are working properly. It’s important to remember that load tests themselves simulate expected real-world loads, from best-case to worst-case scenarios, and are used to determine what the actual performance of the system will be when the application is in production. If the tests are designed properly, end-to-end response times will represent what actual users will experience when accessing the system, at least to a defined level of approximation.
It’s also useful to monitor all of the system resources under this kind of load to determine if they are adequate to support the expected user load. “Heavy-load” tests are executed at greater-than-expected user loads, generally far heavier than a system is ever expected to have to handle. They’re usually used not to determine if a system will fail, but where it will fail first, how badly, and why. What you can then do is scale back a heavy-load to a more reasonable load, testing to see if that same problem you noticed with the heavy-load occurs with the regular load, while perhaps accounting for slightly more users or slightly more time.
Here’s the important point: all of this is performance testing.
What I’m trying to show here is that performance testing is a varied approach that has a series of techniques that can be applied. Those techniques are answering lots of questions but the way you structure your overall performance approach also answers a question. The major element that a performance testing approach is answering can be stated like this:
P = C / L
Here P is “performance”, C is “capacity”, and L is “load”. In other words, the performance of a system is a function of its capacity divided by the load that is placed on it. The general formula that’s sort of a mantra for a performance testing function is
P = [ CB + CP + CS ] / [ LB + LP + LS ]
Here B, P, and S refer to bandwidth, processing, and storage. Each type of capacity is matched by its corresponding load type, just like in real life. And also like real life, it’s the sum of the three parts that determine performance — if a limit is reached in any part, it will limit the performance of the whole.
Ultimately, of course, the performance of any given application is and will always be limited by one of these three types of capacity — but which one, and under what conditions? How can you prefigure that limit beforehand and how can you prove it afterward? Those are the main questions that the performance testing function should be in place to answer.
I believe this can most effectively be done when testers (and everyone else) realize that performance testing — like functional testing — is an approach. Like all good approaches, there are certain techniques that are effective under different conditions, depending upon what you are trying to find out.