In this post, I’m going to talk about how I approach testing from the standpoint of internal and external qualities. I’m also going to indicate why I think automation is a form of testing but certainly cannot be all of testing. However, bear with me, as I’m going to approach this via a specific scenario around games.
Why games? Well, in the past I’ve talked about testing games for graphics and performance as well as testing games for fairness or difficulty. As anyone who follows the game industry knows, lots of so-called “triple A” games have serious quality issues upon release. It’s worth considering how tricky game testing is.
I’m not excusing the state of certain game releases just because testing games is hard. But, in fact, testing games is hard and it’s at least worth understanding why.
At a glance, a reader might think that this post is of limited interest to them if they don’t test games or even care about games. But I think it’s instructive to consider how testing is carried out in different contexts. I also think it’s interesting to consider how much technical testing is required in this context and how much automation is often required. If you stick with me, I’ll show why I think approaching this via games helps us with thinking about testing in any context.
This post will match up a bit with my testing like it was 1980 post, which was not about games, but where there was quite a bit of focus on testers being proficient with code and where “automation” of some sort was front-and-center. Not, I should note, as a replacement for humans but as a way to assist. As a testing industry, allowing for some generalization here, we’ve disconnected from the development industry in many ways and enabled an often fruitless set of debates setting up an ineradicable opposition between “manual” and “automated” testing.
Our Game Test Scenario
Imagine this: you’re a tester at a game development studio. Your development team has implemented a highly anticipated feature: a subsurface skin scattering shader.
Yikes, what does that even mean?
When light interacts with translucent materials — like marble or wax — it doesn’t just reflect off the surface. It penetrates a short distance beneath the surface, scatters within the material, and then exits at a different point. This creates a softer, more diffused appearance compared to purely surface reflections. Here’s an example of how developers might set this up in the popular Unreal Engine.

From the standpoint of skin specifically, consider that if you shine a flashlight on your hand, the red glow you see is caused by light scattering through your skin and interacting with blood and tissue.

A subsurface skin scattering shader is a piece of rendering technology designed to replicate the effect I just described in a virtual environment. By simulating how light interacts with layers beneath the surface of a character’s skin, it adds depth and realism to digital characters, making them appear more lifelike.

A very good, and in-depth, article on the subject is An Introduction To Real-Time Subsurface Scattering.
As you can probably guess, this technique greatly enhances the realism of character models by simulating how light penetrates and scatters through the layers of skin. This is particularly nice for characters intended to be viewed up close. Here are two examples from the game Horizon Zero Dawn that I covered in one of my previously mentioned posts:


Quick Aside on Test Complexity
Incidentally, purely as an aside and not related to subsurface skin shaders, to give you an idea of all the things you have to test for, in the recent game Dragon Age: Veilguard there was a bug where your character’s chosen body shape would change! Here’s the female character as created:

Here’s your female after a specific situation:

Essentially your character’s female model changes to a male body. This only happens if you engage in a particular conversation path with a particular non-player character named Emmerich.
So there’s a case where you presumably have to look for any situation over any possible set of conditions whereby your body shape may change for some reason. How do you test for that? How do you test for that repeatedly with any changes to the game? Better question: would you have even thought to test for that condition?
Let’s now jump back to our subsurface skin shader implementation. How do you, as a tester, propose to verify this over the course of the entire game world? How do you find any possible regressions or areas where this technique is not working? Well, as with any test, first you have to answer what regressions or “not working” would look like. Thus, we have to say what we would be testing for.
Breaking Out the Test Conditions
To determine test conditions, I need to explore and, when doing that, I like to think up questions that provide a good framework for identifying key areas of concern for testing. These questions, in effect, provide the basis for experiments. Each experiment will have a technique that will make carrying out the experiment possible.
Test Condition 1
- Condition: Does the skin shader actually work? Does it actually diffuse the incoming light?
- Experiment: Verify the visual effect of light scattering beneath the skin.
- Technique: You could use a controlled lighting setup to observe how the shader performs. You would want to compare the rendered results against a reference or known-good implementation, such as a physically based rendering benchmark or even what’s called an artistic intention; sort of like a mock-up, if you will.
Test Condition 2
- Condition: Does the skin shader preserve energy conservation?
- Experiment: “Energy conservation” has a specific meaning in this context. Here you want to ensure that the total light reflected and transmitted through the skin never exceeds the incoming light.
- Technique: You would want to measure light intensity before and after scattering under various diffusion profiles. Clearly this would require tools or debugging views to capture a measure of light energy.
A diffusion profile describes how light scatters below the surface of a material before it exits. In gaming, it’s particularly relevant for simulating soft and translucent materials, like skin, wax, or marble, giving them a realistic glow or softness under lighting. For example, it’s why characters in games look lifelike under sunlight or artificial light.
Test Condition 3
- Condition: Does the skin shader respect the specified diffusion profile? (Notice how this building on the question above.)
- Experiment: Here you want to check if the shader adheres to artist-defined parameters, such as scattering radius or color absorption settings.
- Technique: You would want to render the skin under different profiles and verify outputs match expectations using side-by-side comparisons or data visualizations.
Test Condition 4
- Condition: Does the skin shader produce reasonable values? (Obvious related question: what counts as “reasonable”?)
- Experiment: You would want to make sure that the shader outputs values within valid ranges for the rendering pipeline. Anything otuside the valid range would be “unreasonable.”
- Technique: Here again you would want to use debugging tools to examine the pixel data for anomalies, and probably do the equivalent of a “stress test” with extreme parameter values to uncover edge cases.
Test Condition 5
- Condition: Does the diffusion stop at significant depth discontinuities?
- Experiment: What this means is you want to make sure that light scattering does not “bleed” across abrupt changes in depth, such as at the edges of facial features or between fingers.
- Technique: This would require using models with clear depth discontinuities and evaluate visually and numerically to confirm correct behavior. Testers should note that this is a good example of boundary value testing!
Depth discontinuities occur where there’s a sudden change in depth between two adjacent surfaces in a scene. These typically arise at edges or borders of objects in 3D space. For shader testing, ensuring smooth transitions or avoiding artifacts at these points is crucial to maintaining visual realism.
The Test Conditions Multiply
The above was just a rough idea to get started. Clearly there would be many other aspects that can be tested. For purposes of this blog post, I just want to give an idea of the landscape rather than being comprehensive.
The key point here is that the above questions form a robust checklist for ensuring the shader meets both technical and artistic requirements. Each of these test conditions represents a potential failure point and an opportunity to refine the shader’s implementation.
Framing the Test Approach
Let’s consider some approaches to testing. You could, of course, just have humans sit and play the game, constantly, all the time, looking for any areas where the shader may fail one of the above conditions. Clearly that would be ineffective and inefficient.
You could instrument the system under test. In this case, that would be the environment where you can play the game. You could possibly have the developers create a virtual camera that’s placed in the world and an orchestrator script takes screenshots and provides those screenshots with diagnostic data about the graphic parameters. But, in that case, where is the camera placed? How and when is it moved? How often are screenshots taken? How aligned will the graphical parameters be with the screenshot? This is potentially no better than the human testing approach.
To systematically test these properties, you’re going to want to create a robust test framework. And that test framework is going to be automated. Note, however, that just as with any automation, the framework will only be as good as the conditions it’s designed to execute. How good those are depends on the human test design process, which we just looked at above.
Note that these would not be deterministic tests even if automated because deterministic tests are immune to perturbations that may occur in the operating environment. The point of our tests is to find if any such perturbations are occurring. Deterministic tests support a predictable system but our system is not predictable.
This is an area some testers seem to get confused by. What we can do determine what we will test and how we will do it. We can predict the types of errors we are likely to see if there are problems. But we can’t guarantee that we will find those problems. We can only try to put in elements that will observe those problems if they exist under the conditions we execute.
This next bit gets into some code. I’m going to show an implementation I’ve written for this. You are not expected to understand the code and certainly not run it. I provide this because I wish more game testers who blogged actually got into these details.
Test Harness
Let’s start with a first step, which is creating a test harness of sorts. This step involves creating a test file that integrates seamlessly with the game’s build system and allows for straightforward testing of the subsurface skin scattering shader. I’ll call this TestSkinShader.cpp.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
#include "SkinShader.h" #include "TestFramework.h" class TestSkinShader : public TestCase { }; int main() { TestSkinShader testSuite; TestFramework framework; framework.AddTestCase(&testSuite); return framework.Run(); } |
Here SkinShader.h is some hypothetical header for the skin shader code while TestFramework.h would be some hypothetical test framework’s base functionality. Thus, the test suite file includes the necessary headers for the shader and the testing framework. The TestFramework
manages the test cases. The tests are added using the AddTestCase
function and executed with the Run()
function.
I can now fill out the class with some setup and teardown behavior for the shader itself.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
class TestSkinShader : public TestCase { public: void SetUp() override { shader = new SkinShader(); shader->Load("/assets/shaders/skin_shader.glsl"); } void TearDown() override { delete shader; shader = nullptr; } private: SkinShader* shader; }; |
The TestCase
base class provides the SetUp
and TearDown
lifecycle methods to initialize and clean up resources. The skin_shader.glsl refers to an OpenGL Shading Language shader. The idea here is to provide a simulated shader load. This would sort of be a like a mock in a non-game context.
Now let’s add some tests.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
#include <cassert> class TestSkinShader : public TestCase { public: ... void TestNoNegativeValues() { // Arrange (Given) Light invalidLight(-1.0f, -1.0f, -1.0f); Material testMaterial = Material::Skin("default_profile"); shader->SetMaterial(testMaterial); // Act (When) auto result = shader->ComputeDiffuseLight(invalidLight, testMaterial); // Assert (Then) assert(result.Intensity >= 0 && "Shader output should never produce negative intensity values"); } void TestDiffusionProfileAdherence() { // Arrange (Given) Material customMaterial = Material::Skin("custom_profile"); shader->SetMaterial(customMaterial); // Act (When) auto diffusionResult = shader->ComputeDiffusionProfile("custom_profile"); // Assert (Then) assert(diffusionResult.ProfileName == "custom_profile" && "Shader must respect the specified diffusion profile"); } void TestDepthDiscontinuityHandling() { // Arrange (Given) Mesh testMesh = Mesh::WithDepthDiscontinuities(); shader->SetMesh(testMesh); // Act (When) auto scatteringResults = shader->ComputeScattering(testMesh); // Assert (Then) assert(scatteringResults.EdgeBleed == false && "Light scattering must not bleed across depth discontinuities"); } }; |
Notice that each test focuses on one specific behavior. As a tester, I create a framework like this because it should allow for very clear and focused testing of specific shader behaviors, making it easier for developers to pinpoint issues with the shader logic. And if more tests are needed, I can add them simply by defining new test behaviors.
A better developer than me pointed out that in a C++ context, I could really improve this by using macros, creating perhaps a TEST_BEHAVIOR
macro along with some ASSERT macros. Since this isn’t a C++ tutorial, I’ll skip that for this post. Just know that this is something a tester in this context should be aware of, at least in terms of being able to engage developers that specialize in programmatic logic or patterns.
In fact, wait a minute. What the heck am I doing here? Shouldn’t the developers be writing that kind of stuff? Well, some people think so, but keep in mind that I think testers have to, at times, act like developers. In fact, it’s my belief that testers should be just another type of developer.
Yes, this puts me at cross-currents with some of the testing industry!
Testing for Correctness
While I presented everything here in a game context, there are two points that I think are relevant. The first is that testers should be technical resources who understand the development ecosystem and context in which the thing they are testing is situated.
The second is that testers should rely on automation and not fear that somehow the automation is “not testing.” Automation is certainly not all of testing. But it is a massive help to testing. It is a viable technique. It is one we have used since the 1980s (as per my previously mentioned blog post) and, in fact, even earlier.
But let’s not lose sight of a key aspect of tests, which is the idea of verifying correctness. I talked about basic correctness in my plea for testability and when talking about the whole “integrated vs integration” debate.
If your input is some data — and it pretty much always will be — and that data “looks basically okay,” is this truly verifying correctness? Consider our skin shader testing. Say we do this on the body mesh of a character, as I showed earlier with some of the game images.
The problem is that even if something “looks basically okay,” you’re not really validating any of the implementation assumptions that were put in place by the design. Yet, won’t players just be judging if it “looks basically okay”? Yes, they will and they won’t be concerned about many aspects of the shader and its implementation.
But the engineers on the team absolutely will want to evaluate the shader’s technical performance and correctness. There are entire sets of questions (conditions) that cannot be answered by simply eyeballing the output. Tests must go beyond surface-level validation to ensure that the system functions correctly, handles edge cases, and adheres to theoretical expectations.
Internal and External Qualities
This, to me, is a great example of a focus on internal qualities and external qualities. The external qualities are what customers actually see. And I would argue that this type of testing must be broadly experiential and exploratory, thus carried out by humans.
The internal qualities are often what we have to maintain in order to deliver those external qualities. And those internal qualities require that heavy technical focus and benefit greatly from as much automation as is responsible.
That, right there, is a discussion I wish I saw more testers having. Or even being capable of having. I see too many vocal folks on social media driving wedges between developers and testers by saying the former simply don’t care about testing while the latter are always suffering under an industry that despises them. In a previous post, I said:
The specialized discipline of testing is in most danger from its own practitioners.
I still very much believe that. Previously I talked about the breadth of the game testing speciality. I think that breadth, however, applies to the testing speciality more broadly. That speciality can be framed around not just narratives of what testing is and how it works, but also particular concepts. One of the particular concepts I tried to articulate in this post was the internal quality and external quality idea.
I talked about these types of qualities briefly when discussing whether testers should own quality.
Playing at Testing? or Testing at Play?
I’ll let you decide which of those I’ve done here. I’ve said in various places that some of the best testers I have ever worked with were those who had tested games. (And I do mean tested; not just played!) I stand by that sentiment and I hope this post, along with the others I’ve referenced, framed a bit of why I think that.
To be clear on that, this is because game testing makes it very hard to avoid thinking about internal and external qualities. Game testing, and the expense it incurs for studios, makes it difficult to avoid focusing on a cost-of-mistake curve. Game testing puts a heavy emphasis on technical acumen combined with an experimental mindset. And game testing makes it very clear that not only is automation a form of testing but it’s a crucial form of testing in the industry. It just can’t be all of testing.
Ultimately, testing is a balance between understanding the internal qualities of a system — how it behaves, calculates, and renders — and its external qualities — how those behaviors are perceived and experienced by the user.
In game development, this interplay is particularly vivid, as elements like those diffusion profiles and depth discontinuities I mentioned reveal the subtle interplay between physical simulation and artistic intent. By navigating these qualities with an approach that embraces exploration, creativity, and precision, you can not only uncover potential issues but also deepen your appreciation for the systems you test. I argue that part of that deepened appreciation requires and demands that you be involved with the code and, in many cases, writing code that tests the code.
While game testing can make all this more obvious and demonstrable, it’s crucial to remember this applies to anything you test. Testing, at its best, is not just about finding bugs but about playing an active role in shaping experiences that provide authenticity and immersion.