Guarding Quality From Drift to Discipline

Quality doesn’t collapse overnight. It drifts. It drifts in the seams between teams, in the silence between a feature being built and a feature being tested, in the gap between what we meant to cover and what we actually did. That drift is often invisible. Until it isn’t. That’s why testing can’t live in a corner of the organization. It has to be democratized, distributed, and deliberately practiced. And if we’re serious about doing that, we need ways to see the drift before it becomes damage. Let’s dig in!

Many people say that quality is a team responsibility. (Although, see my thoughts on that.) But if you look closely, you’ll find it’s often still treated like a department. Or a phase. Or worse, an afterthought. True distributed quality isn’t a slogan. It’s a practice. It requires understanding cost, acting experimentally, stabilizing deliberately, and recognizing when your testability is eroding. Without that, “quality as a team responsibility” becomes just another platitude.

To make sense of this drift, and do something meaningful about it, I tend to organize my thinking along a particular narrative arc.

Conceptual anchor: democratizing testing.
Economic rationale: cost-of-mistake.
Behavioral discipline: experimental thinking, habit formation.
Operational cadence: scrutinize → stabilize → sustain → scale.
Technical reality check: feature coverage, testability.
Strategic amplifier: automation.

I find it helps to frame this kind of conversation as a flow: from why it matters, to how we know, to what we do about it.

Quality Doesn’t Live in a Corner

When quality assurance is distributed, testing is democratized. Quality isn’t something people “handoff” to a team; it’s a set of questions they can ask themselves as they work. Developers can reason about testability. Designers can anticipate variation. Product can articulate acceptance criteria in a way that exposes gaps, not just features. And testers, those of us who live and breathe this craft, become accelerators of that shared language, not gatekeepers of it.

In line with my thoughts on tester extinction events, a few points follow:

True distribution isn’t about removing specialists. It’s about establishing a shared language of quality that everyone can speak fluently.
True democratization doesn’t eliminate specialists. It elevates them.

Specialists model the questions, heuristics, and critical habits that make quality (as opposed to “quality assurance”) everyone’s domain. They translate what might otherwise remain tacit into something the team can reason about.

So how do we know if we’re actually democratizing testing? We should be able to see it in the texture of conversations. In the stories people tell when they explain their decisions. In whether testability is being considered at the point of design, not just after code merge. If those signals aren’t visible, then democratization isn’t happening, it’s just being declared.

If you want more rambling on this, you can see my post on my perceived evolution of testing as well as how I described my role as a quality and test specialist.

Mistakes Age Poorly

The cost of a mistake isn’t just about money. It’s about time. Specifically, the duration between when a mistake is made and when it’s found. The longer that gap, the worse the impact. That’s the curve we should care about.

This isn’t just a “cost of change” curve. That framing focuses on when work is done. The cost-of-mistake curve focuses on when truth is discovered. It’s not about where in the process you fix something; it’s about how long you let a problem hide.

If quality is truly distributed, then detection latency isn’t just the tester’s burden. It’s a shared signal. A long gap between mistake and discovery means our detection surface is too narrow or too late. A short gap means feedback is closer to the work.

The cost-of-mistake curve tells us that quality activities need to be front-loaded. We need to be asking hard questions early: What are the risks? What could go wrong? What assumptions are we making? Waiting until the end of a development cycle to think about quality is like waiting until you’ve built a house to think about the foundation.

So are we adhering to this principle? And how do we know?

We should see it in our cycle times. In when issues are discovered, not just when they’re fixed. If we can’t track that gap (or worse, if we’re not even looking at it) then we’re not managing quality. We’re just reacting to it. And the best way to shorten that gap, between mistake and discovery, is to stop treating software like a plan and start treating it like an experiment.

If anyone is curious, I dedicated a post to cost of mistake curves.

Every Feature Is a Hypothesis

Software development is fundamentally an exercise in managing uncertainty. We’re building systems with internal qualities (code structure, maintainability, performance) and external qualities (usability, reliability, functionality). Both are constantly at risk of degrading. The best way to manage that uncertainty is to think experimentally. Every feature is a hypothesis. Every change is an experiment. We’re not just building software. We’re testing assumptions about what users need, how systems will behave, and whether our solutions actually work.

That means asking constantly: What qualities matter here? Are they degrading? How do we know?

Internal quality degradation shows up as increasing difficulty in making changes, more bugs in previously stable areas, slower development cycles. External quality degradation shows up as user complaints, performance issues, or gaps between what users expect and what the system delivers.

Thinking experimentally means building in feedback loops. It means instrumenting your systems so you can observe behavior. It means treating every deployment as a chance to learn. And crucially, it means being willing to act on what you learn, even if that means admitting something isn’t working.

I talk about this whole idea in my focus on the quality constant as well as when I discussed navigating qualities.

From Scrutiny to Scale

Not all features or systems are at the same level of maturity, and they shouldn’t be treated as if they are. I think about quality in four broad phases, for lack of a better term:

Scrutinize: When something is new, scrutinize it heavily. Poke at it. Challenge assumptions. Look for the ways it could fail. This is where you’re establishing whether the idea even works.
Stabilize: Once the core concept is proven, focus on making it reliable. Remove the obvious bugs. Smooth out the rough edges. Get it to a point where it consistently does what it’s supposed to do.
Sustain: Now the question becomes: can we keep it working over time? As the system evolves and new features are added, does this remain reliable? This phase is about proving durability.
Scale: Only after you’ve shown you can sustain quality should you consider scaling: more users, more features, more complexity.

This four-phase model has real clarity for me.

Scrutinize: Establish viability.
Stabilize: Prove reliability.
Sustain: Demonstrate durability.
Scale: Expand confidently.

I don’t want that to feel like some corporate maturity model. I want it to feel lived in. The verbs are strong and active. Also, they’re alliterative without sounding contrived, which (hopefully!) makes the structure sticky in people’s mind. And as systems move through these phases, what we cover (and just as importantly, what we don’t!) becomes critical. You can’t stabilize or sustain what you can’t see. That’s where coverage comes in.

I talked a little about these phases before, if you are curious.

Walking the Edges

Code coverage and test coverage are useful metrics, but they’re not sufficient. You can have 100% code coverage and still ship a broken feature because you didn’t test the right things. What matters is feature coverage: have we actually explored the permutations and variations that users will encounter?

Really good testing isn’t just about executing test cases. It’s about thinking through the possibility space. Every feature exists in a context of user behaviors, system states, and environmental conditions, and we need to explore the meaningful combinations. This is where test thinking becomes crucial. Anyone can run a script. The skill is knowing what to test and why.

This is also why I prefer to talk less about “positive” and “negative” testing, and more about test conditions and data conditions. Within those conditions, some will be valid and some invalid. They are all part of the space users might encounter.

I show examples of applying feature coverage when I talk about applying test thinking in the context of code as well as in my “Test Doing” series, which starts with the intersection of testers, code, and automation. I also talked about my lack of desire for positive and negative test terminology.

So how do we know if we’ve achieved meaningful feature coverage? Not by chasing a percentage in a dashboard, but by understanding where the edges of the system are, and whether we’ve walked those edges.

All this said, feature coverage isn’t just about what we see. It’s about how we think. And that’s where instinct, habit, and intuition come in.

From Theory to Instinct

Here’s a truth about expertise: it moves from theory to instinct. When you’re learning to drive, you consciously think about every action. Years later, you’re doing it without thinking; the actions have become habit. The same is true for quality thinking. The goal isn’t just to know testing techniques; it’s to internalize them so deeply that they become reflexive.

I think about this in four dimensions: theory, principle, practice, and intuition. You learn the theory (what is boundary testing; what are basis paths). You extract principles (edge cases often hide bugs). You practice applying those principles. And eventually, you develop intuition. You look at a feature and immediately see where the risks are.

But this isn’t a straight line. It’s more like a conceptual diamond that flexes and shifts depending on where someone is strong. Some people have solid principles but haven’t yet developed intuition. Others have strong practice but weak theoretical grounding. The shape of that diamond tells you where a team’s instincts are, and where they’re thin. The conceptual diamond idea allows for:

Asymmetry (people are strong in different dimensions).
Flexibility (growth doesn’t require climbing a rigid ladder).
Diagnosis (teams can see where they need depth vs breadth).

Let’s break that down by some examples.

A senior developer may have high intuition but low theoretical vocabulary.
A new tester might have strong theoretical grounding but hasn’t developed intuitive risk sense yet.
A product designer might not know testing theory but naturally applies good principles through user empathy.

This multidimensional framing encourages leaders to shape growth rather than just sequence training. Where are people on your team across these dimensions? And how are you helping them deepen the areas where their instinct isn’t yet fully formed? Note that this isn’t about making everyone an expert tester. It’s about making quality thinking a natural part of how people approach their work. The goal isn’t uniform mastery. It’s a shared, adaptive instinct.

I talk a little about this in my post on heuristics around hiring specialist testers.

Interlude

We’ve now created a conceptual arc.

Shared language (democratization)
Shared economic signal (cost-of-mistake)
Shared discipline (experimentation)
Shared cadence (phases)
Shared exploration (coverage)
Shared instinct (instincts and habits)

Up to now, the arc has focused on a few things.

Democratizing language (shared responsibility).
Economic signals (cost-of-mistake).
Behavioral stance (experimental thinking).
Operational maturity (scrutinize → scale).
Exploration depth (feature coverage).
Human cognition (instinct and intuition).

All of that (possibly!) sounds great. But how easily can our systems even support this kind of thinking and testing?

When the System Pushes Back

Instinct alone isn’t enough. Even the sharpest intuition is useless if the system itself resists being tested. That’s where testability becomes the make-or-break factor. Similarly, intuition doesn’t matter if you can’t act on it. And that’s exactly what testability gives you … or takes away.

Quality doesn’t collapse overnight; it drifts. And one of the first places it drifts, often quietly, is testability. By the time the pain is visible, the system is already pushing back against your instincts. Here’s an uncomfortable truth: if your system isn’t testable, none of the other quality practices matter. You can’t democratize testing if the system is a black box. You can’t shift left on the cost curve if you can’t verify changes quickly. You can’t think experimentally if you can’t observe what’s happening.

Testability is the foundation that enables everything else.

However, testability is a quality. And, like all qualities, this means testability degrades. It degrades when we take shortcuts. When we add dependencies without thinking about how to isolate them. When we build complex coupling into systems. When we prioritize speed of delivery over clarity of design. Every one of these decisions trades away testability, and with it, our ability to maintain quality.

The questions we need to ask constantly: Where is testability degraded? Where is it degrading? What internal qualities are we sacrificing? How do we know? And what can we do about it?

A system with good testability has clear interfaces, manageable dependencies, observable behavior, and components that can be tested in isolation. A system with poor testability requires elaborate setup, has hidden dependencies, produces opaque outputs, and can only be tested as a whole. The tragedy is that testability problems compound. A slightly untestable system becomes harder to change safely, which leads to more shortcuts, which further degrades testability. It’s a vicious cycle.

Breaking this cycle means treating testability as a first-class concern. In code reviews, ask: “How would we test this?” In architecture discussions, ask: “Can we observe what this is doing?” In sprint planning, allocate time for improving testability, not just adding features.

Testability isn’t free, but the alternative, a system you can’t confidently change, is far more expensive.

I’ve already made my plea for testability!

If testability is the foundation, automation is the amplifier. But like any amplifier, it only makes clear what’s already there.

Amplify, Don’t Outsource

If testability is the foundation, automation is the amplifier. But like any amplifier, it only makes clear what’s already there. Here’s the thing about test automation that a disturbing number of manager types often seem to miss: automation doesn’t do the thinking. It reflects the thinking we’ve already done. A test automation suite is only as good as the test thinking that went into it. If you automate bad tests, you just get bad tests that run faster.

This has a few important implications:

Automation must be integrated into sprint work, not treated as a separate phase. If your automation is always lagging behind your features, it’s not providing value.
Automation must be easy to update. If changing a test is painful, people won’t do it, and your suite will rot. Maintainability isn’t a nice-to-have; it’s survival.
Automation captures and scales human insight. The value isn’t in the automation itself. It’s in the thinking that the automation preserves and repeats.

Again, automation isn’t a substitute for good testing. It’s a force multiplier for it.

Guarding Against Drift

Drift (generally) doesn’t announce itself. It shows up in the gaps between what we intended and what we actually built. This is why quality thinking can’t be an afterthought. It has to be embedded: in how we design, how we build, how we test, and how we automate.

All these principles I’ve talked about aren’t independent. They reinforce each other. When you protect testability, you enable everything else. When you democratize testing, you need people with quality instincts. When you respect the cost-of-mistake curve, you think experimentally about risks. When you focus on feature coverage, you apply scrutiny before stabilization. And when you treat automation as captured thinking, you ensure that thinking is good thinking.

The common thread is intentionality and evidence. Don’t just claim you’re doing quality work: prove it. Don’t just hope things are working: know it. Don’t just follow processes: understand why they matter and whether they’re achieving the goal.

Here’s something I’ve come to believe over time: there isn’t one singular, capital-Q Quality. Quality isn’t one thing. It’s many things, working in tension. Internal qualities like structure, clarity, and stability. External qualities like reliability, usability, and value. We don’t chase capital-Q Quality directly. Instead, we shape and protect the qualities that create it. The way we think about Quality, in the big-picture sense, is the way we hold all of those small-q qualities together and how we keep them aligned, visible, and intentional.

It’s in that sense that I say that quality isn’t a checklist. It’s a mindset, a set of habits, and a continuous conversation about what matters and whether we’re achieving it. It’s also in this sense that I refer to quality not collapsing overnight. It drifts. And the way you fight drift isn’t with good intentions. It’s with clarity, evidence, and shared responsibility. It’s in that big-picture sense that I can say that quality isn’t guaranteed. It’s guarded. Quality doesn’t defend itself. We do. Quality is never proven once. It’s proven every day. Quality isn’t a phase. It’s a posture.

Ultimately, quality isn’t what we say. It’s what survives drift.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …