Testing: The Art of Unlearning

In two earlier posts I traced the path from Aristotelian to Galilean thinking as a way of understanding how testing developed as a discipline: how competing models of quality, and the slow maturation of experimental method, gave rise to something we might actually recognize as testing today. This post sits in that same current of thought, but takes a step back to ask a prior question: what is it that makes any of that development so difficult in the first place?

There’s a common assumption about how knowledge grows: we learn things, we add them to what we already know, and understanding accumulates. It’s a tidy picture. It also happens to be largely wrong, at least when it comes to the kind of understanding that actually changes how we see the world.

A more accurate picture is this: the hardest part of learning something genuinely new is not acquiring the new idea. It’s retiring the old one. And the old idea is hardest to retire precisely when it feels most obviously true; when it seems less like an assumption and more like the simple shape of reality itself.

This has direct consequences for testing. A tester who can’t retire their assumptions about how a system works will systematically miss the failures that violate those assumptions. And those are often the most consequential failures of all. The bottleneck, in other words, is rarely evidence. It’s unlearning.

The Dialogue That Wasn’t About What You Think

Galileo’s Dialogue Concerning the Two Chief World Systems, published in 1632, is often described as a defense of the Copernican heliocentric model. That description is technically accurate and almost entirely misleading.

The bulk of the Dialogue is not spent arguing that the Earth moves. It’s spent dismantling the felt certainty that the Earth’s movement is inconceivable. Galileo understood something crucial: you cannot argue someone into a new model while they are still standing inside the logic of the old one.

The geocentric model didn’t persist because people lacked access to better evidence. It persisted because the Earth’s stillness felt like bedrock; not a theory, but a precondition for thought itself. Galileo’s real target was that feeling. His method was to make the geocentric intuition feel, from the inside, like the assumption it actually was.

The Dialogue is structured as a three-way conversation between Salviati (the Copernican), Simplicio (the Aristotelian), and Sagredo (the open-minded layman). The choice of format was deliberate: Galileo wanted to show that the geocentric position couldn’t survive honest cross-examination, not merely that it contradicted new data. The Inquisition’s subsequent objection wasn’t primarily to the astronomy. It was to the implication that orthodoxy was debatable at all.

This is the key intuition for testing: evidence is necessary but not sufficient. What a tester brings to evidence, the assumptions through which they interpret it, determines what they can and cannot see. The question is never just “what does this system do?” It’s also “what am I already certain it does, and how would I know if that certainty were wrong?”

Seven Retirements

The physicist Carlo Rovelli, in his 2023 book White Holes, sketched a compressed history of how humanity arrived at Einstein’s theory of general relativity, which among other things, is the discovery that time itself is not fixed but bends under the influence of gravity. What strikes me about his account is that it reads, on closer inspection, less like a sequence of discoveries and more like a sequence of retirements. Each figure in the chain doesn’t simply add something new. They remove something that everyone had assumed was permanently load-bearing.

If you walk through each part of the chain in order, the pattern becomes hard to miss. So let’s do that now.

Anaximander (sixth century BCE) notices that if the Sun, Moon, and stars revolve around the Earth, there must be empty space below the Earth as well as above it. Which means the Earth hovers. Which means the Earth needs no foundation. At the time, this was not just counterintuitive; it was conceptually vertiginous. After all, everything rests on something. Right? I mean, that’s not a theory, is it? That’s just how things are. Well, Anaximander retired it anyway.

Anaximander is often underappreciated in the history of science, perhaps because his conclusion sounds so obvious to us now. But the obviousness is the point: we’ve already done the unlearning he did. From inside the pre-Anaximander intuition, the Earth hovering on nothing was not a bold hypothesis. It was a category error. It was like asking what’s north of the North Pole. The assumption that things require support was so fundamental it didn’t feel like an assumption at all.

Aristotle (fourth century BCE) observes that during lunar eclipses, the Earth’s shadow falling across the Moon is only slightly larger than the Moon itself. Which means the Moon is not a small bright disc in the sky. It’s a large body, a world, only somewhat smaller than the Earth. The Moon as a distant lamp was retired. The Moon as a place became thinkable.

Aristarchus (third century BCE) contributes what may be the most elegant argument in the ancient world. When the Moon is exactly half-lit, and thus a quarter Moon, the angle between the Sun and the Moon as seen from Earth is nearly a right angle. A triangle with two near-right angles has a very distant third vertex. So the Sun is vastly farther away than the Moon. Yet the Sun and Moon appear the same size in the sky. Therefore the Sun must be enormously larger than the Moon and, Aristarchus reasoned, much larger than the Earth itself. Now, if that’s true, it’s far more natural to suppose that the small Earth orbits the gigantic Sun than the reverse.

Aristarchus arrived at heliocentrism nearly eighteen centuries before Copernicus, using nothing but naked-eye observation and geometry. His argument required no instruments and no mathematics beyond what any educated Greek would have known. What it required was the willingness to follow the geometry wherever it led, even when it led somewhere that felt absurd. He was largely ignored. This is itself a data point about how unlearning works: having the correct argument is not the same as having the argument accepted. The intuition that the Earth is still and central was too strong to be dislodged by logic alone.

Copernicus, Kepler, and Galileo (sixteenth and seventeenth centuries) finally make the heliocentric model stick and not just as a geometric convenience but as a description of how things actually are. And the core retirement here is perhaps the most psychologically demanding of all: the intuition that motion is something we would necessarily feel. We don’t feel the Earth move. Therefore the Earth doesn’t move. Galileo’s inclined planes and falling bodies established the principle of inertia (that uniform motion produces no felt sensation) and in doing so, retired the intuition that stillness and motion are experientially distinguishable from the inside.

Newton (eighteenth century) builds modern physics on the work of his predecessors and makes a move that his contemporaries found almost as unsettling as heliocentrism: he proposes that objects can exert force on each other across empty space, with nothing in between. Gravity acts at a distance. The intuition that forces require contact, that cause and effect must touch, is retired. Something else is present in the world beyond material bodies. Newton called it force. He couldn’t explain how it worked. He simply demonstrated, with devastating mathematical precision, that it did.

Faraday and Maxwell (nineteenth century) discover that Newton was almost right. Forces are not quite instantaneous. There is a lag: small, because light is fast, but real. Something propagates through space, carrying the force from one body to another. Faraday called this something the physical field. It was considered a somewhat mystical idea by many of his contemporaries, some invisible medium filling all of space, transmitting influences. Maxwell wrote the equations for it. The retirement here is the assumption that cause and effect are functionally simultaneous. They are not. The gap is just usually too small to matter.

Faraday’s intuition about fields is worth pausing on because it echoes something from my earlier posts on these topics. Kepler, similarly, intuited that the Sun was the motive power of the universe but couldn’t explain the mechanism. Both men were working at the edge of what their conceptual frameworks could support, sensing that something was there without being able to say precisely what. In both cases, the intuition preceded the formalism by decades. This is a recurring pattern: the unlearning happens before the new learning can be properly articulated.

Einstein (twentieth century) finds, while searching for the equations governing the gravitational field, something he didn’t expect: that the geometry of space and time is not fixed. It’s shaped by the gravitational field itself. Time doesn’t pass at a uniform rate everywhere. Clocks run slower in stronger gravitational fields. Falling is not a force acting on a body; it’s a body following the straightest possible path through spacetime that has been curved by mass. The retirement here is double: the Euclidean geometry we learned in school, and the universal tick of time. Both turn out to be approximations, accurate enough for ordinary experience, quietly wrong at the edges.

What Testers Can Recognize Here

What the above sequence traces, viewed through the lens of testing, is a taxonomy of the kinds of assumptions that become invisible through familiarity. Each retirement maps onto a class of assumption that testers routinely carry into their work without examining it.

There’s the Anaximander problem: the assumption that the system rests on a foundation that may not actually exist. I’ve seen people assume, for instance, that the infrastructure beneath the application is stable, that dependencies behave consistently, that the environment in which we test resembles the environment in which users operate. These assumptions often go untested precisely because they feel like preconditions rather than hypotheses. This is something testers can correct for.

There’s the Aristarchus problem: the assumption that we have correctly estimated the scale and distance of things. I’ve seen people regularly misjudge how far a failure propagates, how large the affected surface area actually is, or how remote a particular edge case truly is from normal operation. Aristarchus followed the geometry and found that the Sun was far larger and farther than anyone had supposed. The same discipline, following the logic rather than the intuition, often reveals that a defect’s blast radius is not where we assumed it would be. This, too, is something testers can correct for.

There’s the Copernican problem: the assumption that we, as observers, are at the center of the interaction. Testers construct test cases from their own mental model of how a user approaches a system. That model is always, to some degree, geocentric: organized around the tester’s own position and perspective. The users who find the most interesting failures are often the ones who approach the system from angles the tester never inhabited.

There’s the Newtonian problem: the assumption that effects are local. A change here shouldn’t affect behavior there. After all, there’s nothing connecting them. Newton’s insight was that the absence of visible connection doesn’t mean the absence of actual connection. Testers encounter this constantly in complex systems, where a change in one component produces failures in a component that appears entirely unrelated. The field exists even when we can’t see it.

There’s the Einsteinian problem: the assumption that the frame of reference is stable. We test in a particular environment, at a particular load, with a particular dataset, and we treat the results as if they describe the system in general. Einstein showed that the geometry of spacetime is not fixed but contextual. A tester’s results are similarly contextual: not wrong, but local. The question is always how far the local result generalizes, and that question is harder to answer than it looks.

Unlearning as Method

It’s worth pausing on a contemporary example before returning to Galileo, because the unlearning problem is not just a historical curiosity. It’s happening right now in testing’s own present.

A recent series of posts on this blog explored what it means to test AI-driven systems, by which is meant systems that are non-deterministic, and where the same input does not reliably produce the same output. The challenge that series kept circling was not primarily technical. It was conceptual: testers reading those posts had to retire the assumption that a test result is a stable, binary thing. Pass and fail, in a non-deterministic system, are not fixed categories. They are probabilistic ones. And that retirement turns out to be genuinely difficult, not because testers lack the intelligence to grasp the idea, but because pass/fail is so foundational to how testing has always felt that questioning it seems less like updating a technique and more like pulling on a load-bearing wall.

That is precisely the Anaximander problem, wearing contemporary clothes. The assumption isn’t wrong because it was foolish. It was reasonable, well-supported by decades of practice, and invisible precisely because it worked so well for so long. Unlearning it required the same thing Anaximander required: following the logic of the new situation wherever it led, regardless of how vertiginous the destination felt.

Galileo’s inclined planes, which I looked at in some of the earlier posts in this History and Science series, are worth returning to here. What made Galileo’s experimental method distinctive was not just that he observed carefully. It was that he designed observations specifically to defeat his own expectations. He recognized that an experiment conducted inside an unexamined assumption will tend to confirm that assumption, not because the experimenter is dishonest but because the assumption shapes what counts as a result worth recording.

The mature version of this, for a tester, is not cynicism about one’s own work. It’s a structured practice of asking: what would have to be true for this test to be misleading? What assumption am I making about the system that, if false, would invalidate this result? What would the Aristarchus version of this test look like, the version that follows the geometry wherever it leads, regardless of where I expected it to go?

Our above sequence ends with Einstein retiring the fixed geometry of space and time. That is, in a sense, the ultimate unlearning: discovering that even the stage on which all events occur is itself a participant in those events, bending and warping in response to what happens on it. The frame of reference is never neutral.

Neither is the tester’s. The assumptions we bring to a system are not a transparent window onto its behavior. They are part of what we are testing. Or ought to be. The discipline of testing, understood in its fullest sense, is not just the discipline of observing carefully. It’s the discipline of becoming a rigorous skeptic of one’s own certainties.

That is a harder thing to learn than any technique. It’s also the thing that, historically, has made the difference between testers (and scientists!) who merely confirmed what they already believed, and those who found something genuinely new.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …

The Dialogue That Wasn’t About What You Think

Seven Retirements

What Testers Can Recognize Here

Unlearning as Method

Leave a Reply Cancel reply