Navigating the AI Shift: A Tester’s Mandate

It’s very clear that artificial intelligence has become more democratized than at any other time in history. It’s also fairly clear that this democratization will not only continue but likely accelerate. What is the mandate for quality and test specialists in this context?

A few years back I talked about an ethical mandate for “mistake specialists.” A little more recently I talked about keeping people in computing, particularly when there is increased democratization of computing technologies that impact people. Artificial intelligence — whether you like it, love it, hate it or tolerate it — has clearly reached a saturation point for democratization, similar to what we’ve seen for other computing technologies in our past.

Jason Arbon, CEO of Checkie.AI and Testers.AI recently asked a fantastic question: “Where are the experienced testers in this crucial domain?” It’s a good question. I’ve tried to provide my own answer to that question with a fairly extensive set of posts around entire categories of artificial intelligence and machine learning. Implied in my overall set of posts is that artificial intelligence is not just another innovation that testing has to learn to accommodate. Artificial intelligence is a meta-technology. By which I mean, it’s a foundational technology that drives and amplifies other technologies and it does so across different fields.

A book I recommend on that broad idea of meta-technology is Mustafa Suleyman’s The Coming Wave: Technology, Power, and the Twenty-first Century’s Greatest Dilemma.

The use of artificial intelligence is quite literally starting to shape the trajectory of various industries, like medicine, finance, energy, education, and — crucially! — even its own development. This unparalleled technological scope certainly magnifies its potential benefits but equally certainly magnifies its potential risks.

The Evolution of Risk

Testing, as a discipline in a software or hardware context, has always been to some extent about risk. Or at least understanding risk. However, risks evolve as technology platforms afford and constrain what is and isn’t risky. This means testing has always been an evolving discipline, adapting to the challenges of emerging technologies, from industrial machines to digital computing. However, AI presents a qualitatively different challenge. AI transforms the systems it integrates with, making many traditional validation techniques insufficient. AI has the capacity to influence its own outputs and even the methods by which it is tested.

I truly believe that as AI continues to democratize, it will redefine what technology means, challenging us to reconsider how we approach verification, falsification, and — crucially! — implausification. While the foundational principles of testing, rooted in rigorous empirical investigation, will remain our guiding compass, the methods and tools we use to carry out that investigation must evolve. And that means testers must be front and center to drive that evolution. This should resonate well with what I talked about regarding the evolution of testing.

Kostiantyn Gitko, a Founder and General Manager at Devox Software, asked me an excellent question: how we can tweak traditional Quality Assurance and Testing methods to handle the unique challenges AI brings, especially when it’s being used in critical areas of concern, such as health care systems.

When I talked previously about the basis of testing, I mentioned ontogeny, ontology, and epistemology. I believe these will, more than ever, shape the future of testing. And that means the challenge for quality assurance and testing in AI-driven systems isn’t just tweaking traditional methods; in fact, I would say it’s not even about rethinking the philosophy of testing. Rather, it’s about simply going back to the first principles of the philosophy of testing.

Here I’ll see if I can articulate a bit of what I mean by that which might go some way towards providing my answer to Kostiantyn.

Epistemology of Testing

Much of current testing operates under the assumption that systems are static, with deterministic inputs leading to predictable outputs. AI, however, functions probabilistically, meaning we’re dealing with degrees of confidence rather than binary true/false outputs.

So, what’s the shift we need here? Clearly, we must redefine what it means to “know” that a system is correct. This involves borrowing from epistemology to assess not just whether an AI system works but how we know it works. What evidence is sufficient? Is the system’s reasoning explainable? Is it auditable by human experts? How might testers shift to accommodate this? One example would be to implement model interpretability tests as an expansion of the wide-ranging term “functional tests.” These aren’t just tests of outputs, but tests of the explanations behind outputs. This is critical in healthcare, as just one example, where doctors and regulators need to trust and understand the AI’s decisions.

Ontology of Systems

Traditional software testing assumes a fixed ontology: the system exists within well-defined parameters, operating according to predefined rules. AI, however, adapts and changes its “understanding” of the world based on data, often shifting its ontology dynamically.

What’s a possible shift here? We need to test how an AI constructs its internal models of reality and whether those models remain valid over time. This means testing not only outputs but the framework within which the AI makes decisions. How might testers use techniques to accommodate this shift? They could apply ontological drift detection, essentially monitoring the AI system over time to detect when its model of the world diverges from reality. Sticking with the healthcare example, this might be where shifts in patient demographics, new diseases, or evolving medical knowledge can introduce unseen risks.

Ontogeny of Testing

Ontogeny — how a system develops over time — is crucial for AI because the system’s lifecycle doesn’t end at deployment. AI systems learn, adapt, and even degrade as they encounter new data or adversarial inputs.

What’s the shift here? Testing becomes a continuous process of developmental validation, where we monitor and validate the system not just pre-deployment but throughout its operational life. How? Well, we could implement adversarial testing frameworks that simulate worst-case scenarios, such as feeding adversarial examples designed to exploit weaknesses in the model. Additionally, counterfactual testing — asking “What if?” questions — can help explore edge cases that static testing would miss. For instance, in that healthcare scenario, “What if this patient’s symptoms slightly change?” or “What if a rare disease occurs in conjunction with a common one?”

Adversarial Testing & Counterfactuals

The last point of discussion might warrant some elaboration since I haven’t seen many people talking about this. Adversarial testing isn’t just about breaking the system; it’s about teaching it to resist being broken. This is critical in healthcare, where an adversarial input could be the difference between a correct diagnosis and a fatal one. But I don’t think it’s hard to imagine scenarios outside of healthcare.

This is, to me, a crucial shift. We have to move beyond traditional test cases (which we should have done long ago) and design adaptive test scenarios that mimic real-world adversarial conditions. This includes testing against malicious inputs and unexpected combinations of data. Arguably, we do some of that now. But what about this: ethical dilemmas where AI must prioritize competing outcomes. A specific example here might be that we test an AI-driven diagnostic systems with synthetic datasets that introduce biases, outliers, and noise to evaluate the system’s robustness. Similarly, we could use counterfactuals to challenge the system’s understanding: “If symptom X were present, would the diagnosis change?”

Cross-Discipline Thinking Must Happen

In our industry, we currently talk a lot about “cross-functional” teams and what I’m talking about here would really just extend that to interdisciplinary testing that is embedded within teams. Given the complexity of AI, testing can no longer be the sole domain of QA engineers. It never should have been, in my opinion.

Testing must evolve from being a technical task to being a cross-disciplinary inquiry. This aligns with the epistemological idea of distributed knowledge: no single person or discipline can “know” the whole system, but together, they can evaluate it comprehensively. This goes back to what I talked about way back in 2013 regarding being cross-discipline associative.

The Evidentiary Standard for Testing

Ultimately, AI forces us to think beyond the traditional “Does it work?” and ask deeper questions like:

“How do we know it works?”
“Under what ontological assumptions does it work?”
“How does it develop and change over time?”

Testing for AI becomes less about finding bugs and more about ensuring trustworthiness in dynamic, adaptive systems that can influence human lives. This requires new tools, but more importantly, new ways of thinking about the nature of testing itself.

Also, I should add something here to clarify my above question of “How do we know it works?” This is a question we (hopefully!) ask now of any software. So it’s a bit more than just that, right? It’s almost like the epistemological ambit has expanded since the standard of evidence may be more diffuse or, at the very least, more fractalized. This is not so much because it’s all algorithms — since everything has been about algorithms — but rather about self-modifying and self-adapting algorithms, where what was developed may not be what’s running any longer and where the way the system works may be opaque even to those who developed it.

As AI systems become more self-modifying and self-adapting, the epistemological questions we face are no longer limited to whether a system works in a given moment, but whether we can ever truly know how or why it works in any given moment. Traditional models of software testing, which assume a static system with predictable behaviors, are increasingly inadequate for the dynamic, evolving nature of AI. As these systems continue to change in real time, often beyond the understanding of even their creators, the standard of evidence must evolve. And thus how we search for the evidence must evolve as well.

As quality and test specialists, we must develop new frameworks (human-focused and technology-based) for testing that account for this epistemological ambiguity, ensuring that AI systems remain trustworthy, transparent, and accountable, even as they adapt and evolve in ways that challenge our traditional methods of validation. I would argue one of the primary quality attributes of any business has always been trustability. That trustability could, in part, be conceptualized around how well and to what extent we considered various types of risks that degraded various kinds of qualities. The democratization of AI has simply amplified an already existing reality.

The Ethical Call Is Clear

I believe what we have right now is just a broadening of the scope of the already existing ethical mandate for testers. And that means we are led back to Jason’s crucial question: Where are the experienced testers in this crucial domain? We need more people talking substantively about this. We are living through a technocracy where people are relegating more and more of what they do to machines and thus, really, to algorithms.

There are various ways this kind of technocracy can evolve and thus shape world society. It’s incumbent on us humans to decide the nature of that shaping. Testing has a very large role to play in that. The question is: are we going to avail ourselves of the opportunity or, like much else lately, just relegate it to machines and hope it all works out?

2 thoughts on “Navigating the AI Shift: A Tester’s Mandate”

Stefan Friese says:

8 January 2025 at 9:16 am

About:
* “Much of current testing operates under the assumption that systems are static, with deterministic inputs leading to predictable outputs.” and
* “Traditional software testing assumes a fixed ontology: the system exists within well-defined parameters, operating according to predefined rules. ”

That this is not true started IMO already earlier – just thinking about loosely coupled systems, e.g., with many, independently developed and released microservices and also with realtime data processing pipelines. Testing one microservice is easy but testing an evolving microservices system is clearly also already challenging. Testing of loosely coupled systems and data pipelines has already evolved by introducing new test approaches (e.g., contract tests, pipeline dry run tests) and by focusing more on a shift right with incremental releases and synthetic monitoring. Adapting to evolving systems under test is indeed much more challenging for AI-based systems but that we must question traditional test approaches has started earlier.

1. Jeff Nyman says:
  
  11 January 2025 at 9:13 am
  
  Indeed, what I should have probably said here was that “Much of current testing operates under a longstanding assumption that systems are …” While certain evolutions in test thinking have been effective — the contract testing you mention — that’s still not anywhere near enough of a shift to reliably and effectively test artificial intelligence, except in the most simplistic cases. The challenge is when you have systems that can evolve themselves and can, in fact, have an event horizon relative to their original design. You have systems, for example, that can change the nature of the contract. A synthetic monitor is great, as long as the system itself can’t change aspects of its such that the semantic nature of what is being synthetically monitored is changed.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …