I really enjoyed the book Ubiquity: Why Catastrophes Happen by Mark Buchanan. This book doesn’t talk about quality assurance or testing at all, but it does talk about how things can change very quickly and how a certain context can dictate how things change. I definitely recommend the book for anyone working in a software development context because I found I could abstract ideas out of the context that the book talks about and apply them to the idea of change initiatives, faulty software, and the social context of software engineering environments I have worked within.
I’ll distill a little bit of what I got out of the book here, in the hopes of encouraging you to read it.
The general theme of Ubiquity is that certain processes are subject to unpredictable upheavals. These same processes are routinely and drastically altered by even the least significant of events. Put another way, given the organization of some systems it’s possible for a small shock to trigger a response out of all proportion to itself. It is, in fact, these characteristics of systems that are considered ubiquitous, hence the title of the book.
First let’s talk about the nature of “upheavals.”
The global ecosystem is sometimes visited by abrupt episodes of collapse or near-collapse. There are evidence of at least five so-called mass extinction events and there are many smaller ones. Economic systems can undergo massive collapses (such as those that we saw in 1929, 1987, and 2008). Political systems can undergo massive failures and lead to events like world wars. The crust of the Earth itself appears to undergo sporadic cataclysms, which often lead to massive earthquakes that in turn generate massive tsunamis. Epidemics can sometimes become staggeringly out of control, leading to pandemics (such as the 1918 influenza flu virus). The commonality here is that in organized systems, complex networks of interactions within those systems allow for the emergence of upheavals. This is due to the very nature of the organization itself.
There are two aspects that are important to consider: the notion of history in the system and the notion of a critical state.
Let’s back up for a second and first consider the role of chaos. When you’re dealing with systems and processes that are extraordinarily sensitive to tiny influences along the path (or history) of their development, you have to take into account the potential for chaos. Chaos is, in fact, one way that complexity can grow out of simplicity. So-called chaotic processes look wildly erratic even when the underlying rules of the system are very simple. Yet chaos by itself does not give rise to events called “upheavals.” So what does? Well, that brings us to equilibrium. This is a term that refers to a system that’s in a state of balance. It’s actually when systems are out of equilibrium that you start to get the notion of history because the system does not exist under unchanging conditions. This imbalance leads directly to the critical state because, as Mark puts it in the book:
The key idea is the notion of the critical state, a special kind of organization characterized by a tendency toward sudden and tumultuous changes, an organization that seems to arise naturally under diverse conditions when a system gets pushed away from equilibrium.
Some people refer to the sum of these ideas as complexity theory. Yet others refer to it all as non-equilibrium physics. Some even call it historical physics because the notion of complexity is derived from the accumulation of what are basically “historical accidents.”
The idea of the critical state is more diffuse than just a given location in a system; it’s a result of the structural aspects of a system. The manifestation of the critical state can be thought of as a sort of skeleton of instability, for lack of a better phrase, that runs through interconnected systems. An event or action hitting one area of instability can cause a ripple effect at other nearby points on the instability chain. What you get is a domino-like effect.
A common example given refers to objects with a crystalline structure. These objects tend to have amazing strengths, often being very hard to break. Those very same crystalline structures that give those objects their strength also provide what are known as shatter points. These are spots where a precise application of a carefully measured force will break the object to pieces. To find these shatter points, and thus to use them to shape (or deform) the structure, requires a lot of study, an intimate understanding of the structure itself, and rigorous practice to train the hand in the perfect combination of strength and precision to produce the desired cut.
You can probably ferret out some obvious conclusions to the above.
- If the instability portions of a system are sparse, and all trouble spots are well isolated from each other, then a single effect or event could have only limited repercussions.
- If spots of instability come to riddle a system, the consequences of actions can become very unpredictable.
Thought about in terms of things we design, that means we have a chance to avoid creating fault lines of instabilities within the systems we build. We can’t do that with the Earth, of course, and so we’re going to have earthquakes. We probably can’t do that with our economic system, being as it is subject to the whims of how people behave emotionally when news occurs that has an effect on financial outlooks. But what about systems we build? What about bridges, buildings, and software systems? What about the teams we build? All those things we have more control over.
Clearly what we want to avoid are systems that come to be configured into so-called hypersensitive and thus potentially unstable conditions in which even small events can trigger a response of essentially any size whatsoever. When talking about designing large systems, there still is the idea that this is done by groups of people. Yet each individual person is doing something at a given point. So can the actions of just one person create these instabilities? Does it require the whole group acting together? Mark talks about how a given individual fits into this:
For if the world is organized into a critical state, or something much like it, then even the smallest forces can have tremendous effects. In our social and cultural networks, there can be no isolated act, for our world is designed — not by us, but by the forces of nature — so that even the tiniest of acts will be amplified and registered by the larger world. The individual, then, has power, and yet the nature of that power reflects a kind of irreducible existential predicament. If every individual act may ultimately have great consequences, those consequences are almost entirely unforeseeable.
This is interesting because in systems where many people are free to choose between many options, a small subset of the whole will tend to get a disproportionate amount of attention. Think of this in relation to team dynamics that you’ve seen. The very act of making choices, spread widely enough and freely enough, creates what’s called a power law distribution.
Yeah, okay, but what does all that mean?
Power law distributions are found in human systems. Some economists, for example, have made observations that wealth follows a distribution where 20% of the population holds 80% of the wealth. Linguists have made the point that word frequency falls in a power law pattern, with a small number of high frequency words, a moderate number of common words, and then a huge number of low frequency words. If you want to get all mathematical about it, a power law is any curve for which the height changes in proportion to the horizontal distance raised to some power — that is, multiplied by itself a certain number of times. For example, consider this equation:
height = (distance)2
This represents a curve that bends upward ever more steeply. That’s a power law with power equal to 2.
The important thing to understand is that a power law implies that small occurrences are extremely common, whereas large instances are extremely rare.
The perpetually unstable organization of the critical state makes it always possible for the next event or action to trigger a cataclysm of just about any size. This has nothing to do with the original cause or with some “special situation” in the system itself. What it does have to do with is the “fragility” of the system; a fragility that was due to its construction and the instabilities that were allowed to be put in it. Again, think of designing software here.
When you apply power laws to a system, what all this tells you is that even the greatest events do not necessarily have exceptional causes. Another important thing to consider is that the only way to generate a power law pattern is by some process that has a history — that is, in which the future emerges out of a string of particular events taken at a particular time, each leaving its indelible trace on the course of events. This history makes things complex rather than simple. Yet, while history can lead to complexity, it can also lead to a special, hidden simplicity.
“Hidden simplicity” tends to show up in systems where you have what’s called scale invariance or self-similarity. If that doesn’t mean much to you, don’t worry about it for now. In a very simplified nutshell, this just means that the power law is telling you that your system looks the same at all scales.
What may not be intuitive to you from reading this is that if the distribution of something follows a power law, then words such as “normal”, “typical”, “abnormal”, and “exceptional” simply don’t apply. (It took me awhile to really get the implications of that and to see why it made sense.) What matters in a system is the length of the particular “finger of instability” on which the first tiny “slipping event” takes place. When a system is tuned to be in a critical state — meaning, when the system acquires the special organization of the critical state — then the system lives on the edge of upheaval. What Mark is discussing in Ubiquity in this context are those ubiquitous patterns of change and organization that run through our world at various levels and within various systems.
So now here’s an interesting point: to place something on the very edge of instability and to keep it there seems to require careful tuning and continual adjustment. But in many systems it seems that the critical condition develops quite naturally. This has been referred to as “self-organized criticality.” What counts in the critical state are not complex details but extremely simple underlying features of “geometry” that control how influences can propagate. Self-organized criticality seems to show up only in things that are driven very slowly away from equilibrium, and in which the actions of any individual piece are dominated by that piece’s interactions with other elements.
Pay Attention! Doesn’t that list bit sound a whole lot like team dynamics? Doesn’t it at the very least cause some parallels to spring to mind about software development?
It seems that when it comes to understanding something in a critical state, most of the details just don’t matter. Physicists call this critical state universality. At the critical point, pockets of organization are just about to break out at any place at any moment and, further, are continually breaking out as factions grow and then disappear. How large do the factions grow? How quickly do they dissolve? These questions are all down to the basic geometrical issue of how easy it is for an ordering influence at one point to bring similar order to another point nearby.
What Mark shows us in Ubiquity is that in the critical state, the forces of order and chaos battle to an uneasy balance, neither ever fully winning or losing. And the character of the battle, and the perpetually shifting and changing strife to which it leads, is the same regardless of almost every last detail of the things involved. The physical dimension of the thing in question matters, as does the basic shape of its elements. But nothing else matters. There’s the notion here of universality and universality classes. The idea of universality is that any two substances, real or imaginary, that fall into the same class will necessarily have exactly the same critical-state organization, regardless of how utterly dissimilar they may otherwise seem to be.
The overall idea that I took away from the book is looking at systems that share the character of the critical state, reflected in remarkably simple, statistical laws: the scale-free power laws that reveal a profound hypersensitivity built into the system and the lack of any expected size for the next event. So while chains of events in those systems may not be predictable, it’s not the case that nothing is predictable. It’s in the statistical pattern that emerges over many chains of events that you can hope to discover the laws for all things historical about that system. Such laws capture the general properties of many narratives, rather than just one, and thus reflect the character of the deeper historical process that operates behind individual chains of events.
I can just hear some of you now. “Whoa! Whoa, there! Isn’t this a tad too … theoretical … for quality assurance and testing?” I would maintain that it’s not. The systems we work with as practitioners (systems of people, systems of processes, as well as systems of technology) are all subject to the power laws mentioned in the book. Further, all such systems do have their own “fingers of instability” and their own potential for “upheavals.” Understanding the ideas behind these concepts has definitely helped me keep my eye on how my work can be shaped and do the shaping of systems.
I can just hear some of you now. “Yeah, okay, whatever. Still seems a bit out there from my day to day.” I can buy that. It’s a bit far from my day to day too but maybe that’s part of the point. After all, the decisions that lead to those patterns of instabilities are ultimately made up of decisions that are made day to day. That’s sort of the insidious nature of ubiquity, perhaps.
If you’ll bear with me for a bit, I’ll at least try to justify this post a bit.
As we know, software is incredibly complex. Even simple applications tend to take on a high degree of complexity at a rapid rate. The problem is that the complexity comes in not just from the sheer number of instructions or statements in the application but also on the interactions among and between those instructions and statements.
Come to think of it… This is part of what makes testing those applications so challenging.
So the drive is usually on to reduce this complexity and that usually occurs by breaking up those instructions and statements. The goal is to isolate related groupings of instructions that have to do with storing data and manipulating data. Those groupings become objects. Those objects are then combined into large elements called components. These components are meant to be chunks of code that can be used without necessarily understanding the internal details of how the code all works. The complexity is still there, but it’s hidden behind relatively simple interfaces.
Here’s a visual example of reducing complexity:
This hypothetical “system” is composed of 12 items (of some kind) that interact with each other. By dividing these items into four smaller assemblages, the total number of interactions has been reduced from 66 to 18, and the system is now much less complex. That being said, it is still very complex. In fact, the most significant issue of software is always that inherent complexity.
Software is a codification of a huge set of behaviors: if this occurs, then that should happen, and so on. We can visualize individual behaviors, but we have great difficulty visualizing large numbers of sequential and alternative behaviors.
The same things that make it hard to visualize software make it hard to draw “blueprints” or even specifications of that software. By way of contrast, a road plan can show the exact location, elevation, and dimensions of any part of the structure. The map corresponds to the structure, but it’s not the same thing as the structure. Software, on the other hand, is just a codification of the behaviors that the programmers and users want to take place. The map, in this case, is the same as the structure. Once the system has been completely described, then the software has been created. This means that software can only ever be described accurately at the level of individual instructions. To summarize is to remove essential details, and such details can cause the software to fail catastrophically or — even worse — subtly and insidiously.
It is true that the practice of isolating various sections of code and encapsulating data and behavior within objects can reduce the overall complexity of a system. But that doesn’t release us from the burden of having to individually define each and every one of these instructions (or behaviors); it just helps us to organize them better.
And that’s where the individual human element can come into play.
So what this speaks to is the fact that software is abstract in nature. A map or a blueprint for a piece of software must greatly simplify the representation in order to be comprehensible. But by doing so, it becomes inaccurate and ultimately incorrect. This is an important realization: any architecture, design, or diagram we create for software is essentially inadequate. If we represent every detail, then we’re merely duplicating the software in another form, and we’re wasting our time and effort.
This is such an important point I’m going to say it again:
Come to think of it… It is impossible to accurately blueprint software, or draw up a complete set of requirements before the software — or at least some parts of it — have been completed in some form or another.
This means that any specification of requirements for software is likely to be incomplete in important ways. The users will gain new insights into their needs as the software starts to take shape. The software becomes less abstract to them once they can get hands-on experience and try out a variety of scenarios. This is where the stream of “change requests” ultimately comes from. It’s not that users don’t know what they want: it’s that they’re just not able to visualize a system of this complexity until it’s at least partially complete. What this means is that any belief that you have, or ever can have, a comprehensive and finalized set of requirements is a form of self-deception.
This is essentially why development approaches that are focused on “agile” processes are said to be dealing with complexity in a better way than waterfall. Some people doubt that or just treat it as hype but I found that Ubiquity made a compelling argument for that even though the book wasn’t talking about software development.
The requirements for a piece of software will invariably be incomplete. There will be conceptual gaps that must be filled, and there will be assumptions that aren’t justified and aspects that just won’t work. Because clients aren’t software experts, they won’t always be able to distinguish between what’s possible and what’s not, or know what trade-offs are available. They need to work with the product team (analysts, testers, and developers) to discover this.
This means that the development process is a process of discovery — progressively finding out the exact character of the software that will meet the customer’s needs. Developers must combine analytical and creative skills to find out what their customer really wants (even if the customer’s description is confused and incomplete) and invent ways to combine these requirements into a system that’s logically consistent and easy to use. Software development is also a process of discovering whether and how the chosen technology can be made to perform the role that’s required of it. Sometimes it will work as expected. Sometimes it won’t, but there’s a workaround that takes more effort than originally planned. And sometimes the technology just can’t do what’s needed.
The best way I’ve heard this worded is as such:
Software development isn’t just a process of creating software; it’s also a process of learning how to create the software that is best suited for its purpose.
Another way to look at it:
Programming is more than just writing code. Each step requires the developer to analyze some portion of the problem and design some aspect of the solution.
Software development is a process of research, so at no point can absolutely definitive plans be drawn up.
Come to think of it… The same applies to test plans in the testing arena. This is why I focus more on the test strategy than on the test plan.
The more definitive you try to make the plans, the more flawed they’ll be, and the more labor there will be in revising them to meet changing circumstances. As the shape of the software becomes increasingly clear over the course of a project, the design of that software must be revised again and again. To completely redesign the solution each time around would be onerous and wasteful, so we should think rather of a process of ongoing design.
The question is: does this help us avoid building in those patterns of instability? As it turns out: nope! Software is more fragile than ever in some cases. The major difference is we tend to find out about it quicker and can maybe even react a bit quicker to mitigate problems. On the other hand, you could argue modern software practices simply allow us to make more mistakes even faster. Given the continuing studies into software development, I think the ideas of how software is developed and the ideas presented in the book Ubiquity do intersect.