This is a chapter by David Woods from the book Resilience Engineering: Concepts and Precepts.
This chapter covers immediate vs long term goals (“chronic vs acute”), taking examples from both NASA and healthcare.
It’s one that I’m revisiting after a time.As I seem to do with a few of his works.
It contains a lot of good reminders and some more succinct ways to answer some common questions about resilience, adaptability, and robustness.
I’d love to hear if this week’s issue helped you and how. Please reply and let me know!
Essential characteristics of resilience What is resilience?
“When one uses the label ‘resilience,’ the first reaction is to think of resilience as if it were adaptability, i.e., as the ability to absorb or adapt to disturbance, disruption and change”
I know that I tend to think this way sometimes. It’s something I can fall into without realizing.
I like that Woods distinguishes adaptive capacity at the unplanned for edges or the undesigned edges.
When evidence of holes in the organization’s model builds up, the risk is what Ian Mitroff called many years ago, the error of the third kind, or solving the wrong problem
Woods acknowledges that this is subject to understanding what the performance envelope is and is perhaps vulnerable to disputes about what that is.
Of course in order to be able to examine and discuss what the non-“textbook” performance envelope is, we must have some idea what disturbances the system was designed to handle.
Woods goes on to explain that the area of “textbook competence” can be viewed as a model of uncertainty along with what plans were developed to handle those uncertainties.
Given the previous, resilience is about adjusting that model to fit different challenges.
We should focus, he says, on not just that an organization can have adaptive capacity, (since all things can adapt), but instead on the adaptive capacity relative to the challenges being considered.
resilience engineering devotes effort to make observable the organization’s model of how it creates safety, in order to see when the model is need of revision.
Lest we be left with just that, Woods offers some high level how-tos.
What to monitor
He tells us that we must then monitor how the organization makes decisions so that we can then asses where they are operating in relation to safety boundaries. This monitoring then, should allow for changes that influence the adaptive capacity of the system, allowing it to face new challenges.
Some properties that can be helpful to be aware of and understand in the system are:
- Buffering capacity: How big of a disturbance can the system absorb before breaking down?
- Flexibility vs. stiffness: How much or little can the system reorganize or restructure itself as the environment changes?
- Margin: How close is the system to some performance boundary?
- Tolerance: When the system is near a boundary does it degrade rapidly or gracefully?
It’s also important to look at the system from various scales. Improvements in some areas of the system may harm other areas. Woods calls these “cross-scale interactions,” the idea that there exist various scales, some above and below.
Looking downward, resilience can be affected by how an organization handles goal conflicts and automation design. If poorly managed, then authority-responsibility double binds can occur, where someone can have the responsibility for an outcome, but have no authority to make changes to influence that outcome.
Looking upward, resilience is influenced by people developing work arounds that then affect strategic goals. Woods provides the example of workarounds for operational bottlenecks that eliminate management’s ability to effectively command compliance using broad standards.
It’s important to note that all systems have_some_amount of resilience. This is true regardless of whether or not a negative outcome has occurred.
In fact, negative outcomes can help show sources of resilience that were previously masked or display complexity that was hidden.
Risks, goals, and accidents
An accident is a “fundamentally surprising” event that reveals some sort of mismatch with the organization’s risk model and responses with reality.
Once an accident happens though, an organization has a window, obtained at a high price, during which learning can occur. Though ideally learning and model recalibration will take place prior to an accident occurring.
Resilience Engineering aims to provide support for cognitive processes of reframing an organization’s model of how safety is created before accidents occur.
This happens by determining what to measure that provides some insight or indicator into things that provide resilience (things like the previous list).
Woods goes on to talk about the tension between acute and chronic goals. He uses NASA’s faster, better, cheaper (FBC) and the Institute of Medicine’s quality goals as examples.
Both of these systems contain individual goals that when fulfilled can remove resources from the others. Woods uses a great Erik Hollnagel quote to describe the problem:
If anything is unreasonable, it is the requirement to be both efficient and thorough at the same time — or rather to be thorough when with hindsight it was wrong to be efficient.
This is obviously a complicated situation, the continued treatment of which will vary depending on the organization, but Woods points us in the right direction. He explains that the way to begin to balance these two tensions to focus on the longer term, chronic goals. Then once those can be delivered, work to make them fit the acute goals like efficiency or cost.
To do the opposite means continually sacrificing the long term goals. To achieve this, the organization will need to begin to see the chronic goals more like values than sets of goals to be measured.
Woods also warns us against a common story we hear. When attempts (similar to mine here) are made to link stories and lessons across domains, often the wrong point can be focused on. The typical story seems to be about a lone, brave practitioner who speaks up to management, holds the line and turns out to be right. But perhaps it’s more important to think about what would happen if they were wrong? How would the organization handle that?
If the organization never lets up on the production pressure, then they’re likely operating in a much riskier position than they realize and are losing the ability to try and respond to warning signs.
- Resilience can be seen as more than just the capacity to adapt, instead as the capacity to adapt to_unexpected_inputs into a system
- Resilience Engineer can help support peoples’ cognitive work by finding and measuring sources of resilience
- Accidents are expensive windows through which previous understanding can be questioned
- Focussing on chronic goals can help acute goals come to pass, whereas the reverse rarely will
- Treating chronic goals as values can help this shift
- All systems have some amount of resilience
**Who are you? ** I’m Thai Wood and I help teams build better software and systems
Want to work together? You can learn more about working with me here: https://ThaiWood.IO/consulting
Can you help me with this [question, paper, code, architecture, system]? I don’t have all of the answers, but I have some! Hit reply and I’ll get right back to you as soon as I can.
**Did someone awesome share this with you? ** That’s great! You can sign up yourself here: https://ResilienceRoundup.com
Want to send me a note? My postal address is 304 S. Jones Blvd #2292, Las Vegas, NV 89107