Resilience Roundup - Issue #5

Welcome back!

So how’d it go last week? Did having fewer things to watch and read help you learn or focus more?

I hope so! Following a similar pattern this week with two things, a chapter from the Patient Safety Handbook and an ~1 hr talk.


Mistaking Error If you’ve been wanting somewhere to start, in moving from an “if only we could keep those pesky mistake making humans out of the system,” to “I wonder how we can actually make things safer” then boy does Richard Cook and David Woods have a chapter for you.

This is from the Patient Safety Handbook, which as you may have surmised is about healthcare, but is applicable to almost all industries. I think it actually applies especially in software since so often, you’ll see this sort of model and stopping analysis at “human error” as a cause.

Richard and David explain that a lot of research that has come before, not only hadn’t answered the question posed of “What is error?,” but that it’s actually a dead end. Despite what committees and organizations action items may have said, defining and then counting up all the “human error” isn’t a useful exercise and doesn’t bring us any closer to safer processes or systems.

Instead, we need to look at the problem differently, using several disciplines (including resilience engineering of course) and look into how systems fail and how the people who are in and operate those systems are part of its both success *and* failure, “The New Look.”

They systematically go through a number of myths that may sound familiar and begin to dispel them as well as pointing out that often times when we speak of human “error,” we’re often not talking about the same things. Sometimes even talking about multiple things, or bouncing between them. Are we saying error is what caused the failure? Error is the failure? The process or straying from it was error?

As they point out, what really tends be looked at or cared about, is the negative outcome. If we say error is the cause, then we’ve ruined almost any chance of learning by lumping it all together. As they say “the label error should be the starting point of study and investigation, not the ending point.”

This is a good litmus test for your thinking or process. When you do post incident review and analysis, do you find yourself stopping when you get to the human who performed an “error”? If that is a moment that makes you say “ah, this is it! I did it!” then you’re likely stopping the learning, instead this should be a trigger to keep looking and ask questions.

We can’t group human work and performance into two simple buckets, one we label “error” and the other “not error” without blinding ourselves to the real factors at work. We need to look at “the behaviors…incentives, opportunities, and demands that are present in the workplace”.

Further we can’t even really speak of “preventable” errors without making a lot of assumptions about what preventable means and often falling into the trap of hindsight bias. When we look closer we find that “‘error’ is a piece of data about reactions to failure” that “serves as a placeholder for a set of socially derived beliefs about how things happen.”

Instead let us realize “that failure represents breakdowns in adaptations directed at coping with complexity”

Finally I want to leave you with my favorite quote from this chapter, which is actually displayed in my office:

“help people cope with complexity under pressure to achieve success.”

Are trade-offs necessary/important/useful for resilience engineering? Erik Hollnagel - YouTube [51m auto-cc only] Here Erik Hollnagel gives a talk at the Resilience Engineering Association Symposium a few years ago, wheretakes a look at trade offs: “a situation where the outcome of an action means that one quality or aspect of something is lost in return for another to be gained”, through the lens of disasters, both natural and otherwise to help establish what various types of trade offs exist.

Some of the trade offs, in the case of the Ford Pinto, were calculated (wrongly), then committed to.

Other trade offs occur just as a matter of habit, so automatically that we don’t even notice them, and nor were they thought through.This is a strategy and often an effective one for many situations where information overload occurs.

He also describes how own work with the Efficiency-Thoroughness Trade off (ETTO) that says you can describe behavior as if they made a trade off between efficiency and thoroughness.He emphasizes that this does not mean they do, but simply it can be helpful to think this way or describe behavior this way.

Some trade offs exist in institutions, typically encoded in policy, for example in a risk assessment matrix (example pictured below).

Erik argues that because trade offs are everywhere, they are important, but not useful when explaining things.Saying a trade off was made, is in itself a trade off that can mask other meaning, an ETTO trade off.It trades speed in communication for effectiveness and in the process loses more than it gains when it comes to explaining resilience engineering.

I’d be remiss if I didn’t also tell you that he suggests you get beer in Munich :)


Don't miss out on the next issue!