Most of this is likely familiar to you if you’ve read Sidney Dekker’s books, but I think having a shorter, direct instruction on how to improve these investigations is really important. Thanks to John Allspaw for tweeting about this article..
Reconstructing human contributions to accidents:
the new view on error and performance
In this article he goes over a number of very tactical, useful ways to directly improve accident investigation, especially as it applies to human involvement and performance. He walks us through the creation two deliverables. A timeline that uses domain specific language and another where we talk in more generally applicable terms about the possible psychology of the participants.
Dekker starts off with a reminder about the difference between the “new view” and the “ old view” on safety and human error. That in the old view humans are seen as something to work around and their errors or fallibility are something to protect a system against. Perhaps by removing humans from the loop entirely.
In contrast the new view sees humans as a source of resilience and their behaviors, when labeled an “error”, as a symptom of failure of the system but not as a cause. This acknowledges that safety is not a built in part of a system, that people themselves have to create safety.
Further when we label something as a cause as “human error” we actually obscure a lot of investigation and learnings that could have resulted. It also prevents us from developing effective countermeasures because we never answer why it is that a given action made sense to someone at the time.
Dekker acknowledges that while there are lots of people potentially interested in using this new view two get better answers and more accurately understand human performance, it can be difficult to actually do so. “It is easy to retreat into the old view: seeking out the bad apples and assuming that with them gone, the system will be safer than before”
Dekker, similar to his work in his book “Beyond Human Error”, warns us here in this paper about utilizing hindsight in our investigation. Hindsight is something that can actually poison our ability to be good investigators because of the bias that it introduces.
Lest we think that this is a small undertaking, Dekker acknowledges and reassures us that reconstructing how someone experienced a sequence of events especially when it leads up to an accident is not something that’s easy to do. However, it is something crucial in truly understanding an event and learning from it. Dekker warns that it’s not just hindsight they can work against investigators but also other pressures and constraints including political or practical ones.
As Dekker reminds us we can be almost certain that as an investigator or observer we know way more about the incident than the people who were involved in it. Because of this hindsight we have a huge amount of resources the tell us about what the situation really was instead of just how they experienced it. This is good information to have but can make it difficult for us to see things from their unfolding perspective.
Dekker lists the mechanisms by which we can be confused or misled in our investigations by hindsight
Mechanism 1: making tangled histories linear by cherry picking and regrouping evidence
Dekker quotes Karl Weick: ‘‘people who know the outcome of a complex prior history of tangled, indeterminate events, remember that history as being much more determinant, leading ‘inevitably’ to the outcome they already knew’.
This means that just by having the information of how the series of events unfolded,our brains are automatically going to start to create a linear history. This is despite the fact that is in no way how the participants experienced it. This also makes it tempting to draw conclusions from the “individual fragments”, small pieces of information we use to jump to a large conclusion.
For example Dekker mentions an aviation accident in which investigators used a four second exchange between a pilot and a first officer in which they discussed what what runway to use and had previously discussed wanting to hurry the arrival the their destination following a delay 2 hours earlier. Those carefully chosen fragments alone were used to determine an investigation that the crew was in a rush.
“Each fragment is meaningless outside the context that produced it: each fragment has its own story, background, and reasons for being”. People change behavior and take actions in between these fragments. The behavior that changes in between these fragments also represents decision-making and changing assessments of how the event is unfolding.
When we reconstruct the events in a linear story fashion we lose sight of this behavior and this thinking that is taking place. That linearity that was constructed does not actually exist in the fragments, it is completely artificial, created by the investigator.
Mechanism two: “finding what people could have done to avoid the accident”
Counterfactuals are often present in accident investigations and post incident review. Its likely that you’ve seen these before, perhaps even said or wrote them yourself. These are things like “if only they’d done X” or “they could have done Y”.
Counterfactual as Dekker likes to point out is literally that counter to fact; you are describing a world that did not happen. As he continually reminds us this does not answer the question of why people did what they did. And That is the ultimate point of the paper, answering the questions “why is it that people did what they did? Why did it make sense to them?”
Describing how people did not behave will never yield the answer of why they behaved the way they did.
Counterfactual reasoning can be useful when we’re trying to discover countermeasures against similar failures. This is quite different though than using counterfactual as an explanation for behavior though.
Using counterfactuals is a good way to be swept up by hindsight. It leads us to take a bunch of events and actions that were indeterminate and instead of respecting their overlapping, interconnecting nature; we then produce a linear set of branching pathways. But of course the participants did not experience it that way.
Those branching very clear pathways did not exist to them, they were not presented with a choice to make an error or not to make an error. Dekker describes their likely reality, full uncertainty “like cracks in the window in the ever denser fog of futures not yet known”.
In reality there was unlikely to be any event or motivational reason for the participants to relook at a situation or reassess it or to decide against any particular behavior, or else they would have. It’s important to keep in mind the local rationality principle, these people were doing what they thought at the time was the right thing given what they understood of the system, their situation, and the pressures they were facing.
Mechanism three: “judging people for what they did not do but should have done”
Even though counterfactual can act as a lazy escape hatch for an investigation, they also require an explanation for themselves. If someone chose a path that lead to “error” and we believe it so obvious, then we’re forced to ask the question “how it is that the participants or others missed it?”.
Answers to this question are typically answered with “organizational archaeology” where one will dig through any amount of minutia of regulatory details or procedural framework to come up with some violation of them. This is then used to say that something should or should not have happened.
Given that they may be unable to find it than the simply resort to something more general like best practice or airman ship. Dekker cites both Suchman and Woods, “there is virtually always a mismatch between actual behavior and written guidance that can be located in hindsight”.
This can also manifest as an investigator trying to pick apart cues that were there in a situation but perhaps were not noticed by the participants that only in hindsight can we tell they were important.
Of course we only know now that they’re critical because we have a whole view of the incident this doesn’t keep many investigators from citing the lack of awareness or knowledge of these as cause. There is a difference between having data in front of you and the amount that you actually observe. Dekker calls this “ micromanaging”, taking very tiny pieces of behavior or cues and holding it up against this false world that we created in hindsight.
None of this considers whether or not this information was likely or easily observable and noticeable given the actual complexities of the situation. Dekker continually emphasizes judging people from this position that we constructed with our hindsight still does not answer the core question of why the people in the situation did what they did given what they experienced.
This leads us to the local rationality principle again. Dekker points out that in many accidents in these complex systems the participants were doing the things that they’d always done, the things that have actually led them to safety and success.
Accidents are most often the result of every day influences on decision-making and very rarely humans just behaving erratically or bizarrely before hand. Dekker reminds us of the new view again, that failures by humans are actually part of the work of people and organizations and are symptoms of problems in the system itself as opposed to with that person. He suggests that instead of using human failure as an explanation for the failure of outcomes we instead search to make sense of human action.
This is “the reconstruction of unfolding mindset”. He describes the processes as similar to how a field researcher may experience problems: the challenge of taking very context specific information and extrapolating it into concepts that can be interpreted and verified or falsified by others. “To leave a trace that others can follow” so other investigators can verify our work.
There are a few steps the Dekker provides that we can take care:
Step 1: “ laying out the sequence of events and context specific language”
“The goal is to examine how people’s mindset unfolded parallel with the situation involving around them, and how people, in turn, helped influence the course of events.”
If we find actions or decision-making hard to understand then it’s likely that what information was available to the people at the time is what will reveal what makes sense to them.
When we begin to reconstruct the mindset of individuals we don’t actually start with the mind we start with the situation that the mind experienced. Some suggestions on how to follow the traces of what people experience what cues they saw:
“Shifts in behavior” depending on the data available we can look and see that either people will say something or they will change their actual actions and behavior. These changes are indicators to us to dig further to understand how the situation was unfolding.
“Actions to influence the process” we can also look and see what changes people made in their process and what actions they took
“changes in the process” “a significant change in the process the people manage must serve as an event”. Not all changes in process are instigated or even managed by people, but automation.
This is likely familiar to us as software practitioners with advanced monitoring and automation systems. We must be careful to remember that even though actions occur automatically it doesn’t happen in vacuum. An alarm may be automatic when a data point is over certain threshold but people somewhere in the system helped get to that threshold. Whether or not people noticed these changes and whether or not they responded to them our big clues about what they knew at the time and how they understood the state of the system.
We need capture all of these things in context specific language, not drawing any psychological or mind state conclusions. We’ll use the same words and terms that practitioners themselves would use to talk about their work. We want to be as accurate as possible here and not just jump to broad descriptions about how humans perform.
We’ll construct a timeline with this because this allows people’s actions and the assessments they made to become more obviously attached to the state of the system and potentially the physical location where they occur. Organizing in this way can also give us even more clues about why things made sense to the participants.
Step two: “divide the sequence of events into episodes, if necessary”
“Accidents do not just happen; they evolve over period of time.”
In cases where the time periods over which an accident occurred are large, he suggests that we chunk things into different episodes and then examine them individually.
Choosing what where to divide these episodes can be difficult. Even deciding what counts as the beginning the sequence of events is difficult as well. Since there isn’t a such thing as a root cause then it’s difficult or impossible to say when the beginning of an accident occurred.
But in practical terms we still need to start somewhere. So instead of just assuming a start place everyone agrees with, we can choose a start place and document what we chose and why.
In the absence of clear options Dekker leaves us with a concrete suggestion to help us get started “Take as the beginning of your first episode the first assessment, decision, or action by people or the system close to the mishap—the one that, according to you, set the sequence of events in motion. This assessment or action can be seen as a trigger for the events that unfold from there.”
Even this trigger has its own properties that we could investigate and that’s actually the entire point of choosing somewhere to start so that we can begin to identify these points and then do that investigation
Step three: “find out how the world looked or changed during each episode”
The goal of this step is to determine what the systems and process state was and what information was available to participants. This is how we can begin to start weaving together human behavior and the situation itself.
There may be an extensive amount of domain specific knowledge that is needed to know what parts of the data might be important, but it doesn’t necessarily need to be us as investigators that have this knowledge. Once we know what data may have been available to those involved in the incident we can move on to the next step.
Step four: “identify people’s goals, focus of attention and knowledge active at the time”
Dekker calls this “re-establishing people’s local rationality”. That is reconstructing what they thought was a good idea at the time and what they saw. People’s attention will be directed by two main things, what they understand the situation to be and what’s happened in the world.
We can then say there are two types of “errors” that can occur when were dealing with evolving situations. Either people are constantly changing their interpretation of events or they can get locked into one interpretation.
Step five: “step up to a conceptual description”
Start to build an account that is parallel to the one we created in the first step, this time around use different language. Specifically language that talks about the event in general terms typically psychological ones. The point here is to make the jump from data to our interpretation and of course leave those breadcrumbs of how we got here along the way.
This step is really so that others can come after us, for the other investigators other researchers or other people to learn from this event this failure over this incident. This gives the opportunity for others to see similarity in these failures and hopefully learn from them
“There may be a need for stronger appreciation among investigators of the methodical challenges and pitfalls associated with retrospective analysis of human performance. Even clearer is the need for further development of ways in which investigators can systematically reconstruct the human contribution to accidents and avoid the biases of hindsight”