Resilience Weekly - Process tracing methods - Issue #13

Welcome back! This week I’m taking a look at a chapter from Decision making in action: Models and methods from 1993.

This is one of the sources Sidney Dekker cited in the article from last week’s issue, which is what drew my attention to it.

At first, with such a title, I wasn’t sure if it’d be applicable much to the field of practitioners let alone software, but it turns out there is a lot to learn here.


Process tracing methods for the study of cognition outside the experimental psychology laboratory

Wow, that’s quite a mouthful.

Before we dive in, here some quotes that jumped out at me:

  • “The fundamental point of a protocol analysis is to specify the process an individual (or a team) used to solve a particular problem, for example, to extract their strategies. The investigator’s first responsibility is to be able to report these strategies.”
  • “We need to develop new ways of seeing in these rich situations-rich in the knowledge of the participants, rich in the diversity of strategies by which one can achieve satisfactory performance, rich in the tools available to assist human performance.”

Woods explains outside of the lab means he’s interested in “human cognition as it occurs in its natural setting”. Sounds exactly like the same setting I’m interested in as well, and the one where we conduct our post-incident review or incident response.

Typically in a laboratory, you might look at a small part of time a subset of a process or only some of the connections between complex events or complex variables whereas in real life, we’re getting a chance to see the whole picture instead just a single variable. This way we don’t miss out on the way the system operates as a whole.

The main feature of this chapter that I got the most out of is the section on process tracing methodologies. In other research, you may have also seen the term “protocol analysis” but it seems to be that it is that is one of those terms that is perhaps a bit overloaded and means lots of different things to lots of different people.

Methods that are associated with process tracing all have a goal of mapping out how it is that this incident took place and how it unfolded. This includes the point of view of the participants, what they noticed, what they didn’t, how they interpreted things, focusing on how the outcome came to be.

That sounds a lot like a goal for investigations and postmortems.

Woods tells us that, among other uses, “process tracing techniques can be used to address critical incidents that have already occurred and retrospective analysis”. He points out that a type of process tracing is based in data that comes from verbal reports made by actual participants about how they solved the process and the problem.

This is a lot like the kind of thing we might get in effective post-incident review or some other retrospective analysis.

Additionally Woods discusses behavioral protocols, where we would observe the way that people behave as a data source instead of only relying on what it is that they say. This could include things like actually watching the behavior or looking at data that resulted from their behavior. It could also include things like records of behavior through recording which what variables changed in a critical process or even looking at communication that happened amongst team during an event like a slack channel.

In this approach we would take all the data from these different sources and put them together and correlate them. From this we’d come up with a record from which we could infer things like:

  • What participants did
  • What data they gathered
  • What knowledge they activated
  • What were they expecting
  • What they were intending to do

and see all this as the incident or the events unfold over time. In this sort of behavioral protocol it is up to the investigator to actively go through and cross-reference all these different forms of evidence so that we have some traces of behavior and activity to go by.

A barrier in protocol analysis typically, is the need to have a lot of domain specific knowledge to be able to take that raw data and be able to see what it was someone may have been trying to do or look for, but often times in software, we’re already a part of the team or group or have similar domain knowledge making these techniques even more accessible for us. In the case where we don’t, Woods suggests consulting those who do and noting their interpretation as another piece of the data.

Because incidents are changing and evolving, so too are the participants intents, world view, and mindset. We need to understand how these changes are affecting the participants as those events trigger new thinking or spark new knowledge. We can get the signals crossed or the knowledge that is activated can be incomplete or not relevant, but this mismatch between the participants perception and the actual state can be useful in conducting behavioral analysis successfully.

Woods gives a few questions to help guide us such as:

  • “what did the signal indicate the problem solver about process state”
  • “given a particular action, and what perceived process state or context is this action reasonable?”.

Woods reminds us that when we reconstruct what we can of how people worked through a problem-solving process we can often identify points where knowledge being limited and certain processing gave rise to actions that hindsight we can say or not right.

He does warn us though “any reconstruction is a fictional story,” further he warns us that any reconstruction cannot be said to be set in stone. There’s always a possibility that some investigator later will show more evidence that pokes holes in our fictional story or sheds different light on motives or offers a new new account.

On “field observations”, Woods tells us “Meaningful investigations of complex behavioral situations where the domain practitioner’s performance and skill is the focus of the study will require a significant amount of personal knowledge acquisition and experience with the domain, and especially with the role and point of view of the practitioners within the domain.” Which again, many of us may have.

A problem that can occur is that when we read a field study type report we can’t really replicate or restudy or retrace it and reinterpret the conclusions as we might with a laboratory type study. We’re just left to accept or deny the conclusions.

“There tends to be a great leap from data collected to interpretive conclusions, with a vast wasteland in between”.

I think this is a good reminder for us to write our post incident reports in a way that supports these abilities of re-examination as much as possible. That doesn’t mean we have to share it outside of the organization; even our future selves or future team mates should be able to retrace our steps.

The next step is to move from being very context specific to being context independent in our conclusions and reporting. There are two sides of a spectrum, where on one side is this very exacting close the ground very close to this exact situation sort of report or data. On the other is a very high level generalized report or conclusions.

There’s a trade-off as we work between these two: the closer we are to the former the less applicable it is to not just other people, but even the next incident. If we go too far to the latter then it’s difficult for us to draw actionable conclusions.

Wood explains Hollnagel’s method of dealing with this: make small steps along this way of analysis. Start at that very low level, directly at the situational context and then analyze that using that domain specific language. At this stage there’s not really any general concepts and we’re just saying this thing happened at this time or this other thing occurred in the system at that time for example.

After that, once you have the literal description of what the actual on the ground events, you can then start to steadily introduce more concepts with less context. This is where we move away from domain specific language, to allow us to eventually arrive at a situation where we’re describing principles that we can use in the future for the next incident or at least similar situations, perhaps across even different domains.

Remember that these two descriptions that we arrive at that one close the ground that one higher level are not substituting for each other, we’re not choosing one over the other. They are going to exist in parallel, each as accounts of the incident.

Woods tells us that adopting an experimental framework to organize observations in this way helps us conduct our investigation. Doubts are allowed to formulate so we can change or challenge our beliefs.

Woods summarizes these approaches: “In other words, we must be able to improve human performance through a variety of means for supporting human problem solving and decision making.”, a goal that many of us have whether through education, or tool, or system and process development.

There are some tactics we can use to generate observable data so that we can make inferences. This idea is one that we can use directly to influence how we develop systems. Sometimes our systems currently don’t leave us a lot of breadcrumbs and make it easy to have observable data about either system state or decision-making. But approaching these designs through the lens of improving the ability to generate observable data can help this.

This is especially true if the system currently places the burden on operators to preserve or capture evidence for future incident reviews. It makes it difficult on them to have to do those two different tasks, actually engage with the incident and also capture info for the future.