Learning From the Real World - Examining Technical Work

It was great seeing some of you folks attend the chat I had with Matt Davis around on-call programs and incident response! If you missed it you can check it out here

The Messy Details: Insights From the Study of Technical Work in Healthcare

Why study technical work at all? Most of the work that we’ll encounter is going to be technical work. In order to learn from it, we need to study it. Whether we’re seeking to improve a system, building tooling for a team (including our own), or just looking to learn from and understand how a part of the system or a team work, we need to study the technical work happening there.

A primary function of technical work studies is to make these messy details visible and, if possible, amenable to intervention.

Christopher Nemeth, Richard Cook, and David Woods discuss “the messy details” that arise when you study technical work and what to do about it. Though they’re looking at health care, they have a lot of advice that we can apply to our world of software.

To start, the test of the success or usefulness of studying technical work is “did you discover the significance of small details?” but there is a catch, most of those details are not going to be useful.

You’ll have to sift through them and examine patterns in order to know which ones are going to help you understand and possibly improve the system.

Getting lost in the details

It can be easy to get lost in those details. One way this gets worked around is by avoiding it, not diving into those details, staying on the surface. We can see this in software too, where counting or tabulating is substituted for actually looking at those details. But its only these details that can tell us about the patterns at work and the ways in which we might influence or change them.

The study of cognitive systems in context is fundamentally a processes of discovery. Through it, the researcher learns how practitioners adapt their behavior and strategies to the various purposes and constraints of the field of activity.

The Law of Fluency (people fill gaps and adapt, which then hides what makes the work challenging) makes the study of technical work more difficult. It hides the details and difficulty, even from other “insiders” and certainly from “outsiders”. Eventually insiders don’t see these adaptations as something special or separate, simply just the normal work that happens every day.

This means that those that would seek to glean an understanding from this work must “deliberately and carefully uncover and disentangle these conflicts and uncertainties in order to understand how operators cope with such complexity.” And what is SRE, except a way of coping with complexity? The complexity of our systems and the complexity of their continued operation.

We can see even more parallels between the world of the authors and ours in software:

the machinery about which they are expert [physicians, nurses, and pharmacists] is not manufactured in the traditional sense but is instead assembled ad hoc to fit each procedure’s unique needs.

We could say the same of SRE, instead its assembled ad hoc to fit each production environment’s unique needs.

Don’t avoid the details

Focusing on one specific device or part of a procedure is another way that someone studying cognitive systems and technical work can cope with a complex domain. Its efficient and can still produce “fine grained, precise improvements, especially in those domains in which human factors involvement is mature and models of the relevant cognitive work is more developed” (Nuclear power control rooms are and example)

This is certainly not the case in software. Additionally, this is a “nearsighted view that misses key aspects of how adaptations make technical work work”

Technical work here is based on the knowledge of illness and response, as well as a host of messy details about how to get things done, where things happen, how they can be configured, what is likely to happen, and, most of all, how to make what is needed happen and happen quickly.

Though about healthcare, we need only substitute symptoms for illness and we have a pretty good explanation of our world in software.

Staying in the real world

Unlike the laboratory, the real world has ongoing, interconnected streams of activity that fluctuate as the tempo of operations varies. Multiple goals and perspectives come into conflict and must be resolved through integration or coordination. Expertise and failure intermingle.

This is very much our world also. It helps explain why processes and tools that seem to work for others or in isolation don’t always work for us. People are adapting in order to bridge gaps.

Ultimately this stuff takes work. Investigating and examining cognitive systems is hard. It takes work, it takes the cooperation of those that are doing the work.

This can be exhausting for everyone and also produce friction. However, intellectual friction ultimately produces opportunities for new insights and innovations.


  • Studying technical work and the context that it takes place in allows you to better understand that part of the system.
    • This understanding is needed in order to make effective improvement in the system, whether in that part of somewhere that influences it.
  • There are a lot of “messy details” when you examine technical work and cognitive systems
    • As a result, it can be tempting to ignore them or work around them by tabulating and zooming out, but this removes the benefit of the investigation.
  • The messy details you encounter are because you’re working with the real world, real people, and the real system as you encounter it.
    • This is more accurate and more useful than studying bits and pieces in some sort of isolated, lab environment.
      • “Unlike the laboratory, the real world has ongoing, interconnected streams of activity that fluctuate”
    • The ‘catch’ is that not all the details, perhaps many of them won’t be important. The test of the success or usefulness of your examination is, “did you discover the significance of small details?”
  • It can be tempting to avoid or work around the messy details by abstracting away from them, by just creating tabulations of data, but this approach loses the benefits of those details and can mislead your understanding of the work and the system.
  • Another way you might try to work around this, but focusing on just small part or procedure, but this risks giving you a “nearsighted view that misses key aspects of how adaptations make technical work work
← The FAA outage: On public incident reports and seeking second stories
Measuring System Resilience with the Resilience Analysis Grid →

Subscribe to Resilience Roundup

Subscribe to the newsletter.