Resilience Roundup Issue 3

Welcome back! This week I’m featuring post incident reviews from a variety of different types of organizations.


A chance to learn from a meeting of researchers and tech companies coming together, I’ve read this through for the 2nd time recently, I strongly recommend it. One thing that stuck with me most recently is the drawing the line between types of surprise that responders experience, situational surprise and fundamental surprise.

Creating Foresight:
How Resilience Engineering Can Transform NASA’s Approach
to Risky Decision Making

David Woods again brings his systems safety experience, this time to NASA. He explores not just a specific incident, but also recommendations for the organization as a whole.

Review of the System Failure
That Led to the Tax Day Outage

Another opportunity to see how other teams and agencies do post-incident review. This time from the Office of the Treasury on their outage during Tax Day. A very relatable finding here, “While the response team’s substantial efforts allowed the IRS to resume tax processing operations the same day, improvements are needed to help mitigate or prevent outages.”


A suggestion from John Allspaw on things to read. I’m already starting to go through these, but if you want to “read ahead” here’s a chance. As he says in the README: “This is a collection of readings, talks, and other bits regarding the field of Resilience Engineering.”

← Resilience Roundup Issue 4
Resilience Roundup Issue 2 →

Subscribe to Resilience Roundup

Subscribe to the newsletter.