How do I know when to call for help in incident response?

Last week I wrote about when to intervene, specifically when overseeing a process. Today we'll address the question when it comes to being on-call specifically.

Usually, when I get this question, it's because either A) the asker is new to incident response and wants to do it "right" Or B) it's from someone who wants to set up or otherwise improve an incident response program.

In the case of B, the asker tends to realize that they have internalized some way of making this judgment call to such a degree that is difficult for them to articulate and they want to be able to provide something more helpful and/or standardized for others.

Like a lot of questions in this space, the answer is "it depends." But people who operate at the sharp end, especially at the pace typically involved incident response, are the ones that ultimately have to resolve the uncertainties and goal conflicts, so I'll give some more specific guidance, with the understanding that you'll have to adapt it for the ways in which your situation or organization might be different.

The "right" thing to do mostly depends on expectations. These expectations come from a few different places:

  • Other team members and teams
  • Policies
  • Management

These aren't the only places these expectations can come from, but these are the most common. One of the reasons that this question exists at all, is that though we have all these places that are introducing different expectations about how the process will work at large, most of them don't explicitly specify the answer to this.

Expectations from other team members

There's norms around what other team members, are likely to expect. These are the sorts of expecations you might worry about violating and be seen as not doing enough before calling for help or worry about seen incapable. This, often more than any others, are the biggest sticking point for newer team members.

Though this is a common concern, and I understand the feeling, I firmly believe and teach that this is something that is incumbent upon teams to address collectively, especially if there are those that are more experienced available. They can help make what these expectations are more visible and explicit.

Expectations from policy and management

There are the expectations from management leadership that may or may not be explicit. There's probably some written and invisible pressure to get things fixed "quickly." But probably not a lot of definition around what quickly means.

Additionally, it'll often be assumed that responders will trade-off between efficiency and thoroughness in whatever way is thought to be correct by the writer.

One approach that can help this, is by having an explicit upper limit in how long you want someone to wait when calling for help. This is usually not enough on its own though as expectations will differ based on how bad things seem or specific norms in teams or divisions.

What does "being on-call" actually mean?

And finally, and perhaps most importantly, there's the expectation around what it even means to be on call.

Don't worry, I'm not going to get too meta or philosophical about this here, despite the way that may sound. But it is important to have a clear idea of what you want the person receiving the page to do and what role you want them to be operating in.

I worked with a team that was on call for a financial product that helped people compute taxes. They were trying to address issues around their process seeming very effective sometimes, but ineffective in other cases without having discovered an obvious cause. After exploring their process and talking to them about what their issues were, I discovered that they hadn't done this.

When they were only a few people, they'd all had a similar idea of what to do, how to do it, and approximately when to do it. As the team grew, this was no longer the case, though they didn't realize that at the time. They ran new team members through some of the "typical" stuff, how to start an incident channel, how to use some of the internal tools, when to start a call or not to, that sort of thing.

These are all good things! I don't want you to get the idea that they're not.

But they never talked about what the core goals and responsibilities were of the person receiving the page.

This tends to break down into 2 different approaches that folks tend to use when developing on-call incident response programs at this level. Unfortunately, from the outside or from enough distance they both look the same, especially to newer folks.

  1. The shield model - in this approach, your job as the person who answers the call is to keep this problem from disturbing others, most often by fixing it or otherwise mitigating it until a time that addressing it is less disturbing.
  2. The gathering model - in this approach, your job as the person who asnwers the call is to gather the right people, help them move toward a solution, and cordinate the moving parts.

It boils down to the question "is your job primarily to solve the problem directly or to orchestrate a response?"

I want to acknowledge that if you were to ask this question around your organization that some folks are likely to say "both". I don't think this is really a wrong answer, though when setting up an incident management or on-call program, I think it's unhelpful.

It's true that often we want folks to balance these and ultimately apply their judgment, but they need some basis to help them develop that judgment and ability to trade-off. It's only through giving them a starting stance, guidelines, and the ability to practice it, which means it being OK to get "wrong" and thus have the opportunity to adjust that they can really develop this.

Takeaways

  • Lots of norms, pressures, and expectations can exist from various sources with varying levels of specificity.
  • Individuals and teams can help improve their incident response programs and address this questions specifically, by working to make some of the norms and expectations more expicit and visible.
  • Ultimately, it is the person at the sharp end, usually the one answering the call that has to resolve ambiguity and goal conflicts.
  • Making the primary goals of the first responders explicit can be a critical point in helping them operate in the "correct" mode.
  • Skillful trade-off between working in and on the problem requires time and space to practice and learn.
To Intervene or not to Intervene: The Dilemma of Management by Exception →

Subscribe to Resilience Roundup

Subscribe to the newsletter.