Resilience Roundup - Vigilance Latencies to Aircraft Detection among NORAD Surveillance Operators - Issue #63

Hey folks, if you got an email from last week that didn’t have the link to the paper, you can find it here.


Vigilance Latencies to Aircraft Detection among NORAD Surveillance Operators

Today we’re taking a look at a paper by R. A. Pigeau, R. G. Angus, P. O’Neill and I. Mack where they worked with the North American Aerospace Defense Command (NORAD) in Canada to study “vigilance latency.”

Vigilance latency is just a roundabout way of essentially saying it can take longer for someone to notice something.

This sort of process is important for responders and operators of software systems because many of our dashboards and other tools make assumptions about how our attention or vigilance works, correct or otherwise.

There is some lab research that has said that after various points (20 minutes is one) that people will be able slower to notice something. In the case of this study, the thing the operators were to notice was radar information that could indicate unidentified or unauthorized aircraft in their airspace.

But in software, we tend to setup similar things. Network Operation Centers where people have graphs on a display, for example. Or have you ever tried to debug something by watching it’s tons (and it’s always tons isn’t it?) of log output, looking for something the aberration?

The authors note that they had 3 main concerns that made them pursue this investigation:

  1. laboratory vigilance tasks rarely resemble their real-world counterparts;
  2. when real-world tasks are adequately simulated, the results often differ from laboratory findings; and
  3. there are “few, if any, troublesome vigilance decrements in the operational tasks of the real world”

So they decided to look at a task in real life, instead of simulating a task. NORAD has operators that sit at consoles and look at a combined view of what all the radar in their region is seeing. When they find an aircraft they need to give a tracking number, other people will then use the data to match it against know flight plans. If it’s a known plane, say a commercial jet, it’ll have a transponder that’ll get picked up and the operators get a little diamond icon with some data, they call this a beacon track.

If it’s something they don’t know, they get a little dash icon - and this is called a search track. Here’s the especially tricky bit. Pretty much anything the radar sees that isn’t a known aircraft gets that - icon. Spray from the ocean that is in radar range? Yep, gets a dash. Some mountains? Also a dash. Storm passing over? You guessed it, a dash. Hostile inbound aircraft that won’t identify itself? Also a dash.

So that’s the hard part. It turns out that pretty much no matter what they did, everyone was pretty quick with the beacon tracks, the known aircraft. But the most important task to the operator, their command, and arguably their continent, is identifying stray dashes on the radar view.

Their view updates with new radar info every 2 seconds from the seven most recent radar views, where each individual radar dish is giving info back every 12 seconds. That’s a lot to keep track of. So they divide the area into two big regions, Canada East and West. Each squadron operates on its own in a completely different place. This makes it a good place to study, since you can create various groups out of this arrangement.

It turns out that NORAD computer systems (at the time at least), had an ability to inject data that would then show up on the console. This was indistinguishable from live air traffic to the operators and was how they trained operators.

This is how they were able to study the crews. The computer would log when the injection occurred and when the item got tagged, from this they could figure out a “response time.” Unfortunately the computer didn’t log when live tracks showed up on console so they couldn’t just analyze the data from non-injected events.

After a few pilot studies to determine if this larger study was feasible (one unpublished), they decided to look at three different variables to see if they changed response times:

  1. The number of zones the region was divided into. The smaller the number of zones, the more area an individual had to cover. The larger the number of zones, the more people are needed to cover a region.
  2. Time on task or what happened after an operator had been working on the same thing for a while.
  3. Shift time, whether they were an evening shift or a midnight shift.

Number 2 in particular was to see if lab findings held up. They created a group that would do 20 minutes working with 20 minutes rest and another group that would work for 60 minutes with 60 minutes rest. If the lab results held, they’d see this time increase the longer people were at the console. As you may have figured out by now, it didn’t.

Running the study

They took 16 operators who were told the general purpose of the study. They were promised that the individual performance would be confidential and that they’d get to see the final results.

They then made the groups. There were groups that had 2 zones or 4. Groups that had 20 minute work/rest periods and those that had 60. And then each shift, evening and midnight, would see each arrangement twice, 1 month apart just as they were starting their 4 day work cycle after having 3 days off.

The shifts lined up fairly close to what I’ve seen in other industries for 8 hour swing and midnight shifts, 1500-2300 for evening and 2300-0700 for midnight. This shift schedule was a part of how they already functioned, not something created by the researchers, but still a variable they wanted to account for.

They had 4 experienced air force personnel actually doing the injection, since they would know what it should look like.

The results

Unfortunately, some of the data was lost due to computer problems (unsurprising to some of us in software right?), but they still had quite a bit to work with.

When looking at just 2 zones vs 4, response time in 4 zones (4 people), took an average of ~75 seconds. Whereas for 2 zones (2 people covering twice as much area), the response time was about ~94 seconds.

This wasn’t all that surprising to me upon reading it as I know nothing about their work, but the operators themselves were actually quite surprised by this. The researchers noted that almost every single operator had expressed the idea that 2 operators could do the job just as well as 4.

When they looked at time on task, how long would someone be working on something without a break. There wasn’t any effect. What was curious though is that if you combined some of the views. If you looked at the midnight shift, only when there are 4 zones, the 60/60 shift’s response time was about ~10 seconds slower compared to the 20/20 shift.

Looking further there seemed to be some small effect from circadian rhythms, most people don’t function their best in the middle of the night, but it wasn’t enough to explain the discrepancy. It’s also especially strange that this happened in the 4 zone arrangement, not the 2 zone arrangement. I would have expected the “easier” one to have no difference, and the harder 2 zone one to show it.

We can’t ever really know what happened to cause it, but there were a few theories. One was that since the researchers had seen the midnight shift talk more to each other than other shifts, perhaps this was an adopted strategy to help keep them awake, but was at the cost of a bit of time. Alternatively, it turned out that when there were only 2 people, one for each zone, they ended up being right next to the shift supervisor (who was not part of the study). Perhaps that had some influence.

It’s also possible that having the two zones, the harder mode, was just busy or difficult enough to keep them engaged.

At any rate, this was only observed in this small group, and doesn’t seem to generalize especially well.

Another interesting finding was that if they looked at how the different timed groups performed every 10 minutes, the 20/20 group was very consistent before and after their break. But the 60/60 group had a small drop for the first 10 minutes after break. After that they were right back where you’d expect. But it seems that after an hour they’d needed some time to warm up and get back in the zone.

I know I’ve felt this with a lot of different work, so I can totally understand it. I really liked seeing it show up in the data though.

So what can we make of this? Well, I think that we can get a few things from it. One, similar to last week’s discussion of macrocognition, not all views of human performance are going to be useful to use to us. Also, our systems can have an impact on how people perform, we can help or we can hurt. When we consider the system as a whole, the people in it included of course, we have something complex that can’t always be tested in a lab accurately. This is why it’s important, where possible of course, to observe how people actually work, to learn what’s actually happening.

If you had an SRE team that said they built tools based on guesses of what they thought other engineers needed, you’d probably think there was something very wrong. But I think we can also fall into a similar trap in software, where we perhaps are gathering data, but not really considering the system as whole, or at least not the joint cognitive system view. So our data leads us astray.

Takeaways

  • Much of the lab tests showing that people are miss more things as time goes on doesn’t standup to how people work in the real world.
  • There were some effects on response times seen for people toward the end of a midnight shift in less challenging conditions.
  • Circadian rhythms can affect response times, so it’s possible graveyard shifts may have a bit more difficulty in some cases, especially if there isn’t as much to do (less stimulation).
  • After a longer break (1 hr), it took experienced operators about 10 minutes to warm back up and get back to their normal, high performance.
    • If your software or system is making demands of people that they very quickly notice discrepancies, like in the study, it might be a good idea to schedule around this. Perhaps have some overlap time where incoming people can warm up.
  • There may be some influences for lower paced work, like having supervisors near by.
  • Also, the rate at which the task is stimulating (e.g. 1 thing every hour? 1 thing every minute?) seems to play a role in keeping people engaged.

Don't miss out on the next issue!