Resilience Roundup - Cognitive Systems Engineering: New wine in new bottles - Issue #32

Thanks to everyone who came to say hi and DevOps Days Austin, great meeting and learning from you folks in person!

This is a paper from Eric Hollnagel and David Woods where they are introducing the idea of cognitive systems engineering.

It’s a really cool look at the field and has some good advice on how we can integrate its ideas into our thinking and system designs.

Cognitive Systems Engineering: New wine in new bottles In the paper, Hollnagel and Woods discuss complex man machine systems (MMSs) and explain how the previous looks at man machine systems are not sufficient to create effective ones, given today’s complexities.

This is because they do not include the cognitive part of the operator or the machine in its model. Without doing so, it’s possible to create an MMS, but that does not necessarily mean that it is functioning properly.

This comes about because of the increased complexity available through computers and forms of automation. This is because the tasks that humans are doing are shifting from doing things with their hands to instead using cognitive skills to either monitor and supervise or problem solve and make decisions. Because of this, the human machine interface is shifting into being made up of cognitive systems of both the humans and machines.

The need to have a good interface between the operator and the machine is not a new idea. This has been the case since the Industrial Revolution, but because the older machines were simply an extension of physical functions, those systems were simply designed to make up for physical deficiencies; with the goal being to maximize total output and pretty much nothing else. Now machines are no longer just reacting to direct operator input, they’re now processing information so they can do increasingly complex things and communicate in a way that might be called intelligent. The authors are quick tonote that when they say intelligent here they are saying simply that a “naïve observer” might say that the machines appear to behave in a way that you could call intelligent. Further in some contexts an operator may treat the machine as if it were intelligent.

Traditional approaches, everything from human factors to ergonomics, did not consider or were unable to solve, the problems of the cognitive interface between the operator the machine. The other issue with traditional engineering approaches to this problem is that the fields don’t really possess the concepts and models, or tools that are needed to understand the system from a cognitive viewpoint. The underlying approach in engineering psychology had looked at humans as processing information in a linear series of fixed stages, however the research has shown that that’s not how we work.

So at this point a different approach is coming about which had described the cognitive function instead is a recursive set of operations. This includes data- driven analysis and concept driven analysis. This creates an emphasis on methods of studying a single person’s performance rather than a specific idea of the average person’s performance.

An overview of CSE

What they mean by “new wine in new bottles” is that cognitive systems engineering is not just taking old ideas and looking at them differently. This is a completely new and different interdisciplinary creation to meet the new challenge that has developed around human machine relationships.

The central tenent of CSE is that an MMS needs to be conceived, designed, analyzed and evaluated interms of a cognitive system .

The MMS is much more than the sum of its parts, the way its configured and organized affects the whole.

Another goal for CSE is to be able to give those designing these systems a more useful and more realistic model of how humans actually function. This is because the models that describe the physical world and the psychological worlds differ. We can’t just take the rules we use to describe behavior of the physical world and apply it to humans.

“It is simply that man functions according to a psycho-logic rather than to a logic”.

This just means that when humans are solving problems or diagnosing things you can describe this process with principles of psychology not logic. As a result as someone who is who creates a system they must not have this image of the operator as just an information processor, as just this meat calculator.

Continuing to think that an effective MMS can be created by just decomposing parts doesn’t work. The authors give an example of an MMS where 90% of function could be automated. The whole that is created will be different if the task split between the machine the operator is 95:5% rather than if it were 80:20%. In the note that 80:20 is not necessarily less efficient than the 95:5. This is because boredom varies and stress varies depending on how thetasks are distributed.

“Faultfinding performance is a nonlinear function of the architecture of man and machine tasks, e.g. whether the operator is an active element in the control loop or functions only as a monitor or supervisor of system operation. The operator detects failures better when he participates in system control as opposed to functioning only as a monitor for close low. When we close high the relationships reversed.”

As a result, the design should allow for some amount of change in how these tasks are distributed. This represents a view of CSE that the system should be seen as adaptive and the goal is to improve the function of the system as a whole, not just replace as many human functions as possible.

“The operator may eventually be found to be about any kind of work which can be described algorithmically, but this does not mean that a simple substitution of machine for man will improve the function of the total system”

I think that this is something that gets lost a lot in software. In our industry it is very common to hear “We’ll just automate that away, computers are good at that.” And it’s true there are often things that computers are better at, but as the authors point out if we don’t consider how the resultant whole will function, we are not necessarily improving the system.

This was shown in an evaluation of a steel plant. They found that the operators’ cognitive tasks changed drastically after they had implemented some large-scale automation in steel milling. The report that they wrote showed that:

“The need for the operator to intervene directly in the process is much reduced, the requirements to evaluate information and supervise complex systems is higher.”

Despite this, the designers didn’t take that difference into account and ended up harming the total system performance. The automation either did everything on its own or went into a manual mode for the operator to do everything. There was no way for the operator to be supported, they either did nothing or everything. The authors call this “an impoverished cognitive coupling”.

Further, thinking of an MMS as just the a sum of component tasks becomes an even more relevant problem when we consider human error. This is a point where traditionally designers would just automate away a task or add some other machine around whatever area that they felt the human had failed. But, from the cognitive systems viewpoint, that’s not the right solution. That one exact error might go down as a result but others will now occur because of these new things being introduced while still not considering the cognitive system. They also note that human performance problems are probably a symptom of that poor cognitive interface design. They quote Don Norman saying:

“Forcing people to interact on the machines terms is not only inconvenient – more importantly, because it is an unnatural mode of interaction, it is a primary cause of human error quote.

Looking at something we talked about last week,getting lost in the data of multiple displays, often gets is attributed, in other views to the short-term memory limitations of humans. As a result, the solution that then gets generated, is to give some sort of memory aid for the user. But, memory limitation is not the cause of this getting lost in the data, that’s just a symptom. When operators are getting lost in these complex displays it’s because of a system design that does not match the human cognitive system.

What is a cognitive system?

The authors define a cognitive system as one that produces “intelligent action”, is goal oriented, manipulates symbols, and uses heuristic knowledge for guidance. It is also adaptive and can look at a problem in more than one way. It operates using knowledge about both the environment and its self so that it is able to plan and change its actions; it is data-driven and concepts driven. In this era, they say that machines are potentially, if not actually, becoming cognitive systems. I think that today that that is certainly true. Regardless, an MMS as a whole is certainly a cognitive system.

Part of concept driven behavior is that the intelligent action is produced because of an internal model or representation of the environment. This is the model that is used to plan and decide things, for example, deciding what messages to send and read messages that are received. This has been around for a long time in various forms, the authors cite some Cybernetics work in the 50s for example. They even provide research that suggests even rats are capable of using some sort of internal representation, not just trial and error.

But we can take that further, and CSE does, to then say that the system that is designed could also have a model of its environment and of its operator.

“Based on training, experience, instructions, and the nature of the interface, the operator develops an internal model that describes the operation and function of the machine. Similarly, designers build into artificial systems a model of the user’s characteristics although they may not always fully realize that.”

To distinguish these two ideas of models, the authors refer to the human as having a “model,” whereas the machine has an “image” of the operator.

Examining the system image of the user

There are different levels to the systems image of the user. For one, the designer has encoded some things about the physical characteristics of the user. The authors give an example of a guitar. A guitar typically assumes a user who is right-handed, they have set a certain number of fingers, and a certain amount of strength in their hands. This shows us that the image of the user doesn’t have to be explicitly stated somewhere, but is still there and implied because of the way the machine functions.

Next, the machine also is going to make assumptions about the operators’ cognitive ability. For example, a machine configuration will often make assumptions about how much data a person can remember. All systems assume_something_about a user’s cognitive function. The problem comes when that image of the user is never explicitly designed with a goal to enhance the joint function. Instead, it’s just implied or even buried in the function of the machine.

This problem can grow in complex systems where there are different components that all affect the operator but have been designed independently, none considering the whole system or cognitive function. The system components can include anything from the instruments, or the training programs, the procedures, or the other personnel. These are all components of the total system needed to operate, but the communication among these different components could be very low or even absent.

This isn’t really a new idea, but I think is really important to emphasize in software. This idea is somewhat what we could say is where devops comes from. Attempts at creating a cultural shift that would allow more of the components in the total system to have increased crosstalk.

“Viewing the total operational system as a cognitive system (for instance, order problem-solving or decision-making tasks which must be accomplished in order to handle abnormal events) provides a mechanism to integrate all of the control resources – people, facilities, instrumentation, procedures, and training into a coordinated system.

When a mismatch occurs between the human and the machine this is often the result of the designer not explicitly looking to create a system that addresses the human element. This is why one of the goals of CSE is to give the people the tools they need so they can provide a match between the system image they’re encoding and the cognitive characteristics of the operator.

There is also potentially a third level to a systems’ image of the user. This is when the machine is also cognitive system or at least mimics the functions of the human cognitive system. Then, on some level, the machine is assuming something about how two different cognitive systems interact in addition to assumptions about the user’s cognitive system.

This means that the systems image should be explicitly specified and matched the users cognitive characteristics, but also be able to be changed or have some flexibility. The user may change over time: whether they’re getting better at their job or the nature of the task is changing, or because there are going to be different users.

“The goal for design in MMS as it should be to make the interaction between the operator and the machine is smooth and efficient as the interaction between two persons. But it is an essential part of human communication that each participant is able to continuously modify his model of the other.

How to engineer cognitive systems

First, coordination is needed between the systems’ image of the user and coordination of the user’s model of the system, along with actual properties of the system. The first part is usually handled with design, the latter with training.

This is wherecognitive task analysisis required so that the cognitive activities that need to be accomplished are understood and those requirements are able to be translated in a way that they can be implemented by the designer.

There is a lot of different research that looks at cognitive activities in the system. They’re mostly domain specific, so what you choose is going to vary a lot based on the actual activity. One form that the authors give an example of is Jens Rasmussen work in looking at process control operators’ strategy for diagnosis based on studies of faultfinding behavior. One of the search strategies that he found is called “topographic search”. This is where a good and bad mapping the system against some sort of reference is made and where the potential of “bad” is pared down until the problem area becomes clear.

In order for “topographic search” to be effective an operator uses:

  • a model of the structure of the system to guide the search
  • the model varies in level of abstraction (physical components to functional relationships) depending on the specific goals
  • tactical search rules or heuristics
  • a model of the normal operating state of the system
  • relationships among the data rather than just magnitude of variables

Taking the example of nuclear power plant control rooms, the authors say that the displays don’t actually support those needs, because:

  • There is only one level of representation of plant state
  • The operator must construct other levels mentally
  • There is little explicit training or instructions on how to diagnose the problems; only the specific signs associated with specific failures are provided
  • There are few indication of normal states, particularly under dynamic conditions; the operator must rely on his memory of reference states
  • The one measurement-one indicator display philosophy does not show relationships between data; the operator must integrate data mentally

This mismatch between the want for efficient diagnosis and the actual characteristics of the interface, drastically increases the operators’ mental workload, which causes more possibility for error. I think this is something that we have all experienced in the realm of software. That our monitors and diagnostic tools have a lot of the same characteristics, little training in the problem. Perhaps there was a knowledge base which only give specific signs with specific failures. This happens a lot where you select a list of error codes and every time you see a new one you make new wiki page that has the text in hopes that someone can find it later. That might be a valid approach for certain parts of diagnosis but clearly fails that topographical search requirement.

Principles of man machine system design

There are a few principles authors layout for designing these systems. They call them principles because they’re not guidelines that you can apply directly, but they are things that you can consider to try to come up with specific guidelines that you can incorporate.

First, is the relationship between the field of attention and level of abstraction. This is just saying that there are different levels of representation or detail, visiting various zoom levels in each of those levels, changing how abstract or concrete a view might be, while the same time there are different levels in the field of attention.

Different tasks all require different views of the same process. The authors warn that this concept is required to be able to have a successful MMS, but the way you implement it can vary. They suggest a technique to provide a set of displays at the different levels of goals, functions, and the physical system components. And then each of those levels is represented in terms of goals, functions, and physical systems, recursively. There may be other ways to do this, but it is most important to consider what the level of abstraction is, what area is being focused on, and what span of attention is expected.

“The designer is to build an interface compatible with human cognitive characteristics rather than force the human to adapt to the machine, he must be provided with a clear description of these characteristics and with tools and principles that allow him to adaption properties to the human.

“The characteristics of man as a cognitive system, primarily his adaptability, should not be used as a buffer for bad designs, but rather as a beacon for good designs.”


  • In order to design an effective cognitive interface, its important to consider the system as a whole, not just the component parts
  • The operator will develop a model of the machines and system, but the machine will also have an image of the operator, encoded by the designer
  • Using cognitive task analysis can make for much more effective human machine couplings
  • Effective training should include how to diagnose problems, not just error state/suspected problem pairings
  • For more effective troubleshooting, normal states as well as failure states should be known or learnable
  • Displays that integrate data instead of just one measurement to one gauge displays can also enhance troubleshooting.

Don't miss out on the next issue!