Another article from ACM Queue’s special edition, this week from Richard Cook (a principal at the current sponsor, Adaptive Capacity Labs), who writes about how we as software people interact with our systems and what the implications of that are.
If you’ve followed much of Cook’s work, you’ll likely find the title, “Above the Line, Below the Line” familiar. This is the idea that there one can draw a line through our system, in which above the line are the people. Below the line is where much of the technical stuff is that we can never truly touch directly. Our interaction with them is only possible through the representations like our dashboards and IDEs and such that make up the line, called the line of representation.
Be sure and checkout the diagram Cook provides in the article:
It might be tempting to use that line to say above and below are separate, distinct systems, but that isn’t really the case. It is one system, where each part continually affects and changes the other part. Separating the two isn’t accurate or very meaningful.
Though we talk a lot here about learning from other domains, we don’t often talk about what makes us different from other high consequence, fast paced domains. This is one of them, that we can never directly get at much of our system. Another is that, though we can’t directly interact with anything below the line, we can also look deeply into parts of the system. We have access to things like code or config files.
Cook points out that we as software people also differ in that many times people who respond to keep the system running are the same people who built that part of the system. If they’re not, they probably at least worked with some of the people who did. In contrast, this isn’t true of pilots or nurses.
When parts of the system begin to fail, our attention gets drawn to that area in which our understanding was incomplete or faulty. We are consistently being shown areas where problems occur. Cook quotes Beth Long who calls this the “explodey bits.” This gives us some good starting points where we might want to investigate further and learn more. Cook points out that this is actually why looking at large groups of incidents over time isn’t really very useful, whereas deeply investigating individual incidents can be. Since past performance doesn’t guarantee future success, broadly looking at groups of incidents over time doesn’t teach us much.
As much as we may think of incidents as taking place in all those technical parts of the system below the line, incidents actually take place above it. What an incident is, when it’s begun, and when it’s over are all judgement calls made by people and people are above the line. As Cook says “incidents are constructed.”
Though things can go wrong in similar ways above and below the line. Similar to Conway’s law, the things below the line typically look a lot like the things above it. I think we’ve all experienced this, where there are often not technical reasons why a system is divided in the way it is, but it mirrors some organizational structure. As a result of having a similar structure, things like saturation or cascading failure can happen both above and below the line.
Though people occupy the space above the line together, that doesn’t mean they have the same understandings or mental models of the things below the line. It’s very often that responding to an incident requires collaboration, potentially with lots of other people, who are all testing and updating their mental models.
As we’ve discussed before, coordinating that collaborative effort can be difficult and take a lot of effort.
Understanding these different parts of our system and their implications can help us as we go about almost any interaction with that system. We can use this understanding to look at places we don’t understand and also potentially find expertise in people who do. We can examine places where dysfunction is mirrored on both sides. We can keep in mind that our mental models are not complete and they need constant updating.
- We interact with the technical parts of system through representations like IDEs, we can’t touch those directly.
- Above this line of representation is where the people are.
- Below the line of representation is all the technical and machine parts of the system.
- Though the line of representation goes through the middle of the system, there is still only one system. People and the technical bits are not separate systems.
- They each affect and change the other, we cannot truly separate the two.
- Software is different from many other domains because though we can’t interact with the things below the line directly, we can look at many parts of the system by examining things like code and config files.
Last week we talked about things like:
- Experiences with incident frameworks
- Improvisation vs planning
- Response structures
- Use and harm of ICS
If you want to be part of the discussion for this paper, make sure you’re signed up, I’ll send you a meeting invite for this Friday, February 21st at 0900 PST (1700 UTC).