Ten Challenges for Making Automation a 'Team Player' in the age of AI

With all the talk of AI agents lately, including at conferences like SRECon (whose videos are now available), I thought it would be a good idea to revisit Ten Challenges for Making Automation a Team Player in Joint Human-Agent Activity by Gary Klein, David Woods, Jeffrey Bradshaw, and Paul Feltovich.

If you haven't read my analysis of the original, I recommend you do so first. This time, I'm going to go through each of the challenges and examine what, if anything has changed when it comes to agents in the LLM sense, whether changed for worse or better.

Challenge 1: "To be a team player an intelligent agent must fulfill the requirements of the of a Basic Compact to engage in common-grounding activities"

If you want to learn more about common ground and the Basic Compact see my previous article.

The authors remind us that "Team members must be alert for signs of possible erosion".

Much of the advice or guidance that I've heard around using LLMs falls under this category, the human is essentially told to "be alert for signs of possible erosion," often with a similar lack of specificity or tactics to do so. Unfortunately so does much of the policy that I've seen at companies around the usage of AI even when (or perhaps especially when) it's encouraged or mandated. Except in these cases, the LLM is not treated as a team member and isn't expected to do similar monitoring.

Challenge 2: "...intelligent agents must be able to model the other participants intentions and actions vis-a-vis the joint activity's state and evolution..."

"For example, are they having trouble? Are they on a standard path proceeding smoothly? What impasses have arisen? How have others adapted to disruption in the plan?"

Many agents can't do this and some of the way they "try" actually hinder it. They're anchored in the "typical," so in that sense, they're very much "aware"of the standard path, but they often don't or can't shift gears. Sometimes they may do the opposite, pulling harder to try and get the operator back on the standard path.

For example, the experience of trying to do something somewhat similar to a "typical" thing, yet meaningfully different. Some models will insist on the typical thing, even when it isn't right and may need many rounds or attempts at correction.

Challenge 3: "Human-agent team members must be mutually predictable"

I think a lot of "prompt engineering" and incantation sharing is an attempt to limit and control this aspect, seeking to increase predicability.

The authors address this very directly in a way that was true then and is perhaps more true than ever: "Currently, however agents 'intelligence' and autonomy work directly against the confidence that people have in their predictability"

"Ironically by making agents more adaptable, we might also make them less predictable."

Challenge 4: "Agents must be directable"

Agents are certainly directable, but that's not the complete picture of this challenge: "The nontransparent complexity and inadequate directability of agents can be a formula for disaster."

This is very much still true today. Perhaps directability has increased, though transparency has severely decreased. "Reasoning" outputs are attempts at addressing transparency issues, but does nothing to increase directability.

Methods such as various markdown files (e.g. AGENT.md, various rules type files, CLAUDE.md, etc...) are addressed by the authors as they're very similar to the intent behind policy control:

Policies are a means to dynamically regulate a system’s behavior without changing code or requiring the cooperation of the components being governed. Through policy, people can precisely express bounds on autonomous behavior in a way that’s consistent with their appraisal of an agent’s competence in a given context. Their behavior becomes more predictable with respect to the actions controlled by policy.

Though it's worth noting that those markdown files do not bind agents as firmly as they describe policy, leaving that additional challenge in directability.

Challenge 5: "Agents must be able to make pertinent aspects of their status and intentions obvious to their teammates."

This certainly isn't true today of AI. LLMs are often good at making their proposed action visible to the operator, but this also can lack context that addresses their actual status and ultimate intention in that change.

Challenge 6: "Agents must be able to observe and interpret pertinent signals of status and intentions."

I'd say this is limitedly true. We know that AI/LLMs can "observe" more signals than "traditional" automation, but the interpretation and (correct) resolution, as well as intention is in question.

Challenge 7: "Agents must be able to engage in goal negotiation."

If agents are unable to readily represent, reason about, or modify their goals, they will interfere with coordination and the maintenance of common ground. Traditional planning technologies for agents typically take an autonomy-centered approach, with representations, mechanisms, and algorithms that have been designed to ingest a set of goals and produce output as if they can provide a complete plan that handles all situations. This approach isn't compatible with what we know about optimal coordination in human-agent interaction.

This challenge is about being able to adapt when things change. The agent needs to be able to communicate their goals as well modify them in the face of that change.

This is, at best, a little bit true today, that agents can do this. Often the model will runaway and do something different, but (hopefully/maybe) somewhat related. The situation that the authors describe of "traditional planning technologies" isn't that far off from how some of the "plan" modes work today.

Challenge 8: "Support technologies for planning and autonomy must enable a collaborative approach."

I think this one is in question given the "planning" modes of some models and workflows. While this is very obviously a "support technology for planning," often planning modes are more about turn taking than true collaboration.

I tell the agent something. It writes a plan. I tell it something else. It refines the plan. We're not really collaborating on it, so much as I'm waiting to see if it got it right and then adjusting.

Challenge 9: "Agents must be able to participate in managing attention."

I think this is only true to a very limited degree today. Of course you get a notification when the response is ready or similar things, but I've yet to see a workflow that isn't almost entirely "the agent is done, interrupt the human."

Challenge 10: "All team members must help control the costs of coordinated activity."

While some agent orchestration patterns attempt to do this across multiple agents, they primarily focus on coordination among agents and not between humans and agents. Some even claim to do things reduce cognitive load, though this is usually accomplished by increased autonomy for the agents. Thus the reduced load is from less human input and often increased opacity.

Takeaways

  • The 10 challenges still exist today and are still relevant in making agents (LLM or otherwise) behave like team players when working with humans.
    • The degree to which they are difficult or are solved varies, some getting easier, some more difficult in our current era of AI.
  • Some places that are especially challenging are:
    • Controlling the costs of coordination
    • Directability
    • Adaptation and adjusting to changes
  • Much of the current advice, policies, and workflows for working with LLMs require the human operator to take on additional monitoring and coordination burdens.
How do I know when to call for help in incident response? →

Subscribe to Resilience Roundup

Subscribe to the newsletter.