AI has the same dysfunctions as humans

We've all seen the conflicting posts about success, or otherwise of agent teams for tech delivery. This paper gives some interesting insights why the results are such a mixed bag.

The key result was that all the tested agent structures were less efficient, and less effective, than a single agent. Clearly that's not always true - a single agent will reach context limits, and unlike real life, the test case here was one a single agent could always handle. But the really interesting findings are in WHY they were less successful.

Basically, the agent teams behaved in the same dysfunctional ways as human ones, despite being told not to in the agent prompts. In one case - the traditional gated pipeline (hello, waterfall), the structure failed to move beyond planning, which of course no human org has ever done.

In particular, Goodhart's law (measures cease to be useful once they become targets) was very impactful. Agents told to review created significant low, or no value review findings that reduced the overall success of the team.

But Conway's law (system designs end up mimicking org/communication design) also impacted effectiveness, and Brooks' Law is implied (apparently the mythical man month also applies to agents); communication complexity, and miscommunication between agents created rework.

So, as usual, what's true for humans is true for agents. The difference is that humans can look beyond the immediate incentive to the ultimate goal and outcome, and consciously adjust. Agents are not world-aware, and can't do that; the importance of human in the lead.

The implication is that we need to design agent organizations with the same principals and intentionality we'd design a human one, because it's going to fail in the same ways, and be even more careful in thinking through our incentives, and minimize the inherent cost of distributed responsibility, and distributional complexity.

I've seen symptoms like this in my own experiments with agent teams. I'm going to try tuning my agent definitions around some of the findings from this study, to see how it plays out.

Originally published on LinkedIn.