Multi-agent systems: Why the whole is riskier than the sum of its parts

The architecture most organisations are deploying

Increasingly, the AI deployments that deliver the most significant operational value are not single agents operating in isolation. They are multi-agent systems – architectures in which multiple AI agents, each with distinct capabilities and roles, operate simultaneously and interact with one another in pursuit of broader organisational objectives. These systems can be extraordinarily powerful. They can also generate risks that are qualitatively different from those of any individual component.

The UC Berkeley Agentic AI Risk-Management Standards Profile dedicates specific attention to multi-agent systems (MAS) as a governance challenge that cannot be adequately addressed by extending single-agent risk frameworks. The core insight is this: the risk profile of a multi-agent system is not the sum of its components' risk profiles. It is shaped by interactions, feedback loops and emergent behaviours that only arise when multiple agents operate together – and that are invisible when each agent is evaluated in isolation.

Emergent risks that cannot be predicted in isolation

The Berkeley paper cites multiple documented examples of emergent multi-agent risks. One of the most significant concerns collusion – where agents independently converge on behaviours that are misaligned with human or organisational objectives, either through explicit communication or through tacit learning from each other's actions. Research on autonomous pricing systems has demonstrated that agents incentivised to maximise their individual performance can spontaneously develop coordinated price-fixing behaviours without any explicit instruction to do so.

A second emergent risk is cascading failure. The Berkeley paper describes how errors, hallucinations or malicious inputs can propagate through multi-agent systems in ways that amplify their impact. An incorrect output from one agent, consumed as input by a second agent, can produce a compounded error in that agent's output, which is then further amplified by a third. In a sufficiently interconnected system, a single point of failure can produce systemic consequences.

A third risk is what the Berkeley paper describes as correlated failure modes – the tendency for agents that share underlying models, training data, prompts or configuration settings to exhibit highly correlated behaviour. When multiple agents fail in the same way at the same time because they share a common vulnerability, the risk is not merely additive. It can be simultaneously systemic.

The governance implication: System-level assessment is mandatory

The critical governance implication of multi-agent risk dynamics is that risk assessment must be conducted at the system level, not just the component level. The NIST AI RMF addresses this through the Map function, with Map 5.1 requiring that the likelihood and magnitude of identified impacts are assessed including through analysis of cascading effects and interactions with critical systems. The Berkeley paper supplements this with specific guidance: risk identification for multi-agent systems must include comprehensive system mapping that examines agent interactions, task execution flows, shared data sources, communication protocols and feedback loops.

This is a significantly more demanding requirement than most organisations currently apply to their AI governance processes. It requires technical expertise, cross-functional collaboration and governance infrastructure that can hold a system-level view of risk rather than a component-level view. ISO 42001 Clause 6.1 provides the planning framework for this: the risk assessment process it requires should be designed to capture multi-agent interaction risks explicitly, not just individual AI system risks.

Evaluation must match the architecture

A common governance failure for multi-agent systems is evaluating components rather than the system. The Berkeley paper is explicit that evaluations of multi-agent systems must include testing of the entire system under realistic conditions – including the operating environment, all agent instances with their actual objective prompts and scaffolding, the shared infrastructure and the control mechanisms in place.

Testing agents in isolation, in abstract or game-like scenarios or over short time periods is likely to miss the failure modes that actually matter. The paper identifies several specific failure modes that require dedicated testing: collusive behaviours under various incentive structures, propagation of adversarial inputs across agent communication channels and anomalous coordination patterns that only emerge over extended periods of operation.

Red-teaming for multi-agent systems should include adversarial stress-testing that specifically challenges agent coordination – including scenarios with contradictory goals between agents, information asymmetry where key information is withheld and malfunctioning or adversarial agents introduced into the system.

Communication protocols and auditability

One of the practical governance requirements for multi-agent systems is the establishment of auditable communication protocols – mechanisms that ensure the communications between agents are recorded, traceable and reviewable by human oversight functions. The Berkeley paper highlights several emerging standards that are relevant here, including Anthropic's Model Context Protocol (MCP) for agent-to-data-source connections and Google's Agent2Agent Protocol for agent-to-agent communications.

The governance requirement is not to mandate particular protocols, but to ensure that whatever protocols are in place allow for meaningful human oversight. If the communications between agents in a multi-agent system are not logged and reviewable, the organisation cannot detect collusion, cannot trace the origin of errors and cannot demonstrate accountability when things go wrong.

What boards should require

For boards overseeing organisations with significant multi-agent AI deployments, several requirements should be considered non-negotiable. System-level risk assessments should be conducted before deployment and at intervals proportionate to system capability and change frequency. Monitoring should cover agent interactions as well as individual agent behaviour. Evaluation programmes should include multi-agent specific testing scenarios, not just individual agent benchmarks. And communication protocols between agents should be designed and implemented with auditability as a first-order requirement.

The organisations that get ahead of multi-agent governance now – before their deployments reach a scale and complexity that makes retrospective governance impractical – will be in a fundamentally stronger position than those that discover these risks through operational incidents.

Relevant frameworks: NIST AI RMF (Map 1.1, Map 5.1, Measure 1.1) | ISO 42001 Clauses 6.1, 8.4, 9 | Berkeley Agentic AI Profile: Introduction, Map 1.1, Measure 1.1, Manage 1.3

Contact us

Previous
Previous

Mapping your AI risk landscape: What the NIST AI RMF requires and why it matters

Next
Next

Privacy and security risks in agentic AI: Why the attack surface is bigger than you think