Privacy and security risks in agentic AI: Why the attack surface is bigger than you think

A fundamentally expanded attack surface

When an AI system transitions from responding to queries to autonomously accessing systems, executing tasks and interacting with external environments, its security and privacy risk profile changes fundamentally. The attack surface expands. The potential for data leakage increases. And the consequences of a compromise become significantly more serious.

The UC Berkeley Agentic AI Risk-Management Standards Profile devotes substantial attention to privacy and security risks as a distinct and high-priority risk category for agentic systems. For executives and boards responsible for enterprise risk management, understanding these risks – and the governance responses they require – is not optional.

Memory, long-term state and data leakage

One of the defining characteristics of more sophisticated agentic AI systems is the addition of memory – the capacity to retain and reference information across interactions, building up context about users, processes and environments over time. This capability significantly enhances the agent's usefulness. It also significantly increases the privacy risk.

The Berkeley paper identifies memory as a specific risk factor because agentic systems store and work with sensitive data in contexts that may not have been anticipated during design. Information shared with an agent in one context can persist and be surfaced in another. Sensitive data that a user did not intend to be retained may be stored and subsequently exposed through prompt injection attacks or other exploitation techniques.

The paper also notes that even comprehensive logging – itself a necessary governance control for agentic systems – can function as a form of continuous surveillance when it captures sensitive user behaviour, creating data overreach risks that organisations need to manage explicitly.

Prompt injection and the confused deputy

Prompt injection is one of the most significant security risks specific to AI agents and one of the least well understood by non-technical leaders. It occurs when malicious instructions are embedded in content that an agent processes – an email, a document, a web page – causing the agent to execute those instructions as if they were legitimate commands.

The implications are significant. An agent with access to an organisation's email system, calendar, document store and communication tools can, through a prompt injection attack, be directed to exfiltrate sensitive information, modify documents, send communications impersonating authorised users or grant access to external parties. The Berkeley paper cites documented cases where prompt injection attacks were used to collect victims' location data, email content and calendar information.

Related to this is what the Berkeley paper describes as the confused deputy attack – where an agent is tricked into misusing its legitimate authority. Because agents often operate with access to multiple systems and are trusted by those systems to act on behalf of authorised users, a compromised agent can cause significant harm while appearing to act legitimately.

Multi-agent propagation risks

In multi-agent environments, security risks propagate in ways that single-agent risk frameworks do not capture. The Berkeley paper draws an analogy to computer worms: a malicious prompt injected into one agent in a network can spread to other agents as they communicate and share information, evolving and adapting as it propagates. The paper describes this as analogous to a polymorphic virus, capable of evading detection as it spreads through interconnected agent systems.

This has direct implications for how organisations assess security risk in multi-agent architectures. The security of a multi-agent system cannot be assessed by evaluating each agent individually. The system must be assessed as an interconnected whole, with specific attention to the communication protocols between agents, the trust assumptions each agent makes about instructions received from other agents and the mechanisms in place to detect and contain anomalous propagation.

The principle of least privilege

The single most impactful security control for agentic AI systems is one that is well-established in information security but inconsistently applied to AI deployments: the principle of least privilege. Agents should be granted the minimum access to data, systems and tools necessary to perform their intended function – and no more.

This principle is explicitly endorsed in the NIST AI RMF under the Map function (Map 3.5) and in the Berkeley paper's risk mitigation guidance (Manage 1.3). Its implementation requires that organisations explicitly define, for each deployed agent, the scope of data access, system permissions and tool access it is authorised to use – and that those definitions are enforced technically, not merely stated in policy.

ISO 42001 provides the governance framework within which least-privilege principles can be operationalised. Clause 8 (operational planning and control) requires that operational controls are implemented for identified risks. For agentic AI security, that means permission management systems, access logging and regular review of agent authorisations to ensure they remain proportionate to current operational requirements.

Privacy by design for agentic systems

The Berkeley paper recommends privacy-protecting logging practices as a specific control for agentic systems: logging only information necessary for safety, security and accountability; encrypting logged data in transit and at rest; establishing maximum retention periods based on need and regulatory requirements; and anonymising data by filtering personally identifiable information.

For organisations operating under GDPR, the PDPA or equivalent data protection regimes, these practices are not optional – they are obligations that apply to AI agents just as they apply to other data processing systems. The challenge is that agentic AI systems are often deployed without the same data protection impact assessment rigour that would be applied to a conventional data processing system. Closing that gap is an immediate priority for compliance and risk functions.

The organisations that approach agentic AI security and privacy governance proactively – building minimum privilege principles, prompt injection defences, multi-agent security monitoring and privacy-protective data practices into their deployment frameworks – will be significantly better positioned than those that address these risks reactively.

Relevant frameworks: NIST AI RMF (Map 1.1, Map 3.5) | ISO 42001 Clauses 6.1, 8.4, 8 | Berkeley Agentic AI Profile: Map 1.1 (Privacy and Security), Manage 1.3, Measure 2.7

Contact us

Previous
Previous

Multi-agent systems: Why the whole is riskier than the sum of its parts

Next
Next

Human oversight in the age of AI agents: Designing for accountability