NewDeepKeep launches Vibe AI Red Teaming- a new approach to AI securityRead more
DeepKeep

Agentic AI Security: The Attack Surface Nobody Mapped Yet

June 15, 2025

AI agents don't just answer questions. They make decisions, call tools, browse the web, write and execute code, send emails, and trigger downstream systems. That is a fundamentally different threat model, and most security teams are still treating it like a smarter chatbot.

The shift from LLMs to agentic AI isn't a feature update. It's an architectural change. When a model can act, the blast radius of a security failure expands from a bad output to a bad outcome: leaked credentials, deleted records, unauthorized API calls, data exfiltrated through a perfectly legitimate-looking workflow. The compliance checkbox that covered your LLM deployment covers approximately none of this.

What Makes Agents Dangerous

The core problem with agentic AI security is that agents inherit trust they haven't earned.

When an agent reads a webpage, processes a document, or receives a response from another tool, it treats that content as context. If that content contains instructions, the agent may follow them. This is indirect prompt injection, and it's the defining vulnerability of agentic systems. The model has no reliable way to distinguish between data it was told to process and instructions it was told to follow. Researchers at Microsoft and ETH Zurich demonstrated this in 2023, and the problem has only gotten more relevant as agents have gotten more capable.

It gets more complex with multi-agent architectures. Frameworks like LangChain, CrewAI, and OpenAI's Agent SDK let agents spawn sub-agents, delegate tasks, and share results. When Agent A trusts Agent B's output without verification, an attacker who can influence Agent B can compromise the entire workflow. The security boundary between agents is, in most current implementations, essentially nonexistent. Which is a fun thing to discover after you've granted your agent access to production systems.

Then there are the tools themselves. Agents are given access to real systems: file systems, email, calendars, databases, cloud APIs. The agent doesn't know what it shouldn't do. It knows what it can do. Without explicit access controls scoped to the task at hand, that distinction matters enormously.

Why Traditional Security Doesn't Transfer

Perimeter security protects infrastructure. Input validation catches known-bad patterns. Guardrails filter specific content categories. None of these were designed for a system that reasons about what to do next based on whatever it last read.

Traditional penetration testing is also insufficient. AI systems are non-deterministic: the same input, at different times or with slightly different context, can produce different behavior. A test that passes today may fail tomorrow after a model update, a new system prompt, or a change in connected tools. You can't snapshot an agent and call it secured. Security teams that have done exactly this are in for a surprise.

The attack surface isn't a list of endpoints. It's the model's behavior across every possible input, tool response, and environmental state. That requires a different approach entirely, which is something we've written about in the context of why built-in guardrails aren't enough.

The New Attack Patterns You Should Know

Prompt injection via tool output: An attacker embeds instructions in a webpage, email, or document that an agent is likely to read. The agent follows those instructions, often without any indication to the user that something has gone wrong.

Privilege escalation through tool access: Agents are often granted broad permissions to do their jobs. An attacker who can influence an agent's reasoning can redirect those permissions toward unintended actions. Think of it as social engineering, but the target has root access and no skepticism.

Agent-to-agent manipulation: In multi-agent systems, a compromised or spoofed agent can feed malicious instructions to other agents, causing lateral movement through the workflow.

Exfiltration via legitimate channels: Because agents use real integrations (Slack, email, cloud storage), data can be exfiltrated through channels that look completely normal in the logs. No anomaly detected. No alert fired.

Unintended destructive actions: Agents can delete, overwrite, or modify data as part of their normal operation. Without runtime controls that scope what an agent is allowed to do in a given task context, there is no reliable safety net.

What Securing Agents Actually Requires

Agentic AI security requires controls at the behavioral layer, not just the perimeter. That means monitoring what agents are doing in real time, not just what they're outputting — getting that visibility across every agent in your environment is exactly what DeepKeep's AI Agent Scanner is built for. It means scoping tool permissions to the minimum required for each task. It means treating every external input an agent processes as potentially adversarial.

It also means red teaming agents the way you'd test any autonomous system: with adversarial inputs designed to manipulate reasoning, not just trigger keyword filters. Static tests against a fixed prompt are not red teaming. They're a warm-up.

The attack surface is real, it's growing, and most of it is unmapped. The organizations that treat agentic AI as a new security category, rather than a variation on existing LLM risk, are the ones that won't be reading about themselves in a post-incident report.