← Back to blog posts

Agentic AI Security: The Attack Surface Nobody Mapped Yet

June 15, 2025

AI agents don't just answer questions. They make decisions, call tools, browse the web, write and execute code, send emails, and trigger downstream systems. That is a fundamentally different threat model, and most security teams are still treating it like a smarter chatbot.

The shift from LLMs to agentic AI isn't a feature update. It's an architectural change. When a model can act, the blast radius of a security failure expands from a bad output to a bad outcome: leaked credentials, deleted records, unauthorized API calls, data exfiltrated through a perfectly legitimate-looking workflow. The compliance checkbox that covered your LLM deployment covers approximately none of this.

What Makes Agents Dangerous

The core problem with agentic AI security is that agents inherit trust they haven't earned.

When an agent reads a webpage, processes a document, or receives a response from another tool, it treats that content as context. If that content contains instructions, the agent may follow them. This is indirect prompt injection, and it's the defining vulnerability of agentic systems. The model has no reliable way to distinguish between data it was told to process and instructions it was told to follow. Researchers at Microsoft and ETH Zurich demonstrated this in 2023, and the problem has only gotten more relevant as agents have gotten more capable.

It gets more complex with multi-agent architectures. Frameworks like LangChain, CrewAI, and OpenAI's Agent SDK let agents spawn sub-agents, delegate tasks, and share results. When Agent A trusts Agent B's output without verification, an attacker who can influence Agent B can compromise the entire workflow. The security boundary between agents is, in most current implementations, essentially nonexistent. Which is a fun thing to discover after you've granted your agent access to production systems.

Then there are the tools themselves. Agents are given access to real systems: file systems, email, calendars, databases, cloud APIs. The agent doesn't know what it shouldn't do. It knows what it can do. Without explicit access controls scoped to the task at hand, that distinction matters enormously.

Why Traditional Security Doesn't Transfer

Perimeter security protects infrastructure. Input validation catches known-bad patterns. Guardrails filter specific content categories. None of these were designed for a system that reasons about what to do next based on whatever it last read.

Traditional penetration testing is also insufficient. AI systems are non-deterministic: the same input, at different times or with slightly different context, can produce different behavior. A test that passes today may fail tomorrow after a model update, a new system prompt, or a change in connected tools. You can't snapshot an agent and call it secured. Security teams that have done exactly this are in for a surprise.

The attack surface isn't a list of endpoints. It's the model's behavior across every possible input, tool response, and environmental state. That requires a different approach entirely, which is something we've written about in the context of why built-in guardrails aren't enough.

The New Attack Patterns You Should Know

Prompt injection via tool output: An attacker embeds instructions in a webpage, email, or document that an agent is likely to read. The agent follows those instructions, often without any indication to the user that something has gone wrong.

Privilege escalation through tool access: Agents are often granted broad permissions to do their jobs. An attacker who can influence an agent's reasoning can redirect those permissions toward unintended actions. Think of it as social engineering, but the target has root access and no skepticism.

Agent-to-agent manipulation: In multi-agent systems, a compromised or spoofed agent can feed malicious instructions to other agents, causing lateral movement through the workflow.

Exfiltration via legitimate channels: Because agents use real integrations (Slack, email, cloud storage), data can be exfiltrated through channels that look completely normal in the logs. No anomaly detected. No alert fired.

Unintended destructive actions: Agents can delete, overwrite, or modify data as part of their normal operation. Without runtime controls that scope what an agent is allowed to do in a given task context, there is no reliable safety net.

What Securing Agents Actually Requires

Agentic AI security requires controls at the behavioral layer, not just the perimeter. That means monitoring what agents are doing in real time, not just what they're outputting — getting that visibility across every agent in your environment is exactly what DeepKeep's AI Agent Scanner is built for. It means scoping tool permissions to the minimum required for each task. It means treating every external input an agent processes as potentially adversarial.

It also means red teaming agents the way you'd test any autonomous system: with adversarial inputs designed to manipulate reasoning, not just trigger keyword filters. Static tests against a fixed prompt are not red teaming. They're a warm-up.

The attack surface is real, it's growing, and most of it is unmapped. The organizations that treat agentic AI as a new security category, rather than a variation on existing LLM risk, are the ones that won't be reading about themselves in a post-incident report.

InkJect: The Visual Prompt Injection That Text Defenses Were Never Built to Stop

A hidden instruction inside an image. An LLM that follows it. InkJect is a new visual prompt injection vulnerability confirmed on OpenAI and Anthropic's latest models.

What is AI Red Teaming? A Practical Guide

Red teaming AI systems isn't the same as traditional pen testing. The attack surface is different, the methods are different, and a one-time exercise won't keep you safe. Here's what it actually involves.

What Is Prompt Injection? How It Works and How to Stop It

Prompt injection is the most exploited vulnerability in AI systems today, and one of the hardest to fully fix. Here's what it is, why it's structural, and how to build a defense that actually holds.

DeepKeep Selected as EIC Accelerator Winner: Europe Bets on AI Security

DeepKeep has been awarded €2.5M in blended finance through the EIC Accelerator's October 2024 cut-off. The co-funded project: Multimodal Models with AI-Native Security and Trustworthiness - a recognition that securing AI across LLMs, computer vision, spatial sensing, and multimodal systems isn't a nice-to-have. It's infrastructure.

DeepKeep Launches Vibe AI Red Teaming: A New Approach to AI Security

DeepKeep is introducing Vibe AI Red Teaming, a new approach that combines human expertise with AI-driven execution.

The 45-Minute AI Lobotomy: Why Built-In Guardrails Are Dead

With open-source tools like Heretic performing a 45-minute lobotomy to effortlessly erase an AI's built-in safety guardrails, organizations must abandon the illusion that models can police themselves.

The AI Red Teaming Reality Check: How DeepKeep Delivers on OWASP

The OWASP v1.0 AI Red Teaming standard is the new benchmark for enterprise resilience. Read how DeepKeep ditches static jailbreaks for dynamic, context-aware testing across your entire agentic workflow.

A Rotten Apple Spoils the Image Generation

Poisoned training samples can turn ControlNet into a hidden backdoor. From a security perspective, this is not a noisy exploit. It is a sleeper agent waiting for the right signal.

Why LLM-as-a-Judge Isn't Enough

Let one AI keep an eye on another AI feels like putting a referee in the game. In reality, LLM-as-a-judge isn’t the silver bullet some people wish it was.

Multimodal AI is Smarter. Unfortunately, so are The Attacks.

AI has gotten good at understanding not just what we type, but what we show. This shift has made AI more powerful. Unfortunately, it has also made it more vulnerable.

You Can’t “Detect” a Jailbreak. Here’s What to Do Instead

Everyone is looking for an efficient way to detect and block jailbreaks, but here’s the uncomfortable truth: you can’t reliably detect every jailbreak, and trying to chase them all is a losing game.

Two Smart AI Models. Zero Common Sense.

AI is no longer a one-trick tool. It writes reports, analyzes photos, answers complex questions, and even kicks off real-world actions. Most of this power comes from two areas working side by side: Generative AI and Computer Vision.

Top Three Scenarios for PII Leakage in GenAI

Comprehensive PII detection combines scanning of data, penetration testing and a real-time AI firewall

DeepKeep Launches GenAI Risk Assessment Module

Evaluating model resilience is paramount, particularly during its inference phase in order to provide insights into the model's ability to handle various scenarios effectively

DeepKeep Comes out of Stealth to Safeguard GenAI with AI-Native Security and Trustworthiness

DeepKeep offers AI-Native security and trustworthiness that secures AI throughout its entire lifecycle

Meta’s LlamaV2 7B LLM Suffers from Susceptibility to DoS and Data Leakage

DeepKeep's evaluation of LlamaV2 7B's security and trustworthiness found strengths in task performance and ethical commitment, with areas for improvement in handling complex transformations, addressing bias, and enhancing security against sophisticated threats

View all

Related posts