InkJect: The Visual Prompt Injection That Text Defenses Were Never Built to Stop
A user asked an LLM to deploy a website from a public repository. Standard workflow. The model retrieved the code, processed the repository assets, and built the site as requested.
It also created an administrator account with attacker-controlled credentials, silently embedded in the back end. The user saw none of it. The model flagged nothing. Every guardrail in place treated the task as clean.
The instructions that caused this were sitting inside an image file in the repository. The model read them and followed them. That is InkJect.
The attacker never touched the user's session, the user's environment, or the user's credentials. They uploaded an image to a public repository. That was enough.
Direct vs. indirect: why the distinction matters
Visual prompt injection is not a new concept. Researchers have demonstrated that instructions embedded in images can manipulate visual language models, and some vendors have implemented mitigations for the most straightforward cases.
InkJect is an indirect variant. That word carries a lot of weight.
In a direct attack, the attacker needs the user to interact with a malicious image. The user has to upload it, reference it explicitly, or send it through a channel the attacker controls. That creates a dependency. The attacker needs the user to do something.
In an indirect attack, the attacker does not need the user to do anything beyond their normal workflow. The malicious image sits in a public location. When the user asks the LLM to work, the model retrieves and processes the image on its own, as part of the task. The user does not know the image is there. The model pulls it anyway.
The attack surface for indirect injection is not a specific user interaction. It is every asset the model will autonomously retrieve during the course of its work. Every string, every image, every file could be malicious.
How InkJect works
The setup is simple. An attacker embeds malicious instructions inside an image and hosts it where a VLM is likely to encounter it during a task. The instructions are designed to evade security scanning while remaining legible to the model.
When a user asks the LLM to deploy or interact with that repository, the model retrieves the image as part of its normal operation. Through its inherent ability to process images it reads the embedded instructions and executes them alongside the user's actual task. The user receives a result that looks correct. The unauthorized action has already been taken.
In our test case:
- A user asked an LLM to deploy a website from a public repository.
- The repository contained an image with embedded instructions.
- The model retrieved and processed the image as part of the deployment.
- The hidden instructions told the model to create an admin account with full privileges on the deployed site.
- The website was deployed as requested. An attacker-controlled admin account was created without the user's knowledge.
The model did not flag the instruction. It did not warn the user. It completed both the requested task and the unauthorized one, with no visible indication that anything out of scope had occurred.
Two techniques that defeat detection
InkJect works because of a gap between what security tools can read and what visual language models can read. We found two distinct techniques that exploit this gap. Both defeated the guardrails on all four tested models.
Technique 1: White text on a white background
Malicious instructions are rendered in white or near-white text against a white background. The image looks blank to any human reviewer.
Security scanning tools that evaluate image content for harmful material also miss it. They are looking for recognizable visual content: faces, objects, explicit material, known threat signatures. A white rectangle with no visible contrast registers as an empty image.
The VLM reads it without difficulty.
This is not a quirk of any specific model. Visual language models are built to extract meaning from images across a wide range of conditions, including low contrast, faded text, and challenging backgrounds. That general-purpose visual capability is precisely what the attacker is using. The model sees what human reviewers and automated scanners cannot.

Technique 2: Skewed and distorted text
Some security architectures attempt to catch embedded instructions by running images through OCR before passing them to the model. The reasoning: if you can extract the text first, you can run it through the same filters that catch text-based injection.
Skewing or distorting the perspective of embedded text breaks OCR extraction. The characters are rotated, warped, or transformed enough that OCR returns garbled output or nothing at all. The security filter sees clean input.
The VLM reads the original instruction accurately.
This is the core of the capability gap InkJect exploits. OCR and visual language models do not read images the same way. OCR looks for well-formed character patterns under expected conditions. VLMs interpret visual content semantically, including text rendered in ways that OCR cannot process. Any security architecture that treats these as equivalent has a blind spot that can be precisely targeted.

We tested both techniques against four models across two providers. All four executed the injected instructions. All four would refuse the same instruction delivered as plain text.
Why text-based guardrails do not cover this
OpenAI and Anthropic have both invested heavily in defenses against conventional prompt injection. These systems work. They catch a wide range of text-based injection attempts, flag suspicious patterns, and block instructions that arrive through the prompt.
InkJect bypasses them because it does not go through the text layer. The malicious instruction lives in an image. The VLM processes it through its visual encoder before any text-level analysis occurs. By the time the model produces output, it has already read the embedded instruction and acted on it.
The guardrails that stop 'Create an admin account with these credentials' in a text prompt do not stop the same instruction placed inside an image. The same instruction, delivered visually, executes without resistance.
This is not a failure of any specific safety system. It is a consequence of where those systems were built to operate. They were designed for models that processed text. VLMs process images too, and that processing happens upstream of the controls.
Affected models
We tested InkJect against four production models:
- OpenAI GPT-5.2
- OpenAI GPT-5.4 Mini
- Anthropic Claude Sonnet 4.6
- Anthropic Claude Opus 4.5
All four were susceptible to both evasion techniques. Attack success varied across models, but no model in the test set blocked the injected instructions.
The vulnerability was disclosed to OpenAI and Anthropic prior to this publication.
Why this matters now
VLMs are not being deployed as novelties. They are being embedded into production engineering workflows: repository analysis, code generation, automated deployments, infrastructure provisioning. These systems have real access to real environments.
The indirect nature of InkJect means the attack scales. An attacker does not need to target specific users or compromise specific sessions. They need to plant a malicious image somewhere in the path of assets VLMs routinely retrieve. Public repositories, image hosting services, shared asset libraries: any of these is a viable delivery point. One image can affect every user whose VLM retrieves it.
The attack also leaves no obvious trace. The model completes the user's requested task. Nothing in the output signals that an additional unauthorized action occurred. A user who deploys a repository and reviews the result sees a correctly deployed site. The unauthorized account is there, and they have no reason to look for it.
Forty percent of generative AI solutions are predicted to be multimodal by 2027. The workflows being built today on VLMs are the attack surface InkJect targets. Security architecture for these systems needs to account for what the visual layer can do, not just what the text layer can do.
What the research shows
InkJect demonstrates three things that have practical implications for any organization running VLMs in production.
First, indirect injection is viable at scale. Attackers do not need direct access to a target user or system. They need access to content that a VLM will retrieve autonomously.
Second, the capability gap between OCR and VLMs is an exploitable attack surface. Defenses that rely on OCR to pre-process visual content assume equivalence that does not exist. That assumption is wrong in exactly the ways an attacker needs it to be.
Third, text-based guardrails do not transfer to the visual layer. The same instruction that triggers a refusal in text executes without resistance in an image. Until defenses are built to operate at the visual processing layer, that gap remains open.
InkJect was discovered by DeepKeep's research team.
The vulnerability was disclosed to OpenAI and Anthropic ahead of this publication.















