Anatomy of a prompt injection
Prompt injection is best understood as a chain, from the moment adversarial instructions enter a context window to the moment a model or an agent acts on them. Here is the chain, stage by stage, and where inline enforcement breaks it.
One channel for instructions and data
A language model receives instructions and data through the same channel: text in its context window. It has no type system separating content to analyze from commands to follow. The user's prompt, a system message, a retrieved document, a web page, a tool result, all of it arrives with the same standing. Prompt injection exploits that property by placing instructions where the model expects data.
Nothing here requires a vulnerability in the classic sense. The model is doing what it was built to do, following the strongest instruction in view. That is why injection resists being patched away, and why it sits at the top of the OWASP risk list for LLM applications.
Direct injection
The direct form is the simplest: the attacker is the one typing. A crafted prompt tries to override the system's standing instructions, "ignore previous instructions" is the folk version, in order to extract a system prompt, defeat a content policy, or pull data the assistant can reach but the user should not see. Direct injection is visible in the prompt itself, which makes it the form most amenable to inspection at entry.
Indirect injection
The more consequential form arrives with honest users. Instructions are planted in content the assistant will process later: a paragraph in a shared document, white text in an email signature, a comment in a code file, a field in a support ticket, markup on a web page. The employee asks for a summary; the document asks for something else. When the assistant reads the poisoned content into context, the attacker is inside the interaction without ever touching the prompt.
Indirect injection scales because enterprises wire models to their content on purpose. Retrieval pipelines, connected drives, browsing tools, and inbox assistants all widen the set of text that can carry instructions into a context window.
Tool-call hijacking
Injection becomes an incident when the model can act. Agents turn model output into tool calls: send an email, query a database, write a file, call an API. Hijacking means the injected instructions choose the next tool call instead of the user. An agent summarizing a poisoned page can be steered to forward a thread, exfiltrate what retrieval returned, or act with the credentials it holds. The blast radius of a hijacked agent is whatever that agent is authorized to do.
| Vector | Instructions arrive via | The break point |
|---|---|---|
| DIRECT | The prompt itself | Inspected at entry, before the prompt reaches the model's API |
| INDIRECT | Retrieved content: documents, pages, tickets, tool results | Assembled context judged as one interaction, before the model sees it |
| TOOL-CALL HIJACK | Model output steering an agent's next action | Agent-to-tool calls governed inline; calls that break intent are held or blocked |
Where the chain breaks
Every variant shares two chokepoints. Assembled context flows into a model, and, for agents, actions flow out through tool calls. Those are the two places a control can stand without guessing.
Inline enforcement stands at both. Before the prompt reaches the model's API, the runtime inspects the full assembled interaction, the user's text and the retrieved content it carries, and renders a verdict: allow, redact, hold for human review, or block. Detection is judged by meaning and intent, not pattern matching alone, which matters here because injected instructions rarely repeat a known string. What gives them away is behavior: a document that issues commands, a résumé that addresses the reader's tools, a ticket that asks for credentials.
On the way out, agent-to-tool calls get the same treatment. A call that does not follow from the user's intent, forwarding mail nobody asked to send, touching records outside the task, can be held or blocked before it fires.
The outcome difference is the point of the position. Enforcement after the model answers can document an incident; enforcement before the model changes it. A blocked injection never executes, and a held tool call never fires while a reviewer looks. Either way the decision is traced as sealed, metadata-only evidence, so attempted injections become searchable history rather than folklore. A held item stays visible to designated reviewers until they decide; the trail then keeps the decision, not the content.
Prompt injection is not going away; the property that makes models useful, following instructions in context, is the one it abuses. What an enterprise controls is where enforcement stands and what gets to execute. Stand before the model, judge the whole interaction, govern the tool calls, and keep the receipts.