403 response and logs an audit event — the malicious prompt never reaches the LLM provider.
How It Works
- Every request flowing through the proxy is scanned against 15+ prompt injection patterns.
- If a pattern matches, the proxy blocks the request with
403 Forbidden. - An
policy.violatedaudit event is logged withpolicy_type: prompt_injection. - The original prompt is stored in the audit trail for forensic review.
Patterns Detected
The scanner catches a wide range of known injection techniques:| Category | Examples |
|---|---|
| Role override | ”Ignore all previous instructions”, “You are now…” |
| System prompt extraction | ”Repeat your system prompt”, “What are your instructions?” |
| Jailbreak attempts | ”DAN mode”, “Developer mode enabled” |
| Delimiter attacks | ---END SYSTEM---, [INST] injection |
| Encoding evasion | Base64-encoded instructions, Unicode tricks |
| Context manipulation | ”Forget everything above”, “New conversation” |
| Nested injection | Instructions hidden inside data payloads |
Configure via Policy
Prompt injection detection is controlled through the standard policy system. Create aprompt_injection policy to enable it:
Sensitivity Levels
| Level | Behavior |
|---|---|
low | Catches obvious injection patterns (role override, jailbreak keywords) |
medium | Adds delimiter attacks, encoding evasion, and context manipulation |
high | All patterns including nested injection and heuristic detection |
What the Agent Sees
When a prompt injection is detected, the agent receives:Audit Trail
Every blocked injection attempt is logged:| Event Type | Description |
|---|---|
policy.violated | Prompt injection detected (includes matched pattern category and request metadata) |
Enforcement Architecture
- Scan path: Proxy scans all
messagescontent before forwarding to the LLM provider. - Latency: Pattern matching adds < 2ms overhead per request.
- Fail-open: If the scanner errors, the request proceeds (logged as a warning).
- No data exfiltration: Blocked prompts are stored only in MeshAI audit trail, never forwarded.

