Skip to main content
The MeshAI proxy scans every outbound LLM request for prompt injection patterns. When a match is detected, the proxy returns a 403 response and logs an audit event — the malicious prompt never reaches the LLM provider.

How It Works

  1. Every request flowing through the proxy is scanned against 15+ prompt injection patterns.
  2. If a pattern matches, the proxy blocks the request with 403 Forbidden.
  3. An policy.violated audit event is logged with policy_type: prompt_injection.
  4. The original prompt is stored in the audit trail for forensic review.

Patterns Detected

The scanner catches a wide range of known injection techniques:
CategoryExamples
Role override”Ignore all previous instructions”, “You are now…”
System prompt extraction”Repeat your system prompt”, “What are your instructions?”
Jailbreak attempts”DAN mode”, “Developer mode enabled”
Delimiter attacks---END SYSTEM---, [INST] injection
Encoding evasionBase64-encoded instructions, Unicode tricks
Context manipulation”Forget everything above”, “New conversation”
Nested injectionInstructions hidden inside data payloads

Configure via Policy

Prompt injection detection is controlled through the standard policy system. Create a prompt_injection policy to enable it:
curl -X POST https://api.meshai.dev/governance/policies \
  -H "Authorization: Bearer msh_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "block-prompt-injection",
    "policy_type": "prompt_injection",
    "config": {
      "enabled": true,
      "sensitivity": "high"
    },
    "scope": "global",
    "enabled": true
  }'
{
  "success": true,
  "data": {
    "id": "pol_pi_001",
    "name": "block-prompt-injection",
    "policy_type": "prompt_injection",
    "config": {
      "enabled": true,
      "sensitivity": "high"
    },
    "scope": "global",
    "enabled": true,
    "created_at": "2026-03-18T10:00:00Z"
  }
}

Sensitivity Levels

LevelBehavior
lowCatches obvious injection patterns (role override, jailbreak keywords)
mediumAdds delimiter attacks, encoding evasion, and context manipulation
highAll patterns including nested injection and heuristic detection

What the Agent Sees

When a prompt injection is detected, the agent receives:
{
  "error": "Policy violation: prompt_injection — request blocked due to detected prompt injection pattern",
  "policy_id": "pol_pi_001",
  "status": 403
}

Audit Trail

Every blocked injection attempt is logged:
Event TypeDescription
policy.violatedPrompt injection detected (includes matched pattern category and request metadata)
Query injection-related audit events:
curl "https://api.meshai.dev/governance/audit-trail?event_type=policy.violated&policy_type=prompt_injection" \
  -H "Authorization: Bearer msh_YOUR_API_KEY"

Enforcement Architecture

  • Scan path: Proxy scans all messages content before forwarding to the LLM provider.
  • Latency: Pattern matching adds < 2ms overhead per request.
  • Fail-open: If the scanner errors, the request proceeds (logged as a warning).
  • No data exfiltration: Blocked prompts are stored only in MeshAI audit trail, never forwarded.