Prompt Injection Detection

The MeshAI proxy scans every outbound LLM request for prompt injection patterns. When a match is detected, the proxy returns a 403 response and logs an audit event — the malicious prompt never reaches the LLM provider.

How It Works

Every request flowing through the proxy is scanned against 15+ prompt injection patterns.
If a pattern matches, the proxy blocks the request with 403 Forbidden.
An policy.violated audit event is logged with policy_type: prompt_injection.
The original prompt is stored in the audit trail for forensic review.

Patterns Detected

The scanner catches a wide range of known injection techniques:

Category	Examples
Role override	”Ignore all previous instructions”, “You are now…”
System prompt extraction	”Repeat your system prompt”, “What are your instructions?”
Jailbreak attempts	”DAN mode”, “Developer mode enabled”
Delimiter attacks	`---END SYSTEM---`, `[INST]` injection
Encoding evasion	Base64-encoded instructions, Unicode tricks
Context manipulation	”Forget everything above”, “New conversation”
Nested injection	Instructions hidden inside data payloads

Configure via Policy

Prompt injection detection is controlled through the standard policy system. Create a prompt_injection policy to enable it:

curl -X POST https://api.meshai.dev/governance/policies \
  -H "Authorization: Bearer msh_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "block-prompt-injection",
    "policy_type": "prompt_injection",
    "config": {
      "enabled": true,
      "sensitivity": "high"
    },
    "scope": "global",
    "enabled": true
  }'

{
  "success": true,
  "data": {
    "id": "pol_pi_001",
    "name": "block-prompt-injection",
    "policy_type": "prompt_injection",
    "config": {
      "enabled": true,
      "sensitivity": "high"
    },
    "scope": "global",
    "enabled": true,
    "created_at": "2026-03-18T10:00:00Z"
  }
}

Sensitivity Levels

Level	Behavior
`low`	Catches obvious injection patterns (role override, jailbreak keywords)
`medium`	Adds delimiter attacks, encoding evasion, and context manipulation
`high`	All patterns including nested injection and heuristic detection

What the Agent Sees

When a prompt injection is detected, the agent receives:

{
  "error": "Policy violation: prompt_injection — request blocked due to detected prompt injection pattern",
  "policy_id": "pol_pi_001",
  "status": 403
}

Audit Trail

Every blocked injection attempt is logged:

Event Type	Description
`policy.violated`	Prompt injection detected (includes matched pattern category and request metadata)

Query injection-related audit events:

curl "https://api.meshai.dev/governance/audit-trail?event_type=policy.violated&policy_type=prompt_injection" \
  -H "Authorization: Bearer msh_YOUR_API_KEY"

Enforcement Architecture

Scan path: Proxy scans all messages content before forwarding to the LLM provider.
Latency: Pattern matching adds < 2ms overhead per request.
Fail-open: If the scanner errors, the request proceeds (logged as a warning).
No data exfiltration: Blocked prompts are stored only in MeshAI audit trail, never forwarded.

Getting Started

Proxy (Zero-Code)

Framework Guides

Governance

EU AI Act Compliance

Billing

How It Works

Patterns Detected

Configure via Policy

Sensitivity Levels

What the Agent Sees

Audit Trail

Enforcement Architecture

Getting Started

Proxy (Zero-Code)

Framework Guides

Governance

EU AI Act Compliance

Billing

Documentation Index

​How It Works

​Patterns Detected

​Configure via Policy

​Sensitivity Levels

​What the Agent Sees

​Audit Trail

​Enforcement Architecture

How It Works

Patterns Detected

Configure via Policy

Sensitivity Levels

What the Agent Sees

Audit Trail

Enforcement Architecture