Prompt Injection Detection Showcase

Demonstrates what llm-security's runtime hooks detect — from classic injection to v5.0's advanced evasion techniques. Each payload is fed to the actual hook and verified.

Quick Start

cd plugins/llm-security
node examples/prompt-injection-showcase/run-showcase.mjs

# Filter by category
node examples/prompt-injection-showcase/run-showcase.mjs --category "Bash Evasion"

# Show hook output details
node examples/prompt-injection-showcase/run-showcase.mjs --verbose

What's Tested

61 payloads across 19 categories, verified against 3 runtime hooks:

Input Scanning (`pre-prompt-inject-scan`)

Category	Payloads	Severity	Action	Since
Direct Override	6	CRITICAL	Block	v2.0
Spoofed Headers	4	CRITICAL	Block	v2.0
Identity Hijack	4	CRITICAL	Block	v2.0
Encoding Evasion	3	CRITICAL	Block	v2.3
Unicode Tag Steganography	2	CRITICAL/HIGH	Block/Advisory	v5.0
Leetspeak Obfuscation	3	MEDIUM	Advisory	v5.0
Homoglyph Mixing	2	MEDIUM	Advisory	v5.0
Zero-Width Evasion	1	MEDIUM	Advisory	v5.0
Multi-Language Injection	3	MEDIUM	Advisory	v5.0
HTML/CSS Obfuscation	3	HIGH	Advisory	v2.3-4.0
Evasion Framing	3	HIGH	Advisory	v4.0

Output Scanning (`post-mcp-verify`)

Category	Payloads	Severity	Action	Since
Human-in-the-Loop Traps	4	HIGH	Advisory	v5.0
Natural Language Indirection	4	MEDIUM	Advisory	v5.0
Sub-Agent Spawning	2	MEDIUM	Advisory	v5.0
Hybrid: P2SQL	2	HIGH	Advisory	v5.0
Hybrid: Recursive Injection	2	HIGH	Advisory	v5.0
Hybrid: XSS in Agent Context	3	HIGH	Advisory	v5.0

Bash Command Scanning (`pre-bash-destructive`)

Category	Payloads	Severity	Action	Since
Bash Evasion	4	CRITICAL	Block	v5.0

False Positive Verification

Category	Payloads	Expected	Since
False Positive Check	6	No detection	v2.0

How It Works

Each payload is fed to the hook via stdin using the same JSON protocol Claude Code uses:

UserPromptSubmit hooks receive { session_id, message: { role, content } }
PostToolUse hooks receive { tool_name, tool_input, tool_output }
PreToolUse hooks receive { tool_name, tool_input }

The showcase checks exit codes and stdout:

Exit 2 = blocked (CRITICAL patterns in input scanning)
Exit 0 + JSON stdout = advisory (HIGH/MEDIUM patterns)
Exit 0, no output = allowed (clean input)

Research References

The v5.0 categories are based on recent security research:

Category	Research
Unicode Tag Steganography	DeepMind "AI Agent Traps" (2026), Category 1
Leetspeak/Homoglyphs/Zero-Width	DeepMind traps, Preamble "Prompt Injection 2.0"
Bash Evasion	Preamble hybrid attacks (2026)
Human-in-the-Loop Traps	DeepMind "AI Agent Traps" Category 6
Sub-Agent Spawning	DeepMind "AI Agent Traps" Category 4
Natural Language Indirection	Preamble, DeepMind CaMeL
Hybrid: P2SQL	Preamble "Prompt Injection 2.0"
Hybrid: Recursive Injection	Preamble, Joint paper "The Attacker Moves Second"
Evasion Framing	DeepMind "AI Agent Traps", Oversight Evasion

Limitations

This showcase tests what the hooks can detect deterministically. It cannot demonstrate:

Novel natural language indirection without matching patterns
Adaptive attacks that mutate to evade fixed regex (use --adaptive mode in the attack simulator)
Multi-step attacks spread across hundreds of tool calls (tested by post-session-guard)
Behavioral drift detection (requires session-length sequences)

3.8 KiB Raw Blame History