# Prompt Injection Detection Showcase Demonstrates what llm-security's runtime hooks detect — from classic injection to v5.0's advanced evasion techniques. Each payload is fed to the actual hook and verified. ## Quick Start ```bash cd plugins/llm-security node examples/prompt-injection-showcase/run-showcase.mjs # Filter by category node examples/prompt-injection-showcase/run-showcase.mjs --category "Bash Evasion" # Show hook output details node examples/prompt-injection-showcase/run-showcase.mjs --verbose ``` ## What's Tested 61 payloads across 19 categories, verified against 3 runtime hooks: ### Input Scanning (`pre-prompt-inject-scan`) | Category | Payloads | Severity | Action | Since | |----------|----------|----------|--------|-------| | Direct Override | 6 | CRITICAL | Block | v2.0 | | Spoofed Headers | 4 | CRITICAL | Block | v2.0 | | Identity Hijack | 4 | CRITICAL | Block | v2.0 | | Encoding Evasion | 3 | CRITICAL | Block | v2.3 | | Unicode Tag Steganography | 2 | CRITICAL/HIGH | Block/Advisory | v5.0 | | Leetspeak Obfuscation | 3 | MEDIUM | Advisory | v5.0 | | Homoglyph Mixing | 2 | MEDIUM | Advisory | v5.0 | | Zero-Width Evasion | 1 | MEDIUM | Advisory | v5.0 | | Multi-Language Injection | 3 | MEDIUM | Advisory | v5.0 | | HTML/CSS Obfuscation | 3 | HIGH | Advisory | v2.3-4.0 | | Evasion Framing | 3 | HIGH | Advisory | v4.0 | ### Output Scanning (`post-mcp-verify`) | Category | Payloads | Severity | Action | Since | |----------|----------|----------|--------|-------| | Human-in-the-Loop Traps | 4 | HIGH | Advisory | v5.0 | | Natural Language Indirection | 4 | MEDIUM | Advisory | v5.0 | | Sub-Agent Spawning | 2 | MEDIUM | Advisory | v5.0 | | Hybrid: P2SQL | 2 | HIGH | Advisory | v5.0 | | Hybrid: Recursive Injection | 2 | HIGH | Advisory | v5.0 | | Hybrid: XSS in Agent Context | 3 | HIGH | Advisory | v5.0 | ### Bash Command Scanning (`pre-bash-destructive`) | Category | Payloads | Severity | Action | Since | |----------|----------|----------|--------|-------| | Bash Evasion | 4 | CRITICAL | Block | v5.0 | ### False Positive Verification | Category | Payloads | Expected | Since | |----------|----------|----------|-------| | False Positive Check | 6 | No detection | v2.0 | ## How It Works Each payload is fed to the hook via stdin using the same JSON protocol Claude Code uses: - **UserPromptSubmit** hooks receive `{ session_id, message: { role, content } }` - **PostToolUse** hooks receive `{ tool_name, tool_input, tool_output }` - **PreToolUse** hooks receive `{ tool_name, tool_input }` The showcase checks exit codes and stdout: - Exit 2 = **blocked** (CRITICAL patterns in input scanning) - Exit 0 + JSON stdout = **advisory** (HIGH/MEDIUM patterns) - Exit 0, no output = **allowed** (clean input) ## Research References The v5.0 categories are based on recent security research: | Category | Research | |----------|----------| | Unicode Tag Steganography | DeepMind "AI Agent Traps" (2026), Category 1 | | Leetspeak/Homoglyphs/Zero-Width | DeepMind traps, Preamble "Prompt Injection 2.0" | | Bash Evasion | Preamble hybrid attacks (2026) | | Human-in-the-Loop Traps | DeepMind "AI Agent Traps" Category 6 | | Sub-Agent Spawning | DeepMind "AI Agent Traps" Category 4 | | Natural Language Indirection | Preamble, DeepMind CaMeL | | Hybrid: P2SQL | Preamble "Prompt Injection 2.0" | | Hybrid: Recursive Injection | Preamble, Joint paper "The Attacker Moves Second" | | Evasion Framing | DeepMind "AI Agent Traps", Oversight Evasion | ## Limitations This showcase tests what the hooks **can** detect deterministically. It cannot demonstrate: - Novel natural language indirection without matching patterns - Adaptive attacks that mutate to evade fixed regex (use `--adaptive` mode in the attack simulator) - Multi-step attacks spread across hundreds of tool calls (tested by post-session-guard) - Behavioral drift detection (requires session-length sequences)