| .. | ||
| payloads.json | ||
| README.md | ||
| run-showcase.mjs | ||
| run-showcase.sh | ||
Prompt Injection Detection Showcase
Demonstrates what llm-security's runtime hooks detect — from classic injection to v5.0's advanced evasion techniques. Each payload is fed to the actual hook and verified.
Quick Start
cd plugins/llm-security
node examples/prompt-injection-showcase/run-showcase.mjs
# Filter by category
node examples/prompt-injection-showcase/run-showcase.mjs --category "Bash Evasion"
# Show hook output details
node examples/prompt-injection-showcase/run-showcase.mjs --verbose
What's Tested
61 payloads across 19 categories, verified against 3 runtime hooks:
Input Scanning (pre-prompt-inject-scan)
| Category | Payloads | Severity | Action | Since |
|---|---|---|---|---|
| Direct Override | 6 | CRITICAL | Block | v2.0 |
| Spoofed Headers | 4 | CRITICAL | Block | v2.0 |
| Identity Hijack | 4 | CRITICAL | Block | v2.0 |
| Encoding Evasion | 3 | CRITICAL | Block | v2.3 |
| Unicode Tag Steganography | 2 | CRITICAL/HIGH | Block/Advisory | v5.0 |
| Leetspeak Obfuscation | 3 | MEDIUM | Advisory | v5.0 |
| Homoglyph Mixing | 2 | MEDIUM | Advisory | v5.0 |
| Zero-Width Evasion | 1 | MEDIUM | Advisory | v5.0 |
| Multi-Language Injection | 3 | MEDIUM | Advisory | v5.0 |
| HTML/CSS Obfuscation | 3 | HIGH | Advisory | v2.3-4.0 |
| Evasion Framing | 3 | HIGH | Advisory | v4.0 |
Output Scanning (post-mcp-verify)
| Category | Payloads | Severity | Action | Since |
|---|---|---|---|---|
| Human-in-the-Loop Traps | 4 | HIGH | Advisory | v5.0 |
| Natural Language Indirection | 4 | MEDIUM | Advisory | v5.0 |
| Sub-Agent Spawning | 2 | MEDIUM | Advisory | v5.0 |
| Hybrid: P2SQL | 2 | HIGH | Advisory | v5.0 |
| Hybrid: Recursive Injection | 2 | HIGH | Advisory | v5.0 |
| Hybrid: XSS in Agent Context | 3 | HIGH | Advisory | v5.0 |
Bash Command Scanning (pre-bash-destructive)
| Category | Payloads | Severity | Action | Since |
|---|---|---|---|---|
| Bash Evasion | 4 | CRITICAL | Block | v5.0 |
False Positive Verification
| Category | Payloads | Expected | Since |
|---|---|---|---|
| False Positive Check | 6 | No detection | v2.0 |
How It Works
Each payload is fed to the hook via stdin using the same JSON protocol Claude Code uses:
- UserPromptSubmit hooks receive
{ session_id, message: { role, content } } - PostToolUse hooks receive
{ tool_name, tool_input, tool_output } - PreToolUse hooks receive
{ tool_name, tool_input }
The showcase checks exit codes and stdout:
- Exit 2 = blocked (CRITICAL patterns in input scanning)
- Exit 0 + JSON stdout = advisory (HIGH/MEDIUM patterns)
- Exit 0, no output = allowed (clean input)
Research References
The v5.0 categories are based on recent security research:
| Category | Research |
|---|---|
| Unicode Tag Steganography | DeepMind "AI Agent Traps" (2026), Category 1 |
| Leetspeak/Homoglyphs/Zero-Width | DeepMind traps, Preamble "Prompt Injection 2.0" |
| Bash Evasion | Preamble hybrid attacks (2026) |
| Human-in-the-Loop Traps | DeepMind "AI Agent Traps" Category 6 |
| Sub-Agent Spawning | DeepMind "AI Agent Traps" Category 4 |
| Natural Language Indirection | Preamble, DeepMind CaMeL |
| Hybrid: P2SQL | Preamble "Prompt Injection 2.0" |
| Hybrid: Recursive Injection | Preamble, Joint paper "The Attacker Moves Second" |
| Evasion Framing | DeepMind "AI Agent Traps", Oversight Evasion |
Limitations
This showcase tests what the hooks can detect deterministically. It cannot demonstrate:
- Novel natural language indirection without matching patterns
- Adaptive attacks that mutate to evade fixed regex (use
--adaptivemode in the attack simulator) - Multi-step attacks spread across hundreds of tool calls (tested by post-session-guard)
- Behavioral drift detection (requires session-length sequences)