97 lines
3.8 KiB
Markdown
97 lines
3.8 KiB
Markdown
# Prompt Injection Detection Showcase
|
|
|
|
Demonstrates what llm-security's runtime hooks detect — from classic injection to v5.0's advanced evasion techniques. Each payload is fed to the actual hook and verified.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
cd plugins/llm-security
|
|
node examples/prompt-injection-showcase/run-showcase.mjs
|
|
|
|
# Filter by category
|
|
node examples/prompt-injection-showcase/run-showcase.mjs --category "Bash Evasion"
|
|
|
|
# Show hook output details
|
|
node examples/prompt-injection-showcase/run-showcase.mjs --verbose
|
|
```
|
|
|
|
## What's Tested
|
|
|
|
61 payloads across 19 categories, verified against 3 runtime hooks:
|
|
|
|
### Input Scanning (`pre-prompt-inject-scan`)
|
|
|
|
| Category | Payloads | Severity | Action | Since |
|
|
|----------|----------|----------|--------|-------|
|
|
| Direct Override | 6 | CRITICAL | Block | v2.0 |
|
|
| Spoofed Headers | 4 | CRITICAL | Block | v2.0 |
|
|
| Identity Hijack | 4 | CRITICAL | Block | v2.0 |
|
|
| Encoding Evasion | 3 | CRITICAL | Block | v2.3 |
|
|
| Unicode Tag Steganography | 2 | CRITICAL/HIGH | Block/Advisory | v5.0 |
|
|
| Leetspeak Obfuscation | 3 | MEDIUM | Advisory | v5.0 |
|
|
| Homoglyph Mixing | 2 | MEDIUM | Advisory | v5.0 |
|
|
| Zero-Width Evasion | 1 | MEDIUM | Advisory | v5.0 |
|
|
| Multi-Language Injection | 3 | MEDIUM | Advisory | v5.0 |
|
|
| HTML/CSS Obfuscation | 3 | HIGH | Advisory | v2.3-4.0 |
|
|
| Evasion Framing | 3 | HIGH | Advisory | v4.0 |
|
|
|
|
### Output Scanning (`post-mcp-verify`)
|
|
|
|
| Category | Payloads | Severity | Action | Since |
|
|
|----------|----------|----------|--------|-------|
|
|
| Human-in-the-Loop Traps | 4 | HIGH | Advisory | v5.0 |
|
|
| Natural Language Indirection | 4 | MEDIUM | Advisory | v5.0 |
|
|
| Sub-Agent Spawning | 2 | MEDIUM | Advisory | v5.0 |
|
|
| Hybrid: P2SQL | 2 | HIGH | Advisory | v5.0 |
|
|
| Hybrid: Recursive Injection | 2 | HIGH | Advisory | v5.0 |
|
|
| Hybrid: XSS in Agent Context | 3 | HIGH | Advisory | v5.0 |
|
|
|
|
### Bash Command Scanning (`pre-bash-destructive`)
|
|
|
|
| Category | Payloads | Severity | Action | Since |
|
|
|----------|----------|----------|--------|-------|
|
|
| Bash Evasion | 4 | CRITICAL | Block | v5.0 |
|
|
|
|
### False Positive Verification
|
|
|
|
| Category | Payloads | Expected | Since |
|
|
|----------|----------|----------|-------|
|
|
| False Positive Check | 6 | No detection | v2.0 |
|
|
|
|
## How It Works
|
|
|
|
Each payload is fed to the hook via stdin using the same JSON protocol Claude Code uses:
|
|
|
|
- **UserPromptSubmit** hooks receive `{ session_id, message: { role, content } }`
|
|
- **PostToolUse** hooks receive `{ tool_name, tool_input, tool_output }`
|
|
- **PreToolUse** hooks receive `{ tool_name, tool_input }`
|
|
|
|
The showcase checks exit codes and stdout:
|
|
- Exit 2 = **blocked** (CRITICAL patterns in input scanning)
|
|
- Exit 0 + JSON stdout = **advisory** (HIGH/MEDIUM patterns)
|
|
- Exit 0, no output = **allowed** (clean input)
|
|
|
|
## Research References
|
|
|
|
The v5.0 categories are based on recent security research:
|
|
|
|
| Category | Research |
|
|
|----------|----------|
|
|
| Unicode Tag Steganography | DeepMind "AI Agent Traps" (2026), Category 1 |
|
|
| Leetspeak/Homoglyphs/Zero-Width | DeepMind traps, Preamble "Prompt Injection 2.0" |
|
|
| Bash Evasion | Preamble hybrid attacks (2026) |
|
|
| Human-in-the-Loop Traps | DeepMind "AI Agent Traps" Category 6 |
|
|
| Sub-Agent Spawning | DeepMind "AI Agent Traps" Category 4 |
|
|
| Natural Language Indirection | Preamble, DeepMind CaMeL |
|
|
| Hybrid: P2SQL | Preamble "Prompt Injection 2.0" |
|
|
| Hybrid: Recursive Injection | Preamble, Joint paper "The Attacker Moves Second" |
|
|
| Evasion Framing | DeepMind "AI Agent Traps", Oversight Evasion |
|
|
|
|
## Limitations
|
|
|
|
This showcase tests what the hooks **can** detect deterministically. It cannot demonstrate:
|
|
|
|
- Novel natural language indirection without matching patterns
|
|
- Adaptive attacks that mutate to evade fixed regex (use `--adaptive` mode in the attack simulator)
|
|
- Multi-step attacks spread across hundreds of tool calls (tested by post-session-guard)
|
|
- Behavioral drift detection (requires session-length sequences)
|