ktg-plugin-marketplace/plugins/llm-security/examples/prompt-injection-showcase/README.md

97 lines
3.8 KiB
Markdown

# Prompt Injection Detection Showcase
Demonstrates what llm-security's runtime hooks detect — from classic injection to v5.0's advanced evasion techniques. Each payload is fed to the actual hook and verified.
## Quick Start
```bash
cd plugins/llm-security
node examples/prompt-injection-showcase/run-showcase.mjs
# Filter by category
node examples/prompt-injection-showcase/run-showcase.mjs --category "Bash Evasion"
# Show hook output details
node examples/prompt-injection-showcase/run-showcase.mjs --verbose
```
## What's Tested
61 payloads across 19 categories, verified against 3 runtime hooks:
### Input Scanning (`pre-prompt-inject-scan`)
| Category | Payloads | Severity | Action | Since |
|----------|----------|----------|--------|-------|
| Direct Override | 6 | CRITICAL | Block | v2.0 |
| Spoofed Headers | 4 | CRITICAL | Block | v2.0 |
| Identity Hijack | 4 | CRITICAL | Block | v2.0 |
| Encoding Evasion | 3 | CRITICAL | Block | v2.3 |
| Unicode Tag Steganography | 2 | CRITICAL/HIGH | Block/Advisory | v5.0 |
| Leetspeak Obfuscation | 3 | MEDIUM | Advisory | v5.0 |
| Homoglyph Mixing | 2 | MEDIUM | Advisory | v5.0 |
| Zero-Width Evasion | 1 | MEDIUM | Advisory | v5.0 |
| Multi-Language Injection | 3 | MEDIUM | Advisory | v5.0 |
| HTML/CSS Obfuscation | 3 | HIGH | Advisory | v2.3-4.0 |
| Evasion Framing | 3 | HIGH | Advisory | v4.0 |
### Output Scanning (`post-mcp-verify`)
| Category | Payloads | Severity | Action | Since |
|----------|----------|----------|--------|-------|
| Human-in-the-Loop Traps | 4 | HIGH | Advisory | v5.0 |
| Natural Language Indirection | 4 | MEDIUM | Advisory | v5.0 |
| Sub-Agent Spawning | 2 | MEDIUM | Advisory | v5.0 |
| Hybrid: P2SQL | 2 | HIGH | Advisory | v5.0 |
| Hybrid: Recursive Injection | 2 | HIGH | Advisory | v5.0 |
| Hybrid: XSS in Agent Context | 3 | HIGH | Advisory | v5.0 |
### Bash Command Scanning (`pre-bash-destructive`)
| Category | Payloads | Severity | Action | Since |
|----------|----------|----------|--------|-------|
| Bash Evasion | 4 | CRITICAL | Block | v5.0 |
### False Positive Verification
| Category | Payloads | Expected | Since |
|----------|----------|----------|-------|
| False Positive Check | 6 | No detection | v2.0 |
## How It Works
Each payload is fed to the hook via stdin using the same JSON protocol Claude Code uses:
- **UserPromptSubmit** hooks receive `{ session_id, message: { role, content } }`
- **PostToolUse** hooks receive `{ tool_name, tool_input, tool_output }`
- **PreToolUse** hooks receive `{ tool_name, tool_input }`
The showcase checks exit codes and stdout:
- Exit 2 = **blocked** (CRITICAL patterns in input scanning)
- Exit 0 + JSON stdout = **advisory** (HIGH/MEDIUM patterns)
- Exit 0, no output = **allowed** (clean input)
## Research References
The v5.0 categories are based on recent security research:
| Category | Research |
|----------|----------|
| Unicode Tag Steganography | DeepMind "AI Agent Traps" (2026), Category 1 |
| Leetspeak/Homoglyphs/Zero-Width | DeepMind traps, Preamble "Prompt Injection 2.0" |
| Bash Evasion | Preamble hybrid attacks (2026) |
| Human-in-the-Loop Traps | DeepMind "AI Agent Traps" Category 6 |
| Sub-Agent Spawning | DeepMind "AI Agent Traps" Category 4 |
| Natural Language Indirection | Preamble, DeepMind CaMeL |
| Hybrid: P2SQL | Preamble "Prompt Injection 2.0" |
| Hybrid: Recursive Injection | Preamble, Joint paper "The Attacker Moves Second" |
| Evasion Framing | DeepMind "AI Agent Traps", Oversight Evasion |
## Limitations
This showcase tests what the hooks **can** detect deterministically. It cannot demonstrate:
- Novel natural language indirection without matching patterns
- Adaptive attacks that mutate to evade fixed regex (use `--adaptive` mode in the attack simulator)
- Multi-step attacks spread across hundreds of tool calls (tested by post-session-guard)
- Behavioral drift detection (requires session-length sequences)