feat: initial open marketplace with llm-security, config-audit, ultraplan-local
This commit is contained in:
commit
f93d6abdae
380 changed files with 65935 additions and 0 deletions
|
|
@ -0,0 +1,97 @@
|
|||
# Prompt Injection Detection Showcase
|
||||
|
||||
Demonstrates what llm-security's runtime hooks detect — from classic injection to v5.0's advanced evasion techniques. Each payload is fed to the actual hook and verified.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
cd plugins/llm-security
|
||||
node examples/prompt-injection-showcase/run-showcase.mjs
|
||||
|
||||
# Filter by category
|
||||
node examples/prompt-injection-showcase/run-showcase.mjs --category "Bash Evasion"
|
||||
|
||||
# Show hook output details
|
||||
node examples/prompt-injection-showcase/run-showcase.mjs --verbose
|
||||
```
|
||||
|
||||
## What's Tested
|
||||
|
||||
61 payloads across 19 categories, verified against 3 runtime hooks:
|
||||
|
||||
### Input Scanning (`pre-prompt-inject-scan`)
|
||||
|
||||
| Category | Payloads | Severity | Action | Since |
|
||||
|----------|----------|----------|--------|-------|
|
||||
| Direct Override | 6 | CRITICAL | Block | v2.0 |
|
||||
| Spoofed Headers | 4 | CRITICAL | Block | v2.0 |
|
||||
| Identity Hijack | 4 | CRITICAL | Block | v2.0 |
|
||||
| Encoding Evasion | 3 | CRITICAL | Block | v2.3 |
|
||||
| Unicode Tag Steganography | 2 | CRITICAL/HIGH | Block/Advisory | v5.0 |
|
||||
| Leetspeak Obfuscation | 3 | MEDIUM | Advisory | v5.0 |
|
||||
| Homoglyph Mixing | 2 | MEDIUM | Advisory | v5.0 |
|
||||
| Zero-Width Evasion | 1 | MEDIUM | Advisory | v5.0 |
|
||||
| Multi-Language Injection | 3 | MEDIUM | Advisory | v5.0 |
|
||||
| HTML/CSS Obfuscation | 3 | HIGH | Advisory | v2.3-4.0 |
|
||||
| Evasion Framing | 3 | HIGH | Advisory | v4.0 |
|
||||
|
||||
### Output Scanning (`post-mcp-verify`)
|
||||
|
||||
| Category | Payloads | Severity | Action | Since |
|
||||
|----------|----------|----------|--------|-------|
|
||||
| Human-in-the-Loop Traps | 4 | HIGH | Advisory | v5.0 |
|
||||
| Natural Language Indirection | 4 | MEDIUM | Advisory | v5.0 |
|
||||
| Sub-Agent Spawning | 2 | MEDIUM | Advisory | v5.0 |
|
||||
| Hybrid: P2SQL | 2 | HIGH | Advisory | v5.0 |
|
||||
| Hybrid: Recursive Injection | 2 | HIGH | Advisory | v5.0 |
|
||||
| Hybrid: XSS in Agent Context | 3 | HIGH | Advisory | v5.0 |
|
||||
|
||||
### Bash Command Scanning (`pre-bash-destructive`)
|
||||
|
||||
| Category | Payloads | Severity | Action | Since |
|
||||
|----------|----------|----------|--------|-------|
|
||||
| Bash Evasion | 4 | CRITICAL | Block | v5.0 |
|
||||
|
||||
### False Positive Verification
|
||||
|
||||
| Category | Payloads | Expected | Since |
|
||||
|----------|----------|----------|-------|
|
||||
| False Positive Check | 6 | No detection | v2.0 |
|
||||
|
||||
## How It Works
|
||||
|
||||
Each payload is fed to the hook via stdin using the same JSON protocol Claude Code uses:
|
||||
|
||||
- **UserPromptSubmit** hooks receive `{ session_id, message: { role, content } }`
|
||||
- **PostToolUse** hooks receive `{ tool_name, tool_input, tool_output }`
|
||||
- **PreToolUse** hooks receive `{ tool_name, tool_input }`
|
||||
|
||||
The showcase checks exit codes and stdout:
|
||||
- Exit 2 = **blocked** (CRITICAL patterns in input scanning)
|
||||
- Exit 0 + JSON stdout = **advisory** (HIGH/MEDIUM patterns)
|
||||
- Exit 0, no output = **allowed** (clean input)
|
||||
|
||||
## Research References
|
||||
|
||||
The v5.0 categories are based on recent security research:
|
||||
|
||||
| Category | Research |
|
||||
|----------|----------|
|
||||
| Unicode Tag Steganography | DeepMind "AI Agent Traps" (2026), Category 1 |
|
||||
| Leetspeak/Homoglyphs/Zero-Width | DeepMind traps, Preamble "Prompt Injection 2.0" |
|
||||
| Bash Evasion | Preamble hybrid attacks (2026) |
|
||||
| Human-in-the-Loop Traps | DeepMind "AI Agent Traps" Category 6 |
|
||||
| Sub-Agent Spawning | DeepMind "AI Agent Traps" Category 4 |
|
||||
| Natural Language Indirection | Preamble, DeepMind CaMeL |
|
||||
| Hybrid: P2SQL | Preamble "Prompt Injection 2.0" |
|
||||
| Hybrid: Recursive Injection | Preamble, Joint paper "The Attacker Moves Second" |
|
||||
| Evasion Framing | DeepMind "AI Agent Traps", Oversight Evasion |
|
||||
|
||||
## Limitations
|
||||
|
||||
This showcase tests what the hooks **can** detect deterministically. It cannot demonstrate:
|
||||
|
||||
- Novel natural language indirection without matching patterns
|
||||
- Adaptive attacks that mutate to evade fixed regex (use `--adaptive` mode in the attack simulator)
|
||||
- Multi-step attacks spread across hundreds of tool calls (tested by post-session-guard)
|
||||
- Behavioral drift detection (requires session-length sequences)
|
||||
Loading…
Add table
Add a link
Reference in a new issue