feat: initial open marketplace with llm-security, config-audit, ultraplan-local

2026-04-06 18:47:49 +02:00 · 2026-04-06 18:47:49 +02:00 · f93d6abdae
commit f93d6abdae
380 changed files with 65935 additions and 0 deletions
--- a/plugins/llm-security/examples/prompt-injection-showcase/README.md
+++ b/plugins/llm-security/examples/prompt-injection-showcase/README.md
@ -0,0 +1,97 @@
+# Prompt Injection Detection Showcase
+
+Demonstrates what llm-security's runtime hooks detect — from classic injection to v5.0's advanced evasion techniques. Each payload is fed to the actual hook and verified.
+
+## Quick Start
+
+```bash
+cd plugins/llm-security
+node examples/prompt-injection-showcase/run-showcase.mjs
+
+# Filter by category
+node examples/prompt-injection-showcase/run-showcase.mjs --category "Bash Evasion"
+
+# Show hook output details
+node examples/prompt-injection-showcase/run-showcase.mjs --verbose
+```
+
+## What's Tested
+
+61 payloads across 19 categories, verified against 3 runtime hooks:
+
+### Input Scanning (`pre-prompt-inject-scan`)
+
+| Category | Payloads | Severity | Action | Since |
+|----------|----------|----------|--------|-------|
+| Direct Override | 6 | CRITICAL | Block | v2.0 |
+| Spoofed Headers | 4 | CRITICAL | Block | v2.0 |
+| Identity Hijack | 4 | CRITICAL | Block | v2.0 |
+| Encoding Evasion | 3 | CRITICAL | Block | v2.3 |
+| Unicode Tag Steganography | 2 | CRITICAL/HIGH | Block/Advisory | v5.0 |
+| Leetspeak Obfuscation | 3 | MEDIUM | Advisory | v5.0 |
+| Homoglyph Mixing | 2 | MEDIUM | Advisory | v5.0 |
+| Zero-Width Evasion | 1 | MEDIUM | Advisory | v5.0 |
+| Multi-Language Injection | 3 | MEDIUM | Advisory | v5.0 |
+| HTML/CSS Obfuscation | 3 | HIGH | Advisory | v2.3-4.0 |
+| Evasion Framing | 3 | HIGH | Advisory | v4.0 |
+
+### Output Scanning (`post-mcp-verify`)
+
+| Category | Payloads | Severity | Action | Since |
+|----------|----------|----------|--------|-------|
+| Human-in-the-Loop Traps | 4 | HIGH | Advisory | v5.0 |
+| Natural Language Indirection | 4 | MEDIUM | Advisory | v5.0 |
+| Sub-Agent Spawning | 2 | MEDIUM | Advisory | v5.0 |
+| Hybrid: P2SQL | 2 | HIGH | Advisory | v5.0 |
+| Hybrid: Recursive Injection | 2 | HIGH | Advisory | v5.0 |
+| Hybrid: XSS in Agent Context | 3 | HIGH | Advisory | v5.0 |
+
+### Bash Command Scanning (`pre-bash-destructive`)
+
+| Category | Payloads | Severity | Action | Since |
+|----------|----------|----------|--------|-------|
+| Bash Evasion | 4 | CRITICAL | Block | v5.0 |
+
+### False Positive Verification
+
+| Category | Payloads | Expected | Since |
+|----------|----------|----------|-------|
+| False Positive Check | 6 | No detection | v2.0 |
+
+## How It Works
+
+Each payload is fed to the hook via stdin using the same JSON protocol Claude Code uses:
+
+- **UserPromptSubmit** hooks receive `{ session_id, message: { role, content } }`
+- **PostToolUse** hooks receive `{ tool_name, tool_input, tool_output }`
+- **PreToolUse** hooks receive `{ tool_name, tool_input }`
+
+The showcase checks exit codes and stdout:
+- Exit 2 = **blocked** (CRITICAL patterns in input scanning)
+- Exit 0 + JSON stdout = **advisory** (HIGH/MEDIUM patterns)
+- Exit 0, no output = **allowed** (clean input)
+
+## Research References
+
+The v5.0 categories are based on recent security research:
+
+| Category | Research |
+|----------|----------|
+| Unicode Tag Steganography | DeepMind "AI Agent Traps" (2026), Category 1 |
+| Leetspeak/Homoglyphs/Zero-Width | DeepMind traps, Preamble "Prompt Injection 2.0" |
+| Bash Evasion | Preamble hybrid attacks (2026) |
+| Human-in-the-Loop Traps | DeepMind "AI Agent Traps" Category 6 |
+| Sub-Agent Spawning | DeepMind "AI Agent Traps" Category 4 |
+| Natural Language Indirection | Preamble, DeepMind CaMeL |
+| Hybrid: P2SQL | Preamble "Prompt Injection 2.0" |
+| Hybrid: Recursive Injection | Preamble, Joint paper "The Attacker Moves Second" |
+| Evasion Framing | DeepMind "AI Agent Traps", Oversight Evasion |
+
+## Limitations
+
+This showcase tests what the hooks **can** detect deterministically. It cannot demonstrate:
+
+- Novel natural language indirection without matching patterns
+- Adaptive attacks that mutate to evade fixed regex (use `--adaptive` mode in the attack simulator)
+- Multi-step attacks spread across hundreds of tool calls (tested by post-session-guard)
+- Behavioral drift detection (requires session-length sequences)