ktg-plugin-marketplace/plugins/llm-security/examples/prompt-injection-showcase/README.md

3.8 KiB

Prompt Injection Detection Showcase

Demonstrates what llm-security's runtime hooks detect — from classic injection to v5.0's advanced evasion techniques. Each payload is fed to the actual hook and verified.

Quick Start

cd plugins/llm-security
node examples/prompt-injection-showcase/run-showcase.mjs

# Filter by category
node examples/prompt-injection-showcase/run-showcase.mjs --category "Bash Evasion"

# Show hook output details
node examples/prompt-injection-showcase/run-showcase.mjs --verbose

What's Tested

61 payloads across 19 categories, verified against 3 runtime hooks:

Input Scanning (pre-prompt-inject-scan)

Category Payloads Severity Action Since
Direct Override 6 CRITICAL Block v2.0
Spoofed Headers 4 CRITICAL Block v2.0
Identity Hijack 4 CRITICAL Block v2.0
Encoding Evasion 3 CRITICAL Block v2.3
Unicode Tag Steganography 2 CRITICAL/HIGH Block/Advisory v5.0
Leetspeak Obfuscation 3 MEDIUM Advisory v5.0
Homoglyph Mixing 2 MEDIUM Advisory v5.0
Zero-Width Evasion 1 MEDIUM Advisory v5.0
Multi-Language Injection 3 MEDIUM Advisory v5.0
HTML/CSS Obfuscation 3 HIGH Advisory v2.3-4.0
Evasion Framing 3 HIGH Advisory v4.0

Output Scanning (post-mcp-verify)

Category Payloads Severity Action Since
Human-in-the-Loop Traps 4 HIGH Advisory v5.0
Natural Language Indirection 4 MEDIUM Advisory v5.0
Sub-Agent Spawning 2 MEDIUM Advisory v5.0
Hybrid: P2SQL 2 HIGH Advisory v5.0
Hybrid: Recursive Injection 2 HIGH Advisory v5.0
Hybrid: XSS in Agent Context 3 HIGH Advisory v5.0

Bash Command Scanning (pre-bash-destructive)

Category Payloads Severity Action Since
Bash Evasion 4 CRITICAL Block v5.0

False Positive Verification

Category Payloads Expected Since
False Positive Check 6 No detection v2.0

How It Works

Each payload is fed to the hook via stdin using the same JSON protocol Claude Code uses:

  • UserPromptSubmit hooks receive { session_id, message: { role, content } }
  • PostToolUse hooks receive { tool_name, tool_input, tool_output }
  • PreToolUse hooks receive { tool_name, tool_input }

The showcase checks exit codes and stdout:

  • Exit 2 = blocked (CRITICAL patterns in input scanning)
  • Exit 0 + JSON stdout = advisory (HIGH/MEDIUM patterns)
  • Exit 0, no output = allowed (clean input)

Research References

The v5.0 categories are based on recent security research:

Category Research
Unicode Tag Steganography DeepMind "AI Agent Traps" (2026), Category 1
Leetspeak/Homoglyphs/Zero-Width DeepMind traps, Preamble "Prompt Injection 2.0"
Bash Evasion Preamble hybrid attacks (2026)
Human-in-the-Loop Traps DeepMind "AI Agent Traps" Category 6
Sub-Agent Spawning DeepMind "AI Agent Traps" Category 4
Natural Language Indirection Preamble, DeepMind CaMeL
Hybrid: P2SQL Preamble "Prompt Injection 2.0"
Hybrid: Recursive Injection Preamble, Joint paper "The Attacker Moves Second"
Evasion Framing DeepMind "AI Agent Traps", Oversight Evasion

Limitations

This showcase tests what the hooks can detect deterministically. It cannot demonstrate:

  • Novel natural language indirection without matching patterns
  • Adaptive attacks that mutate to evade fixed regex (use --adaptive mode in the attack simulator)
  • Multi-step attacks spread across hundreds of tool calls (tested by post-session-guard)
  • Behavioral drift detection (requires session-length sequences)