feat(llm-security): add toxic-agent-demo example for TFA scanner [skip-docs]

Single-component lethal-trifecta walkthrough that drives scanners/toxic-flow-analyzer.mjs against a deliberately misconfigured fixture plugin. The fixture agent declares tools: [Bash, Read, WebFetch], which alone covers all three trifecta legs (input surface + data access + exfil sink). No hooks/hooks.json is shipped, so TFA's mitigation logic finds no active guards and emits a CRITICAL "Lethal trifecta:" finding without downgrade. Plugin marker is plugin.fixture.json (recognised by isPlugin()) rather than .claude-plugin/plugin.json — the latter is blocked by the plugin's own pre-write-pathguard hook, and plugin.fixture.json exists in isPlugin() specifically so example fixtures can self-mark without touching guarded paths. Three independent assertions (3/3 must pass): direct trifecta present and CRITICAL; finding mentions the exfil-helper component; description confirms "no hook guards detected" (proves the mitigation path stayed inactive). expected-findings.md documents the contract. OWASP / framework mapping: ASI01, ASI02, ASI05, LLM01, LLM02, LLM06. Docs updated: plugin README "Other runnable examples", plugin CLAUDE.md "Examples" tabellen, CHANGELOG [Unreleased] Added. [skip-docs] is appropriate because examples don't change what the plugin "synes å dekke utad" — marketplace root README is unaffected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-05 15:15:04 +02:00 · 2026-05-05 15:15:04 +02:00 · 92fb0087fa
commit 92fb0087fa
parent 15607b182e
8 changed files with 422 additions and 0 deletions
--- a/plugins/llm-security/examples/toxic-agent-demo/README.md
+++ b/plugins/llm-security/examples/toxic-agent-demo/README.md
@ -0,0 +1,144 @@
+# Toxic-Flow Walkthrough — Single-Component Lethal Trifecta
+
+> **WARNING: This is a demonstration fixture, NOT a real attack.**
+> The fixture agent is deliberately misconfigured. It is never
+> loaded by Claude Code — the run script only feeds the directory
+> to the deterministic scanner.
+
+## What this demonstrates
+
+`scanners/toxic-flow-analyzer.mjs` (TFA scanner) detects **lethal
+trifecta** patterns at the *plugin component* level. Where every
+other scanner in this plugin looks at file content, TFA looks at
+*capability combinations*: which agents/commands/skills hold which
+tools, and which keywords or prior-scanner findings light up which
+of the three trifecta legs.
+
+The lethal trifecta (Willison / Invariant Labs):
+
+1. **Untrusted input surface** — the component is exposed to data
+   an attacker can control (Bash stdin, MCP output, `$ARGUMENTS`,
+   remote URLs, …).
+2. **Sensitive data access** — the component can read project
+   secrets (`Read`, `Glob`, `Grep`, `Bash`-via-`cat`, …).
+3. **Exfiltration sink** — the component can move data out of
+   the process boundary (`WebFetch`, `Bash`-via-`curl`, sub-agent
+   delegation, …).
+
+When all three meet in a single component **and** no hook guards
+are active, TFA emits a CRITICAL `Lethal trifecta:` finding. With
+guards present, severity downgrades to HIGH or MEDIUM.
+
+## Fixture layout
+
+```
+examples/toxic-agent-demo/
+  fixture/
+    plugin.fixture.json              # plugin marker (recognised by
+                                     # toxic-flow-analyzer.isPlugin())
+    agents/
+      exfil-helper.fixture.md        # tools: [Bash, Read, WebFetch]
+                                     #   - description names "untrusted user input" + "remote URL"
+                                     #   - body lists .env / ~/.aws / keychain / secret
+                                     #   - body references webhook / upload / curl --data
+  README.md                          # this file
+  run-toxic-flow.mjs                 # walkthrough runner
+  expected-findings.md               # testable contract
+```
+
+The plugin marker is `plugin.fixture.json` (not `.claude-plugin/plugin.json`)
+because the plugin's own `pre-write-pathguard.mjs` hook blocks all
+writes inside `.claude-plugin/` — `plugin.fixture.json` is a
+sentinel file `toxic-flow-analyzer.isPlugin()` recognises
+specifically so example fixtures can mark themselves as plugins
+without touching guarded paths.
+
+The fixture deliberately has no `hooks/hooks.json`, so TFA's
+mitigation logic finds neither an exfil guard
+(`pre-bash-destructive` / `post-mcp-verify` /
+`pre-install-supply-chain`) nor an input guard
+(`pre-prompt-inject-scan`) and keeps the finding at CRITICAL.
+
+## How to run
+
+```bash
+cd plugins/llm-security
+node examples/toxic-agent-demo/run-toxic-flow.mjs
+
+# Verbose — full per-finding listing with evidence string
+node examples/toxic-agent-demo/run-toxic-flow.mjs --verbose
+```
+
+Expected: `3 pass, 0 fail` with 1 CRITICAL `Lethal trifecta:
+exfil-helper (agent)` finding.
+
+## Scanner involved
+
+- **`scanners/toxic-flow-analyzer.mjs`** — invoked directly via
+  `import { scan }`. Takes `(targetPath, discovery, priorResults)`.
+  In this walkthrough `priorResults` is `{}` (no upstream scanners)
+  so the trifecta is detected from frontmatter + keywords alone.
+  In the orchestrated form (`scan-orchestrator.mjs`), TFA runs
+  LAST and consumes findings from all 9 prior scanners (UNI, ENT,
+  PRM, DEP, TNT, GIT, NET, MEM, SCR), which can promote
+  classifications via the enrichment pass in
+  `enrichFromPriorResults()`.
+
+## Why TFA is special
+
+Other scanners detect dangerous content. TFA detects dangerous
+*architecture* — combinations that no individual file would trip,
+but that together complete an exfiltration chain. A plugin can be
+clean by every per-file check and still ship a single agent that
+holds Bash + Read + WebFetch, in which case one prompt-injection
+chain on that agent reads `.env` and uploads it.
+
+This is a defense-in-depth complement to:
+
+| Layer | What it covers |
+|-------|----------------|
+| `permission-mapper` | Excessive-tool advisories per component |
+| `taint-tracer` | LLM01/LLM02 in code paths |
+| `pre-prompt-inject-scan` | Runtime injection in user prompts |
+| `post-session-guard` | Runtime trifecta across tool calls (Rule of Two) |
+| **`toxic-flow-analyzer`** | **Capability combinations across plugin surface** |
+
+`post-session-guard` is the runtime sibling of TFA — see
+`examples/lethal-trifecta-walkthrough/` for the runtime view of
+the same trifecta concept.
+
+## OWASP / framework mapping
+
+| Code | Framework | Why |
+|------|-----------|-----|
+| ASI01 | OWASP Agentic Top 10 | Memory / tool poisoning leading to action |
+| ASI02 | OWASP Agentic Top 10 | Tool misuse via excess capability |
+| ASI05 | OWASP Agentic Top 10 | Cascading hallucination / chained capability |
+| LLM01 | OWASP LLM Top 10 (2025) | Prompt injection feeds the input leg |
+| LLM02 | OWASP LLM Top 10 (2025) | Sensitive information disclosure on data-leg activation |
+| LLM06 | OWASP LLM Top 10 (2025) | Excessive Agency — too many tools on one component |
+| MCP1 | OWASP MCP Top 10 | MCP-borne untrusted input strengthens leg 1 (not exercised in this fixture) |
+| MCP3 | OWASP MCP Top 10 | MCP-borne data-access likewise (not exercised) |
+
+## Limitations
+
+- The fixture exercises TFA in **isolation** (`priorResults = {}`).
+  The orchestrated `scan-orchestrator.mjs` flow runs TFA after
+  9 other scanners and may classify additional legs via the
+  enrichment pass — leading to more findings or higher severity
+  on real plugins than this minimal example shows.
+- TFA's keyword + tool sets are fixed. A novel exfil verb that
+  doesn't match the keyword list would not light up the leg-3
+  flag without a confirming prior-scanner finding.
+- TFA only runs on plugin-shaped targets (per `isPlugin()`).
+  Standalone scripts and non-plugin repos are skipped — TFA is
+  meant to audit the plugin attack surface, not arbitrary code.
+
+## See also
+
+- `scanners/toxic-flow-analyzer.mjs` — scanner source
+- `tests/lib/toxic-flow-analyzer.test.mjs` — unit-test contract
+- `examples/lethal-trifecta-walkthrough/` — runtime trifecta
+  (post-session-guard, Rule of Two, sliding window)
+- `knowledge/owasp-agentic-top10.md` — ASI01 / ASI02 / ASI05 background
+- `expected-findings.md` (in this folder) — the testable contract