feat(llm-security): add toxic-agent-demo example for TFA scanner [skip-docs]
Single-component lethal-trifecta walkthrough that drives scanners/toxic-flow-analyzer.mjs against a deliberately misconfigured fixture plugin. The fixture agent declares tools: [Bash, Read, WebFetch], which alone covers all three trifecta legs (input surface + data access + exfil sink). No hooks/hooks.json is shipped, so TFA's mitigation logic finds no active guards and emits a CRITICAL "Lethal trifecta:" finding without downgrade. Plugin marker is plugin.fixture.json (recognised by isPlugin()) rather than .claude-plugin/plugin.json — the latter is blocked by the plugin's own pre-write-pathguard hook, and plugin.fixture.json exists in isPlugin() specifically so example fixtures can self-mark without touching guarded paths. Three independent assertions (3/3 must pass): direct trifecta present and CRITICAL; finding mentions the exfil-helper component; description confirms "no hook guards detected" (proves the mitigation path stayed inactive). expected-findings.md documents the contract. OWASP / framework mapping: ASI01, ASI02, ASI05, LLM01, LLM02, LLM06. Docs updated: plugin README "Other runnable examples", plugin CLAUDE.md "Examples" tabellen, CHANGELOG [Unreleased] Added. [skip-docs] is appropriate because examples don't change what the plugin "synes å dekke utad" — marketplace root README is unaffected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
15607b182e
commit
92fb0087fa
8 changed files with 422 additions and 0 deletions
144
plugins/llm-security/examples/toxic-agent-demo/README.md
Normal file
144
plugins/llm-security/examples/toxic-agent-demo/README.md
Normal file
|
|
@ -0,0 +1,144 @@
|
|||
# Toxic-Flow Walkthrough — Single-Component Lethal Trifecta
|
||||
|
||||
> **WARNING: This is a demonstration fixture, NOT a real attack.**
|
||||
> The fixture agent is deliberately misconfigured. It is never
|
||||
> loaded by Claude Code — the run script only feeds the directory
|
||||
> to the deterministic scanner.
|
||||
|
||||
## What this demonstrates
|
||||
|
||||
`scanners/toxic-flow-analyzer.mjs` (TFA scanner) detects **lethal
|
||||
trifecta** patterns at the *plugin component* level. Where every
|
||||
other scanner in this plugin looks at file content, TFA looks at
|
||||
*capability combinations*: which agents/commands/skills hold which
|
||||
tools, and which keywords or prior-scanner findings light up which
|
||||
of the three trifecta legs.
|
||||
|
||||
The lethal trifecta (Willison / Invariant Labs):
|
||||
|
||||
1. **Untrusted input surface** — the component is exposed to data
|
||||
an attacker can control (Bash stdin, MCP output, `$ARGUMENTS`,
|
||||
remote URLs, …).
|
||||
2. **Sensitive data access** — the component can read project
|
||||
secrets (`Read`, `Glob`, `Grep`, `Bash`-via-`cat`, …).
|
||||
3. **Exfiltration sink** — the component can move data out of
|
||||
the process boundary (`WebFetch`, `Bash`-via-`curl`, sub-agent
|
||||
delegation, …).
|
||||
|
||||
When all three meet in a single component **and** no hook guards
|
||||
are active, TFA emits a CRITICAL `Lethal trifecta:` finding. With
|
||||
guards present, severity downgrades to HIGH or MEDIUM.
|
||||
|
||||
## Fixture layout
|
||||
|
||||
```
|
||||
examples/toxic-agent-demo/
|
||||
fixture/
|
||||
plugin.fixture.json # plugin marker (recognised by
|
||||
# toxic-flow-analyzer.isPlugin())
|
||||
agents/
|
||||
exfil-helper.fixture.md # tools: [Bash, Read, WebFetch]
|
||||
# - description names "untrusted user input" + "remote URL"
|
||||
# - body lists .env / ~/.aws / keychain / secret
|
||||
# - body references webhook / upload / curl --data
|
||||
README.md # this file
|
||||
run-toxic-flow.mjs # walkthrough runner
|
||||
expected-findings.md # testable contract
|
||||
```
|
||||
|
||||
The plugin marker is `plugin.fixture.json` (not `.claude-plugin/plugin.json`)
|
||||
because the plugin's own `pre-write-pathguard.mjs` hook blocks all
|
||||
writes inside `.claude-plugin/` — `plugin.fixture.json` is a
|
||||
sentinel file `toxic-flow-analyzer.isPlugin()` recognises
|
||||
specifically so example fixtures can mark themselves as plugins
|
||||
without touching guarded paths.
|
||||
|
||||
The fixture deliberately has no `hooks/hooks.json`, so TFA's
|
||||
mitigation logic finds neither an exfil guard
|
||||
(`pre-bash-destructive` / `post-mcp-verify` /
|
||||
`pre-install-supply-chain`) nor an input guard
|
||||
(`pre-prompt-inject-scan`) and keeps the finding at CRITICAL.
|
||||
|
||||
## How to run
|
||||
|
||||
```bash
|
||||
cd plugins/llm-security
|
||||
node examples/toxic-agent-demo/run-toxic-flow.mjs
|
||||
|
||||
# Verbose — full per-finding listing with evidence string
|
||||
node examples/toxic-agent-demo/run-toxic-flow.mjs --verbose
|
||||
```
|
||||
|
||||
Expected: `3 pass, 0 fail` with 1 CRITICAL `Lethal trifecta:
|
||||
exfil-helper (agent)` finding.
|
||||
|
||||
## Scanner involved
|
||||
|
||||
- **`scanners/toxic-flow-analyzer.mjs`** — invoked directly via
|
||||
`import { scan }`. Takes `(targetPath, discovery, priorResults)`.
|
||||
In this walkthrough `priorResults` is `{}` (no upstream scanners)
|
||||
so the trifecta is detected from frontmatter + keywords alone.
|
||||
In the orchestrated form (`scan-orchestrator.mjs`), TFA runs
|
||||
LAST and consumes findings from all 9 prior scanners (UNI, ENT,
|
||||
PRM, DEP, TNT, GIT, NET, MEM, SCR), which can promote
|
||||
classifications via the enrichment pass in
|
||||
`enrichFromPriorResults()`.
|
||||
|
||||
## Why TFA is special
|
||||
|
||||
Other scanners detect dangerous content. TFA detects dangerous
|
||||
*architecture* — combinations that no individual file would trip,
|
||||
but that together complete an exfiltration chain. A plugin can be
|
||||
clean by every per-file check and still ship a single agent that
|
||||
holds Bash + Read + WebFetch, in which case one prompt-injection
|
||||
chain on that agent reads `.env` and uploads it.
|
||||
|
||||
This is a defense-in-depth complement to:
|
||||
|
||||
| Layer | What it covers |
|
||||
|-------|----------------|
|
||||
| `permission-mapper` | Excessive-tool advisories per component |
|
||||
| `taint-tracer` | LLM01/LLM02 in code paths |
|
||||
| `pre-prompt-inject-scan` | Runtime injection in user prompts |
|
||||
| `post-session-guard` | Runtime trifecta across tool calls (Rule of Two) |
|
||||
| **`toxic-flow-analyzer`** | **Capability combinations across plugin surface** |
|
||||
|
||||
`post-session-guard` is the runtime sibling of TFA — see
|
||||
`examples/lethal-trifecta-walkthrough/` for the runtime view of
|
||||
the same trifecta concept.
|
||||
|
||||
## OWASP / framework mapping
|
||||
|
||||
| Code | Framework | Why |
|
||||
|------|-----------|-----|
|
||||
| ASI01 | OWASP Agentic Top 10 | Memory / tool poisoning leading to action |
|
||||
| ASI02 | OWASP Agentic Top 10 | Tool misuse via excess capability |
|
||||
| ASI05 | OWASP Agentic Top 10 | Cascading hallucination / chained capability |
|
||||
| LLM01 | OWASP LLM Top 10 (2025) | Prompt injection feeds the input leg |
|
||||
| LLM02 | OWASP LLM Top 10 (2025) | Sensitive information disclosure on data-leg activation |
|
||||
| LLM06 | OWASP LLM Top 10 (2025) | Excessive Agency — too many tools on one component |
|
||||
| MCP1 | OWASP MCP Top 10 | MCP-borne untrusted input strengthens leg 1 (not exercised in this fixture) |
|
||||
| MCP3 | OWASP MCP Top 10 | MCP-borne data-access likewise (not exercised) |
|
||||
|
||||
## Limitations
|
||||
|
||||
- The fixture exercises TFA in **isolation** (`priorResults = {}`).
|
||||
The orchestrated `scan-orchestrator.mjs` flow runs TFA after
|
||||
9 other scanners and may classify additional legs via the
|
||||
enrichment pass — leading to more findings or higher severity
|
||||
on real plugins than this minimal example shows.
|
||||
- TFA's keyword + tool sets are fixed. A novel exfil verb that
|
||||
doesn't match the keyword list would not light up the leg-3
|
||||
flag without a confirming prior-scanner finding.
|
||||
- TFA only runs on plugin-shaped targets (per `isPlugin()`).
|
||||
Standalone scripts and non-plugin repos are skipped — TFA is
|
||||
meant to audit the plugin attack surface, not arbitrary code.
|
||||
|
||||
## See also
|
||||
|
||||
- `scanners/toxic-flow-analyzer.mjs` — scanner source
|
||||
- `tests/lib/toxic-flow-analyzer.test.mjs` — unit-test contract
|
||||
- `examples/lethal-trifecta-walkthrough/` — runtime trifecta
|
||||
(post-session-guard, Rule of Two, sliding window)
|
||||
- `knowledge/owasp-agentic-top10.md` — ASI01 / ASI02 / ASI05 background
|
||||
- `expected-findings.md` (in this folder) — the testable contract
|
||||
Loading…
Add table
Add a link
Reference in a new issue