ktg-plugin-marketplace/plugins/llm-security/examples/toxic-agent-demo/README.md

# Toxic-Flow Walkthrough — Single-Component Lethal Trifecta

> **WARNING: This is a demonstration fixture, NOT a real attack.**
> The fixture agent is deliberately misconfigured. It is never
> loaded by Claude Code — the run script only feeds the directory
> to the deterministic scanner.

## What this demonstrates

`scanners/toxic-flow-analyzer.mjs` (TFA scanner) detects **lethal
trifecta** patterns at the *plugin component* level. Where every
other scanner in this plugin looks at file content, TFA looks at
*capability combinations*: which agents/commands/skills hold which
tools, and which keywords or prior-scanner findings light up which
of the three trifecta legs.

The lethal trifecta (Willison / Invariant Labs):

1. **Untrusted input surface** — the component is exposed to data
   an attacker can control (Bash stdin, MCP output, `$ARGUMENTS`,
   remote URLs, …).
2. **Sensitive data access** — the component can read project
   secrets (`Read`, `Glob`, `Grep`, `Bash`-via-`cat`, …).
3. **Exfiltration sink** — the component can move data out of
   the process boundary (`WebFetch`, `Bash`-via-`curl`, sub-agent
   delegation, …).

When all three meet in a single component **and** no hook guards
are active, TFA emits a CRITICAL `Lethal trifecta:` finding. With
guards present, severity downgrades to HIGH or MEDIUM.

## Fixture layout

```
examples/toxic-agent-demo/
  fixture/
    plugin.fixture.json              # plugin marker (recognised by
                                     # toxic-flow-analyzer.isPlugin())
    agents/
      exfil-helper.fixture.md        # tools: [Bash, Read, WebFetch]
                                     #   - description names "untrusted user input" + "remote URL"
                                     #   - body lists .env / ~/.aws / keychain / secret
                                     #   - body references webhook / upload / curl --data
  README.md                          # this file
  run-toxic-flow.mjs                 # walkthrough runner
  expected-findings.md               # testable contract
```

The plugin marker is `plugin.fixture.json` (not `.claude-plugin/plugin.json`)
because the plugin's own `pre-write-pathguard.mjs` hook blocks all
writes inside `.claude-plugin/` — `plugin.fixture.json` is a
sentinel file `toxic-flow-analyzer.isPlugin()` recognises
specifically so example fixtures can mark themselves as plugins
without touching guarded paths.

The fixture deliberately has no `hooks/hooks.json`, so TFA's
mitigation logic finds neither an exfil guard
(`pre-bash-destructive` / `post-mcp-verify` /
`pre-install-supply-chain`) nor an input guard
(`pre-prompt-inject-scan`) and keeps the finding at CRITICAL.

## How to run

```bash
cd plugins/llm-security
node examples/toxic-agent-demo/run-toxic-flow.mjs

# Verbose — full per-finding listing with evidence string
node examples/toxic-agent-demo/run-toxic-flow.mjs --verbose
```

Expected: `3 pass, 0 fail` with 1 CRITICAL `Lethal trifecta:
exfil-helper (agent)` finding.

## Scanner involved

- **`scanners/toxic-flow-analyzer.mjs`** — invoked directly via
  `import { scan }`. Takes `(targetPath, discovery, priorResults)`.
  In this walkthrough `priorResults` is `{}` (no upstream scanners)
  so the trifecta is detected from frontmatter + keywords alone.
  In the orchestrated form (`scan-orchestrator.mjs`), TFA runs
  LAST and consumes findings from all 9 prior scanners (UNI, ENT,
  PRM, DEP, TNT, GIT, NET, MEM, SCR), which can promote
  classifications via the enrichment pass in
  `enrichFromPriorResults()`.

## Why TFA is special

Other scanners detect dangerous content. TFA detects dangerous
*architecture* — combinations that no individual file would trip,
but that together complete an exfiltration chain. A plugin can be
clean by every per-file check and still ship a single agent that
holds Bash + Read + WebFetch, in which case one prompt-injection
chain on that agent reads `.env` and uploads it.

This is a defense-in-depth complement to:

| Layer | What it covers |
|-------|----------------|
| `permission-mapper` | Excessive-tool advisories per component |
| `taint-tracer` | LLM01/LLM02 in code paths |
| `pre-prompt-inject-scan` | Runtime injection in user prompts |
| `post-session-guard` | Runtime trifecta across tool calls (Rule of Two) |
| **`toxic-flow-analyzer`** | **Capability combinations across plugin surface** |

`post-session-guard` is the runtime sibling of TFA — see
`examples/lethal-trifecta-walkthrough/` for the runtime view of
the same trifecta concept.

## OWASP / framework mapping

| Code | Framework | Why |
|------|-----------|-----|
| ASI01 | OWASP Agentic Top 10 | Memory / tool poisoning leading to action |
| ASI02 | OWASP Agentic Top 10 | Tool misuse via excess capability |
| ASI05 | OWASP Agentic Top 10 | Cascading hallucination / chained capability |
| LLM01 | OWASP LLM Top 10 (2025) | Prompt injection feeds the input leg |
| LLM02 | OWASP LLM Top 10 (2025) | Sensitive information disclosure on data-leg activation |
| LLM06 | OWASP LLM Top 10 (2025) | Excessive Agency — too many tools on one component |
| MCP1 | OWASP MCP Top 10 | MCP-borne untrusted input strengthens leg 1 (not exercised in this fixture) |
| MCP3 | OWASP MCP Top 10 | MCP-borne data-access likewise (not exercised) |

## Limitations

- The fixture exercises TFA in **isolation** (`priorResults = {}`).
  The orchestrated `scan-orchestrator.mjs` flow runs TFA after
  9 other scanners and may classify additional legs via the
  enrichment pass — leading to more findings or higher severity
  on real plugins than this minimal example shows.
- TFA's keyword + tool sets are fixed. A novel exfil verb that
  doesn't match the keyword list would not light up the leg-3
  flag without a confirming prior-scanner finding.
- TFA only runs on plugin-shaped targets (per `isPlugin()`).
  Standalone scripts and non-plugin repos are skipped — TFA is
  meant to audit the plugin attack surface, not arbitrary code.

## See also

- `scanners/toxic-flow-analyzer.mjs` — scanner source
- `tests/lib/toxic-flow-analyzer.test.mjs` — unit-test contract
- `examples/lethal-trifecta-walkthrough/` — runtime trifecta
  (post-session-guard, Rule of Two, sliding window)
- `knowledge/owasp-agentic-top10.md` — ASI01 / ASI02 / ASI05 background
- `expected-findings.md` (in this folder) — the testable contract