Kjell Tore Guttormsen 92fb0087fa feat(llm-security): add toxic-agent-demo example for TFA scanner [skip-docs]

Single-component lethal-trifecta walkthrough that drives
scanners/toxic-flow-analyzer.mjs against a deliberately
misconfigured fixture plugin. The fixture agent declares
tools: [Bash, Read, WebFetch], which alone covers all three
trifecta legs (input surface + data access + exfil sink). No
hooks/hooks.json is shipped, so TFA's mitigation logic finds
no active guards and emits a CRITICAL "Lethal trifecta:"
finding without downgrade.

Plugin marker is plugin.fixture.json (recognised by isPlugin())
rather than .claude-plugin/plugin.json — the latter is blocked
by the plugin's own pre-write-pathguard hook, and
plugin.fixture.json exists in isPlugin() specifically so
example fixtures can self-mark without touching guarded paths.

Three independent assertions (3/3 must pass): direct trifecta
present and CRITICAL; finding mentions the exfil-helper
component; description confirms "no hook guards detected"
(proves the mitigation path stayed inactive). expected-findings.md
documents the contract.

OWASP / framework mapping: ASI01, ASI02, ASI05, LLM01, LLM02, LLM06.

Docs updated: plugin README "Other runnable examples", plugin
CLAUDE.md "Examples" tabellen, CHANGELOG [Unreleased] Added.
[skip-docs] is appropriate because examples don't change what
the plugin "synes å dekke utad" — marketplace root README is
unaffected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-05 15:15:04 +02:00

6.2 KiB

Raw Blame History

Toxic-Flow Walkthrough — Single-Component Lethal Trifecta

WARNING: This is a demonstration fixture, NOT a real attack. The fixture agent is deliberately misconfigured. It is never loaded by Claude Code — the run script only feeds the directory to the deterministic scanner.

What this demonstrates

scanners/toxic-flow-analyzer.mjs (TFA scanner) detects lethal trifecta patterns at the plugin component level. Where every other scanner in this plugin looks at file content, TFA looks at capability combinations: which agents/commands/skills hold which tools, and which keywords or prior-scanner findings light up which of the three trifecta legs.

The lethal trifecta (Willison / Invariant Labs):

Untrusted input surface — the component is exposed to data an attacker can control (Bash stdin, MCP output, $ARGUMENTS, remote URLs, …).
Sensitive data access — the component can read project secrets (Read, Glob, Grep, Bash-via-cat, …).
Exfiltration sink — the component can move data out of the process boundary (WebFetch, Bash-via-curl, sub-agent delegation, …).

When all three meet in a single component and no hook guards are active, TFA emits a CRITICAL Lethal trifecta: finding. With guards present, severity downgrades to HIGH or MEDIUM.

Fixture layout

examples/toxic-agent-demo/
  fixture/
    plugin.fixture.json              # plugin marker (recognised by
                                     # toxic-flow-analyzer.isPlugin())
    agents/
      exfil-helper.fixture.md        # tools: [Bash, Read, WebFetch]
                                     #   - description names "untrusted user input" + "remote URL"
                                     #   - body lists .env / ~/.aws / keychain / secret
                                     #   - body references webhook / upload / curl --data
  README.md                          # this file
  run-toxic-flow.mjs                 # walkthrough runner
  expected-findings.md               # testable contract

The plugin marker is plugin.fixture.json (not .claude-plugin/plugin.json) because the plugin's own pre-write-pathguard.mjs hook blocks all writes inside .claude-plugin/ — plugin.fixture.json is a sentinel file toxic-flow-analyzer.isPlugin() recognises specifically so example fixtures can mark themselves as plugins without touching guarded paths.

The fixture deliberately has no hooks/hooks.json, so TFA's mitigation logic finds neither an exfil guard (pre-bash-destructive / post-mcp-verify / pre-install-supply-chain) nor an input guard (pre-prompt-inject-scan) and keeps the finding at CRITICAL.

How to run

cd plugins/llm-security
node examples/toxic-agent-demo/run-toxic-flow.mjs

# Verbose — full per-finding listing with evidence string
node examples/toxic-agent-demo/run-toxic-flow.mjs --verbose

Expected: 3 pass, 0 fail with 1 CRITICAL Lethal trifecta: exfil-helper (agent) finding.

Scanner involved

scanners/toxic-flow-analyzer.mjs — invoked directly via import { scan }. Takes (targetPath, discovery, priorResults). In this walkthrough priorResults is {} (no upstream scanners) so the trifecta is detected from frontmatter + keywords alone. In the orchestrated form (scan-orchestrator.mjs), TFA runs LAST and consumes findings from all 9 prior scanners (UNI, ENT, PRM, DEP, TNT, GIT, NET, MEM, SCR), which can promote classifications via the enrichment pass in enrichFromPriorResults().

Why TFA is special

Other scanners detect dangerous content. TFA detects dangerous architecture — combinations that no individual file would trip, but that together complete an exfiltration chain. A plugin can be clean by every per-file check and still ship a single agent that holds Bash + Read + WebFetch, in which case one prompt-injection chain on that agent reads .env and uploads it.

This is a defense-in-depth complement to:

Layer	What it covers
`permission-mapper`	Excessive-tool advisories per component
`taint-tracer`	LLM01/LLM02 in code paths
`pre-prompt-inject-scan`	Runtime injection in user prompts
`post-session-guard`	Runtime trifecta across tool calls (Rule of Two)
`toxic-flow-analyzer`	Capability combinations across plugin surface

post-session-guard is the runtime sibling of TFA — see examples/lethal-trifecta-walkthrough/ for the runtime view of the same trifecta concept.

OWASP / framework mapping

Code	Framework	Why
ASI01	OWASP Agentic Top 10	Memory / tool poisoning leading to action
ASI02	OWASP Agentic Top 10	Tool misuse via excess capability
ASI05	OWASP Agentic Top 10	Cascading hallucination / chained capability
LLM01	OWASP LLM Top 10 (2025)	Prompt injection feeds the input leg
LLM02	OWASP LLM Top 10 (2025)	Sensitive information disclosure on data-leg activation
LLM06	OWASP LLM Top 10 (2025)	Excessive Agency — too many tools on one component
MCP1	OWASP MCP Top 10	MCP-borne untrusted input strengthens leg 1 (not exercised in this fixture)
MCP3	OWASP MCP Top 10	MCP-borne data-access likewise (not exercised)

Limitations

The fixture exercises TFA in isolation (priorResults = {}). The orchestrated scan-orchestrator.mjs flow runs TFA after 9 other scanners and may classify additional legs via the enrichment pass — leading to more findings or higher severity on real plugins than this minimal example shows.
TFA's keyword + tool sets are fixed. A novel exfil verb that doesn't match the keyword list would not light up the leg-3 flag without a confirming prior-scanner finding.
TFA only runs on plugin-shaped targets (per isPlugin()). Standalone scripts and non-plugin repos are skipped — TFA is meant to audit the plugin attack surface, not arbitrary code.

6.2 KiB Raw Blame History