Kjell Tore Guttormsen 92fb0087fa feat(llm-security): add toxic-agent-demo example for TFA scanner [skip-docs]

Single-component lethal-trifecta walkthrough that drives
scanners/toxic-flow-analyzer.mjs against a deliberately
misconfigured fixture plugin. The fixture agent declares
tools: [Bash, Read, WebFetch], which alone covers all three
trifecta legs (input surface + data access + exfil sink). No
hooks/hooks.json is shipped, so TFA's mitigation logic finds
no active guards and emits a CRITICAL "Lethal trifecta:"
finding without downgrade.

Plugin marker is plugin.fixture.json (recognised by isPlugin())
rather than .claude-plugin/plugin.json — the latter is blocked
by the plugin's own pre-write-pathguard hook, and
plugin.fixture.json exists in isPlugin() specifically so
example fixtures can self-mark without touching guarded paths.

Three independent assertions (3/3 must pass): direct trifecta
present and CRITICAL; finding mentions the exfil-helper
component; description confirms "no hook guards detected"
(proves the mitigation path stayed inactive). expected-findings.md
documents the contract.

OWASP / framework mapping: ASI01, ASI02, ASI05, LLM01, LLM02, LLM06.

Docs updated: plugin README "Other runnable examples", plugin
CLAUDE.md "Examples" tabellen, CHANGELOG [Unreleased] Added.
[skip-docs] is appropriate because examples don't change what
the plugin "synes å dekke utad" — marketplace root README is
unaffected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-05 15:15:04 +02:00

2.9 KiB

Raw Blame History

Expected findings — toxic-agent-demo

This is the testable contract enforced by run-toxic-flow.mjs. Three independent assertions. Any drift = scanner regression or fixture rot.

Required assertions (3 / 3 must pass)

1. Direct trifecta — single component covers all 3 legs

The TFA scanner returns at least 1 finding whose title starts with Lethal trifecta:.
At least one of those findings has severity === 'critical'.

The component covering all three legs is agents/exfil-helper.fixture.md. With tools: [Bash, Read, WebFetch], the tool-based classifier alone covers:

Leg 1 (input surface) — Bash is in INPUT_SURFACE_TOOLS.
Leg 2 (data access) — Read is in DATA_ACCESS_TOOLS; Bash also adds the "cat/find/grep capable" evidence string.
Leg 3 (exfil sink) — both Bash and WebFetch are in EXFIL_SINK_TOOLS.

Keywords in description and body reinforce all three:

Leg	Keyword(s) hit
1	`untrusted`, `user input`, `url`, `remote`
2	`secret`, `credential`, `.env`, `.aws`, `keychain`
3	`webhook`, `upload`, `curl`, `network`, `http`, `transfer`, `exfil`

2. Finding mentions the exfil-helper component

The trifecta finding's title matches /exfil-helper/i. This guards against a regression where TFA emits a generic project-level fallback instead of the per-component finding.

3. No hook guards detected

The trifecta finding's description matches /no hook guards detected/i. This proves the mitigation logic correctly identified the missing hooks/hooks.json and kept the severity at CRITICAL rather than downgrading to HIGH or MEDIUM.

If a real hooks/hooks.json is later added to the fixture, the description switches to Mitigated by active hook guards (severity reduced). and the severity drops to HIGH (one guard type) or MEDIUM (both guard types) — this assertion would fail and signal that the mitigation path activated.

Total finding shape

Total TFA findings:        1
  direct trifectas:        1
  cross-component:         0
  project-level fallback:  0
Files scanned (components): 1
Scanner status:            ok

If the scanner emits more than one direct trifecta, add a second component to the fixture or update this contract — extra findings are not a failure, but they are a deliberate fixture change.

Out of scope (intentionally)

Cross-component trifectas (would need a second agent/command splitting the legs) — see tests/lib/toxic-flow-analyzer.test.mjs for that case.
Mitigation downgrade to HIGH / MEDIUM — would need a real hooks/hooks.json referencing one of the guard scripts.
Prior-scanner enrichment via enrichFromPriorResults() — this walkthrough passes priorResults = {} to keep the demonstration reproducible and isolated. The full scan-orchestrator.mjs flow exercises that pass in production.

2.9 KiB Raw Blame History