feat(llm-security): add toxic-agent-demo example for TFA scanner [skip-docs]
Single-component lethal-trifecta walkthrough that drives scanners/toxic-flow-analyzer.mjs against a deliberately misconfigured fixture plugin. The fixture agent declares tools: [Bash, Read, WebFetch], which alone covers all three trifecta legs (input surface + data access + exfil sink). No hooks/hooks.json is shipped, so TFA's mitigation logic finds no active guards and emits a CRITICAL "Lethal trifecta:" finding without downgrade. Plugin marker is plugin.fixture.json (recognised by isPlugin()) rather than .claude-plugin/plugin.json — the latter is blocked by the plugin's own pre-write-pathguard hook, and plugin.fixture.json exists in isPlugin() specifically so example fixtures can self-mark without touching guarded paths. Three independent assertions (3/3 must pass): direct trifecta present and CRITICAL; finding mentions the exfil-helper component; description confirms "no hook guards detected" (proves the mitigation path stayed inactive). expected-findings.md documents the contract. OWASP / framework mapping: ASI01, ASI02, ASI05, LLM01, LLM02, LLM06. Docs updated: plugin README "Other runnable examples", plugin CLAUDE.md "Examples" tabellen, CHANGELOG [Unreleased] Added. [skip-docs] is appropriate because examples don't change what the plugin "synes å dekke utad" — marketplace root README is unaffected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
15607b182e
commit
92fb0087fa
8 changed files with 422 additions and 0 deletions
|
|
@ -0,0 +1,78 @@
|
|||
# Expected findings — toxic-agent-demo
|
||||
|
||||
This is the testable contract enforced by `run-toxic-flow.mjs`.
|
||||
Three independent assertions. Any drift = scanner regression or
|
||||
fixture rot.
|
||||
|
||||
## Required assertions (3 / 3 must pass)
|
||||
|
||||
### 1. Direct trifecta — single component covers all 3 legs
|
||||
|
||||
- The TFA scanner returns at least 1 finding whose `title`
|
||||
starts with `Lethal trifecta:`.
|
||||
- At least one of those findings has `severity === 'critical'`.
|
||||
|
||||
The component covering all three legs is
|
||||
`agents/exfil-helper.fixture.md`. With `tools: [Bash, Read,
|
||||
WebFetch]`, the tool-based classifier alone covers:
|
||||
|
||||
- **Leg 1** (input surface) — `Bash` is in `INPUT_SURFACE_TOOLS`.
|
||||
- **Leg 2** (data access) — `Read` is in `DATA_ACCESS_TOOLS`;
|
||||
`Bash` also adds the "cat/find/grep capable" evidence string.
|
||||
- **Leg 3** (exfil sink) — both `Bash` and `WebFetch` are in
|
||||
`EXFIL_SINK_TOOLS`.
|
||||
|
||||
Keywords in description and body reinforce all three:
|
||||
|
||||
| Leg | Keyword(s) hit |
|
||||
|-----|----------------|
|
||||
| 1 | `untrusted`, `user input`, `url`, `remote` |
|
||||
| 2 | `secret`, `credential`, `.env`, `.aws`, `keychain` |
|
||||
| 3 | `webhook`, `upload`, `curl`, `network`, `http`, `transfer`, `exfil` |
|
||||
|
||||
### 2. Finding mentions the exfil-helper component
|
||||
|
||||
The trifecta finding's `title` matches `/exfil-helper/i`. This
|
||||
guards against a regression where TFA emits a generic
|
||||
project-level fallback instead of the per-component finding.
|
||||
|
||||
### 3. No hook guards detected
|
||||
|
||||
The trifecta finding's `description` matches
|
||||
`/no hook guards detected/i`. This proves the mitigation logic
|
||||
correctly identified the missing `hooks/hooks.json` and kept the
|
||||
severity at CRITICAL rather than downgrading to HIGH or MEDIUM.
|
||||
|
||||
If a real `hooks/hooks.json` is later added to the fixture, the
|
||||
description switches to `Mitigated by active hook guards
|
||||
(severity reduced).` and the severity drops to HIGH (one guard
|
||||
type) or MEDIUM (both guard types) — this assertion would fail
|
||||
and signal that the mitigation path activated.
|
||||
|
||||
## Total finding shape
|
||||
|
||||
```
|
||||
Total TFA findings: 1
|
||||
direct trifectas: 1
|
||||
cross-component: 0
|
||||
project-level fallback: 0
|
||||
Files scanned (components): 1
|
||||
Scanner status: ok
|
||||
```
|
||||
|
||||
If the scanner emits more than one direct trifecta, add a second
|
||||
component to the fixture or update this contract — extra
|
||||
findings are not a failure, but they are a deliberate fixture
|
||||
change.
|
||||
|
||||
## Out of scope (intentionally)
|
||||
|
||||
- Cross-component trifectas (would need a second agent/command
|
||||
splitting the legs) — see `tests/lib/toxic-flow-analyzer.test.mjs`
|
||||
for that case.
|
||||
- Mitigation downgrade to HIGH / MEDIUM — would need a real
|
||||
`hooks/hooks.json` referencing one of the guard scripts.
|
||||
- Prior-scanner enrichment via `enrichFromPriorResults()` — this
|
||||
walkthrough passes `priorResults = {}` to keep the demonstration
|
||||
reproducible and isolated. The full `scan-orchestrator.mjs` flow
|
||||
exercises that pass in production.
|
||||
Loading…
Add table
Add a link
Reference in a new issue