feat(llm-security): add pre-compact-poisoning example for PreCompact hook [skip-docs]
Runnable demonstration of hooks/scripts/pre-compact-scan.mjs (the only PreCompact hook in the plugin) detecting both a CRITICAL injection pattern and an AWS-shaped credential inside a synthetic JSONL transcript, exercised across all three values of LLM_SECURITY_PRECOMPACT_MODE plus a benign-transcript control case in block mode that proves the gate is not a brick wall. The transcript is generated at runtime in a per-invocation tempdir under os.tmpdir() and the directory is removed in a finally block, so the user's real ~/.claude/projects/.../transcripts/ are never touched. The AWS-shaped key uses the same 'AK' + 'IA' + ... fragmentation idiom as tests/e2e/attack-chain.test.mjs so this source contains no literal credentials and pre-edit-secrets does not block writes during development. Nine independent assertions (9/9 must pass): - block mode + poisoned: exit 2, decision=block JSON, reason text covers both injection and AWS labels (3 assertions) - warn mode + poisoned: exit 0, systemMessage JSON, no decision field (2 assertions) - off mode + poisoned: exit 0, no JSON on stdout (2 assertions) - block mode + benign: exit 0, no decision=block JSON (2 assertions) OWASP / framework mapping: LLM01, LLM02, ASI01, AT-1, AT-3. Docs updated: plugin README "Other runnable examples", plugin CLAUDE.md "Examples" tabellen, CHANGELOG [Unreleased] Added. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
92fb0087fa
commit
b6d912200e
6 changed files with 525 additions and 0 deletions
|
|
@ -0,0 +1,88 @@
|
|||
# Expected findings — pre-compact-poisoning
|
||||
|
||||
This is the testable contract enforced by
|
||||
`run-pre-compact-poisoning.mjs`. Nine independent assertions
|
||||
across four scenarios. Any drift = hook regression or fixture rot.
|
||||
|
||||
## Required assertions (9 / 9 must pass)
|
||||
|
||||
### Scenario A — block mode + poisoned transcript
|
||||
|
||||
The poisoned transcript embeds two distinct triggers:
|
||||
|
||||
- An "ignore all previous instructions" phrase inside a synthetic
|
||||
`tool_result` block (matches `CRITICAL_PATTERNS` in
|
||||
`injection-patterns.mjs`).
|
||||
- An AWS-shaped key built at runtime via string concatenation
|
||||
(matches `SECRET_PATTERNS` regex `/AKIA[0-9A-Z]{16}/`).
|
||||
|
||||
A.1 Hook exits with code `2`.
|
||||
A.2 Stdout is JSON `{ "decision": "block", "reason": "..." }`.
|
||||
A.3 The `reason` string mentions both:
|
||||
- an injection label (`/ignore previous|override/i`), AND
|
||||
- the AWS key label (`/AWS Access Key/i`).
|
||||
|
||||
If A.3 fails, either the injection-patterns regex set or the
|
||||
SECRET_PATTERNS table changed in a way that dropped one of these
|
||||
labels.
|
||||
|
||||
### Scenario B — warn mode + poisoned transcript
|
||||
|
||||
B.1 Hook exits with code `0` (advisory, not block).
|
||||
B.2 Stdout is JSON `{ "systemMessage": "..." }` with no
|
||||
`decision` field. The `systemMessage` summary is the same as
|
||||
the block-mode `reason` text.
|
||||
|
||||
### Scenario C — off mode + poisoned transcript
|
||||
|
||||
C.1 Hook exits with code `0`.
|
||||
C.2 Stdout is empty (no JSON). The `off` branch returns at the
|
||||
top of the script before reading the transcript at all,
|
||||
which is the documented "fully disabled" semantic.
|
||||
|
||||
### Scenario D — block mode + benign transcript
|
||||
|
||||
This is the brick-wall control: it proves the hook does not
|
||||
reflexively block all compactions.
|
||||
|
||||
D.1 Hook exits with code `0`.
|
||||
D.2 Stdout has no `decision: "block"` JSON. (Either no JSON or
|
||||
a non-block payload — the assertion only fails on a literal
|
||||
block decision, which would indicate a false positive.)
|
||||
|
||||
## Total finding shape (block mode)
|
||||
|
||||
```
|
||||
pre-compact-scan (auto): 3 finding(s) in transcript. Compaction
|
||||
may preserve poisoned content in condensed form. Top: override:
|
||||
ignore previous instructions, indirect: instruction addressed
|
||||
to AI/assistant, AWS Access Key ID.
|
||||
```
|
||||
|
||||
The "3 finding(s)" count covers:
|
||||
|
||||
1. CRITICAL — `override: ignore previous instructions`
|
||||
2. MEDIUM — `indirect: instruction addressed to AI/assistant`
|
||||
(the synthetic tool-result text frames the injection as a
|
||||
"Note to assistant", which trips the indirect-address pattern)
|
||||
3. SECRET — `AWS Access Key ID`
|
||||
|
||||
If `injection-patterns.mjs` adds new MEDIUM rules that match the
|
||||
fixture text, the count and `Top: ...` ordering may shift. The
|
||||
contract only asserts the *labels* in the reason string, not the
|
||||
finding count or order — that flexibility is intentional.
|
||||
|
||||
## Out of scope (intentionally)
|
||||
|
||||
- The other secret labels in `SECRET_PATTERNS`
|
||||
(GitHub / npm / PEM / bearer / generic). Demonstrating those
|
||||
would require either growing the fixture or building each at
|
||||
runtime; the AWS key alone is sufficient to prove the
|
||||
credential-finding path activates.
|
||||
- The 512 KB tail cap (`LLM_SECURITY_PRECOMPACT_MAX_BYTES`) — not
|
||||
exercised because the synthetic transcript is small.
|
||||
- The leetspeak / homoglyph / multi-language MEDIUM patterns —
|
||||
exercised by `examples/prompt-injection-showcase/`.
|
||||
- The `compaction_trigger` legacy field name (the hook reads
|
||||
both `trigger` and `compaction_trigger`) — only `trigger` is
|
||||
exercised here.
|
||||
Loading…
Add table
Add a link
Reference in a new issue