Kjell Tore Guttormsen b6d912200e feat(llm-security): add pre-compact-poisoning example for PreCompact hook [skip-docs]

Runnable demonstration of hooks/scripts/pre-compact-scan.mjs (the
only PreCompact hook in the plugin) detecting both a CRITICAL
injection pattern and an AWS-shaped credential inside a synthetic
JSONL transcript, exercised across all three values of
LLM_SECURITY_PRECOMPACT_MODE plus a benign-transcript control case
in block mode that proves the gate is not a brick wall.

The transcript is generated at runtime in a per-invocation tempdir
under os.tmpdir() and the directory is removed in a finally block,
so the user's real ~/.claude/projects/.../transcripts/ are never
touched. The AWS-shaped key uses the same 'AK' + 'IA' + ...
fragmentation idiom as tests/e2e/attack-chain.test.mjs so this
source contains no literal credentials and pre-edit-secrets does
not block writes during development.

Nine independent assertions (9/9 must pass):
- block mode + poisoned: exit 2, decision=block JSON, reason text
  covers both injection and AWS labels (3 assertions)
- warn mode + poisoned: exit 0, systemMessage JSON, no decision
  field (2 assertions)
- off mode + poisoned: exit 0, no JSON on stdout (2 assertions)
- block mode + benign: exit 0, no decision=block JSON (2 assertions)

OWASP / framework mapping: LLM01, LLM02, ASI01, AT-1, AT-3.

Docs updated: plugin README "Other runnable examples", plugin
CLAUDE.md "Examples" tabellen, CHANGELOG [Unreleased] Added.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-05 15:23:10 +02:00

3.4 KiB

Raw Blame History

Expected findings — pre-compact-poisoning

This is the testable contract enforced by run-pre-compact-poisoning.mjs. Nine independent assertions across four scenarios. Any drift = hook regression or fixture rot.

Required assertions (9 / 9 must pass)

Scenario A — block mode + poisoned transcript

The poisoned transcript embeds two distinct triggers:

An "ignore all previous instructions" phrase inside a synthetic tool_result block (matches CRITICAL_PATTERNS in injection-patterns.mjs).
An AWS-shaped key built at runtime via string concatenation (matches SECRET_PATTERNS regex /AKIA[0-9A-Z]{16}/).

A.1 Hook exits with code 2. A.2 Stdout is JSON { "decision": "block", "reason": "..." }. A.3 The reason string mentions both: - an injection label (/ignore previous|override/i), AND - the AWS key label (/AWS Access Key/i).

If A.3 fails, either the injection-patterns regex set or the SECRET_PATTERNS table changed in a way that dropped one of these labels.

Scenario B — warn mode + poisoned transcript

B.1 Hook exits with code 0 (advisory, not block). B.2 Stdout is JSON { "systemMessage": "..." } with no decision field. The systemMessage summary is the same as the block-mode reason text.

Scenario C — off mode + poisoned transcript

C.1 Hook exits with code 0. C.2 Stdout is empty (no JSON). The off branch returns at the top of the script before reading the transcript at all, which is the documented "fully disabled" semantic.

Scenario D — block mode + benign transcript

This is the brick-wall control: it proves the hook does not reflexively block all compactions.

D.1 Hook exits with code 0. D.2 Stdout has no decision: "block" JSON. (Either no JSON or a non-block payload — the assertion only fails on a literal block decision, which would indicate a false positive.)

Total finding shape (block mode)

pre-compact-scan (auto): 3 finding(s) in transcript. Compaction
may preserve poisoned content in condensed form. Top: override:
ignore previous instructions, indirect: instruction addressed
to AI/assistant, AWS Access Key ID.

The "3 finding(s)" count covers:

CRITICAL — override: ignore previous instructions
MEDIUM — indirect: instruction addressed to AI/assistant (the synthetic tool-result text frames the injection as a "Note to assistant", which trips the indirect-address pattern)
SECRET — AWS Access Key ID

If injection-patterns.mjs adds new MEDIUM rules that match the fixture text, the count and Top: ... ordering may shift. The contract only asserts the labels in the reason string, not the finding count or order — that flexibility is intentional.

Out of scope (intentionally)

The other secret labels in SECRET_PATTERNS (GitHub / npm / PEM / bearer / generic). Demonstrating those would require either growing the fixture or building each at runtime; the AWS key alone is sufficient to prove the credential-finding path activates.
The 512 KB tail cap (LLM_SECURITY_PRECOMPACT_MAX_BYTES) — not exercised because the synthetic transcript is small.
The leetspeak / homoglyph / multi-language MEDIUM patterns — exercised by examples/prompt-injection-showcase/.
The compaction_trigger legacy field name (the hook reads both trigger and compaction_trigger) — only trigger is exercised here.

3.4 KiB Raw Blame History