Runnable demonstration of hooks/scripts/pre-compact-scan.mjs (the only PreCompact hook in the plugin) detecting both a CRITICAL injection pattern and an AWS-shaped credential inside a synthetic JSONL transcript, exercised across all three values of LLM_SECURITY_PRECOMPACT_MODE plus a benign-transcript control case in block mode that proves the gate is not a brick wall. The transcript is generated at runtime in a per-invocation tempdir under os.tmpdir() and the directory is removed in a finally block, so the user's real ~/.claude/projects/.../transcripts/ are never touched. The AWS-shaped key uses the same 'AK' + 'IA' + ... fragmentation idiom as tests/e2e/attack-chain.test.mjs so this source contains no literal credentials and pre-edit-secrets does not block writes during development. Nine independent assertions (9/9 must pass): - block mode + poisoned: exit 2, decision=block JSON, reason text covers both injection and AWS labels (3 assertions) - warn mode + poisoned: exit 0, systemMessage JSON, no decision field (2 assertions) - off mode + poisoned: exit 0, no JSON on stdout (2 assertions) - block mode + benign: exit 0, no decision=block JSON (2 assertions) OWASP / framework mapping: LLM01, LLM02, ASI01, AT-1, AT-3. Docs updated: plugin README "Other runnable examples", plugin CLAUDE.md "Examples" tabellen, CHANGELOG [Unreleased] Added. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
159 lines
6.5 KiB
Markdown
159 lines
6.5 KiB
Markdown
# Pre-Compact Poisoning Walkthrough
|
|
|
|
> **WARNING: This is a demonstration fixture, NOT a real attack.**
|
|
> The transcript is generated at runtime in a per-invocation
|
|
> tempdir. The user's real `~/.claude/projects/.../transcripts/`
|
|
> are never touched, and this source file contains no literal
|
|
> credentials.
|
|
|
|
## What this demonstrates
|
|
|
|
`hooks/scripts/pre-compact-scan.mjs` is the only `PreCompact`
|
|
hook in the plugin. It runs **before** Claude Code compacts the
|
|
conversation context — auto-compaction at the context-window
|
|
limit, or the user pressing `/compact`. Its job is to flag
|
|
poisoned content before that content survives into a condensed
|
|
form where the surrounding injection context is no longer visible
|
|
to the model.
|
|
|
|
The hook reads at most the last 512 KB of the transcript JSONL
|
|
file and applies two pattern sets:
|
|
|
|
1. **Prompt-injection patterns** — `CRITICAL_PATTERNS` and
|
|
`MEDIUM_PATTERNS` from `scanners/lib/injection-patterns.mjs`
|
|
(the same set used by `pre-prompt-inject-scan` and
|
|
`post-mcp-verify`).
|
|
2. **Credential regexes** — a small `SECRET_PATTERNS` table for
|
|
AWS access keys, GitHub tokens, npm tokens, PEM private-key
|
|
block headers, generic credential assignments, and bearer
|
|
tokens.
|
|
|
|
Behaviour is controlled by `LLM_SECURITY_PRECOMPACT_MODE`:
|
|
|
|
| Mode | Finding present | Exit | Stdout |
|
|
|------|-----------------|------|--------|
|
|
| `off` | (any) | 0 | (empty — scan skipped entirely) |
|
|
| `warn` | yes | 0 | `{ "systemMessage": "..." }` |
|
|
| `warn` | no | 0 | (empty) |
|
|
| `block` | yes | 2 | `{ "decision": "block", "reason": "..." }` |
|
|
| `block` | no | 0 | (empty) |
|
|
|
|
Default is `warn`.
|
|
|
|
## Fixture layout
|
|
|
|
```
|
|
examples/pre-compact-poisoning/
|
|
README.md # this file
|
|
run-pre-compact-poisoning.mjs # builds transcripts in tempdir, drives the hook
|
|
expected-findings.md # testable contract
|
|
```
|
|
|
|
There is no on-disk fixture. The run script:
|
|
|
|
1. Creates a tempdir under `os.tmpdir()` via `mkdtempSync`.
|
|
2. Writes two synthetic JSONL transcripts to that tempdir:
|
|
- `poisoned-transcript.jsonl` — contains an "ignore previous
|
|
instructions" phrase inside a synthetic `tool_result` block,
|
|
plus an AWS access-key ID built at runtime via string
|
|
concatenation (matches `/AKIA[0-9A-Z]{16}/`).
|
|
- `benign-transcript.jsonl` — a plain Q&A about listing files.
|
|
3. Spawns `hooks/scripts/pre-compact-scan.mjs` with
|
|
`{ session_id, transcript_path, hook_event_name: "PreCompact",
|
|
trigger: "auto" }` on stdin.
|
|
4. Cleans up the tempdir in a `finally` block.
|
|
|
|
The AWS-shaped key is constructed via the same fragmentation
|
|
pattern used in `tests/e2e/attack-chain.test.mjs` (`'AK' + 'IA' +
|
|
'IOSFODNN7' + 'EXAMPLE'`) so this source contains no literal
|
|
credentials and `pre-edit-secrets.mjs` does not block it from
|
|
being written.
|
|
|
|
## How to run
|
|
|
|
```bash
|
|
cd plugins/llm-security
|
|
node examples/pre-compact-poisoning/run-pre-compact-poisoning.mjs
|
|
|
|
# Verbose — show full hook stdout/stderr per case
|
|
node examples/pre-compact-poisoning/run-pre-compact-poisoning.mjs --verbose
|
|
```
|
|
|
|
Expected: `9 pass, 0 fail` across four scenarios:
|
|
|
|
1. block + poisoned → exit 2, structured `decision=block` JSON,
|
|
reason text covers both an injection label and the AWS-key label.
|
|
2. warn + poisoned → exit 0, `systemMessage` JSON (no `decision`
|
|
field).
|
|
3. off + poisoned → exit 0, no JSON on stdout (scan skipped).
|
|
4. block + benign → exit 0, no `decision=block` JSON (proves the
|
|
gate is not a brick wall on benign content).
|
|
|
|
## Hook involved
|
|
|
|
- **`hooks/scripts/pre-compact-scan.mjs`** — invoked via
|
|
`child_process.spawnSync('node', [HOOK], { input: stdin })` to
|
|
match the harness contract exactly. The hook reads the
|
|
transcript via `readTailCapped(filePath, MAX_BYTES)`,
|
|
flattens JSONL message content via `extractTextFromTranscript`,
|
|
then runs the two pattern sets. No Claude Code agent runtime
|
|
is required.
|
|
|
|
The orchestrated `/security audit` flow does not run this hook
|
|
(it's a runtime defence, not a scan-time check). This walkthrough
|
|
exercises the runtime contract directly.
|
|
|
|
## Why pre-compact poisoning matters
|
|
|
|
Compaction collapses long conversations into a summary that the
|
|
model treats as authoritative context for the rest of the
|
|
session. If a malicious tool result earlier in the conversation
|
|
managed to sneak past `post-mcp-verify` (e.g., via a pattern not
|
|
yet in the regex set), compaction can preserve a *condensed* form
|
|
of the poison where the model can no longer see the surrounding
|
|
"this came from a sketchy source" context. Worse, condensed
|
|
summaries are smaller and so more likely to fit inside the
|
|
attacker's preferred attention window.
|
|
|
|
`pre-compact-scan` is a **second chance** to catch poison that
|
|
slipped past the runtime gates — a defence-in-depth pattern that
|
|
matches the joint-paper finding that no single-layer defence
|
|
holds against adaptive attacks.
|
|
|
|
## OWASP / framework mapping
|
|
|
|
| Code | Framework | Why |
|
|
|------|-----------|-----|
|
|
| LLM01 | OWASP LLM Top 10 (2025) | Prompt injection persisting through compaction |
|
|
| LLM02 | OWASP LLM Top 10 (2025) | Sensitive information disclosure — credentials in transcript |
|
|
| ASI01 | OWASP Agentic Top 10 | Memory poisoning via condensed form |
|
|
| AT-1 | DeepMind Agent Traps | Hidden cognitive priors carried across context boundary |
|
|
| AT-3 | DeepMind Agent Traps | Tool-output indirection that survives summarisation |
|
|
|
|
## Limitations
|
|
|
|
- `MAX_BYTES` defaults to 512 000 bytes. Earlier-in-history
|
|
poison that does not appear in the last 512 KB of the
|
|
transcript is not scanned. The cap exists for the documented
|
|
<500 ms latency target on large transcripts. Tune via
|
|
`LLM_SECURITY_PRECOMPACT_MAX_BYTES`.
|
|
- The credential regex set is small by design (compaction is
|
|
performance-sensitive). The full secrets regex set lives in
|
|
`pre-edit-secrets.mjs`, which fires on a different event.
|
|
- The hook does not modify the transcript — it only blocks
|
|
compaction or emits an advisory. Poison that has already
|
|
shaped the conversation may still influence the model in the
|
|
current window.
|
|
|
|
## See also
|
|
|
|
- `hooks/scripts/pre-compact-scan.mjs` — hook source
|
|
- `tests/hooks/pre-compact-scan.test.mjs` — unit-test contract
|
|
- `tests/e2e/multi-session.test.mjs` — multi-session scenario
|
|
that exercises the same pre-compact path across simulated
|
|
session boundaries
|
|
- `scanners/lib/injection-patterns.mjs` — shared pattern set
|
|
- `examples/poisoned-claude-md/` — sibling demonstration of
|
|
*scan-time* memory poisoning (different surface, same family
|
|
of threat)
|
|
- `expected-findings.md` (in this folder) — the testable contract
|