ktg-plugin-marketplace/plugins/llm-security/examples/pre-compact-poisoning/README.md

# Pre-Compact Poisoning Walkthrough

> **WARNING: This is a demonstration fixture, NOT a real attack.**
> The transcript is generated at runtime in a per-invocation
> tempdir. The user's real `~/.claude/projects/.../transcripts/`
> are never touched, and this source file contains no literal
> credentials.

## What this demonstrates

`hooks/scripts/pre-compact-scan.mjs` is the only `PreCompact`
hook in the plugin. It runs **before** Claude Code compacts the
conversation context — auto-compaction at the context-window
limit, or the user pressing `/compact`. Its job is to flag
poisoned content before that content survives into a condensed
form where the surrounding injection context is no longer visible
to the model.

The hook reads at most the last 512 KB of the transcript JSONL
file and applies two pattern sets:

1. **Prompt-injection patterns** — `CRITICAL_PATTERNS` and
   `MEDIUM_PATTERNS` from `scanners/lib/injection-patterns.mjs`
   (the same set used by `pre-prompt-inject-scan` and
   `post-mcp-verify`).
2. **Credential regexes** — a small `SECRET_PATTERNS` table for
   AWS access keys, GitHub tokens, npm tokens, PEM private-key
   block headers, generic credential assignments, and bearer
   tokens.

Behaviour is controlled by `LLM_SECURITY_PRECOMPACT_MODE`:

| Mode | Finding present | Exit | Stdout |
|------|-----------------|------|--------|
| `off`   | (any) | 0 | (empty — scan skipped entirely) |
| `warn`  | yes   | 0 | `{ "systemMessage": "..." }` |
| `warn`  | no    | 0 | (empty) |
| `block` | yes   | 2 | `{ "decision": "block", "reason": "..." }` |
| `block` | no    | 0 | (empty) |

Default is `warn`.

## Fixture layout

```
examples/pre-compact-poisoning/
  README.md                       # this file
  run-pre-compact-poisoning.mjs   # builds transcripts in tempdir, drives the hook
  expected-findings.md            # testable contract
```

There is no on-disk fixture. The run script:

1. Creates a tempdir under `os.tmpdir()` via `mkdtempSync`.
2. Writes two synthetic JSONL transcripts to that tempdir:
   - `poisoned-transcript.jsonl` — contains an "ignore previous
     instructions" phrase inside a synthetic `tool_result` block,
     plus an AWS access-key ID built at runtime via string
     concatenation (matches `/AKIA[0-9A-Z]{16}/`).
   - `benign-transcript.jsonl` — a plain Q&A about listing files.
3. Spawns `hooks/scripts/pre-compact-scan.mjs` with
   `{ session_id, transcript_path, hook_event_name: "PreCompact",
     trigger: "auto" }` on stdin.
4. Cleans up the tempdir in a `finally` block.

The AWS-shaped key is constructed via the same fragmentation
pattern used in `tests/e2e/attack-chain.test.mjs` (`'AK' + 'IA' +
'IOSFODNN7' + 'EXAMPLE'`) so this source contains no literal
credentials and `pre-edit-secrets.mjs` does not block it from
being written.

## How to run

```bash
cd plugins/llm-security
node examples/pre-compact-poisoning/run-pre-compact-poisoning.mjs

# Verbose — show full hook stdout/stderr per case
node examples/pre-compact-poisoning/run-pre-compact-poisoning.mjs --verbose
```

Expected: `9 pass, 0 fail` across four scenarios:

1. block + poisoned → exit 2, structured `decision=block` JSON,
   reason text covers both an injection label and the AWS-key label.
2. warn + poisoned → exit 0, `systemMessage` JSON (no `decision`
   field).
3. off + poisoned → exit 0, no JSON on stdout (scan skipped).
4. block + benign → exit 0, no `decision=block` JSON (proves the
   gate is not a brick wall on benign content).

## Hook involved

- **`hooks/scripts/pre-compact-scan.mjs`** — invoked via
  `child_process.spawnSync('node', [HOOK], { input: stdin })` to
  match the harness contract exactly. The hook reads the
  transcript via `readTailCapped(filePath, MAX_BYTES)`,
  flattens JSONL message content via `extractTextFromTranscript`,
  then runs the two pattern sets. No Claude Code agent runtime
  is required.

The orchestrated `/security audit` flow does not run this hook
(it's a runtime defence, not a scan-time check). This walkthrough
exercises the runtime contract directly.

## Why pre-compact poisoning matters

Compaction collapses long conversations into a summary that the
model treats as authoritative context for the rest of the
session. If a malicious tool result earlier in the conversation
managed to sneak past `post-mcp-verify` (e.g., via a pattern not
yet in the regex set), compaction can preserve a *condensed* form
of the poison where the model can no longer see the surrounding
"this came from a sketchy source" context. Worse, condensed
summaries are smaller and so more likely to fit inside the
attacker's preferred attention window.

`pre-compact-scan` is a **second chance** to catch poison that
slipped past the runtime gates — a defence-in-depth pattern that
matches the joint-paper finding that no single-layer defence
holds against adaptive attacks.

## OWASP / framework mapping

| Code | Framework | Why |
|------|-----------|-----|
| LLM01 | OWASP LLM Top 10 (2025) | Prompt injection persisting through compaction |
| LLM02 | OWASP LLM Top 10 (2025) | Sensitive information disclosure — credentials in transcript |
| ASI01 | OWASP Agentic Top 10 | Memory poisoning via condensed form |
| AT-1  | DeepMind Agent Traps | Hidden cognitive priors carried across context boundary |
| AT-3  | DeepMind Agent Traps | Tool-output indirection that survives summarisation |

## Limitations

- `MAX_BYTES` defaults to 512 000 bytes. Earlier-in-history
  poison that does not appear in the last 512 KB of the
  transcript is not scanned. The cap exists for the documented
  <500 ms latency target on large transcripts. Tune via
  `LLM_SECURITY_PRECOMPACT_MAX_BYTES`.
- The credential regex set is small by design (compaction is
  performance-sensitive). The full secrets regex set lives in
  `pre-edit-secrets.mjs`, which fires on a different event.
- The hook does not modify the transcript — it only blocks
  compaction or emits an advisory. Poison that has already
  shaped the conversation may still influence the model in the
  current window.

## See also

- `hooks/scripts/pre-compact-scan.mjs` — hook source
- `tests/hooks/pre-compact-scan.test.mjs` — unit-test contract
- `tests/e2e/multi-session.test.mjs` — multi-session scenario
  that exercises the same pre-compact path across simulated
  session boundaries
- `scanners/lib/injection-patterns.mjs` — shared pattern set
- `examples/poisoned-claude-md/` — sibling demonstration of
  *scan-time* memory poisoning (different surface, same family
  of threat)
- `expected-findings.md` (in this folder) — the testable contract