test(llm-security): add e2e suite proving framework works as coordinated system

Three new files in tests/e2e/ (45 tests, 1777 -> 1822):

- attack-chain.test.mjs (17): full hook stack against attack payloads in
  sequence -- prompt injection at the gate; T1/T5/T8 bash evasions;
  pathguard on .env / .ssh; secrets hook on AWS-shaped keys and PEM
  headers; markdown link-title and HTML-comment poisoning in tool
  output; trifecta accumulation over a single session with dedup on
  the next benign call.

- multi-session.test.mjs (9): state persistence across simulated
  session boundaries. Uses the fact that a hook child's process.ppid
  equals the test runner's process.pid, so writing the session state
  file directly simulates "previous session" history. Covers slow-burn
  trifecta (legs spread >50 calls), MCP cumulative description drift
  via LLM_SECURITY_MCP_CACHE_FILE override, and pre-compact transcript
  poisoning in warn / block / clean / missing-file modes.

- scan-pipeline.test.mjs (19): scan-orchestrator + all 10 scanners +
  toxic-flow correlator against poisoned-project (BLOCK / 95 / Extreme)
  and grade-a-project (WARNING / 48 / High). Asserts envelope shape,
  verdict, risk_score, severity counts, OWASP coverage, scanner
  enumeration, and a narrative-coherence cross-check that the BLOCK
  scan strictly outranks the WARNING scan along every axis.

Test files build credential-shaped payloads at runtime via concatenation
so they contain no literal matches for the pre-edit-secrets regexes
(memory rule feedback_secrets_hook_test_fixtures.md).

Doc updates in same commit per marketplace policy:
- CLAUDE.md header: 1777+ -> 1822+ tests, mentions tests/e2e/
- README.md badge tests-1777 -> tests-1822, body text updated
- CHANGELOG.md: new [Unreleased] Added section describing scope

No version bump. No behavior changes outside tests/.
This commit is contained in:
Kjell Tore Guttormsen 2026-05-05 12:06:57 +02:00
commit f835777c1e
6 changed files with 974 additions and 3 deletions

View file

@ -1,6 +1,6 @@
# LLM Security Plugin (v7.3.1)
Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1777+ unit and integration tests; mutation-testing coverage not published.
Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1822+ unit, integration, and end-to-end tests (`tests/e2e/` covers the multi-hook attack chain, multi-session state simulation, and the full scan-orchestrator pipeline); mutation-testing coverage not published.
**v7.0.0 — Severity-dominated risk scoring (v2 model, BREAKING).** Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise):