test(llm-security): add e2e suite proving framework works as coordinated system

Three new files in tests/e2e/ (45 tests, 1777 -> 1822): - attack-chain.test.mjs (17): full hook stack against attack payloads in sequence -- prompt injection at the gate; T1/T5/T8 bash evasions; pathguard on .env / .ssh; secrets hook on AWS-shaped keys and PEM headers; markdown link-title and HTML-comment poisoning in tool output; trifecta accumulation over a single session with dedup on the next benign call. - multi-session.test.mjs (9): state persistence across simulated session boundaries. Uses the fact that a hook child's process.ppid equals the test runner's process.pid, so writing the session state file directly simulates "previous session" history. Covers slow-burn trifecta (legs spread >50 calls), MCP cumulative description drift via LLM_SECURITY_MCP_CACHE_FILE override, and pre-compact transcript poisoning in warn / block / clean / missing-file modes. - scan-pipeline.test.mjs (19): scan-orchestrator + all 10 scanners + toxic-flow correlator against poisoned-project (BLOCK / 95 / Extreme) and grade-a-project (WARNING / 48 / High). Asserts envelope shape, verdict, risk_score, severity counts, OWASP coverage, scanner enumeration, and a narrative-coherence cross-check that the BLOCK scan strictly outranks the WARNING scan along every axis. Test files build credential-shaped payloads at runtime via concatenation so they contain no literal matches for the pre-edit-secrets regexes (memory rule feedback_secrets_hook_test_fixtures.md). Doc updates in same commit per marketplace policy: - CLAUDE.md header: 1777+ -> 1822+ tests, mentions tests/e2e/ - README.md badge tests-1777 -> tests-1822, body text updated - CHANGELOG.md: new [Unreleased] Added section describing scope No version bump. No behavior changes outside tests/.
2026-05-05 12:06:57 +02:00 · 2026-05-05 12:06:57 +02:00 · f835777c1e
commit f835777c1e
parent a7a334c8d1
6 changed files with 974 additions and 3 deletions
--- a/plugins/llm-security/README.md
+++ b/plugins/llm-security/README.md
@ -13,7 +13,7 @@
 ![Scanners](https://img.shields.io/badge/scanners-23-cyan)
 ![Hooks](https://img.shields.io/badge/hooks-9-red)
 ![Knowledge](https://img.shields.io/badge/knowledge_docs-22-green)
-![Tests](https://img.shields.io/badge/tests-1777-success)
+![Tests](https://img.shields.io/badge/tests-1822-success)
 ![License](https://img.shields.io/badge/license-MIT-lightgrey)

 A Claude Code plugin that provides security scanning, auditing, and threat modeling for agentic AI projects. Built on [OWASP LLM Top 10 (2025)](https://genai.owasp.org/llm-top-10/), [OWASP Agentic AI Top 10 (ASI01-ASI10)](https://genai.owasp.org/agentic-ai/), OWASP Skills Top 10 (AST01-AST10), MCP Top 10, and the [AI Agent Traps](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6372438) taxonomy (Google DeepMind, 2025), grounded in published research from ToxicSkills, ClawHavoc, MCPTox, Pillar Security, Invariant Labs, GHSL Security Lab, and Operant AI.
@ -425,7 +425,7 @@ These gaps are surfaced advisorily through `/security threat-model` and `/securi

 This is a **solo open-source project in stabilization mode** as of 2026-05-01.
 The current feature set (5 frameworks, 23 scanners, 9 hooks, 6 agents,
-20 commands, 22 knowledge files, 1777+ tests) is the natural plateau for
+20 commands, 22 knowledge files, 1822+ tests including a dedicated end-to-end suite) is the natural plateau for
 what a deterministic + advisory plugin can defend against without crossing
 into commercial-grade territory. Going forward, work focuses on: