docs(llm-security): A3 honesty-sweep — 7 sitater nedtonet (critical-review §9)

Closes A3 of v7.1.0 critical-review patch. Each rewrite preserves the underlying
claim where it is accurate but removes hype/overreach language. Historical
CHANGELOG/README version-table rows are intentionally left as-is (they document
what was claimed at the time of release, not what is true today).

Changes (CLAUDE.md, commands/ide-scan.md, knowledge/mitigation-matrix.md,
docs/security-hardening-guide.md):

- "Trustworthy scoring (BREAKING)" → "Severity-dominated risk scoring
  (v2 model, BREAKING)". Removes hype framing; describes the actual mechanism.
- "Context-aware entropy scanner" → "Rule-based entropy scanner with
  file-extension skip, 8 line-level suppression rules, and configurable policy".
  No ML/context inference; just rules.
- "1487 tests" → "1511 unit and integration tests; mutation-testing coverage
  not published". Updated count after A1+A2 (+24) and added qualifier.
- "Fully Schrems II compatible" → "Schrems II compatible in default offline
  mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`)
  transmits package identifiers to a Google-operated API and is a separate
  compliance consideration." Acknowledges the OSV.dev opt-in caveat.
- "Rule of Two enforcement" → "Rule of Two detection (configurable; default
  warn; blocks on high-confidence trifectas in opt-in `block` mode; distributed
  trifectas detected but not blocked by default)". "Enforcement" implied
  block; default is warn.
- "Hardened ZIP extractor" → suffix " — no fuzz-testing results published
  to date". Caps and class-of-attacks rejected are accurate; absence of
  formal fuzz coverage now stated.
- "defense-in-depth" — preserved as framing, but quantified in
  security-hardening-guide §4: "three independent detection layers with
  documented bypass classes". Each layer named, each layer's known bypasses
  pointed to (critical-review §4 evasion arsenal).

Tests: 1511/1511 green (no behavioural change).
This commit is contained in:
Kjell Tore Guttormsen 2026-04-29 11:52:55 +02:00
commit a46308b1e9
4 changed files with 17 additions and 8 deletions

View file

@ -1,11 +1,11 @@
# LLM Security Plugin (v7.0.0)
Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1487 tests.
Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1511 unit and integration tests; mutation-testing coverage not published.
**v7.0.0 — Trustworthy scoring (BREAKING).** Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise):
**v7.0.0 — Severity-dominated risk scoring (v2 model, BREAKING).** Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise):
1. **Risk-score v2 formula** (`scanners/lib/severity.mjs`) — severity-dominated, log-scaled within tier. Replaces v1 sum-and-cap that collapsed every non-trivial scan to 100/Extreme. Tiers: critical → 7095, high only → 4065, medium only → 1535, low only → 111. Verdict cutoffs realigned to new bands (BLOCK ≥65, WARNING ≥15).
2. **Context-aware entropy scanner** — file-extension skip (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`) + 8 new line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source.
2. **Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy** — extensions skipped (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`); line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source.
3. **DEP typosquat allowlist expansion** — 22 npm + 5 PyPI entries for short-name tools that tripped Levenshtein detection on every modern codebase (`knip`, `oxlint`, `tsx`, `nx`, `rimraf`, `uv`, `ruff`, etc.).
See `docs/security-hardening-guide.md` §6 for the calibration story.
@ -21,7 +21,7 @@ See `docs/security-hardening-guide.md` §6 for the calibration story.
| `/security plugin-audit [path\|url]` | Plugin trust assessment (local or GitHub URL) |
| `/security mcp-audit [--live]` | MCP server config audit (add `--live` for runtime inspection) |
| `/security mcp-inspect` | Live MCP server inspection — connect via JSON-RPC 2.0, scan tool descriptions |
| `/security ide-scan [target\|url]` | Scan installed VS Code + JetBrains extensions/plugins — OR fetch a remote VSIX from Marketplace, OpenVSX, or direct URL (v6.4.0), OR a JetBrains plugin from `plugins.jetbrains.com` (v6.6.0). 7 VS Code checks + 7 JetBrains-specific checks (theme-with-code, broad activation, Premain-Class instrumentation, native binaries, depends-chain, typosquat, shaded jars). Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps). Orchestrates reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension. Offline by default, `--online` opt-in |
| `/security ide-scan [target\|url]` | Scan installed VS Code + JetBrains extensions/plugins — OR fetch a remote VSIX from Marketplace, OpenVSX, or direct URL (v6.4.0), OR a JetBrains plugin from `plugins.jetbrains.com` (v6.6.0). 7 VS Code checks + 7 JetBrains-specific checks (theme-with-code, broad activation, Premain-Class instrumentation, native binaries, depends-chain, typosquat, shaded jars). Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps — no fuzz-testing results published to date). Orchestrates reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension. Offline by default, `--online` opt-in |
| `/security posture` | Quick scorecard (13 categories) |
| `/security threat-model` | Interactive STRIDE/MAESTRO session |
| `/security diff [path]` | Compare scan against baseline — shows new/resolved/unchanged/moved |
@ -133,7 +133,7 @@ Subcommands: `scan`, `deep-scan`, `posture`, `audit-bom`, `benchmark`. Dispatche
Pipeline templates in `ci/`: `github-action.yml`, `azure-pipelines.yml`, `gitlab-ci.yml`. Documentation: `docs/ci-cd-guide.md`.
All templates use `--fail-on high --format sarif --output-file results.sarif` with SARIF upload per platform.
Standalone CLI makes zero network calls (except opt-in OSV.dev in supply-chain-recheck). Fully Schrems II compatible.
Standalone CLI makes zero network calls in default mode. Schrems II compatible in default offline mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`) transmits package identifiers to a Google-operated API and is a separate compliance consideration.
## Knowledge Files (20)
@ -179,7 +179,7 @@ No persistent state except `post-session-guard.mjs` which maintains a per-sessio
Prompt injection is **structurally unsolvable** with current architectures (joint paper, 14 researchers, 95-100% ASR against all 12 tested defenses). v5.0 does not claim to "prevent" injection. Instead, it implements **defense-in-depth**:
- **Broader detection** — MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language), Unicode Tag steganography, bash expansion evasion
- **Increased attack cost** — Rule of Two enforcement (configurable block/warn/off for lethal trifecta), bash normalization before gate matching
- **Increased attack cost** — Rule of Two detection (configurable block/warn/off for lethal trifecta; default `warn`, blocks on high-confidence trifectas in opt-in `block` mode; distributed trifectas across MCP servers are detected but not blocked by default), bash normalization before gate matching
- **Longer monitoring windows** — 100-call long-horizon alongside 20-call sliding window, slow-burn trifecta detection, behavioral drift via Jensen-Shannon divergence
- **Architectural constraints** — opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag; not semantic lineage; trivially bypassed by mutation or summarisation of tool output), sub-agent delegation tracking, HITL trap detection. Inspired by CaMeL (DeepMind, 2025), but this is a lightweight byte-fingerprint, not semantic capability tracking
- **Honest documentation** — Known Limitations section acknowledges what deterministic hooks cannot detect

View file

@ -39,7 +39,7 @@ Arguments (pass through as provided by the user):
- `--fail-on <severity>` — exit 1 if findings at/above severity
URL mode notes:
- Hardened ZIP extractor with caps: 50MB compressed, 500MB uncompressed, 100x expansion ratio, 10 000 entries, depth 20.
- Hardened ZIP extractor with caps: 50MB compressed, 500MB uncompressed, 100x expansion ratio, 10 000 entries, depth 20. No fuzz-testing results published to date.
- Rejects: zip-slip paths, symlink entries, absolute paths, drive letters, encrypted entries, ZIP64.
- TLS verified, HTTPS only, 30s timeout. Cross-host redirects rejected.
- Temp directory always cleaned up (success, error, abort).

View file

@ -98,6 +98,15 @@ recorded in the audit trail.
techniques before the denylist gate runs. These are **defense-in-depth** layers
that complement the Claude Code 2.1.98+ harness-level fixes, not a replacement.
The plugin's "defense-in-depth" claim resolves to **three independent detection
layers with documented bypass classes**: (1) the Claude Code harness denylist
(out of plugin scope, evolves with platform); (2) `bash-normalize.mjs` T1-T6
collapse rules; (3) `pre-bash-destructive.mjs` post-normalization pattern match
+ `post-session-guard.mjs` runtime trifecta correlation. Each layer has known
bypasses (see Defense Philosophy in `CLAUDE.md` and `docs/critical-review-2026-04-20.md`
§4 for the evasion arsenal). Stacking layers raises attacker cost; it does not
provide formal worst-case guarantees.
| Layer | Technique | Example | Normalization |
|-------|-----------|---------|---------------|
| T1 | Empty quotes | `rm''-rf /` | strip `''` / `""` between tokens |

View file

@ -32,7 +32,7 @@ Attacker injects instructions via external content (files, web pages, tool outpu
| Prompt injection input scanning | Automated | `pre-prompt-inject-scan.mjs` detects CRITICAL/HIGH/MEDIUM injection patterns in user prompts | Hook file exists; MEDIUM advisory enabled |
| Unicode Tag steganography detection | Automated | `string-utils.mjs` decodes U+E0000-E007F tags; `injection-patterns.mjs` escalates to CRITICAL/HIGH | `decodeUnicodeTags()` in normalization pipeline |
| Bash evasion normalization | Automated | `bash-normalize.mjs` strips parameter expansion before pattern matching | `normalizeBashExpansion()` called by both bash hooks |
| Rule of Two enforcement | Automated | `post-session-guard.mjs` detects trifecta (untrusted input + sensitive data + exfil) | `LLM_SECURITY_TRIFECTA_MODE` env var respected; block mode available |
| Rule of Two detection (block-mode opt-in) | Automated | `post-session-guard.mjs` detects trifecta (untrusted input + sensitive data + exfil); blocks only when `LLM_SECURITY_TRIFECTA_MODE=block` AND high-confidence trifecta is observed; default `warn` | `LLM_SECURITY_TRIFECTA_MODE` env var respected; block mode opt-in |
| Long-horizon monitoring | Automated | `post-session-guard.mjs` 100-call window + behavioral drift detection | Long-horizon window active alongside 20-call window |
| HITL trap detection | Automated | `injection-patterns.mjs` HIGH patterns for approval urgency, summary suppression, scope minimization | HITL patterns present in HIGH_PATTERNS array |
| Hybrid attack detection | Automated | `injection-patterns.mjs` HYBRID_PATTERNS for P2SQL, recursive injection, XSS | Hybrid patterns checked in tool output scanning |