Pre-trekexecute snapshot of in-progress CLAUDE.md/SKILL.md edits and extracted docs/ files. Captured as one commit so /trekexecute claude-design can run against a clean working tree. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
47 lines
12 KiB
Markdown
47 lines
12 KiB
Markdown
# LLM Security — Version history
|
||
|
||
Per-release notes for v7.0.0 onward. Imported from `CLAUDE.md` via `@docs/version-history.md`.
|
||
|
||
## v7.0.0 — Severity-dominated risk scoring (v2 model, BREAKING)
|
||
|
||
Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise):
|
||
|
||
1. **Risk-score v2 formula** (`scanners/lib/severity.mjs`) — severity-dominated, log-scaled within tier. Replaces v1 sum-and-cap that collapsed every non-trivial scan to 100/Extreme. Tiers: critical → 70–95, high only → 40–65, medium only → 15–35, low only → 1–11. Verdict cutoffs realigned to new bands (BLOCK ≥65, WARNING ≥15). `info` findings are observability-only — counted in OWASP aggregates but contribute zero to risk_score, verdict, and riskBand (B3, v7.2.0 — was undocumented pre-7.2.0). See `severity.mjs` JSDoc for full contract.
|
||
2. **Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy** — extensions skipped (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`); line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source.
|
||
3. **DEP typosquat allowlist expansion** — 22 npm + 5 PyPI entries for short-name tools that tripped Levenshtein detection on every modern codebase (`knip`, `oxlint`, `tsx`, `nx`, `rimraf`, `uv`, `ruff`, etc.).
|
||
|
||
See `docs/security-hardening-guide.md` §6 for the calibration story.
|
||
|
||
## v7.1.1 — Scan-rapport narrative coherence (patch)
|
||
|
||
Three coordinated edits address the whiplash symptom that survived v7.0.0 (numbers fixed, narrative still walked findings back as "false positive" in prose):
|
||
(a) `agents/skill-scanner-agent.md` Step 2.5 mandates context-first severity assignment — every signal has exactly one disposition (suppressed OR reported), no per-finding walk-back; (b) `templates/unified-report.md` gains a `### Narrative Audit` block in Executive Summary surfacing `summary.narrative_audit.suppressed_findings.{count, by_category}` from the agent's trailing JSON; (c) both files updated from stale v1 risk-formula constants to the v2 model that has been authoritative in `severity.mjs` since v7.0.0. Counter is distinct from the existing top-level `output.suppressed` (`.llm-security-ignore` rule integer). Out-of-scope but flagged: `commands/scan.md:113-114` retains the v1 formula; resolution deferred to Batch B.
|
||
|
||
## v7.3.0 — MCP cumulative-drift baseline (Wave C of Batch C)
|
||
|
||
Closes E14 from `docs/critical-review-2026-04-20.md`. The `mcp-description-cache.mjs` schema gains a sticky `baseline` slot per tool plus a 10-event rolling `history` array (FIFO). Cumulative drift = `levenshtein(current, baseline) / max(|current|, |baseline|)`; when the ratio crosses `mcp.cumulative_drift_threshold` (default 0.25), `post-mcp-verify.mjs` emits a separate MEDIUM `mcp-cumulative-drift` advisory. The existing per-update >10% drift signal is unchanged — both fire independently. Slow-burn rug-pulls that keep each update under the per-update threshold but cumulatively diverge from baseline are now caught. Baseline survives the 7-day TTL purge so detection persists across the full window. New `/security mcp-baseline-reset` slash command (plus `scanners/mcp-baseline-reset.mjs` CLI: `--list`, `--target <tool>`, or no-args clear-all) lets the user acknowledge a legitimate MCP server upgrade — clearing the baseline causes the next call to seed a fresh one from the incoming description; description, firstSeen, lastSeen, and history are preserved for audit. `LLM_SECURITY_MCP_CACHE_FILE` env var overrides the cache path for end-to-end testing without polluting the user's real `~/.cache/llm-security/mcp-descriptions.json`.
|
||
|
||
## v7.3.0 — Env-var deprecation warnings (D3 of Batch C, Wave D)
|
||
|
||
Closes 8.7 from `.claude/projects/2026-04-29-batch-c-scope-finalize/plan.md`. `scanners/lib/policy-loader.mjs` exports a new helper `getPolicyValueWithEnvWarn(section, key, envVarName, defaultValue)` — env still wins per Preferences (existing behaviour), but when both the env-var AND the `policy.json` key are explicitly set, the helper emits a single per-process stderr line: `[llm-security] Deprecation: env-var ${ENVVAR} will be removed in v8.0.0; policy.json key ${section}.${key} also set — env wins for now. Suppress with LLM_SECURITY_DEPRECATION_QUIET=1.` Module-scoped `Set` dedupes per env-var name across call-sites. Four overlapping vars are wired through the helper: `LLM_SECURITY_INJECTION_MODE` ↔ `injection.mode` (in `pre-prompt-inject-scan.mjs`), `LLM_SECURITY_TRIFECTA_MODE` ↔ `trifecta.mode` and `LLM_SECURITY_ESCALATION_WINDOW` ↔ `trifecta.escalation_window` (in `post-session-guard.mjs`), `LLM_SECURITY_AUDIT_LOG` ↔ `audit.log_path` (in `scanners/lib/audit-trail.mjs`). `DEFAULT_POLICY` gains `trifecta.escalation_window: 5` to close the gap noted in the plan revisions table (M10). Env-only vars without policy.json equivalents (`LLM_SECURITY_UPDATE_CHECK`, `LLM_SECURITY_PRECOMPACT_MODE`, `LLM_SECURITY_PRECOMPACT_MAX_BYTES`, `LLM_SECURITY_IDE_ROOTS`, `LLM_SECURITY_MCP_CACHE_FILE`) are unchanged — they emit no deprecation signal because there is nothing to deprecate yet.
|
||
|
||
## v7.5.0 — Playground (additive surface, no scanner/hook behavior changes)
|
||
|
||
Single-file SPA at `playground/llm-security-playground.html` (~10 200 lines) for onboarding, demo og workshop-bruk uten Claude Code-installasjon. Parser + renderer for alle 18 `produces_report=true`-kommandoer i `CATALOG`. State i IndexedDB primær (`llm-security-playground-v1`) med localStorage-fallback, sirkelfri Proxy + EventTarget store, microtask-batchet render. Theme-bootstrap med FOUC-prevention. 4 overflater: onboarding (5 grupper) → home (3 tracks) → catalog (20 kommandoer) ⇄ project (rapporter / oversikt / kontekst / eksport). Demo-state har tre prosjekter inline; `dft-komplett-demo` har alle 18 rapporter ferdig parsed for klikk-gjennom. Vendor-synket design-system under `playground/vendor/playground-design-system/` (sjekksum-låst via `MANIFEST.json`, redigeres aldri direkte). Test-fixtures under `playground/test-fixtures/` (én markdown-fil per kommando) er kontrakt-anker for parser-utvikling. Skjermdumper i `playground/screenshots/v7.5.0/`. Eksponerte vinduer-globaler for testing/automasjon: `__store`, `__navigate`, `__loadDemoState`, `__scheduleRender`, `__PARSERS`, `__RENDERERS`, `__CATALOG`, `__inferVerdict`, `__inferKeyStats`, `__renderPageShell`, `__handlePasteImport`. Inkluderer fix av `normalizeVerdictText` regex-rekkefølge: GO-WITH-CONDITIONS sjekkes før GO så betinget verdict ikke kollapser til ALLOW.
|
||
|
||
## v7.6.0 — Playground Tier 3-referanse-case (additive surface, no scanner/hook behavior changes)
|
||
|
||
Playgroundet er nå en visuelt og strukturelt fullført referanse-implementasjon for `shared/playground-design-system/` Tier 3-supplementet. 8 nye Tier 3-komponenter integrert i de 18 rapport-rendererne: `tfa-flow` + `tfa-leg` + `tfa-arrow` (lethal trifecta-kjede med `<button>`-elementer + ARIA-group/aria-label) i `renderScan` + `renderDeepScan`; `mat-ladder` + `mat-step` (5-trinns modenhets-stige med terskler 0/25/50/75/95% PASS) i `renderPosture`; `suppressed-group` (narrative-audit fra `summary.narrative_audit.suppressed_findings`) i `renderScan` + `renderDeepScan`; `codepoint-reveal` + `cp-tag`/`cp-zw`/`cp-bidi` (Unicode-steganografi side-ved-side reveal med U+200B-D|FEFF|2060|180E → `cp-zw`, U+202A-E|2066-9 → `cp-bidi`-detection) i `renderMcpInspect`; `top-risks` + `top-risk[data-severity]` (rangert top-funn-listing, semantisk `<ol>`, ekskluderer info-funn) i `renderScan`/`renderDeepScan`/`renderPluginAudit`/`renderPosture`/`renderAudit`; utvidet `recommendation-card[data-severity]` (severity-tinted advisory) på alle inline-bruk + nye per-bucket advisory-cards i `renderClean` + intro snapshot + diff-rows i `renderHarden` (action-mapping CREATE→positive / APPEND→medium / MERGE→low / SKIP→low); `risk-meter` (band-visualisering 0-100 med Low/Medium/High/Critical/Extreme bands) på 5 archetypes (scan, deep-scan, plugin-audit, audit, red-team); `card--severity-{level}` modifier på `findings__item`-cards. Wave 1 (Sesjon 2) la til `badge--scope-security` (identitets-chip), `verdict-pill-lg` med `__verdict`+`__sub` (erstatter custom verdict-pill på alle 18 rapport-typer), og DS Tier 3 `form-progress` + `fp-step` i onboarding-wizard. Wave 0 (Sesjon 1) slettet ~30 duplikat-CSS-deklarasjoner fra `<style>`-blokken (DS vinner cascade) og harmoniserte page-shell på alle 4 overflater. 5 nye DS-helpers: `renderToxicFlow`, `renderMatLadder`, `renderSuppressedGroup`, `renderCodepointReveal`, `renderTopRisks`. 2 nye normaliserings-helpers: `mapSeverityToCardLevel(input)` (severity + action-types til DS-konvensjoner) og `parseNarrativeAudit(md)`. 12 skjermdumper planlagt i `playground/screenshots/v7.6.0/`. A11Y-rapport oppdatert (`playground/A11Y-RAPPORT.md`) — WCAG 2.1 AA bekreftet, severity-soft fargepar verifisert, semantiske elementer (`<ol>`, `<button>`, `<section>`) erstatter generic `<div>`. Filendring totalt over 5 sesjoner: 10209 → 10677 linjer. Kjent begrensning: `parsed.findings` er tom for `deep-scan`/`audit` demo-fixturer (parser-begrensning, ikke fikset i v7.6.0 — sporet for v7.6.x patch).
|
||
|
||
## v7.6.1 — Playground visuell-patch (no scanner/hook behavior changes)
|
||
|
||
Seks bugs fanget av maintainer ved manuell verifisering i nettleser etter v7.6.0-release. Alle skyldtes mismatch mellom DS-klasser og hvordan playground-rendrere brukte dem (eller manglende DS-implementasjoner av klasser playground-rendrere antok eksisterte).
|
||
|
||
(1) `renderFindingsBlock` brukte `.findings` outer-class som DS har som 2-kolonners grid (`grid-template-columns: 360px 1fr`) for list+detail-panel-layout — playground brukte den uten detail-panel, headeren havnet i venstre 360px-kolonne, items i 1fr. Erstattet med `<section class="report-meta">` + `<h4>` + korrekt `findings__list > findings__group > findings__group-header + findings__items`-mønster.
|
||
(2) `.report-table` manglet helt i DS men brukes i 7+ rendrere (OWASP-kategorier, Supply chain, Scanner Risk Matrix, Plugin-meta, Permission-matrise, Live-meter, Siste runs, Godkjenninger, Mitigation roadmap) — lagt lokal CSS-implementasjon i playground-HTML `<style>`-blokk (border-collapse, zebra-hover, header-styling).
|
||
(3) `renderPreDeploy` traffic-lights brukte `.sm-card__grade` som er fast 28×28 px (designet for én A-F-bokstav) — kuttet "PASS" til "AS" og "PASS-WITH-NOTES" til "PASS-WITH-..." i alle traffic-light-cards. Erstattet med bredde-tilpasset status-pill via inline styling (severity-soft + on tokens).
|
||
(4) Threat-model matrix-bobler ikke klikkbare — `<span>` uten event-handler. Erstattet med `<button type="button" data-threat-id>` + `aria-label`. Click-handler scroller til tilsvarende rad i Trusler-tabellen og fremhever den i 1.6 sek.
|
||
(5) Radar-labels overlappet ved 6+ akser — alle brukte `text-anchor="middle"` med samme offset. Økt SVG-størrelse fra 280×280 til 380×380, radius fra 105 til 125, bytter `text-anchor` fra `middle` til `start`/`end` basert på horisontal-posisjon (`Math.cos(ang)` > 0.2 / < -0.2 / mellom).
|
||
(6) `recommendation-card__body` tekstoverflyt på lange single-line tekster (vilkår, owner-tags, dato) — lagt `overflow-wrap: anywhere; word-break: break-word` i lokal `<style>`-blokk.
|
||
|
||
4/4 fix-spesifikke smoke-tester passerer + 18/18 renderere produserer fortsatt komplett HTML mot `dft-komplett-demo` (regresjons-test). Filendring 10677 → 10753 linjer (+76 netto).
|