Hver /security <cmd> som produserer rapport printer nå en klikkbar file://-lenke til en self-contained HTML-versjon. Levert over fem sesjoner; sesjon 5 wirer de 14 resterende skill-filene + slipper v7.7.0 (versjonsbump + docs). Sesjon-historikk: - Sesjon 1 (0dc7ff4) — playground katalog list-view + builder-pane med copy-knapp på alle 18 rapporter - Sesjon 2 (86d6ecd) — playground prosjekt-surface opprydding (stub-screen + topbar-splitt) - Sesjon 3 (fa5fb48) — extract 18 inline parsers + 18 inline renderers fra playground til canonical ESM-modul scripts/lib/report-renderers.mjs (playground beholder bit-identisk inline-kopi siden ESM import ikke fungerer fra file://) - Sesjon 4 (db80854) — ny zero-dep CLI scripts/render-report.mjs (stdin/file/stdout-modus, kebab→camel commandId-routing, ~140 KB self-contained HTML med 6 inlined DS-stylesheets + lokal .report-table, absolutte file://-paths for Ghostty cmd-click). 4 skills wired: scan, audit, posture, deep-scan. - Sesjon 5 (denne) — 14 resterende skills wired: plugin-audit, mcp-audit, mcp-inspect, ide-scan, supply-check, dashboard, pre-deploy, diff, watch, registry, clean, harden, threat-model, red-team. Hver skill-fil har nå en HTML Report-step som instruerer Claude å skrive markdown verbatim, kjøre CLI, og appende klikkbar file://-lenke til respons. Release-arbeid: - Versjonsbump v7.6.1 → v7.7.0 i 6 plugin-filer + 2 rot-filer (package.json, .claude-plugin/plugin.json, README badge, CLAUDE.md header + state-seksjon, docs/version-history.md, plugin Recent versions- tabell, rot README plugin-entry, rot CLAUDE.md plugin-katalog) - CHANGELOG [7.7.0] med full historikk fra sesjon 1-5 - docs/version-history.md v7.7.0-seksjon Verifisert: - 18/18 commandIds i CLI gir > 138 KB self-contained HTML - 1819/1820 tester grønne (pre-compact-scan-perf-flake fyrte under last, passerer i isolasjon på 1582 ms — pre-eksisterende, defer til v7.7.x) - 18/18 skill-filer har HTML Report-step - Ingen kildefil-treff på 7.6.1 utenfor historiske changelog/version- history/README releases-tabell Ingen scanner- eller hook-atferdsendringer — purely additive surface. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
14 KiB
LLM Security — Version history
Per-release notes for v7.0.0 onward. Imported from CLAUDE.md via @docs/version-history.md.
v7.0.0 — Severity-dominated risk scoring (v2 model, BREAKING)
Three changes target the false-positive cascade on real codebases (hyperframes.com gave BLOCK / Extreme / 100, ~70% noise):
- Risk-score v2 formula (
scanners/lib/severity.mjs) — severity-dominated, log-scaled within tier. Replaces v1 sum-and-cap that collapsed every non-trivial scan to 100/Extreme. Tiers: critical → 70–95, high only → 40–65, medium only → 15–35, low only → 1–11. Verdict cutoffs realigned to new bands (BLOCK ≥65, WARNING ≥15).infofindings are observability-only — counted in OWASP aggregates but contribute zero to risk_score, verdict, and riskBand (B3, v7.2.0 — was undocumented pre-7.2.0). Seeseverity.mjsJSDoc for full contract. - Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy — extensions skipped (
.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map); line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpegfilter_complex, User-Agent strings, SQL DDL,throw new Error(\...`), markdown image URLs). Configurable via.llm-security/policy.jsonentropysection (thresholds,suppress_extensions,suppress_line_patterns,suppress_paths). Envelopecalibration` block reports skip counters + effective thresholds + policy source. - DEP typosquat allowlist expansion — 22 npm + 5 PyPI entries for short-name tools that tripped Levenshtein detection on every modern codebase (
knip,oxlint,tsx,nx,rimraf,uv,ruff, etc.).
See docs/security-hardening-guide.md §6 for the calibration story.
v7.1.1 — Scan-rapport narrative coherence (patch)
Three coordinated edits address the whiplash symptom that survived v7.0.0 (numbers fixed, narrative still walked findings back as "false positive" in prose):
(a) agents/skill-scanner-agent.md Step 2.5 mandates context-first severity assignment — every signal has exactly one disposition (suppressed OR reported), no per-finding walk-back; (b) templates/unified-report.md gains a ### Narrative Audit block in Executive Summary surfacing summary.narrative_audit.suppressed_findings.{count, by_category} from the agent's trailing JSON; (c) both files updated from stale v1 risk-formula constants to the v2 model that has been authoritative in severity.mjs since v7.0.0. Counter is distinct from the existing top-level output.suppressed (.llm-security-ignore rule integer). Out-of-scope but flagged: commands/scan.md:113-114 retains the v1 formula; resolution deferred to Batch B.
v7.3.0 — MCP cumulative-drift baseline (Wave C of Batch C)
Closes E14 from docs/critical-review-2026-04-20.md. The mcp-description-cache.mjs schema gains a sticky baseline slot per tool plus a 10-event rolling history array (FIFO). Cumulative drift = levenshtein(current, baseline) / max(|current|, |baseline|); when the ratio crosses mcp.cumulative_drift_threshold (default 0.25), post-mcp-verify.mjs emits a separate MEDIUM mcp-cumulative-drift advisory. The existing per-update >10% drift signal is unchanged — both fire independently. Slow-burn rug-pulls that keep each update under the per-update threshold but cumulatively diverge from baseline are now caught. Baseline survives the 7-day TTL purge so detection persists across the full window. New /security mcp-baseline-reset slash command (plus scanners/mcp-baseline-reset.mjs CLI: --list, --target <tool>, or no-args clear-all) lets the user acknowledge a legitimate MCP server upgrade — clearing the baseline causes the next call to seed a fresh one from the incoming description; description, firstSeen, lastSeen, and history are preserved for audit. LLM_SECURITY_MCP_CACHE_FILE env var overrides the cache path for end-to-end testing without polluting the user's real ~/.cache/llm-security/mcp-descriptions.json.
v7.3.0 — Env-var deprecation warnings (D3 of Batch C, Wave D)
Closes 8.7 from .claude/projects/2026-04-29-batch-c-scope-finalize/plan.md. scanners/lib/policy-loader.mjs exports a new helper getPolicyValueWithEnvWarn(section, key, envVarName, defaultValue) — env still wins per Preferences (existing behaviour), but when both the env-var AND the policy.json key are explicitly set, the helper emits a single per-process stderr line: [llm-security] Deprecation: env-var ${ENVVAR} will be removed in v8.0.0; policy.json key ${section}.${key} also set — env wins for now. Suppress with LLM_SECURITY_DEPRECATION_QUIET=1. Module-scoped Set dedupes per env-var name across call-sites. Four overlapping vars are wired through the helper: LLM_SECURITY_INJECTION_MODE ↔ injection.mode (in pre-prompt-inject-scan.mjs), LLM_SECURITY_TRIFECTA_MODE ↔ trifecta.mode and LLM_SECURITY_ESCALATION_WINDOW ↔ trifecta.escalation_window (in post-session-guard.mjs), LLM_SECURITY_AUDIT_LOG ↔ audit.log_path (in scanners/lib/audit-trail.mjs). DEFAULT_POLICY gains trifecta.escalation_window: 5 to close the gap noted in the plan revisions table (M10). Env-only vars without policy.json equivalents (LLM_SECURITY_UPDATE_CHECK, LLM_SECURITY_PRECOMPACT_MODE, LLM_SECURITY_PRECOMPACT_MAX_BYTES, LLM_SECURITY_IDE_ROOTS, LLM_SECURITY_MCP_CACHE_FILE) are unchanged — they emit no deprecation signal because there is nothing to deprecate yet.
v7.5.0 — Playground (additive surface, no scanner/hook behavior changes)
Single-file SPA at playground/llm-security-playground.html (~10 200 lines) for onboarding, demo og workshop-bruk uten Claude Code-installasjon. Parser + renderer for alle 18 produces_report=true-kommandoer i CATALOG. State i IndexedDB primær (llm-security-playground-v1) med localStorage-fallback, sirkelfri Proxy + EventTarget store, microtask-batchet render. Theme-bootstrap med FOUC-prevention. 4 overflater: onboarding (5 grupper) → home (3 tracks) → catalog (20 kommandoer) ⇄ project (rapporter / oversikt / kontekst / eksport). Demo-state har tre prosjekter inline; dft-komplett-demo har alle 18 rapporter ferdig parsed for klikk-gjennom. Vendor-synket design-system under playground/vendor/playground-design-system/ (sjekksum-låst via MANIFEST.json, redigeres aldri direkte). Test-fixtures under playground/test-fixtures/ (én markdown-fil per kommando) er kontrakt-anker for parser-utvikling. Skjermdumper i playground/screenshots/v7.5.0/. Eksponerte vinduer-globaler for testing/automasjon: __store, __navigate, __loadDemoState, __scheduleRender, __PARSERS, __RENDERERS, __CATALOG, __inferVerdict, __inferKeyStats, __renderPageShell, __handlePasteImport. Inkluderer fix av normalizeVerdictText regex-rekkefølge: GO-WITH-CONDITIONS sjekkes før GO så betinget verdict ikke kollapser til ALLOW.
v7.6.0 — Playground Tier 3-referanse-case (additive surface, no scanner/hook behavior changes)
Playgroundet er nå en visuelt og strukturelt fullført referanse-implementasjon for shared/playground-design-system/ Tier 3-supplementet. 8 nye Tier 3-komponenter integrert i de 18 rapport-rendererne: tfa-flow + tfa-leg + tfa-arrow (lethal trifecta-kjede med <button>-elementer + ARIA-group/aria-label) i renderScan + renderDeepScan; mat-ladder + mat-step (5-trinns modenhets-stige med terskler 0/25/50/75/95% PASS) i renderPosture; suppressed-group (narrative-audit fra summary.narrative_audit.suppressed_findings) i renderScan + renderDeepScan; codepoint-reveal + cp-tag/cp-zw/cp-bidi (Unicode-steganografi side-ved-side reveal med U+200B-D|FEFF|2060|180E → cp-zw, U+202A-E|2066-9 → cp-bidi-detection) i renderMcpInspect; top-risks + top-risk[data-severity] (rangert top-funn-listing, semantisk <ol>, ekskluderer info-funn) i renderScan/renderDeepScan/renderPluginAudit/renderPosture/renderAudit; utvidet recommendation-card[data-severity] (severity-tinted advisory) på alle inline-bruk + nye per-bucket advisory-cards i renderClean + intro snapshot + diff-rows i renderHarden (action-mapping CREATE→positive / APPEND→medium / MERGE→low / SKIP→low); risk-meter (band-visualisering 0-100 med Low/Medium/High/Critical/Extreme bands) på 5 archetypes (scan, deep-scan, plugin-audit, audit, red-team); card--severity-{level} modifier på findings__item-cards. Wave 1 (Sesjon 2) la til badge--scope-security (identitets-chip), verdict-pill-lg med __verdict+__sub (erstatter custom verdict-pill på alle 18 rapport-typer), og DS Tier 3 form-progress + fp-step i onboarding-wizard. Wave 0 (Sesjon 1) slettet ~30 duplikat-CSS-deklarasjoner fra <style>-blokken (DS vinner cascade) og harmoniserte page-shell på alle 4 overflater. 5 nye DS-helpers: renderToxicFlow, renderMatLadder, renderSuppressedGroup, renderCodepointReveal, renderTopRisks. 2 nye normaliserings-helpers: mapSeverityToCardLevel(input) (severity + action-types til DS-konvensjoner) og parseNarrativeAudit(md). 12 skjermdumper planlagt i playground/screenshots/v7.6.0/. A11Y-rapport oppdatert (playground/A11Y-RAPPORT.md) — WCAG 2.1 AA bekreftet, severity-soft fargepar verifisert, semantiske elementer (<ol>, <button>, <section>) erstatter generic <div>. Filendring totalt over 5 sesjoner: 10209 → 10677 linjer. Kjent begrensning: parsed.findings er tom for deep-scan/audit demo-fixturer (parser-begrensning, ikke fikset i v7.6.0 — sporet for v7.6.x patch).
v7.6.1 — Playground visuell-patch (no scanner/hook behavior changes)
Seks bugs fanget av maintainer ved manuell verifisering i nettleser etter v7.6.0-release. Alle skyldtes mismatch mellom DS-klasser og hvordan playground-rendrere brukte dem (eller manglende DS-implementasjoner av klasser playground-rendrere antok eksisterte).
(1) renderFindingsBlock brukte .findings outer-class som DS har som 2-kolonners grid (grid-template-columns: 360px 1fr) for list+detail-panel-layout — playground brukte den uten detail-panel, headeren havnet i venstre 360px-kolonne, items i 1fr. Erstattet med <section class="report-meta"> + <h4> + korrekt findings__list > findings__group > findings__group-header + findings__items-mønster.
(2) .report-table manglet helt i DS men brukes i 7+ rendrere (OWASP-kategorier, Supply chain, Scanner Risk Matrix, Plugin-meta, Permission-matrise, Live-meter, Siste runs, Godkjenninger, Mitigation roadmap) — lagt lokal CSS-implementasjon i playground-HTML <style>-blokk (border-collapse, zebra-hover, header-styling).
(3) renderPreDeploy traffic-lights brukte .sm-card__grade som er fast 28×28 px (designet for én A-F-bokstav) — kuttet "PASS" til "AS" og "PASS-WITH-NOTES" til "PASS-WITH-..." i alle traffic-light-cards. Erstattet med bredde-tilpasset status-pill via inline styling (severity-soft + on tokens).
(4) Threat-model matrix-bobler ikke klikkbare — <span> uten event-handler. Erstattet med <button type="button" data-threat-id> + aria-label. Click-handler scroller til tilsvarende rad i Trusler-tabellen og fremhever den i 1.6 sek.
(5) Radar-labels overlappet ved 6+ akser — alle brukte text-anchor="middle" med samme offset. Økt SVG-størrelse fra 280×280 til 380×380, radius fra 105 til 125, bytter text-anchor fra middle til start/end basert på horisontal-posisjon (Math.cos(ang) > 0.2 / < -0.2 / mellom).
(6) recommendation-card__body tekstoverflyt på lange single-line tekster (vilkår, owner-tags, dato) — lagt overflow-wrap: anywhere; word-break: break-word i lokal <style>-blokk.
4/4 fix-spesifikke smoke-tester passerer + 18/18 renderere produserer fortsatt komplett HTML mot dft-komplett-demo (regresjons-test). Filendring 10677 → 10753 linjer (+76 netto).
v7.7.0 — HTML-rapport for alle 18 skill-kommandoer
Alle 18 /security-kommandoer som produserer rapport får nå en klikkbar file://-lenke til en self-contained HTML-versjon. Levert over 5 sesjoner (UX-arbeid + extract + CLI + wiring). Ingen scanner- eller hook-atferdsendringer — purely additive.
Sesjon 1 (0dc7ff4) — Playground katalog list-view + builder-pane. Katalog-overflaten fikk list-view (grid-toggle) + builder-pane med copy-knapp på alle 18 rapporter, så onboarding-flytene blir bredere/dypere uten å forlate playground-modusen.
Sesjon 2 (86d6ecd) — Playground prosjekt-surface opprydding. Stub-screen-håndtering (rapport ikke ferdig parsed → tydelig placeholder i stedet for tom panel), topbar-splitt (navigasjons-trinn vs. eksport-handlinger), generell DS-justering for projekt-overflate.
Sesjon 3 (fa5fb48) — scripts/lib/report-renderers.mjs extract. De 18 inline parserne + 18 inline rendererne i playground-HTML-fila flyttet til canonical ESM-modul (scripts/lib/report-renderers.mjs) med ren import { PARSERS, RENDERERS } from './...'-overflate. Playground beholder en inline-kopi (bit-identisk) fordi ESM import ikke fungerer fra file:// uten Chrome/Firefox-flags. Canonical kilden + playground inline = to overflater, samme atferd.
Sesjon 4 (db80854) — scripts/render-report.mjs CLI + 4 skills wired. Ny zero-dep Node-CLI tar commandId + --in/--out (stdin/file/stdout-modus), bruker kebab→camel-konvertering så alle 18 commandIds fungerer automatisk. Output er self-contained HTML (~140 KB): inlines 6 DS-stylesheets (tokens, base, components, tier2, tier3, tier3-supplement) + lokal .report-table-implementasjon. Fonter ikke inlined (ville blåst opp HTML 7x til ~1 MB) — tokens.css har -apple-system, BlinkMacSystemFont, system-ui som fallback. Absolutte file://-paths i stdout for Ghostty cmd-click. Default output reports/<command>-<YYYYMMDD-HHmmss>.html relativt til CWD. 4 skills wired: scan, audit, posture, deep-scan.
Sesjon 5 — 14 resterende skills wired + release. plugin-audit, mcp-audit, mcp-inspect, ide-scan, supply-check, dashboard, pre-deploy, diff, watch, registry, clean, harden, threat-model, red-team — alle har nå en avsluttende "HTML Report"-step i sin skill-fil som instruerer Claude å (1) compute temp md-path, (2) Write hele markdown-rapporten verbatim, (3) kjøre CLI, (4) appende > **HTML-rapport:** [Åpne i nettleser](file:///abs/sti.html) til respons. v7.7.0 release (versjonsbump på tvers av package.json, .claude-plugin/plugin.json, README badge + state, CLAUDE.md header + state-seksjon, marketplace-rot-README).
Pre-existing pre-compact-scan-perf-flake (1000ms terskel under last) gjenstår — defer til v7.7.x patch.