Tre fixes commited etter v7.7.0-tagen (b732eee + 2a6f73f + 81b7beb) viste
versjons-inkonsistens: package.json + plugin.json + README badge + CLAUDE.md
header satt fortsatt på v7.7.0 mens commit-meldinger og inline-kommentarer
refererte v7.7.1 som om det var en release. Per feedback_version_sync.md
skal alle versjonsreferanser stemme — denne commiten lukker gapet.
Endringer:
- package.json: 7.7.0 → 7.7.1
- .claude-plugin/plugin.json: 7.7.0 → 7.7.1
- plugin README badge: version-7.7.0-blue → version-7.7.1-blue
- plugin README "Recent versions"-tabell: ny [7.7.1]-rad
- plugin CLAUDE.md header + v7.7.1-highlights state-seksjon
- docs/version-history.md: ny v7.7.1-seksjon
- playground HTML linje 6935: 'Plugin v7.7.0' → 'Plugin v7.7.1'
(samme template-litteral som v7.7.0-bumpen ikke fanget, nå synket)
- CHANGELOG.md: ny [7.7.1]-seksjon med full Changed/Fixed/Notes
- rot README llm-security-entry: v7.7.0 → v7.7.1 + ny v7.7.1-bullet
- rot CLAUDE.md plugin-katalog: v7.7.1-bump
Verifisert:
- 1820/1820 tester grønne (pre-compact-flake fyrte ikke)
- CLI rapporterer fornuftig feilmelding på tom input
- Ingen kildefil-treff på 7.7.0 utenfor CHANGELOG/version-history/REMEMBER/TODO/ROADMAP
Ingen ny atferd. Kun versjons-synking + dokumentasjon av tre fixes som var
deployert som ad-hoc-commits.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add three sections to bring README depth closer to marketplace standard
(was 206 lines, smallest in marketplace; now 298):
- "Why this exists" — names the convergent-middle-ground problem in Claude
Design, cites Anthropic's frontend-aesthetics cookbook as primary source,
explains the five-layer prompt scaffold and the plugin's interactive role
- "Workflow example: from idea to prompt" — realistic 8-phase walkthrough
against the slides preset (Q1 results all-hands deck), including a ~30-line
sample prompt block showing what Phase 6 actually delivers
- "Recent versions" — v0.1.0 summary + CHANGELOG.md pointer + v1.0
readiness criteria pointer to REMEMBER.md
No feature changes; pure docs polish. verify.sh passes 41/0/1 (the 1 warn
is the expected SC1 dogfood-missing advisory until operator runs first
real Claude Design session).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Etter v7.7.1-strippen ble onboarding fjernet, men topbar viste fortsatt
demo-state-organisasjonsnavnet ('Direktoratet for digital tjeneste-
utvikling') hentet fra shared.organization. Det er ikke meningsfullt
lenger siden onboarding ikke kan endre verdien.
Erstattet med statisk 'llm-security' som nøytralt scope-anker. Crumb-
parameteren (f.eks. 'Katalog') beholdes som suffix.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Operatør-tilbakemelding etter v7.7.0: hjem-overflaten ledet fortsatt med
prosjekter (Re-onboard / Nytt prosjekt / Command-katalog) — katalog var
tredje kort, sekundært bak prosjekt-tracks. Brukeren ba om å fjerne
onboarding + prosjekter og beholde katalog ('Vi legger til funksjonalitet
senere').
Minimum-strip (gammel kode bevart, kun routing + topbar endret):
- renderActive(): tvinger alltid activeSurface til 'catalog'.
Onboarding/home/project-render-funksjonene er bevart men ikke rutbare.
- Init-default endret fra 'home' til 'catalog' (også for migrerte states).
- Topbar: 'Hjem' og 'Re-onboard'-knappene fjernet. 'Katalog' beholdt
sammen med Eksporter/Importer/tema-toggle.
Konsekvens: playgrounden lander direkte i Command-katalog (20 kommandoer
med list-view + builder-pane + copy-knapp fra sesjon 1). Project-state +
onboarding-state forblir i IndexedDB men ingen UI-vei dit. Når funksjon-
alitet legges til igjen kan routeren utvides og topbar-knapper restaureres.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hjem-overflaten viste fortsatt 'Plugin v7.6.1' i meta-linja fordi
versjon-strengen er hardkodet inline i renderHome (line 6933), ikke
synkronisert med package.json. Fanget post-release.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hver /security <cmd> som produserer rapport printer nå en klikkbar
file://-lenke til en self-contained HTML-versjon. Levert over fem
sesjoner; sesjon 5 wirer de 14 resterende skill-filene + slipper
v7.7.0 (versjonsbump + docs).
Sesjon-historikk:
- Sesjon 1 (0dc7ff4) — playground katalog list-view + builder-pane med
copy-knapp på alle 18 rapporter
- Sesjon 2 (86d6ecd) — playground prosjekt-surface opprydding
(stub-screen + topbar-splitt)
- Sesjon 3 (fa5fb48) — extract 18 inline parsers + 18 inline renderers
fra playground til canonical ESM-modul scripts/lib/report-renderers.mjs
(playground beholder bit-identisk inline-kopi siden ESM import ikke
fungerer fra file://)
- Sesjon 4 (db80854) — ny zero-dep CLI scripts/render-report.mjs
(stdin/file/stdout-modus, kebab→camel commandId-routing, ~140 KB
self-contained HTML med 6 inlined DS-stylesheets + lokal .report-table,
absolutte file://-paths for Ghostty cmd-click). 4 skills wired:
scan, audit, posture, deep-scan.
- Sesjon 5 (denne) — 14 resterende skills wired: plugin-audit, mcp-audit,
mcp-inspect, ide-scan, supply-check, dashboard, pre-deploy, diff,
watch, registry, clean, harden, threat-model, red-team. Hver skill-fil
har nå en HTML Report-step som instruerer Claude å skrive markdown
verbatim, kjøre CLI, og appende klikkbar file://-lenke til respons.
Release-arbeid:
- Versjonsbump v7.6.1 → v7.7.0 i 6 plugin-filer + 2 rot-filer
(package.json, .claude-plugin/plugin.json, README badge, CLAUDE.md
header + state-seksjon, docs/version-history.md, plugin Recent versions-
tabell, rot README plugin-entry, rot CLAUDE.md plugin-katalog)
- CHANGELOG [7.7.0] med full historikk fra sesjon 1-5
- docs/version-history.md v7.7.0-seksjon
Verifisert:
- 18/18 commandIds i CLI gir > 138 KB self-contained HTML
- 1819/1820 tester grønne (pre-compact-scan-perf-flake fyrte under last,
passerer i isolasjon på 1582 ms — pre-eksisterende, defer til v7.7.x)
- 18/18 skill-filer har HTML Report-step
- Ingen kildefil-treff på 7.6.1 utenfor historiske changelog/version-
history/README releases-tabell
Ingen scanner- eller hook-atferdsendringer — purely additive surface.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- New scripts/render-report.mjs CLI: stdin/file/stdout modes, ESM import
from ./lib/report-renderers.mjs, kebab→camel renderer-name lookup so
any of the 18 PARSERS works
- Standalone HTML wrap: inlines 6 DS stylesheets (tokens, base, components,
tier2, tier3, tier3-supplement) + local .report-table CSS. Skips fonts.css
→ system-ui fallback via tokens.css (~137 KB self-contained vs ~1 MB
with woff2 bundled)
- 4 skill files wired: commands/{scan,audit,posture,deep-scan}.md — new
step instructs Claude to Write the markdown report to a temp file,
invoke the CLI, and print a markdown-formatted file:// link
- Absolute file:// paths in stdout for Ghostty cmd-click compatibility
- Default output: reports/<command>-<YYYYMMDD-HHmmss>.html relative to CWD
- Smoke-tested: stdin→stdout, file→file roundtrip, all 4 commands produce
valid HTML with DS-aligned page-shell (page__title, verdict-pill-lg,
risk-meter, key-stats, findings__item, recommendation-card)
- Tests 1820/1820 green (same baseline; pre-compact-scan perf-flake from
NEXT-SESSION-PROMPT did not fire on retry)
- Playground untouched (2 scripts, 0 parse failures), report-renderers.mjs
untouched (74 exports, 18 PARSERS, 18 RENDERERS)
Sesjon 4 av 5. v7.7.0 release + 9 remaining skill wirings = sesjon 5.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CLAUDE.md and README.md previously named the forbidden tokens
literally when describing the validate-plugin.sh assertion (i) and
test-sc3-citations.sh negative grep. The recursive scans then flagged
the documentation itself as a leak. Rewords both descriptions to
describe the policy without using the banned literals.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pre-trekexecute snapshot of in-progress CLAUDE.md/SKILL.md edits and
extracted docs/ files. Captured as one commit so /trekexecute claude-design
can run against a clean working tree.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- renderCatalogSurface rewritten to list-view (1 rad per kommando),
filter-chips (Alle/Rapport/Verktoy + 6 kategori-chips) + sok
- Builder-pane (modal) med live-preview: pipeline-strengen oppdateres
mens skjema fylles ut. Kopier-knapp er primaer CTA med clipboard API +
textarea-fallback for file:// (allerede eksisterende).
- Smart prefill fra store.state.shared via 'from: shared' fields i
renderCommandForm. Pane-state skriver ikke tilbake til shared (scope
'cat', ingen project-save). Felles-felt markert med 'felles'-badge.
- Forstegangsbesok lander pa home (fjernet onboarding auto-redirect).
Re-onboard tilgjengelig via topbar.
Sesjon 1 av 5 i v7.7.0-lopet. CSS-additioner: catalog-filter-chips,
catalog-chip, catalog-list, catalog-row, builder-modal.
Tester: 1822/1822 gronne. Static JS-parse OK. Browser-walkthrough
gjenstar — verifiseres manuelt for v7.7.0 release. Docs oppdateres ved
v7.7.0-release i Sesjon 5 (samlet commit).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Standardize named-markdown-link guidance across all plugins so file://
references render as independently clickable links in terminals like
Ghostty (bare file:// URLs only make the first clickable).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v5.1.0 shipped with an unquoted brief_version: 2.1 in trekbrief-template.md.
parseScalar coerced it to Number 2.1, and the sequencing gate guarded on
typeof === 'string', silently bypassing BRIEF_V51_MISSING_SIGNALS.
Three-part atomic fix:
- brief-validator.mjs:87+149 now accepts both string and number forms via
String(fm.brief_version) coercion.
- trekbrief-template.md quotes the value so new briefs parse as String.
- doc-consistency.test.mjs pins the QUOTED form going forward.
Three regression tests added in brief-validator.test.mjs.
Operator runs Ghostty (also iTerm2, modern Terminal.app) — all support
cmd+click on file:// URLs. Producing commands (/trekbrief, /trekplan,
/trekreview) already emit both forms but the contract was implicit.
This commit makes it explicit:
1. CLAUDE.md gains an "Operator-UX guarantee" paragraph stating both
forms must always appear in the final report: (a) plain file://
URL with absolute path (for cmd+click), (b) copy-pasteable
`open file://` command (for terminals without cmd+click).
2. tests/lib/doc-consistency.test.mjs gains a pin asserting both
patterns appear in all three producing commands' final report
blocks. Drift catches at test time.
Non-functional change to the commands themselves — they already
emit both forms (verified at trekbrief.md L510/L519, trekplan.md
L798/L802, trekreview.md L299/L317).
Operator request 2026-05-13: "Noter ned i Voyage at jeg ALLTID får
en slik direkte file:// lenke."
Flip model: sonnet → model: opus across 20 agent files, 4 prose references
in commands (trekplan, trekresearch), trekendsession command frontmatter,
and CLAUDE.md tables. Aligns CLAUDE.md premium-profile row to actual
premium.yaml content (all-opus, which has been the case since v4.1.0 but
the doc was drift). Companion to VOYAGE_PROFILE=premium env-var (set in
~/.zshenv same day) — env-var governs orchestrator phase model; this
commit governs sub-agent models which are frontmatter-pinned and not
reachable by the profile resolver.
npm test: 516 pass, 0 fail, 2 skipped (unchanged from baseline).
Operator rationale: complete Opus coverage across all Voyage activity,
including the 20 sub-agents that the profile system does not control
(architecture-mapper, task-finder, plan-critic, scope-guardian,
brief-reviewer, code-correctness-reviewer, brief-conformance-reviewer,
review-coordinator, session-decomposer, plus the 6 researcher agents,
plus the 5 codebase-analysis agents).
Cost implication: sub-agent runs ~5x more expensive vs sonnet. Accepted.
Informational blockquote after the v3.0.0 note. Documents that voyage is
Tier 1 (per-task) of a three-tier architecture under the author's private
marketplace: Tier 2 app-creator (per-app), Tier 3 app-factory (per-portfolio).
Both are pre-implementation. Asymmetry-invariant preserved: voyage stays
unaware of Tier 2/3 — Handover 1 (brief format) is the only integration
point. Brief-schema changes therefore breaking for downstream consumers,
formalized in v5.4.
The operator pointed at ~/repos/claude-code-100x/claude-code-100x/build-site.js
as the annotation reference from the start. v4.2/v4.3 built a bespoke
playground instead. v5.0.0 deleted it. v5.0.1 pointed at /playground
document-critique (Claude-leads, wrong direction). v5.0.2 was operator-led
but too thin (line-click + freeform note, no intent). v5.0.3 finally
matches the reference.
scripts/annotate.mjs rewritten:
- Markdown rendered as proper article HTML (h1/p/li/ul/table/blockquote/pre)
instead of line-numbered raw lines.
- Pencil-toggle annotation mode in the topbar, default ON.
- Select text OR click any element → form popover at cursor.
- Three intent buttons: Fiks (red) / Endre (orange) / Spørsmål (blue).
- Comment textarea. Save (Cmd+Enter), Cancel (Esc).
- Section context auto-detected from nearest h1/h2.
- Sidebar panel: annotations grouped by section, intent badges,
snippet quotes, delete buttons, click-to-scroll with flash highlight.
- Copy Prompt: structured markdown export with intent labels.
- localStorage persistence keyed on absolute artifact path
(voyage-annotate:v2: prefix to avoid colliding with v5.0.2 state).
Tests: 12 (up from 10), all passing. npm test: 518 / 516 pass / 0 fail / 2 skipped.
Reference: ~/repos/claude-code-100x/claude-code-100x/build-site.js
lines 1431–2255 (annotation UI section).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v5.0.0 added a read-only HTML render. v5.0.1 deleted that and pointed at
/playground document-critique, which pre-generates Claude's suggestions
and asks the operator to approve/reject them. The operator asked for the
opposite — a surface where THEY drive every annotation. v5.0.2 lands it.
scripts/annotate.mjs (~430 lines, zero deps) takes any artifact .md and
writes a self-contained HTML next to it. The HTML renders the document
with line numbers, lets the operator click any line to add their own
note (inline textarea, save with Cmd+Enter or button), keeps a sidebar
of all notes (editable + deletable + persisted in localStorage per
artifact path), and exposes Copy Prompt to gather every note into one
structured prompt. Operator copies, pastes back, Claude revises the .md.
The three producing commands now run annotate.mjs at their last step and
print the file:// link with explicit "Click any line to add YOUR OWN note"
instructions. The v5.0.1 /playground document-critique line is gone.
npm test green: 516 tests, 514 pass, 0 fail, 2 skipped.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The v5.0.0 stop-gap had /trekbrief, /trekplan, and /trekreview each render
a read-only {artifact}.html (via scripts/render-artifact.mjs) AND print a
vague "run the /playground plugin" instruction. In practice the read-only
HTML was redundant with what /playground produces and the instruction
wasn't copy-paste-ready — the operator had to guess the right invocation.
v5.0.1 deletes scripts/render-artifact.mjs + its test + npm run render,
and makes each producing command end with a single boxed, literal,
copy-paste-ready line:
/playground build a document-critique playground for {artifact_path}
One paste from the operator launches the official playground skill's
document-critique template, which builds an interactive HTML — artifact
on the left, per-line Approve/Reject/Comment cards on the right, Copy
Prompt button at the bottom. Mark suggestions, click Copy Prompt, paste
back, Claude revises the .md. Doc-consistency test pins the literal
invocation so the prose cannot soften back into vagueness.
npm test green: 503 tests, 501 pass, 0 fail, 2 skipped.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The v4.2/v4.3 bespoke playground SPA (~388 KB), the /trekrevise command,
Handover 8 (annotation → revision), the supporting lib/ modules
(anchor-parser, annotation-digest, markdown-write, revision-guard), the
Playwright e2e suite, and the @playwright/test / @axe-core/playwright
devDeps are removed. A browser walkthrough found the playground borderline
unusable, and it duplicated the official /playground plugin's
document-critique / diff-review templates.
In their place: scripts/render-artifact.mjs — a small, zero-dependency
renderer that turns a brief/plan/review .md into a self-contained,
design-system-styled, zero-network .html (frontmatter folded into a
<details> block). /trekbrief, /trekplan, and /trekreview call it on their
last step and print the file:// link; to annotate, run /playground
(document-critique) on the .md and paste the generated prompt back.
Resolves the v4.3.1-deferred findings as moot (their target files are
deleted). npm test green: 509 tests, 507 pass, 0 fail, 2 skipped.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bumps .claude-plugin/plugin.json 4.2.0 -> 4.3.0 (package.json, package-lock.json,
and the README badge were already at 4.3.0). Updates the v4.3.0 CHANGELOG entry with
the verified test count (711 pass / 0 fail / 2 skipped, 713 total), a "Re-review
remediation (Sesjon 13-18)" note covering the 11-finding cycle Waves 1-3 closed, and
a "Known issues — deferred to v4.3.1" subsection listing the 3 new findings the Sesjon
18 re-review surfaced in the remediation code (87069b35 SECURITY_INJECTION defense-in-
depth, 4cc3bfc9 PLAN_EXECUTE_DRIFT, c6c64a58 MISSING_TEST). Updates root CLAUDE.md
(voyage v4.0.0 -> v4.3.0, seven-command + playground), root README + plugin README
(test count, Known-limitations note, fixes the stale "trekplan@" install snippet ->
"voyage@"), root marketplace.json (voyage description), and plugin CLAUDE.md (Playground
paragraph). A plan-critic-reviewed Wave-4 remediation plan for the 3 deferred findings
is ready (.claude/plans/, gitignored).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 30 of v4.3 plan — Wave 7 Group D:
- voyage-playground-a11y.spec.mjs (3 tests): light + dark theme axe-core
scans (compared against baseline JSON, fails only on NEW or GROWN
violations) + pixel-diff smoke for SC1 (light + dark, 1280x900,
maxDiffPixelRatio=0.02).
- voyage-playground-network.spec.mjs (1 test): SC7 authoritative gate via
page.on('request') instrument — verifies zero external (http/https/ws)
requests during page load.
- playwright.config.mjs: snapshotPathTemplate routes to tests/e2e/snapshots/
(matches Step 31 expected_paths).
Baseline policy: HTML is FROZEN in Sesjon 6 (Wave 7 = verification, not
fix). Existing critical/serious WCAG violations (aria-hidden-focus +
color-contrast) recorded in tests/e2e/snapshots/a11y-baseline.json as
delta-baseline. Actual a11y fix deferred to v4.4.
Verify: npm run test:e2e -> 4 passed (3 a11y + 1 network).
Step 27 of v4.3 plan — adds Playwright 1.59.1 + axe-core/playwright 4.10.2
as devDependencies, npm script test:e2e, and playwright.config.mjs with
file:// baseURL pointing at playground/. Chromium browser binary installed
to ~/Library/Caches/ms-playwright/.
Trace: SC2 (WCAG verifikasjon krever browser-driver), research/04
Recommendation security; brief Constraint linje 64 (zero new deps i
playground/) er OK fordi Playwright er devDependency på voyage-plugin-rot,
ikke i playground/.
Step 20 of v4.3 playground plan. Document-level keydown handler:
- J = next annotation (next sorted-by-line draft, wraps)
- K = prev annotation (wraps)
- ] = toggle sidebar visibility
- Escape = clear active anchor + sidebar list selection
Active annotation gets yellow-tint (Step 18 setActiveAnchor) and the
matching gutter-badge receives focus + scrollIntoView. Aria-live region
announces position + target: "Annotering 3 av 7: <target> — <snippet>".
Skips input/textarea/select/contenteditable so playground never steals
keystrokes from form fields. Modifier keys (Ctrl/Alt/Meta) pass through
to browser shortcuts. Wired in init() after dashboard nav.
Trace: SC2 (WCAG AA keyboard), SC6, research/04 Dim 2 + Insight 5 +
Recommendation keyboard-navigation.
Step 19 of v4.3 playground plan. Sidebar now default aria-hidden=true
(translateX collapses panel, leaves 40px FAB rail). FAB toggle has
data-action=toggle-sidebar for keyboard binding (] in Step 20).
New annotation-list section in sidebar panel:
- filter radiogroup: Alle (default), Åpne (unresolved), Resolved
- voyage-jumplist ordered list with numbered badges matching the
gutter-badge ordering (sorted by line ASC)
- aria-live jumplist count: "X av N" (filtered/total)
- click list-item -> setActiveAnchor + scrollIntoView + data-active
renderAnnotationList wires into mountRender so list refreshes on every
render. Filter state (voyageFilterState) persists across renders within
the session.
Trace: SC6, research/04 Dim 1 (hidden-by-default) + Insight 1 +
Recommendation sidebar/navigation.
Step 18 of v4.3 playground plan. Replaces v4.2 Gesture 2 pencil-icon
hover-reveal with numbered circular badges in the left gutter (one per
anchored paragraph; ordering matches sidebar jumplist). 2-3px accent stripe
extends right from the badge into the gutter. Yellow-tint highlight
(rgba 255, 235, 59, 0.25 — Google Docs pattern) applies to the anchored
paragraph when an annotation is active. Body text never reflowed or
recolored. Gesture 1 (text-select adder) and Gesture 3 (page-level note)
remain for new annotation creation.
CSS uses --color-scope-voyage token for badge background and stripe.
JS adds injectAnchorBadges() + setActiveAnchor() and rewires mountRender.
Trace: SC1 + SC6, research/04 Insight 3 + Recommendation marker-design.
Step 17 of v4.3 playground plan. Pure function relocateAnchorsToBlockBoundaries
(text, anchors) detects atomic markdown blocks (fenced code, tables, deeply
nested lists) and relocates anchor-comment insertion to the line BEFORE block
opening rather than inside the block. Pure markdown-text -> markdown-text
transform (no DOM, no markdown-it dependency).
Companion test tests/integration/annotation-block-boundary.test.mjs extracts
the function via balanced-brace scan and exercises it through Function() —
7 unit tests covering empty anchors, outside-block stays, fenced-code
relocation, table relocation, deeply-nested list relocation, mixed
inside/outside, and shape contract.
Trace: SC6, research/04 Dim 3 (Notion block-level fallback), plan-critic
major #6 (DOM-vs-no-DOM contradiction resolved via pure-function design).
Step 15 (v4.3 Sesjon 3 — Wave 3) — wires the dashboard fleet-tiles
to a drill-down view with breadcrumb update + back-to-dashboard
navigation + browser back/forward restoration via popstate.
renderArtifactDetail(artifactKey) renders the chosen artifact into
the #voyage-detail slot using renderPageShell + renderArtifact:
- brief / plan / review → markdown render
- progress → JSON pretty-print in <pre>
- research → list of all research-briefs
- missing entry → "Artifact mangler" placeholder
Click delegation on .fleet-tile[data-artifact] triggers detail render
+ pushDetailURL (?artifact=<key>); data-action=back-to-dashboard
returns to the dashboard view + pushDashboardURL. Topbar breadcrumb
gets a third segment for detail views.
URL-parameter deep-linking: at page-load, ?project=<basePath>
surfaces a guide-panel hint explaining that webkitdirectory requires
a user-gesture; the hint links to the same Last prosjektmappe button
that wireProjectLoader already exposes. popstate handler restores
the view-state on browser back/forward (no-op when no project loaded).
Test additions (4): renderArtifactDetail function, URLSearchParams
presence, data-action=back-to-dashboard attribute, popstate listener.
Step 14 (v4.3 Sesjon 3 — Wave 3) — adds renderDashboard pipeline that
turns a ProjectArtifacts struct (produced by loadProjectDirectory in
Step 13) into a fleet-grid of fleet-tiles, one per artifact-type
(brief / plan / review / research / progress).
Status vocabulary: complete, in-progress, blocked, missing, stale
Severity mapping: missing → critical, blocked → high, in-progress
+ stale → medium, complete → low. Severity drives DS color tokens
via [data-severity] attribute selectors.
When loadProjectDirectory completes, dashboard takes over the main
stage (paste-flow elements hidden); topbar updates with project
breadcrumb. Step 13's pipeline already calls renderDashboard via
graceful-fallback, so wiring is automatic.
Test additions (4): fleet-grid + fleet-tile presence, renderDashboard
function declaration, status vocabulary completeness.
New docs/three-tier-context.md (110 lines) documents Voyage's position as
Tier 1 in a three-tier ecosystem with upstream consumers (app-creator,
app-factory; both currently in private incubation). The brief identifies
upstream-consumed contracts (brief.md format, /trekplan CLI, handover
schemas), prescribes stability principles, and explicitly preserves
Voyage's runtime agnosticism — no imports, no detection, no special cases.
Awareness without coupling.
CLAUDE.md § Architecture: 2-line callout pointing to the brief, following
the existing "opt-in upstream architect plugin" precedent.
No Voyage behavior change. Documentation-only.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v4.1.0 → v4.2.0. Two-file change per Step 14 manifest (package.json +
.claude-plugin/plugin.json). Description tagline expanded from
"brief, research, plan, execute, review, continue" to include "revise"
and "+ first marketplace playground".
Out-of-scope under Step 14 forbidden_paths (left at 4.1.0 intentionally):
- lib/exporters/otlp-format.mjs (VOYAGE_SCOPE_VERSION constant)
- hooks/scripts/otel-export.mjs (User-Agent header)
These constants are touched on the next bump where the constants directory
is in scope; keeping them stale for one release is acceptable since
otel/otlp telemetry is opt-in and the version field is informational.
Verification:
- node -e "import('./package.json',{with:{type:'json'}}).then(m=>console.log(m.default.version))" → 4.2.0
- jq -r .version .claude-plugin/plugin.json → 4.2.0
- npm test: 610 pass / 0 fail / 2 skipped (Docker)
- SC11 pipeline-self-eat gate: render-artifact.mjs renders own brief.md + plan.md to non-empty HTML
Pure module computing deterministic 16-char SHA-256 prefix for annotation set.
Canonicalization: sort by id, fixed field order (id|target_artifact|target_anchor|intent|comment|timestamp), \n-join, sha256, take first 16 hex.
Brief SC4 specifies sha256-prefix; research-05 said sha1 — brief wins per Hard Rule "Brief-driven".
6 tests pass: empty digest, order-independence, intent-sensitivity, format invariant, golden value, undefined-vs-empty equivalence.
3 new test files, 24 cases (8 per validator):
- baseline (no annotation fields) still valid
- revision: 0 / revision: 5 accepted
- source_annotations list-of-dict accepted
- annotation_digest string accepted
- revision_reason accepted
- all 4 fields together accepted
- unrecognized future field tolerated (forward-compat policy)
Pin against future strict-key refactors. No production code change — pure regression pin.
Found by simulert v4.1 smoke — doc/code-drift in v4.1 ship:
docs/observability.md claims "Cloud metadata endpoints (169.254.169.254)
are permanently blocked" but the validator allowed them when
VOYAGE_OTEL_ALLOW_PRIVATE=1. Cloud metadata services expose IAM
credentials and instance secrets — operator-trust extended to
RFC-1918 home-lab access does NOT extend here, because the
blast-radius (cloud-account compromise) is qualitatively different.
New HARD_BLOCKED_HOSTS set checked BEFORE the link-local opt-in path:
- 169.254.169.254 (AWS / GCP / Azure metadata)
- 100.100.100.200 (AliCloud metadata)
- metadata.google.internal
- metadata.azure.com
New error code ENDPOINT_HARD_BLOCKED. Existing test for
ENDPOINT_LINK_LOCAL_REJECTED on 169.254.169.254 updated to assert
the new code; 3 new tests verify the hard-block holds even with
VOYAGE_OTEL_ALLOW_PRIVATE=1, plus AliCloud + GCP-hostname coverage.
Tests: 487 → 490 pass + 2 skipped.
Step 21 of v4.1 — extend-in-place per Plan-critic Blocker 2 split:
commands-only assertions land here; CLAUDE.md / README.md pinning is
deferred to Step 22 (post-write).
Changes:
1. CLAUDE.md command coverage loop now spans all SIX pipeline commands
(added /trekcontinue — was 5 of 6 pre-v4.1 per HIGH risk-assessor).
2. New: every pipeline command-file (trekbrief/research/plan/execute/
review/continue.md) must document the --profile flag.
3. New: forbidden-alias check — no command-file may use the legacy
names model_per_phase / phase_to_model / profile_phase_models.
Canonical name is "phase_models" (locked in brief).
4. New: at least one command-file must mention "phase_models" by name
so the regression detects total removal of the canonical-name
reference.
Tests: 482 pass + 2 skipped (Docker not installed).
Step 20 of v4.1 — implements drift detection in plan-validator.mjs per
brief Assumptions block 7: "Mismatch (e.g. korrupt manuell endring)
emitterer MANIFEST_PROFILE_DRIFT-warning fra plan-validator i --strict-modus."
Logic (after validateAllManifests in validatePlanContent):
1. Strict-mode only — soft mode never emits drift warnings.
2. Plan frontmatter must declare 'profile: <name>' to establish baseline.
3. For each step manifest, if profile_used is set AND differs from plan
profile, emit warning (NOT error) with code MANIFEST_PROFILE_DRIFT
and location 'step N: profile_used = X, plan profile = Y'.
Forward-compat preserved: drift is a warning, plan remains valid:true.
Operators see the drift in --strict mode without parsing breaking.
New files:
tests/validators/plan-validator-profile-drift.test.mjs — 4 tests
tests/fixtures/plan-profile-drift.md — drift fixture
Tests verify:
1. drift detected in strict mode → MANIFEST_PROFILE_DRIFT in warnings
2. drift NOT detected in soft mode → strict gate honored
3. matching profile → no drift warning
4. no plan-level profile → drift detection silent (no baseline)
Tests: 479 pass + 2 skipped (Docker not installed).
Step 19 of v4.1 — extend-in-place per brief Preferences. Three new test
blocks asserting forward-compat:
1. Legacy fixtures (plan-run-A.md, plan-run-B.md) — without profile_used
in frontmatter — still parse cleanly after manifest-yaml.mjs added
OPTIONAL_STRING_KEYS.
2. New fixtures (profile-plan-run-{economy,premium}-*.md) — with
profile_used in frontmatter — parse cleanly with correct profile
value extracted.
3. Real v4.1 plan (.claude/projects/2026-05-08-voyage-v4.1-modellprofiler/plan.md)
validates strict, emits no PLAN_VERSION_MISMATCH warning.
Tests: 475 pass + 2 skipped (Docker not installed).
Step 17 of v4.1 — escalate-handler invoked. Live LLM-budget ($60-120 for
4 plan-runs á /trekplan --profile {economy,premium} on
examples/01-add-verbose-flag/brief.md) was not authorized for the
v4.1-execute-4b session.
Per Step 17 escalate-fallback (and NEXT-SESSION-PROMPT.local.md
fallback-strategy): document economy-Plan as parked, use balanced as
low-threshold profile, defer empirical calibration to v4.2.
Files:
tests/synthetic/profile-plan-run-economy-1.md — 30 steps, parked-synthetic
tests/synthetic/profile-plan-run-economy-2.md — 30 steps, parked-synthetic
tests/synthetic/profile-plan-run-premium-1.md — 40 steps, parked-synthetic
tests/synthetic/profile-plan-run-premium-2.md — 40 steps, parked-synthetic
tests/synthetic/profile-jaccard-calibration.md — threshold 0.55 pinned per
research/02 conservative starting value
Replacement procedure documented in calibration.md "How to replace"
section. Trigger conditions for empirical re-run:
1. Cross-tier smoke-test (Step 18) flips red on a real run
2. v4.2 LLM-budget approval
3. New profile tier added
Step 16 of v4.1 — first test in tests/integration/, establishes the
skip-on-missing-tool pattern voyage will reuse for environment-dependent
integration tests. Two tests:
1. compose config parses and contains expected services
2. compose config pins required image versions
Both skip cleanly when 'docker info' fails (no Docker installed). On a
machine with Docker, both tests run docker compose config and assert the
4 services + 3 version pins are present.
Tests: 468 pass + 2 skipped (Docker not installed in dev env).
Step 14 follow-up — VOYAGE_OTEL_ENDPOINT (not VOYAGE_OTLP_ENDPOINT) per
hooks/scripts/otel-export.mjs and lib/exporters/endpoint-validator.mjs.
Adds VOYAGE_OTEL_ALLOW_PRIVATE=1 for localhost since 127.0.0.1 is
loopback and rejected by default.
Step 13 of v4.1 — adds Stop hook entry pointing to
hooks/scripts/otel-export.mjs (added in Step 12 / commit c5fb745).
Mounts the orchestrator on Claude Code's Stop event so OTel/Prometheus
export runs at session-end when VOYAGE_EXPORT_MODE is set.
HIGH-risk-mitigering: tests/hooks/hooks-json-stop-wired.test.mjs
asserter at Stop-key finnes, refererer otel-export.mjs, bruker
\${CLAUDE_PLUGIN_ROOT}-substitusjon, og har type:command.
Tests: 464 → 468 (4 new). All green.
Step 4 av v4.1-execute (Wave 2, Session 2).
Tre innebygde modellprofiler matcher brief profile-assignment matrix:
- economy: alle 6 phase_models = sonnet, parallel 2-3, external_research=false,
iter-cap=1. ~$1-3 per pipeline-sesjon.
- balanced: brief/research/execute/continue=sonnet, plan=opus, review=opus,
parallel 4-6, external_research=false (operator-override deferred
til v4.2 per NEXT-SESSION-PROMPT scope-grenser), iter-cap=2.
~$5-15 per pipeline-sesjon.
- premium: alle 6 phase_models = opus, parallel 6-8, external_research=true,
iter-cap=3. ~$20-60 per pipeline-sesjon (default, samme som v4.0).
Bruker list-of-dicts for phase_models (parser-kompatibel mot
lib/util/frontmatter.mjs:79-105). Verifisert: alle 3 filer parses uten feil
og returnerer array med 6 entries (phase+model per entry).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 3 av v4.1-execute (Wave 1, Session 1).
Legg ny eksportert const OPTIONAL_STRING_KEYS = ['profile_used'] parallel
til eksisterende OPTIONAL_KEYS. Utvid parseManifest med ny dispatch-loop
etter OPTIONAL_BOOLEAN_KEYS. Returnerer MANIFEST_OPTIONAL_TYPE hvis
profile_used finnes men ikke er string.
Forskjell fra OPTIONAL_BOOLEAN_KEYS: absence == not-present (NOT defaulted
til false, unlike boolean). Downstream-konsumenter kan dermed skille mellom
unset og empty-string.
Tester (5 nye, baseline 372 → 377):
- OPTIONAL_STRING_KEYS export drift-pin
- profile_used: economy parses successfully (SC #10 forward-compat)
- profile_used: numeric rejected
- absence: field NOT in parsed (string-key semantics)
- profile_used + skip_commit_check + memory_write co-existence
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 2 av v4.1-execute (Wave 1, Session 1).
Legg --profile i valued-arrayen for alle 6 voyage-kommandoer (trekbrief,
trekresearch, trekplan, trekexecute, trekreview, trekcontinue). Mønster
identisk med eksisterende --project/--brief valued-handling. Ingen endring
til parseArgs-logikk — utvider kun schema.
Tester (11 nye, baseline 361 → 372):
- 6 happy-path-tests (én per kommando)
- ARG_MISSING_VALUE for --profile uten verdi
- --profile + --quick kombo
- --profile + --gates edge-case (--gates parses inline, ikke i FLAG_SCHEMA)
- --profile + --project kombo
- trekcontinue --profile (validerer at tomt valued[] nå er utvidet)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Slash-command-parseren matcher !`...` selv inne i ```bash markdown-fences,
som gjorde at Phase 8 NEXT-SESSION-PROMPT-template eksekverte ved skill-load
med literale {project_dir}/{next_session_brief_path}/{next_session_label}/
{status}-strenger som argv. Det ga ENOENT på .session-state.local.json.tmp
og blokkerte hele /trekexecute skill-loadet.
Fjern !`...`-wrapperen og merk blokken eksplisitt som runtime-template.
Pattern matcher nå konvensjonen brukt andre steder i samme fil
(linje 202-208) der ```bash brukes for orkestrator-instruksjon uten
auto-eksekvering.
Wave 0 av v4.1-execute — pre-requisite for å låse opp /trekexecute
skill-invokasjon mot .claude/projects/2026-05-08-voyage-v4.1-modellprofiler/
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- renderCost: FIX — KEY_STATS_CONFIG['cost-distribution'] og inferVerdict('cost-distribution') viste "[object Object]" / returnerte alltid 'go' fordi parser-output har p50/p90 = {monthly, yearly}-objekter, ikke tall. Begge ekstraherer nå .monthly med fallback for flate fixtures.
- renderLicense: PASS — ingen kode-endring. Capability-matrix-status korrekt utledet (met/partial/missing) via parseCapabilityMatrix. Visuell QA gjenstår i sesjon 6.
- renderCompare: FIX — firstWord-heuristikk feilet når begge subjekter delte førsteord (f.eks. "Azure AI Foundry" vs "Azure ML + AKS" ga begge fw='azure', kollapset vinn-attribusjon). Erstattet med distinctive-token-matching: full-subject-substring først, deretter ord som er unike for ett subjekt. Diff-cell coloring oppdatert til samme matchSubject()-helper.
- renderUtredning: MINOR — droppet misvisende role="tab"/role="tablist" siden vi rendrer anchor-jump-TOC (alle paneler synlige), ikke ekte tab-toggle. Beholdt aria-current="true" for visuell aktiv-markør (DS-CSS hekter på den). Ekte tab-toggle defer til v1.15.0.
validate-plugin.sh: 219 PASS uendret
run-e2e.sh --playground: 272 PASS uendret
test-playground-migrations.sh: 7 PASS uendret
Refs V1.14.0-AUDIT.local.md sub-batch E (sesjon 5b).
- renderMigrate: <section class="phase-detail"> per fase erstattet med
<div class="expansion">-list (DS-supplement). Default-collapsed, klikkbar
header (Fase N: navn + duration), body = milepaeler + suksesskriterier.
Behold cycle-ribbon + mat-ladder + phases-summary-tabell + risks-tabell.
- renderPoc: speil renderMigrate. Traffic-light flyttet inn i expansion-body
(ul.traffic-list per fase med status fra fasens stepState).
- renderSummary: KEY_STATS_CONFIG['verdict'] patchet — parseTable returnerer
rader med header-baserte nokler (Metric/Verdi/Mal) ikke canonical
{label,value,unit}. Ny logikk bruker metrics_headers + heuristikk-match for
label/value/unit-kolonner, med fallback til canonical felt.
Backward-kompatibelt.
- renderAdr: verifisert PASS — ingen endring (.adr-meta + critique-cards
rendrer pent uten ekstra arbeid).
- ACTIONS['phase-expand']: ny handler registrert som alias for
requirement-expand (samme toggle-monster, eget action-navn for senere
divergens).
- Lokal CSS: hele .phase-detail-blokken (~10 linjer) slettet. Defensive-
kommentar oppsummert til 5-linjers historie-notat.
- Style-blokk effektive linjer: 147 (var 178 etter sesjon 4).
Smoke-tester:
- validate-plugin.sh: 219 PASS
- run-e2e.sh --playground: 272 PASS (202 statisk + 70 parser)
- test-playground-migrations.sh: 7 PASS
Refs V1.14.0-AUDIT.local.md sub-batch D.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bugfixes (B-DS-1, B-DS-2, B-DS-3 fra V1.14.0-AUDIT):
- .kanban-card__name (tier3-supplement): word-break: break-all → break-word
+ overflow-wrap: anywhere. Knekket midt i ord ("Tekn isk dokumen tasjon").
- .expansion__title-main, .expansion__title-sub (tier3-supplement): legg
til display: block. Begge er <span> som flyter inline by default —
resultat: "dokumentertKilde: Art. 9" på samme linje.
- .matrix__bubble (components.css): legg til cursor: pointer, hover-scale
og focus-visible. Antas rendret som <button> i konsumenter — gir
visuell + keyboard-fokus-feedback.
Re-syncet til plugins/ms-ai-architect/playground/vendor/ via
sync-design-system.mjs. Slettet 3 lokal-overrides i playground HTML
(matrix-bubble, expansion-title, kanban-card-name). Style-blokk:
191 → 182 linjer.
Smoke-tester: validate-plugin 219 PASS, e2e --playground 272 PASS,
statisk struktur 202 PASS.
Andre plugins (llm-security, voyage, okr, config-audit) påvirkes IKKE
— beholder gammel vendored DS inntil de selv re-syncer.
Sesjon 2 av 6 i v1.14.0 root-cause-multi-sesjons-løp.
ms-ai-architect plugin-versjon ikke bumpet (sesjon 6 ship-er v1.14.0).
[skip-docs]: docs oppdateres i sesjon 6 ved v1.14.0 plugin-ship.
Refs V1.14.0-AUDIT.local.md sub-batch 1 + 4.
10 visuelle bugs identifisert av maintainer i nettleser etter v1.13.0
shipped. Patch-pakke som adresserer mismatch mellom playground-rendrere
og DS-konvensjoner som v1.13.0 ikke fanget opp.
- B7: classify "Forpliktelser" indent — lokal .report-meta CSS-reset
(DL grid max-content+1fr, h4 uppercase+bold, ul padding-left space-5)
for konsistent venstre-justering uavhengig av nestelse.
- B8a: requirement-expand handler missing — renderRequirements markup
hadde data-action="requirement-expand" på hver expansion__head, men
ingen ACTIONS-handler var registrert. R-01..R-09-radene i AI Act-krav
var derfor ikke klikkbare. Fix: register ACTIONS['requirement-expand'].
- B8b: expansion title-main + title-sub kjørte sammen — DS' spans var
inline. Lokal display:block så de stables vertikalt.
- B10: kanban-card tegnknekking — DS' word-break:break-all knekker midt
i ord. Lokal override med break-word.
- B11: DPIA matrix-bobler ikke responderer — v1.13.0 click-handler
matchet kun mot første-kolonne i Trusler-tabellen. DPIA-fixturer har
full-tekst label i matrix_cells men T-001-id i threats-tabellen, så
ingen match. Utvid til (Pass 1) exact first-cell + (Pass 2) substring-
match mot enhver celle med 40-tegn-prefiks-toleranse.
- B12, B13, B15: defensive layout for top-risks/suppressed-panel/
phase-detail/aiact-timeline — eksplisitt display:block; clear:both;
width:100% mot grid-leak fra small-multiples/kanban-board/mat-ladder.
- B14: Migrate "skal vel være tabell" — phases-summary-tabell over
phase-detail-seksjonene (Fase, Varighet, Milepæler-count, Suksesskriterier-
count, Status). Samme tabell speilet i renderPoc for konsistens.
Verifisering:
- 23/23 smoke-test PASS (B7-B15 + 5 v1.13.0-regresjoner)
- 271/271 playground E2E PASS
- 219 plugin-validering PASS
- 42 KB-update PASS
Versjon: v1.13.0 -> v1.13.1 (plugin.json, README badge, README
version-history, CHANGELOG, ROADMAP, TODO, plugin CLAUDE.md
playground-header, root README plugin-list, root CLAUDE.md plugin-list).
Berører kun lokal CSS i <style>-blokk, ACTIONS-handler-registrering,
click-handler-utvidelse, og to renderer-funksjoner. Ingen modifisering
av playground/vendor/. Vendored DS' .kanban-card__name { word-break:
break-all } står — overstyres lokalt.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fix-pakke som speiler llm-security v7.6.1 (commit f9b555a). Samme klasse
visuelle bugs identifisert via parallell DS-analyse av playground-rendrere.
- B1: renderFindingsBlock + renderRequirements bytter <div class="findings">
outer (DS grid 360px+1fr klemte indre struktur til 360px-kolonne, lot
1fr-detail-panel-kolonnen stå tom) til <section class="report-meta">.
BEM-strukturen findings__list > findings__group > findings__items uendret.
- B2: lokal .report-table CSS for 6+ rapporter (Trusler, Kostnadsoversikt,
TCO, Risiko-tabell, Key Metrics) som manglet styling — DS implementerer
ikke klassen. Speilet lokal styling fra llm-security v7.6.1.
- B3: ROS-matrise-bobler bytter <span> til <button type="button"
data-threat-id="..." aria-label="..."> med document-level click-handler
som scroller smooth til tilsvarende rad i Trusler-tabellen og
highlighter raden i 1.6 sek. Lokal CSS for cursor:pointer, hover
scale(1.15), :focus-visible outline.
- B4: renderRadarSvg bumpet 300x300 til 380x380, R fra 100 til 125,
label-offset fra R+25 til R+28, dynamisk text-anchor basert på
horisontal-posisjon for å unngå at bottom-labels overlapper hverandre
ved 6+ akser (typisk for ROS-rapport med 7 risiko-dimensjoner).
- B5: lokal .recommendation-card__body { overflow-wrap: anywhere;
word-break: break-word } for å forhindre at lange single-line tekster
(URLer, owner-tags, dato) skubber innhold ut av viewport i grid-cellen.
tests/test-playground-v3.sh: DS-klasse-assertion oppdatert fra .findings
til .findings__list (BEM-list er fortsatt i bruk; outer grid-container
bevisst fjernet i B1).
Verifisering:
- 22/22 smoke-test PASS (B1-B5 grep-asserts)
- 271/271 playground E2E PASS (201 statisk-struktur + 70 parser-fixtures)
- 219 plugin-validering PASS
- 42 KB-update test PASS
Versjon: v1.12.0 -> v1.13.0 (plugin.json, README badge, README
version-history, CHANGELOG, ROADMAP, TODO, plugin CLAUDE.md
playground-header, root README plugin-list, root CLAUDE.md plugin-list).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CLAUDE.md OBLIGATORISK-regel: enhver feature-endring som pusher til
Forgejo MÅ oppdatere alle tre doc-nivåer i SAMME commit eller umiddelbart
etter. v7.6.1-fix-commit (f9b555a) bumpet kun versjons-badgen — denne
oppfølgings-commit-en lukker doc-gapet.
- plugins/llm-security/README.md: ny [7.6.1] history-tabell-rad
- plugins/llm-security/CLAUDE.md: header bumpet v7.6.0 → v7.6.1 +
ny v7.6.1-blurb (alle 6 fix-detaljer)
- README.md (rot): llm-security versjons-rad bumpet v7.6.0 → v7.6.1 +
v7.6.1 history-bullet over v7.6.0-bullet
Ingen kodeendringer.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Seks bugs fanget av maintainer ved manuell verifisering i nettleser etter
v7.6.0-release. Alle skyldes mismatch mellom DS-klasser og hvordan
playground-rendrere brukte dem, eller manglende DS-implementasjoner av
klasser playground-rendrere antok eksisterte.
Fixes:
- renderFindingsBlock brukte .findings outer-class som DS har som
2-kolonners grid (360px list + 1fr detail-panel) — headeren havnet
i venstre kolonne, items i høyre, brutt layout i alle 18 rapporter
med findings. Erstattet med .report-meta + h4 + findings__list >
findings__group + findings__group-header + findings__items
(korrekt DS-mønster, kun list-delen).
- .report-table manglet helt i DS men brukes i 7+ rendrere (OWASP,
Supply chain, Scanner Risk Matrix, Plugin-meta, Permission-matrise,
Live-meter, Siste runs, Godkjenninger, Mitigation roadmap). Lagt
lokal CSS-implementasjon i playground-HTML style-blokk: border-
collapse, zebra-hover, header-styling. Komplementerer DS-tokens
uten å modifisere vendor.
- renderPreDeploy traffic-lights brukte .sm-card__grade som er fast
28x28 px (én A-F-bokstav) — kuttet PASS til AS og PASS-WITH-NOTES
til PASS-WITH-... i alle traffic-light-cards. Erstattet med
bredde-tilpasset status-pill via inline styling (severity-soft +
on tokens).
- Threat-model matrix-bobler ikke klikkbare. Erstattet span med
button type=button data-threat-id + aria-label. Click-handler
scroller til tilsvarende rad i Trusler-tabellen og fremhever
den i 1.6 sek.
- Radar-labels overlappet ved 6+ akser fordi alle brukte
text-anchor=middle. Økt SVG-størrelse 280 → 380, radius 105 → 125.
Bytter text-anchor fra middle til start/end basert på horisontal-
posisjon.
- recommendation-card__body tekstoverflyt på lange single-line tekster
(vilkår, owner-tags, dato). Lagt overflow-wrap: anywhere;
word-break: break-word i lokal style-blokk.
Verifisering:
- 4/4 fix-spesifikke smoke-tester passerer
- 18/18 renderere produserer fortsatt komplett HTML mot
dft-komplett-demo (regresjons-test)
- Filendring playground.html 10677 → 10753 linjer (+76 netto)
Versjonsbump v7.6.0 → v7.6.1 (patch — bugfix-only, ingen scanner- eller
hook-atferdsendringer):
- plugins/llm-security/.claude-plugin/plugin.json
- plugins/llm-security/package.json
- plugins/llm-security/README.md (badge)
- plugins/llm-security/CHANGELOG.md ([7.6.1] entry)
- plugins/llm-security/playground/llm-security-playground.html (footer)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fase 3: badge--scope-security som identitets-chip på alle prosjekt- og
rapport-cards (signal "denne er llm-security"). Plassert i topbar
(app-header__brand), fleet-tile-meta, command-subcard card__head,
catalog-card card__head, og onboarding form-progress autosave-blokk.
verdict-pill-lg (DS Tier 2 + Tier 3 supplement) erstatter custom
verdict-pill — nå med __verdict + valgfri __sub-struktur. renderPageShell
aksepterer opts.verdictSub som videresendes til renderVerdictPill.
Fase 4: Onboarding wizard bruker DS Tier 3 form-progress + fp-step med
data-state="done|in-progress|pending" og __num/__name — erstatter
playground-ens lokale form-progress__step-implementasjon. Steps wrappet
i form-progress__steps-container per DS-mønster. Aside har nå
form-progress__autosave-blokk med scope-badge og fullført-counter.
CSS-blokken som tidligere overstyrte DS for .verdict-pill og
.form-progress__heading/__step/__step-marker/--done er fjernet —
DS Tier 3 supplement vinner cascade-en.
Verifisering: verdict-pill-lg=20 (>=12), badge--scope-security=5 (>=5),
fp-step=12 (>=5), .verdict-pill\b i style-blokk=0, form-progress__step
strict singular=0 (3 naive treff er DS-canonical __steps-plural).
14 window-globaler intakt. JS parse OK, demo-state JSON OK,
HTML-balansert (3/3 script, 1/1 style).
Sesjon 2 av 5 i v7.6.0-pipeline. Foundation (sesjon 1) ga 9ef0c48.
Neste: Tier 3 spesialkomponenter del 1 (fase 5a-d) i sesjon 3.
Docs (plugin README/CLAUDE/rot-README/CHANGELOG) oppdateres i Sesjon 5
per master-plan; derav [skip-docs] her.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Slett ~50 duplikat-CSS-deklarasjoner fra playground-ens <style>-blokk
som overstyrte DS Tier 3 supplement uten gevinst (.app-shell, .tab-list,
.fleet-tile*, .form-progress, .eyebrow, .page__*, .key-stat*, .field-*,
.expansion (ekskl. body), .stack-*, .card*, .tracks*, .checkbox-row).
JS-fix: 4 modifier-strenger oppdatert fra forkortede ('crit', 'med')
til DS-konsistente fulle navn ('critical', 'medium') i renderKeyStatsGrid-data.
Konsekvens: DS vinner cascade-en, eliminerer subtile visuelle drift
mellom playground og referanse-scenarioer.
Page-shell harmonisering: alle 4 overflater (onboarding, home, catalog,
project) bruker nå DS page__header-klyngen via renderPageShell. Onboarding
konvertert fra custom <header class="onboarding-header"> til samme mønster.
renderPageShell utvidet med opts.meta (page__meta) og opts.hero
(page__header--hero modifier). Hero-mønster på home med
clamp(36px, 5vw, 56px) og letter-spacing -0.025em.
Behold til Sesjon 2: .verdict-pill (erstattes av verdict-pill-lg fase 3),
.form-progress__step* (erstattes av fp-step fase 4), .multi-select
(bevisst input-box-look), .expansion__body (markup-mismatch m/ DS-anim).
Forberedelse til v7.6.0 — Tier 3 referanse-case.
Mirror av ms-ai-architect playground-arkitektur, tilpasset llm-security:
- 4 overflater (onboarding/home/catalog/project) med surface-router
- IndexedDB persistens (llm-security-playground-v1) + localStorage fallback
- Theme-bootstrap med FOUC-prevention og localStorage-persist
- 20 kommandoer i CATALOG (5 kategorier: discover/posture/findings-ops/
hardening/adversarial/mcp-ops) med full input_fields + report_archetype
- 5-gruppers onboarding (organisasjon/scope/profil/plattform/compliance)
med form-progress sidebar
- Home: 3 tracks + fleet-grid prosjektliste + tom-state med demo-data
- Katalog: ekspanderbare grupper med live-søk og forhåndsvisning
- Prosjekt-stub: 4 screen-tabs + 6 kategori-tabs + per-kommando
skjema/paste-import/rapport-soner
- Demo-state: Direktoratet for digital tjenesteutvikling med 2 prosjekter
- Eksport/import (JSON envelope), action-handlers (35), modal-portal
PARSERS + RENDERERS er tomme routing-objekter — fylles i Fase 2 (10 høy-prio
kommandoer) og Fase 3 (resterende 10). Paste-import viser «parser ikke
implementert»-guide-panel for kommandoer uten parser, og lagrer rå markdown
i state for fremtidig parsing.
Vendor: 27 filer synket fra shared/playground-design-system/
(MANIFEST.json sjekksum-låst, source_commit 487f7ae).
Verifisert: node --check OK (2737 linjer, 113733 char inline JS),
HTML-tag-balanse OK. Manuell smoke-test gjenstår.
Docs (plugin README, CLAUDE.md, rot-README) bumpes ved Fase 3-fullføring
sammen med plugin.json v7.5.0. Derfor [skip-docs] her.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The ultra-cc-architect plugin was removed from the marketplace; voyage's
architecture-discovery contract still pointed at it by name. Replaced
verbatim references with plugin-agnostic phrasing ("upstream architect
producer") in code comments and user-facing warning messages.
CHANGELOG entries and config-audit v5.0.0 snapshots intentionally
preserved as historical records.
- Add v7.4.0 line covering 9 runnable examples and 3 new e2e test suites
- Update test count 1768 → 1822 in stat footer
- Add "9 runnable examples" to stat footer
Bumps from v7.3.1 to v7.4.0. Purely additive surface — no scanner
or hook behavior changes, no breaking changes.
Headline content (already merged on main since v7.3.1):
- examples/ utvidelse — seven runnable demonstration walkthroughs
shipped over three sessions (sesjon 1 pre-existing
prompt-injection-showcase + lethal-trifecta-walkthrough,
mcp-rug-pull, supply-chain-attack, poisoned-claude-md,
bash-evasion-gallery, toxic-agent-demo, pre-compact-poisoning).
Each is self-contained: README + fixture + run-script +
expected-findings testable contract. State-isolation pattern
(PID-suffixed JSONL or env-overrides like
LLM_SECURITY_MCP_CACHE_FILE) keeps the user's real cache and
/tmp state untouched.
- tests/e2e/ — three new suites totalling 45 tests:
attack-chain.test.mjs (17), multi-session.test.mjs (9),
scan-pipeline.test.mjs (19). Test count 1777 to 1822. These
exercise the framework as a coordinated system rather than as
isolated unit-tests.
Version sync (8 files):
- package.json
- .claude-plugin/plugin.json
- CLAUDE.md (header)
- README.md (badge + Recent versions tabellen new row)
- CHANGELOG.md (Unreleased to [7.4.0] - 2026-05-05 with summary)
- scanners/dashboard-aggregator.mjs VERSION constant
- scanners/ide-extension-scanner.mjs VERSION constant
- scanners/posture-scanner.mjs VERSION constant
Stabilization-stance unchanged. v8.0.0 remains the planned
deprecation-cleanup release. v7.x continues as the stable line.
Tests: 1822/1822 grønne lokalt etter bump.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Runnable demonstration of hooks/scripts/pre-compact-scan.mjs (the
only PreCompact hook in the plugin) detecting both a CRITICAL
injection pattern and an AWS-shaped credential inside a synthetic
JSONL transcript, exercised across all three values of
LLM_SECURITY_PRECOMPACT_MODE plus a benign-transcript control case
in block mode that proves the gate is not a brick wall.
The transcript is generated at runtime in a per-invocation tempdir
under os.tmpdir() and the directory is removed in a finally block,
so the user's real ~/.claude/projects/.../transcripts/ are never
touched. The AWS-shaped key uses the same 'AK' + 'IA' + ...
fragmentation idiom as tests/e2e/attack-chain.test.mjs so this
source contains no literal credentials and pre-edit-secrets does
not block writes during development.
Nine independent assertions (9/9 must pass):
- block mode + poisoned: exit 2, decision=block JSON, reason text
covers both injection and AWS labels (3 assertions)
- warn mode + poisoned: exit 0, systemMessage JSON, no decision
field (2 assertions)
- off mode + poisoned: exit 0, no JSON on stdout (2 assertions)
- block mode + benign: exit 0, no decision=block JSON (2 assertions)
OWASP / framework mapping: LLM01, LLM02, ASI01, AT-1, AT-3.
Docs updated: plugin README "Other runnable examples", plugin
CLAUDE.md "Examples" tabellen, CHANGELOG [Unreleased] Added.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Single-component lethal-trifecta walkthrough that drives
scanners/toxic-flow-analyzer.mjs against a deliberately
misconfigured fixture plugin. The fixture agent declares
tools: [Bash, Read, WebFetch], which alone covers all three
trifecta legs (input surface + data access + exfil sink). No
hooks/hooks.json is shipped, so TFA's mitigation logic finds
no active guards and emits a CRITICAL "Lethal trifecta:"
finding without downgrade.
Plugin marker is plugin.fixture.json (recognised by isPlugin())
rather than .claude-plugin/plugin.json — the latter is blocked
by the plugin's own pre-write-pathguard hook, and
plugin.fixture.json exists in isPlugin() specifically so
example fixtures can self-mark without touching guarded paths.
Three independent assertions (3/3 must pass): direct trifecta
present and CRITICAL; finding mentions the exfil-helper
component; description confirms "no hook guards detected"
(proves the mitigation path stayed inactive). expected-findings.md
documents the contract.
OWASP / framework mapping: ASI01, ASI02, ASI05, LLM01, LLM02, LLM06.
Docs updated: plugin README "Other runnable examples", plugin
CLAUDE.md "Examples" tabellen, CHANGELOG [Unreleased] Added.
[skip-docs] is appropriate because examples don't change what
the plugin "synes å dekke utad" — marketplace root README is
unaffected.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three new self-contained, runnable threat demonstrations under
examples/, continuing the batch started in 583a78c. Each example
has README.md + run-*.mjs + expected-findings.md and uses
state-isolation discipline so the user's real cache/state files
are never polluted.
- examples/supply-chain-attack/ — two-layer demonstration:
pre-install-supply-chain (PreToolUse) blocks compromised
event-stream version 3.3.6 and emits a scope-hop advisory for
the @evilcorp scope; dep-auditor (DEP scanner, offline) flags
5 typosquat dependencies plus a curl-piped install-script
vector in the fixture package.json. Maps to LLM03/LLM05/ASI04.
- examples/poisoned-claude-md/ — all 6 memory-poisoning detectors
fire on a deliberately poisoned CLAUDE.md plus a fixture
agent file under .claude/agents (E15/v7.2.0 surface):
detectInjection, detectShellCommands, detectSuspiciousUrls,
detectCredentialPaths, detectPermissionExpansion,
detectEncodedPayloads. No agent runtime needed — scanner
imported directly. Maps to LLM01/LLM06/ASI04.
- examples/bash-evasion-gallery/ — one disguised variant per
T1 through T9 evasion technique fed through pre-bash-destructive,
verified BLOCK after bash-normalize strips the evasion. T8
base64-pipe-shell uses its own BLOCK_RULE. The canonical
destructive form uses a path token rather than the bare slash
(regex word-boundary requires it). Source-string fragmentation
pattern reused from the e2e attack-chain test. Maps to
LLM06/ASI01/LLM01.
Plugin README "Other runnable examples" section + plugin
CLAUDE.md "Examples" table + CHANGELOG Unreleased/Added
all updated. Marketplace root README unchanged
([skip-docs] for marketplace-level gate — plugin's outward
coverage is unchanged, only demonstrations were added).
Companion to 8df5d5c (which only carried the doc updates — the example
directories themselves were left out of staging by mistake). This
commit adds the actual example mappes:
- examples/lethal-trifecta-walkthrough/{README.md, run-trifecta.mjs,
expected-findings.md}
- examples/mcp-rug-pull/{README.md, run-rug-pull.mjs,
expected-findings.md}
Plus plugin CLAUDE.md "Examples (runnable demonstrations)" section
with a 4-row table covering malicious-skill-demo, prompt-injection-
showcase, lethal-trifecta-walkthrough, and mcp-rug-pull plus the
state-isolation discipline notes.
Marketplace root README unchanged since plugin's outward coverage
is unchanged ([skip-docs] covers the marketplace-level gate).
Two new self-contained, runnable threat demonstrations under examples/:
- lethal-trifecta-walkthrough/ — feeds 5 hook calls (WebFetch, Read .env,
Bash curl POST + suppression follow-ups) into post-session-guard and
verifies the Rule-of-Two advisory fires exactly on leg 3. State
isolated via run-script PID so /tmp/llm-security-session-*.jsonl is
not polluted. Treffer post-session-guard, ASI01/ASI02, LLM01/LLM02.
- mcp-rug-pull/ — mutates an MCP tool description across 8 stages.
Each per-update <10% Levenshtein, cumulative reaches 32.2% by stage
7 — proves the v7.3.0 (E14) mcp-cumulative-drift MEDIUM advisory
catches slow-burn rug-pulls that the per-update detection would
miss. Uses LLM_SECURITY_MCP_CACHE_FILE to isolate cache. Treffer
post-mcp-verify, mcp-description-cache.mjs, OWASP MCP05/LLM03/ASI04.
Each example: README.md + run-*.mjs + expected-findings.md.
Plugin README "Other runnable examples" section + CHANGELOG
[Unreleased] Added bullets + plugin CLAUDE.md "Examples" section
all updated in this commit. Marketplace root README unchanged
since plugin's outward coverage is unchanged ([skip-docs]
covers the marketplace-level gate).
Synthetic ROS-analyse output for "Acme Kunde-chatbot" (Acme Kommune)
following the same pattern as security-assessment, cost-estimation,
ai-act and summary fixtures. Satisfies all 29 assertions in
tests/test-ros-output.sh:
- 8 phases (Fase 1-8) plus Ledelsessammendrag
- 12 trusler i T-XXX-NN format (MAESTRO + OWASP-mapping)
- 9 risikoer i R-N format
- 10 tiltak i M-N format
- 7 ROS-dimensjoner med X/5-scoring
- 5x5 risikomatrise + restrisiko-tabell
- NS 5814 + ISO 31000 metodikk-referanser
- AI Act, GDPR, OWASP regulatoriske referanser
- MAESTRO + supply-chain referanser (Vedlegg O coverage)
Tar bort den siste pre-eksisterende run-e2e-feilen
(`bash tests/run-e2e.sh` exits 0).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three new files in tests/e2e/ (45 tests, 1777 -> 1822):
- attack-chain.test.mjs (17): full hook stack against attack payloads in
sequence -- prompt injection at the gate; T1/T5/T8 bash evasions;
pathguard on .env / .ssh; secrets hook on AWS-shaped keys and PEM
headers; markdown link-title and HTML-comment poisoning in tool
output; trifecta accumulation over a single session with dedup on
the next benign call.
- multi-session.test.mjs (9): state persistence across simulated
session boundaries. Uses the fact that a hook child's process.ppid
equals the test runner's process.pid, so writing the session state
file directly simulates "previous session" history. Covers slow-burn
trifecta (legs spread >50 calls), MCP cumulative description drift
via LLM_SECURITY_MCP_CACHE_FILE override, and pre-compact transcript
poisoning in warn / block / clean / missing-file modes.
- scan-pipeline.test.mjs (19): scan-orchestrator + all 10 scanners +
toxic-flow correlator against poisoned-project (BLOCK / 95 / Extreme)
and grade-a-project (WARNING / 48 / High). Asserts envelope shape,
verdict, risk_score, severity counts, OWASP coverage, scanner
enumeration, and a narrative-coherence cross-check that the BLOCK
scan strictly outranks the WARNING scan along every axis.
Test files build credential-shaped payloads at runtime via concatenation
so they contain no literal matches for the pre-edit-secrets regexes
(memory rule feedback_secrets_hook_test_fixtures.md).
Doc updates in same commit per marketplace policy:
- CLAUDE.md header: 1777+ -> 1822+ tests, mentions tests/e2e/
- README.md badge tests-1777 -> tests-1822, body text updated
- CHANGELOG.md: new [Unreleased] Added section describing scope
No version bump. No behavior changes outside tests/.
ToS-vurdering konkluderte med at autonom cron-kjøring er unødvendig kompleks
for en solo-fork-and-own-plugin. Apply-fasen krever LLM-resonnering uansett,
så manuell trigger fra en aktiv Claude Code-sesjon er enklere og holder
pluginen klart innenfor Anthropic Consumer Terms paragraf 3 (automated access
only via API key or where explicitly permitted — Claude Code CLI er
eksemptert som offisielt verktøy).
Lagt til:
- commands/kb-update.md — ny /architect:kb-update slash-kommando som driver
poll, endringsrapport, microsoft_docs_fetch-update og commit fra sesjonen.
Argumenter: --skip-discover, --priorities, --dry-run, --single-commit
- Catalog-entry i playground HTML for kb-update (categori: tool, 4 input-felt)
Slettet (Wave 3-5 reversert, ~1500 linjer + 7 testmoduler):
- scripts/install-kb-cron.mjs (cross-OS scheduler-installer)
- scripts/kb-update/weekly-kb-cron.mjs (cron-orkestrator med pre-flight, lock,
backup, claude -p subprocess, post-run verify, rollback)
- scripts/kb-update/templates/ (4 scheduler-templates: launchd plist, systemd
service+timer, Windows ps1 + README)
- scripts/kb-update/lib/auth-mode.mjs (cron-spesifikk auth validation)
- scripts/kb-update/lib/lock-file.mjs (PID+mtime stale-detection)
- scripts/kb-update/lib/cost-estimat.mjs (pre-flight budget-cap)
- 7 testmoduler under tests/kb-update/ for slettet kode
- tests/test-kb-update.sh (Bash-3.2-shim, erstattet av direkte node --test)
Beholdt (utility-laget fortsatt brukbart):
- run-weekly-update.mjs, report-changes.mjs, build-registry.mjs,
discover-new-urls.mjs (KB change-detection-pipelinen)
- lib/atomic-write, lib/backup, lib/cross-platform-paths, lib/log-rotate
- 4 testmoduler (42/42 tester PASS)
Endret:
- hooks/scripts/session-start-context.mjs: fjern kb-update-status.json-overvaaking
- tests/run-e2e.sh --kb-update kaller node --test direkte i stedet for shim
- README.md, CLAUDE.md: KB-vedlikehold-seksjon rewriter for manuell modell
- plugin.json: 1.11.0 -> 1.12.0
- Rot README + CLAUDE.md: ms-ai-architect-versjon bumpet
Schedulering er bevisst utenfor scope og overlatt til brukeren — eventuelle
forks som vil ha periodisk varsling kan sette opp egen cron / launchd /
GitHub Actions som kjører rapport-fasen og varsler om aa kjore
/architect:kb-update i CC-sesjon.
Verifisering:
- bash tests/validate-plugin.sh: 219 PASS, 0 FAIL
- bash tests/run-e2e.sh --kb-update: 42/42 inner + suite PASS
- bash tests/run-e2e.sh --playground: 271/271 PASS (statisk + parsers)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 12 — adds --kb-update flag to tests/run-e2e.sh and a Bash 3.2-compatible
shim test-kb-update.sh that runs `node --test tests/kb-update/*.test.mjs`
(shell-glob form; Node 25 rejects directory-form arguments). Shim translates
node --test exit code + parsed pass/fail counts into the e2e-helpers.sh
suite counters (init_suite/print_summary).
Verification:
- Playground baseline 271 PASS unchanged before/after edit
- bash tests/run-e2e.sh --kb-update: exits 0, 110/110 inner tests pass
- bash tests/run-e2e.sh --all: kb-update suite included
- Pre-existing ROS-fixture absence (tests/fixtures/ros-analysis/) is
unrelated to this change and remains for separate handling
Wave 5 of 7 in v1.12.0 auto-KB-update plan.
Plan: .claude/projects/2026-05-04-kb-update-fork-and-own/plan.md (Step 12)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 11 of v1.12.0 plan (.claude/projects/2026-05-04-kb-update-fork-and-own/plan.md).
scripts/install-kb-cron.mjs lives at the scripts/ root (not inside
scripts/kb-update/) because it is a plugin-wide install tool, not part of
the KB-update pipeline itself. Reads the appropriate template from
scripts/kb-update/templates/, fills {{NODE_BIN}}, {{PLUGIN_ROOT}},
{{LOG_FILE}}, {{SCHEDULE_HOUR/MINUTE/DAY_OF_WEEK}} placeholders, writes
to the platform-specific scheduler dir, and registers the job:
macOS - launchctl bootstrap gui/<uid> <plist> (load -w fallback)
Linux - systemctl --user daemon-reload && enable --now <timer>
Windows - powershell -ExecutionPolicy Bypass -File <ps1> (beta)
Flags: --print-only, --target macos|linux|windows, --uninstall, --purge,
--node-bin, --claude-bin, --schedule "M H * * D" (default: Wed 04:23).
UID resolution for launchctl is guarded by process.getuid() POSIX-only
(undefined on Windows). MCP server presence in ~/.claude.json is
warning-only per brief Spørsmål 7. WSL detected via /proc/version.
Cross-OS rendering supported via --print-only --target <other>; install
on a non-host target rejects with explicit error.
11 subprocess + filesystem-snapshot tests in
tests/kb-update/test-install-cron.test.mjs verify --print-only produces
filled templates with no unsubstituted {{...}} placeholders, --print-only
writes nothing under HOME, --uninstall is idempotent on an empty HOME,
--schedule substitutes correctly, and invalid flags reject with non-zero
exit. Tests never invoke launchctl/systemctl/Register-ScheduledTask
against real schedulers.
Tests: 110/110 pass (was 99 before this step).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
D5 — final session of post-v3.4.1 stabilisering. Repo prepared for the
upcoming voyage-rebrand (v4.0.0 hard cut: ultraplan-local → voyage,
/ultra*-local → /trek*).
Tracked changes:
- README.md: cut #9 jargon — '### Self-verifying plan chain' →
'### Manifest-verified steps' with body rewritten to drop the
'objective completion predicate' jargon.
- package.json: removed 'simulate' script that pointed to
tests/simulator/run-pipeline.mjs (file never existed; D3 was
dropped before that work shipped).
- .claude-plugin/marketplace.json: ultraplan-local description
updated from 'Four-command pipeline' to the current six-command
shape with Handover 6 + multi-session resumption (matches
plugin.json).
- docs/_archive-ultra-suite-brief_2.md: deleted (tracked planning-doc
unrelated to ultraplan-local; 117 lines, no inbound references).
Untracked cleanup (not in commit, gitignored):
- 4 stale plugin-root .local.md (NEXT-SESSION-PROMPT.archived,
PLAN-v2.1-phase3, V3.0-MULTI-SESSION-PLAN, etc.)
- 3 docs/ planning .local.md (ultracontinue-brief, ultracontinue-design-notes,
ultraexecute-v2-observations)
- examples/01-add-verbose-flag/perf-baseline.local.md
- .claude/plans/ultraplan-2026-04-17-logger.md
- 9 closed sub-projects under .claude/projects/ (skill-factory,
ultracontinue, ultrareview-local, ultra-pipeline-speedup,
examples-02-real-cli, post-v3.4.0-roadmap, spor-c-q3-cache,
v3.3.1-ultracontinue-fixes)
Cuts #7 (template-duplisering) + #10 (Two kinds of briefs) reviewed
and judged not needed: README has 38 code-fences vs CLAUDE.md 2 (no
overlap), and 'Two kinds of briefs' is already a direct task-vs-
research-brief explanation, not jargon.
D3 + D4 droppet 2026-05-05 — voyage-rebrand renames all ultra*
references; new test infrastructure built against the old names
would need to be renamed in the same pass. Memory pin:
feedback_cleanup_vs_new_code.md.
Tests: 361 / 0 (unchanged — no test changes).
Stabilisering close-out: complete. Repo is ready for voyage-rebrand.
Foundation lib for v1.12.0 cron rewrite — closes brief deliverable
"log-rotate" that was missing from the original plan (Phase 9 scope
revision). Standard logrotate idiom, zero dependencies.
- rotateLog(logPath, opts) returns {rotated, dropped, kept}
- Defaults: maxSizeBytes 10 MB, maxGenerations 5 (1 active + 4 rotated)
- No-op when log missing or under threshold
- Over-size: drop oldest, shift .N..1 down by one, move active → .1
- maxGenerations=1 keeps only the active slot (no rotated copies)
- Pure stdlib fs.renameSync chain with silent try/catch on missing gens
8/8 tests pass: missing/under-size/over-size paths, chained 6 rotations
capped at maxGenerations, oldest dropped, two-step content shift.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Foundation lib for v1.12.0 cron rewrite. Detects which Claude auth mode is
in scope and rejects modes that are architecturally incompatible with cron.
Resolution order:
- ANTHROPIC_API_KEY env-var → 'api-key'
- CLAUDE_CODE_OAUTH_TOKEN env-var → 'long-oauth'
- ~/.claude.json onboarded + runner exit 0 → 'subscription-browser-only'
- otherwise → 'unauthenticated'
Subscription browser-OAuth tokens expire ~15h and cannot survive cron — the
detector flags them explicitly so validateAuthForCron throws EAUTHCRON with
a remediation message pointing to `claude setup-token` or ANTHROPIC_API_KEY.
Both runner (subprocess invoker) and claudeJsonPath (~/.claude.json) are
dependency-injected. Tests stub them — no real subprocess spawn, no home-
directory reads.
15/15 tests pass: precedence, env-var detection, onboarded subscription,
non-onboarded fallback, validateAuthForCron throw paths.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Foundation lib for v1.12.0 cron rewrite skill-tree backup/restore.
Zero dependencies. Uses fs.cpSync (recursive + preserveTimestamps) without
dereference (Node 22.17.x regression) and without filter (Windows symlink-
type bug).
- backupDir(srcDir, backupRoot, opts) → {backupPath, retentionDays, restore()}
- Backup-id format YYYY-MM-DDTHH-MM-SS (filesystem-safe; no colons)
- .backup-meta.json sentinel written as first action inside backupPath
- restore() writes .rollback-in-progress at backupRoot BEFORE rmSync+cpSync
so a crashed restore leaves the sentinel for the next run to detect
- detectStaleRollback(backupRoot) — boolean predicate over sentinel
- cleanupOldBackups(backupRoot, retentionDays) — 3-step age resolution:
meta.created_at → dir mtime → skip-with-warning (never delete a dir
whose age cannot be established)
12/12 tests pass: timestamp format, content round-trip, sentinel lifecycle,
retention, mtime fallback, unparseable-meta skip, missing-root no-op.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Foundation lib for v1.12.0 cron rewrite. Atomic exclusive create via
fs.writeFileSync('wx'); on EEXIST resolves staleness with OR semantics:
stale if PID is dead OR mtime exceeds threshold. Either alone breaks the
lock — handles SIGKILL orphans (mtime), PID-reuse races (mtime), and
crashed-then-replaced runs (PID).
- acquireLock(lockPath, opts) → {lockPath, release()}
- staleThresholdMs default 1h; refreshIntervalMs opt-in for long runs
- registerCleanup default true (exit/SIGINT/SIGTERM/SIGHUP/uncaughtException)
- isPidAlive uses kill(pid, 0) with EPERM-as-alive nuance
12/12 tests pass: PID liveness, fixture concurrency, idempotent release,
stale variants (dead+old, live+old, fresh+live), staleThresholdMs honored.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
D2 of post-v3.4.1 stabilisering. Removes 14 plugin-name references from
agents/, commands/, and docs/ tracked files (CLAUDE.md/README.md/SECURITY.md
were ryddet in v3.4.1 commit 6bca3fb).
The external architect plugin was moved out of the public marketplace
2026-05-04 due to ToS concerns around future skill sources. References in
prose are now stale or misleading for public users. The architecture/overview.md
filesystem slot remains available for any compatible producer — discovery
is plugin-agnostic via lib/validators/architecture-discovery.mjs (drift-WARN,
never drift-FAIL).
Files:
- agents/planning-orchestrator.md (1 ref generalized)
- commands/ultraplan-local.md (2 refs generalized; missed by prompt inventory)
- docs/HANDOVER-CONTRACTS.md (4 refs generalized; Handover 3 + stability summary)
- docs/architect-bridge-test.md (deleted; was a public-only bridge checklist)
- docs/subagent-delegation-audit.md (5 refs/rows removed; intervention #5 dropped, recommendation adjusted)
CHANGELOG.md retains historical references (20 occurrences) intentionally.
Verification:
- grep tracked non-CHANGELOG md: 0 references remaining
- npm test: 361/361 pass (baseline preserved)
D1 of post-v3.4.1 stabilisering. Path C (cache-warm sentinel + identical-tool
parallel) is closed 2026-05-05 per Q3 experiment NEGATIVE result:
median cache_creation_input_tokens = 163,903 across 3 fork-children at
186K parent context (CC v2.1.128, Sonnet 4.6).
Master-plan thresholds: <= 1,500 POSITIVE / >= 3,500 NEGATIVE — NEGATIVE
solidly. CLAUDE_CODE_FORK_SUBAGENT does not preserve cache prefix across
identical-tool children at our context size.
Path C migration is deferred indefinitely. Reassessment is appropriate
when CC v2.2.xxx ships fork-cache-relevant features. Harness
(scripts/q3-cache-prefix-experiment.mjs) and analyser
(lib/stats/cache-analyzer.mjs) remain available for re-run.
Brief: .claude/projects/2026-05-04-spor-c-q3-cache-prefix-experiment/brief.md
Result: q3-experiment-results.local.md (gitignored)
Pure auth-mode-aware cost estimator for v1.12.0 cron pre-flight.
Heuristic: critical+high files only (medium/low excluded per brief);
3000 input + 1500 output tokens per file at Sonnet pricing
($3/M in, $15/M out).
Auth-mode behavior:
- api-key: numeric usd, kvote_warn off (subject to dollar-cap)
- long-oauth, subscription-browser-only:
usd null, kvote_warn on (quota, no dollar billing)
- unauthenticated/missing: best-effort api-key estimate
11/11 tests pass; covers both billing modes plus token-math
invariance across auth-mode (auth only affects dollar-field, not tokens).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Foundation lib for status-fil + lock-fil writes in v1.12.0 cron rewrite.
Pattern: writeFileSync to <path>.tmp.<pid>.<random> then renameSync to
target. Defends against half-written files; readers either see the
previous version or the new one, never a partial.
- atomicWriteSync(path, content) — string or Buffer
- atomicWriteJson(path, obj) — 2-space indent, trailing newline
- Windows EEXIST/EPERM defensive fallback (unlink target + rename)
- Best-effort tmp cleanup on writeFileSync failure
- crypto.randomInt(0, 2**32) two-arg form (unambiguous across Node)
9/9 tests pass including 50-way concurrent-write fuzzer (async-aware
withTmp helper).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First foundation lib for v1.12.0 auto-KB-update. Resolves per-OS paths:
- macOS: ~/Library/{Caches,Logs,Application Support}/<app>/
- Linux: XDG_CACHE_HOME / XDG_STATE_HOME with ~/.cache, ~/.local/state fallbacks
- Windows: %LOCALAPPDATA%\<app>\{Cache,Logs,State}
Plus getBackupDir(pluginRoot) → <pluginRoot>/.kb-backup (gitignored).
All four functions auto-mkdir target. Dependency-injection via opts
({platform, homedir, env}) makes the lib pure-testable; 13/13 tests
pass under tmpdir isolation without touching real ~/ paths.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pre-step for v1.12.0 auto-KB-update for fork-and-own. The cron-rewrite
in Step 9 will create plugin-root/.kb-backup/<ISO-ts>/skills/ during each
run; gitignoring it here ensures backups never enter git history. The
.rollback-in-progress sentinel is created by lib/backup.mjs#restore() and
must also be ignored.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implements Spor C of post-v3.4.0 roadmap. Zero-dep harness measures
CLAUDE_CODE_FORK_SUBAGENT cache-prefix preservation across 3 fork-children
with identical --allowedTools at 150-250K parent context.
Harness uses --append-system-prompt-file (avoids stdin buffer cap at
>200K bytes) + --exclude-dynamic-system-prompt-sections (prevents
per-child cache-prefix divergence from cwd/env/git-status).
Companion analyser summarizes accumulated ultraexecute-stats.jsonl:
percentile wall_time (p50/p90/max), total events, ISO time range.
Output: JSON via --json <path> CLI shim.
Result file is gitignored (*.local.md). Master-plan thresholds
(<= 1.5K positive / >= 3.5K negative) gate the v3.5.0 Path C decision.
Brief: .claude/projects/2026-05-04-spor-c-q3-cache-prefix-experiment/brief.md
Master-plan: .claude/projects/2026-05-04-post-v3.4.0-roadmap/master-plan.md
Brings docs to parity with other plugin READMEs (graceful-handoff,
ai-psychosis pattern):
README.md
- Header block: tagline, solo-maintained disclaimer, AI-generated note
- 6 shields.io badges (version, platform, output-style, commands, hooks, license)
- "The problem" framing: why a shared tone is needed across plugins
- Eight-directive table with what/how each rule changes Claude output
- Before/after example showing default vs human-friendly on the same task
- Architecture ASCII diagram of style merge into system prompt
- Quick start: marketplace install, settings.json enable, /config activation, verify steps
- "What this plugin does NOT do" section pointing users to ms-ai-architect /
ai-psychosis / linkedin-thought-leadership for adjacent concerns
- Cross-plugin use, compatibility matrix, versioning policy
GOVERNANCE.md (new)
- Standard marketplace fork-and-own governance, adapted with
human-friendly-style-specific notes (likely fork variants are tone
variants; trivial fork target since it's one Markdown file)
- Issues-yes, PRs-no policy with reasoning
- Version stability guarantees for the style file itself
CHANGELOG.md
- v1.0.0 entry expanded to reflect the docs polish + GOVERNANCE addition
- All within the same unreleased v1.0.0 (still 1 commit ahead of origin)
[skip-docs]: doc-trippel covered in initial commit (e769140); this is
plugin-internal docs polish only.
New plugin shipping a single Claude Code output style for consistent,
plain-language tone across all marketplace plugins. Auto-discovered from
the plugin's output-styles/ directory per Anthropic's documented plugin
contract.
Style instructs Claude to explain what and why (not how), hide noise
(paths, raw commands, JSON, stack traces) by default, match the user's
language, and stay honest about uncertainty. Keeps Claude Code's default
coding instructions intact via keep-coding-instructions: true.
- plugins/human-friendly-style/output-styles/human-friendly.md (style)
- plugins/human-friendly-style/.claude-plugin/plugin.json (manifest)
- plugins/human-friendly-style/{README,CLAUDE,CHANGELOG,LICENSE}
- .claude-plugin/marketplace.json: registered as 9th plugin
- README.md (root): added section between OKR and Shared infrastructure
[skip-docs]: doc-trippel covered (plugin README, plugin CLAUDE, root
README). Root CLAUDE.md update deferred to avoid conflict with concurrent
ultraplan-local + ms-ai-architect work touching the same Repo-struktur
block.
Pipeline-walk-through fylt inn etter B3 pipeline-run mot examples/02-real-cli.
Erstatter 'TBD' og '(Placeholder)' med faktisk research-skip + plan-summary
+ execute-summary (4 commits c4cf49f → da68c2f) + 10/10 SC PASS-tabell.
Spor B er ferdig. Neste handling: operatør-bekreftelse + WAIT_FOR_TELEMETRY
før Spor C kan starte. Se plugins/ultraplan-local/NEXT-SESSION-PROMPT.local.md
(stop-prompt, IKKE C1).
Step 4 (final) of plan.md (Spor B B3 pipeline run). Adds 4 new tests
in a contiguous block at the end of tests/tally.test.mjs, mirroring
the existing spawnSync style. All 4 test names contain --regex or -r.
Coverage map:
- SC #1 (long form, exit 0): test 1
- SC #2 (-r short form): test 2
- SC #4 (invalid exits 2 with /^tally: invalid regex/): test 3
- SC #5 (--json includes flags.regex): test 4
Total: 14 tests, all green, 3.16s wall-clock (under 5s cap).
[skip-docs]
Step 3 of plan.md (Spor B B3 pipeline run). Adds one line under
Options: in the HELP template literal so --help users can discover
the new flag. Satisfies SC #8.
[skip-docs]
Step 2 of plan.md (Spor B B3 pipeline run). Wires the --regex/-r flag
into main(): when set, compileRegex(pattern) is used and the count is
text.match(re).length. Invalid regex exits 2 via the existing fail()
helper. JSON output now includes flags.regex so consumers can tell the
mode apart. Baseline tests remain green; -i/--ignore-case has no effect
when --regex is set (out of brief scope).
Verify covered: SC #1 (any position), SC #2 (-r short form), SC #3
(regex semantics differ), SC #4 (invalid exits 2), SC #5 (JSON regex),
SC #6 (byte-identical baseline).
[skip-docs]
Step 1 of plan.md (Spor B B3 pipeline run). Adds the new --regex / -r
flag to parseArgs and a compileRegex(pattern) helper. The flag is
parsed but main() does not yet branch on it (wired in step 2). All
10 baseline tests remain green.
[skip-docs]
Adds the runnable counterpart to examples/01-add-verbose-flag (which is
artifacts-only). The fixture is the measurement target for Spor B's
end-to-end pipeline run (B3) and Spor C's cache-prefix experiment.
Baseline:
- tally.mjs (80 lines, hand-rolled argv parser, zero deps)
- 3 flags: --json, -i/--ignore-case, --lines + --help
- Exit codes: 0 success, 1 file error, 2 invalid argv
- 10 node:test cases, all green (~2.2s wall-clock)
- Deterministic fixtures: sample.txt (foo×7, Foo×1, regex fo+×9) +
poem.txt (--lines vs total distinction)
- REGENERATED.md skeleton (B3 fills the pipeline walk-through)
Brief preconditions verified:
- grep -c 'foo' sample.txt = 4 (>= 1)
- regex /fo+/g count = 9 (> grep count)
- Brief assumptions for B3 SC #1, #3 hold
This is the first runnable example in plugins/ultraplan-local/examples/.
Next: B3 runs /ultraresearch-local + /ultraplan-local + /ultraexecute-local
against the brief to add --regex/-r, then verifies all 10 Success Criteria.
Documents the 4-track procedure for updating plugin playground HTML
when plugins are extended or upgraded:
- Track A: Plugin HTML change (parsers, renderers, surfaces)
- Track B: Shared design-system change (with vendor sync)
- Track C: Visual verification (screenshots + manual QA)
- Track D: Release (version bump + 3-doc rule)
Lives at marketplace root because the procedure crosses the
plugin/shared boundary. Marketplace-root CLAUDE.md gets a one-line
pointer under Konvensjoner so Claude finds it automatically in
future sessions.
Includes architecture diagram, common pitfalls (replace_all scope,
sync-without-testing, screenshot folder version mismatch, background
orchestrator degradation), and guidance on when to hoist inline CSS
to the shared DS vs keep it plugin-local.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 13 of v3.4.1 plan.
- plugin.json description: Five-command -> Six-command (drift fix); also
drops the trailing ultra-cc-architect sentence (SC-6 collateral).
Mentions multi-session resumption as part of the Six-command pipeline.
- plugin.json + package.json version: 3.4.0 -> 3.4.1.
361 tests still green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 12 of v3.4.1 plan. Surgical line-by-line generalization of references
to the ultra-cc-architect plugin (no longer publicly distributed):
- CLAUDE.md: 8 hits → "opt-in upstream architect plugin (not bundled)"
- README.md: 9 hits including bare slug at line 646 (removed entirely);
rephrased to "no longer publicly distributed" with the architecture/
filesystem slot still supported by /ultraplan-local
- SECURITY.md: 1 hit → generalized "Opt-in upstream architect step"
CHANGELOG.md historical references preserved per brief; appended a
2026-05-04 note at top of [3.0.0] block stating the plugin is no longer
publicly distributed but the architecture/overview.md slot remains
supported for any compatible producer.
The architecture/overview.md filesystem contract (Handover 3, EXTERNAL,
drift-WARN) is unchanged — anyone implementing a compatible producer
can plug in.
361 tests still green (no regressions). doc-consistency pins for
/ultracontinue-local and Handover 7 § Lifecycle still pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 11 of v3.4.1 plan. Adds the lifecycle subsection to Handover 7
documenting:
- Producer/consumer arbeidsdeling (executor + helper write; ultracontinue
reads; pre-compact-flush refreshes only)
- Stale-file principle: status==='completed' state files SHOULD be
removed via /ultracontinue-local --cleanup --confirm (operator-invoked,
no auto-cleanup, no force flag)
- Frontmatter contract for NEXT-SESSION-PROMPT.local.md: producers MUST
write produced_by + produced_at (ISO-8601); files without frontmatter
are tolerated (warning, not error) for backwards compatibility
- Idempotency: --cleanup --confirm is safe to re-run; partial state
reported but never auto-recovered
Adds 3 doc-consistency pins:
- next-session-prompt-validator CLI shim
- Handover 7 § Lifecycle subsection present
- Handover 7 § Lifecycle names --cleanup + produced_by contract
358 -> 361 tests, all green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 10 of v3.4.1 plan.
commands/ultracontinue-local.md:
- New Phase 0.5 between Phase 0 and Phase 1 — terminal cleanup mode
triggered by parsed flags['--cleanup'] === true. Requires explicit
positional[0] (no "clean all"), no template placeholders in the Bash
invocation. Passes through to cleanupProject via inline ESM. Cleanup
never falls through to Phase 1/2/3/4.
- Phase 0 usage block updated to document --cleanup and --cleanup
--confirm forms alongside the legacy <project-dir> form.
tests/commands/ultracontinue.test.mjs:
- Test (Bug 4 prose) — Phase 0.5 header present, references
cleanupProject and flags['--cleanup'], appears between Phase 0 and
Phase 1 in document order, usage mentions --cleanup --confirm.
- Test (f-1) dry-run on completed project lists candidates without
deleting; both files still on disk.
- Test (f-2 + f-3) confirm-mode deletes both files; subsequent
invocation on the already-cleaned dir signals CLEANUP_NO_STATE_FILE
(deterministic terminal state, idempotent for operators).
Tests 355 -> 358 (+3).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 9 of v3.4.1 plan.
lib/util/cleanup.mjs (new):
- cleanupProject(projectDir, {dryRun, confirm}) reads
.session-state.local.json via validateSessionState; refuses unless the
parsed status is strictly equal to 'completed' (per risk-assessor
Critical 2 — no soft-match on similar statuses).
- Default dryRun: true; refuses dryRun: false without explicit
confirm: true (CLEANUP_REQUIRES_CONFIRM).
- Removes .session-state.local.json + NEXT-SESSION-PROMPT.local.md
candidates; ENOENT counts as "already absent" so the function is
idempotent.
- No CLI shim — invoked from /ultracontinue --cleanup via inline ESM
(Step 10 wires this in).
tests/lib/cleanup.test.mjs (new):
- 7 cases: dry-run lists candidates without deleting; confirm-mode
deletes both files; idempotent re-run signals CLEANUP_NO_STATE_FILE
after fully cleaned; refuses on status: in_progress
(CLEANUP_NOT_COMPLETED); refuses dryRun: false without confirm
(CLEANUP_REQUIRES_CONFIRM); defaults to dry-run; missing state file
returns CLEANUP_NO_STATE_FILE.
Internal scaffolding consumed by Step 10 (Phase 0.5 wire-up). User-facing
docs land with Step 14.
Tests 348 -> 355 (+7).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 8 of v3.4.1 plan.
commands/ultracontinue-local.md:
- New Phase 1.5 between Phase 1 and Phase 2 — runs the
next-session-prompt-validator in --consistency mode when both candidates
exist (plugin-root + project-dir). Refuses on producer mismatch with
fresh candidates, downgrades stale candidate to a warning, downgrades
>24h wall-clock drift to a soft warning.
- Anti-substitution rule applies — paths emitted as concrete tokens, not
template placeholders.
lib/validators/next-session-prompt-validator.mjs:
- Sharpen NEXT_SESSION_PROMPT_PRODUCER_MISMATCH error message to include
the literal "produced_by" field name so consumers (and operators) can
trace the disagreement back to the YAML key.
tests/commands/ultracontinue.test.mjs:
- Test (Bug 3 prose) — Phase 1.5 header present, references validator,
appears between Phase 1 and Phase 2 in document order.
- Test (Bug 3 e) — tmp project dir with state file + two prompt files
with mismatched producers, both fresh relative to state.updated_at;
CLI consistency mode exits non-zero, JSON stdout surfaces
NEXT_SESSION_PROMPT_PRODUCER_MISMATCH with both paths and the
"produced_by" token in the message.
Tests 346 -> 348 (+2).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 7 of v3.4.1 plan.
ultraplan-end-session-local Phase 3:
- Replace require()-of-ESM-module shim with node --input-type=module + import.
- Convert Phase 1 project enumeration to ESM as well so the file is uniformly
ESM (grep -c 'require(' commands/ultraplan-end-session-local.md → 0).
- Combined ESM block writes both .session-state.local.json (atomicWriteJson)
and sibling NEXT-SESSION-PROMPT.local.md (writeFileSync) so producers
succeed or fail together.
- Sibling markdown gets frontmatter: produced_by, produced_at, project.
ultraexecute-local Phases 8 / 2.55 / 4:
- Each phase that writes .session-state.local.json now also writes a sibling
NEXT-SESSION-PROMPT.local.md with frontmatter (produced_by:
ultraexecute-local, produced_at: ISO-8601, status). Phase 8 includes the
full ESM block; 2.55 / 4 reference the combined pattern.
- This is the producer side of the Bug 3 contract; consumer-side wire-up
(Phase 1.5 consistency check in /ultracontinue) lands in Step 8.
Tests: 346 green (no new tests this step — coverage comes via Step 8
integration test).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 6 of v3.4.1 plan. Adds the validator quartet
(Content/Object/Consistency/CLI) for NEXT-SESSION-PROMPT.local.md
frontmatter (produced_by, produced_at). State-anchored staleness check
is the primary refusal; 24h wall-clock drift downgraded to soft warning
to avoid false positives on weekend pauses.
Internal scaffolding consumed by Step 8 (Phase 1.5 wire-up). User-facing
docs land with Step 14 (CHANGELOG + README + version bump).
Tests 335 -> 346 (+11): 9 unit + 2 CLI shim cases.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 5 of v3.4.1 hot-fix plan. Phase 2 of
commands/ultracontinue-local.md is rewritten to remove every curly-
brace template placeholder. The {state-file-path} substitution failure
caused the path-guard hook to crash on unresolved templates.
New Phase 2 structure:
2.a — Read the file with the Read tool (no Bash). Deterministic and
not subject to shell-substitution errors.
2.b — Schema-validate via the existing CLI shim, with the resolved
absolute path emitted as a literal string token by the model
at the time of the Bash call. Anti-substitution invariant:
STOP if about to emit any unresolved placeholder.
2.c — Interpret validator result (preserved verbatim from the
previous Phase 2 — three-way branch on valid + status).
Verification: grep -c "{state-file-path}" returns 0; full Phase 2
section contains no {lowercase-template} curly-brace placeholders.
Suite 322 -> 335 passing (+13: 7 from Step 1, 4 from Step 2, 2 from
Step 4).
Step 4 of v3.4.1 hot-fix plan. Two new tests in
tests/commands/ultracontinue.test.mjs:
(d-1) ALLOW-resolved-path — runHook + pre-bash-executor sanity check
that a concrete validator invocation (no template placeholders)
is not blocked by the marketplace bash-guard.
(d-2) NO-PLACEHOLDER — Pattern D structure assertion that Phase 2
contains neither {state-file-path} nor any other
{lowercase-template} curly-brace placeholder, and that the
Phase 2 prose explicitly documents the deterministic Read
tool flow.
(d-1) passes today (the planned bash form is allowed). (d-2) fails
intentionally on current Phase 2 — Step 5 fix turns it green.
Step 3 of v3.4.1 hot-fix plan. Three fixes in
commands/ultracontinue-local.md:
- Phase 0: replace "$ARGUMENTS contains --help or -h" with parsed-arg
dispatch via parseArgs(...,'ultracontinue'). Usage block fires only
when flags['--help'] === true OR positional[0] === '-h'. Empty,
whitespace, and project-dir args fall through to Phase 1
(auto-discovery), which is the operator-default invocation.
- Phase 1.a: NEW — reject .md positional arg with SC-2 diagnostic
("expected <project-dir>" + "did you mean to paste"). Operators
pasting a NEXT-SESSION-PROMPT.local.md path see a clear error
instead of a confusing fallthrough.
- Phase 1.b: auto-discovery node -e now emits {path, updated_at}
JSON per candidate; Phase 1 sorts numerically via
Date.parse(updated_at) DESC instead of lexicographic compare.
Newest in_progress wins, including across year-boundary timestamps.
All 4 Step 2 regression tests now green; full suite 322 → 333 passing.
Step 2 of v3.4.1 hot-fix plan. Establishes tests/commands/
directory and adds Pattern D structure tests against
commands/ultracontinue-local.md prose:
(a) Phase 1 must document Date.parse(updated_at) numeric sort
(b) Phase 0 must NOT use substring "contains --help" dispatch;
must reference parsed flags or positional[0]
(c) Phase 1 must reference auto-discovery as empty-args fallback
(d) Phase 1 must emit SC-2 diagnostic strings for .md positional arg
Tests (a), (b), (d) fail intentionally on current prose; Step 3
fix turns them green. Test (c) passes already (current Phase 1
prose says "non-empty" which matches the regex assertion).
Step 1 of v3.4.1 hot-fix plan (project 2026-05-04-v3.3.1-ultracontinue-fixes).
Adds ultracontinue entry to FLAG_SCHEMA covering boolean flags --help,
--cleanup, --confirm, --dry-run with no valued flags. The -h short form
is intentionally not aliased: it appears as positional[0] === '-h' and
the command prose dispatches usage on either condition.
7 new tests in tests/lib/arg-parser.test.mjs verify empty args, --help,
-h positional, --cleanup, --cleanup --confirm, project-dir positional,
and .md positional (parser-level accept; command-level reject).
Adds one-click demo and committed screenshots so forkers see what the plugin
produces without running anything. Plugin contract unchanged.
- Inline <script id="demo-state-v1"> block (37 KB) built by
scripts/build-demo-state.mjs from playground/test-fixtures/*.md
- "Last inn demo-data" button on onboarding (replaces all state with demo)
- raw_markdown persistence on project.reports[id] with equal-value guard
- rehydratePasteImports() auto-fills textareas + re-renders visualizations
on project surface mount
- tests/screenshot/ standalone Playwright runner (own package.json)
- 24 committed screenshots in playground/screenshots/v1.10.0/
(12 surfaces x 2 themes, deviceScaleFactor 2 retina, fullPage)
Tests: 215 + 201 + 70 + 7 = 493 PASS, no regressions.
Docs updated per OBLIGATORISK three-level rule (plugin README, plugin CLAUDE,
marketplace root README, CHANGELOG).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pins existing BLOCK rules in the two pre-* executor hooks so a future
silent weakening of BLOCK_RULES surfaces as test failures instead of
slipping through code review.
50 new tests covering both hooks plus allow-list pins (lib/, tests/,
docs/, ls, git, npm) and fail-open on malformed input. Reuses
tests/helpers/hook-helper.mjs child-process spawner.
[skip-docs]
Replaces explicit *.local.md + *.local.json rules with single *.local.*
glob covering the new examples/01-add-verbose-flag/perf-measure.local.sh
and perf-baseline.local.md operator-driven measurement harness for SC1
wall-time gate.
The harness files themselves stay outside git (operator-run only). The
.gitignore generalization is the only tracked change.
[skip-docs]
Adds 6 files in tests/synthetic/ exercising the determinism pipeline at the
SC7 brief floor (Jaccard >= 0.833). Plan fixture pair: 40 step titles each
with 38 shared (Jaccard 0.905). Review fixture pair: 30 finding-IDs each
with 28 shared (Jaccard 0.875). Reuses lib/parsers/jaccard.mjs +
lib/parsers/finding-id.mjs.
The new pair coexists with tests/lib/review-determinism.test.mjs which
holds the older SC4 (0.70) floor against tests/fixtures/ultrareview/.
The lower floor protects pipeline regressions; the higher floor anchors
the speedup brief's determinism aspiration.
[skip-docs]
New hooks/scripts/post-compact-flush.mjs (PostCompact event, CC v2.1.105+):
auto-discovers <cwd>/.claude/projects/*/.session-state.local.json (most
recently modified), validates it via session-state-validator, emits
additionalContext via stdout so the post-compact assistant turn has
Handover 7 resume context loaded immediately.
Read-only — never writes. Always exits 0; never blocks compaction. Uses
only node:fs sync APIs available since Node 12 (no glob dependency).
Companion to the existing pre-compact-flush.mjs:
- PreCompact: refresh progress.json + .session-state.local.json
- PostCompact: re-inject .session-state.local.json into context
Wired in hooks/hooks.json under a new PostCompact matcher block.
Both files staged via /tmp/claude-* and copied into hooks/* via Bash to
respect the llm-security plugin path-guard (which blocks direct Write to
hooks/scripts/*.mjs and hooks*.json).
Test: tests/hooks/post-compact-flush.test.mjs (4 tests) covers no-state,
malformed-state, valid-state, and multi-project mtime selection.
Wire the main-merge-gate lifecycle event into commands/ultraexecute-local.md
Phase 8. Three event variants emitted via lib/stats/event-emit.mjs (S8):
- main-merge-gate fired at the gate boundary
- main-merge-approved fired on operator confirm
- main-merge-declined fired on operator decline (run recorded as partial)
The gate ALWAYS pauses regardless of gates_mode — it is the one always-on
boundary that --gates does not toggle. On decline, --resume re-enters at
the gate, and the wave session branches survive on the remote thanks to
Hard Rule 19's push-before-cleanup. Recovery surface is documented inline.
Pin in tests/lib/main-merge-gate.test.mjs locks the always-on prose, the
event names, and the recovery-surface contract.
Single autonomy-control surface (--gates) added to ultrabrief, ultraresearch,
ultraplan, and ultraexecute. When present, sets gates_mode = true and
re-enables approval pauses at every phase boundary + every wave for
high-stakes runs. When absent (default in auto), the chain runs continuously
to the main-merge gate (which always pauses regardless of --gates — that
boundary is the one always-on safety stop).
ultrabrief: pause after auto-mode confirmation; emit brief-approved event
ultraresearch: pause after each topic completes
ultraplan: pause after Phases 5, 7, 9
ultraexecute: pause after each wave's worktrees finish, before merge-back,
AND before the main-merge gate (MAIN_MERGE_GATE)
All four commands invoke the autonomy-gate state machine via the CLI shim
node lib/util/autonomy-gate.mjs (built in S8). Test pin in
tests/lib/gates-flag-coverage.test.mjs locks the contract.
Also wires the brief-approved stats emission into ultrabrief Phase 5 auto
path (was the SC4 wiring requirement from plan-v2 Step 11).
Bring the launch template (used by /ultraplan-local --decompose) into
contract-parity with the Phase 2.6 wave executor hardenings shipped in the
previous commit:
- GIT_OPTIONAL_LOCKS=0 exported once at the top
- MAX_TURNS / MAX_BUDGET_USD env-overridable (default 50 / 5)
- Absolute SHARED_CONTEXT_FILE built from brief + architecture
- SAFETY_PREAMBLE prepended to every per-session prompt (GH #36071 +
GH #52272 clarifications)
- Per-child --max-turns + --max-budget-usd + --append-system-prompt-file
- push-before-cleanup before merge AND in the cleanup_worktrees trap
- Three new template rules (16, 17, 18, 19) document the contract for
session-decomposer
Pin in tests/lib/doc-consistency.test.mjs locks all required substrings
against future regressions.
Strengthen single-message reinforcement for plan-critic + scope-guardian
parallel dispatch in commands/ultraplan-local.md Phase 9 and mirror in
agents/planning-orchestrator.md Phase 6. Reviewers now write structured JSON
to /tmp/{plan-critic,scope-guardian}-out.json which is merged via the
lib/review/plan-review-dedup.mjs CLI shim from S8.
The merged set lets us revise the plan once for duplicate findings instead
of twice. Source: research/05 R1 + R2.
Pin in tests/lib/doc-consistency.test.mjs locks both files against
single-message + dedup-helper regressions.
Inline STEP_HEADING_REGEX, FORBIDDEN_HEADING_REGEX, the canonical step+manifest
example, and the post-write plan-validator self-check directly into Phase 8 of
commands/ultraplan-local.md. This eliminates the dependency on Opus 4.7
implicitly loading agents/planning-orchestrator.md — the format contract now
travels with the command file itself.
Source: research/04 D5 + plan-v2 Step 7. Pin in tests/lib/doc-consistency.test.mjs
locks the substrings so future edits cannot silently regress the seal.
ros = REFERENCE STANDARD (mot Plugin Playground-run2/scenarios/ros-lier-kommune.html)
- parseMatrixRisk utvidet med _consumer-detection (ros når ## Top-risikoer eller ## Anbefaling), topRisks[] (max 5, fallback til threats sortert på severity-rank med alfabetisk tie-breaker), og recommendation (første avsnitt under ## Anbefaling)
- R15 regression: hasTopRisks/hasAnbefal-detection er ikke-invasiv; Dpia-fixturer har ingen av disse seksjonene → _consumer=null, topRisks=[], recommendation='' (alle felt forblir uendret for dpia-rendereren)
- renderRos wrapper renderPageShell med eyebrow=ROS; behold matrix 5x5 + radar 7-akser + threats; legg til top-risks-list, residual-pair og recommendation-card
- ros.md fixture utvidet med ## Top-risikoer (5 trusler), Restrisiko: 4x3 to 2x2, og ## Anbefaling
- RESTRISIKO key-stat utledes når residualPair finnes (samme monster som Dpia og Security)
Sesjon 6 (v1.10.0) gjor en samlet README/CLAUDE/CHANGELOG-oppdatering for hele v1.10.0-loypet.
[skip-docs]
Pin the contract from plan-v2 Steps 1-3: every agents/*.md must declare
model: (opus|sonnet|haiku) AND (tools: or disallowedTools:). Orchestrators
(planning/research/review) must be opus and include the Agent tool;
non-orchestrators must not include Agent (no recursive swarming).
23 agents in scope; 5 pinning tests.
[skip-docs]
- parseConformityChecklist utvidet med buckets {passed, conditional, failed} via bucketOf-mapping
- Status-mapping stotter bade engelsk (met/partial/missing) og norsk (bestatt/betinget/avvist) for backward-compat
- renderConformity: erstatter findings-listen med E1 kanban-board (3 kolonner: Bestatt, Med betingelser, Ikke bestatt)
- aiact-timeline beholdt for deadlines (under kanban som sekundaer report-meta-blokk)
- Wrapped med renderPageShell (eyebrow SAMSVAR)
- Fixture conformity.md oppdatert til norske status-markorer for tydeligere bucket-mapping (5 bestatt, 3 betinget, 4 avvist)
- renderFria wrapped med renderPageShell (eyebrow FRIA, lede ref AI Act Art. 27)
- Erstatter rights-matrix med D4 critique-cards per rettighet (severity fra impact-score)
- Ny fria-case i inferVerdict: max impact >=4 block, >=3 warning, ellers go-with-conditions
- DS_CLASSES test oppdatert: rights-matrix -> critique-card (Step 10 endrer body for FRIA)
- renderRequirements wrapped med renderPageShell (eyebrow KRAV, verdict via requirements-list)
- scenario-card-grid: gruppert pa source_article, status fra dominant (met/partial/missing)
- expansion-card per krav (E7): severity-dot + title + chev, body med dl
- data-action requirement-expand wired for klikk-toggle (handler kommer i Sesjon 6)
- renderAiActPyramid wrapped med renderPageShell (eyebrow KLASSIFISERING, verdict via aiact-archetype, keyStats via inferKeyStats)
- 4 details/summary-blokker under pyramide for klikk-pa-tier kort beskrivelse (active tier open by default)
- Inline CSS for pyramide-desc + scenario-card-grid + residual-pair + read-more-block (klargjor renderers A.2-A.6)
Moved to a separate marketplace. Drops the plugin directory, the
manifest entry, and the README/CLAUDE.md sections describing it.
ultraplan-local references to the optional architecture/overview.md
contract are kept (filesystem-level discovery, drift-WARN), but
marketplace-name pointers in ultraplan-local docs may follow.
Marketplace-root .gitignore already covers plugins/*/.claude/, but
plugin-local coverage is load-bearing for fork-and-own (forks of just
the plugin won't carry the marketplace .gitignore).
Removes:
- All 6 PNG screenshots (playground/screenshots/) and the capture script
(scripts/screenshots/capture-playground.py).
- "Screenshots" section from plugin README.
- "Screenshot-suite" section from plugin CLAUDE.md.
- Screenshots bullet from marketplace root README's ms-ai-architect listing.
Scrubs the 17 synthetic fixtures + CHANGELOG/CLAUDE/README of identifying
references: organization names, government-agency names, agency-specific
terminology, sector-specific use cases. Replaced with generic placeholder
data ("Acme AS" / "Demosystem") that exercises the same parser archetypes.
Plugin's domain-target wording (Datatilsynet, offentlig sektor, offentlig
myndighet, rettshåndhevelse, NS 5814, Utredningsinstruksen, EU AI Act
Annex III categories) is intact — those describe the plugin's intended
audience, not any specific entity.
This is a cleanup commit. Earlier git history still contains the prior
references; force-push or rebase is required if scrubbing the history is
desired. That decision is out of scope here — please run it separately
if needed.
Verified post-scrub:
- bash tests/validate-plugin.sh -> 215/215 PASS
- bash tests/run-e2e.sh --playground -> 240/240 PASS (170 + 70)
Step 13 (Wave 5). Adds light/dark theme toggle to v3 playground.
- Inline <script> in <head> reads ms-ai-architect-theme from localStorage and
sets <html data-theme="..."> BEFORE stylesheets parse (avoids FOUC).
- New .theme-toggle button in topbar (vendored design-system class).
- ACTIONS['toggle-theme'] flips data-theme, persists to localStorage, and
syncs all [data-theme-label] elements + aria-label in-place (no re-render).
- Default behavior (no localStorage value or unsupported value) keeps existing
data-theme="dark" hard-coded on <html>.
14 tolerant parsers per kanonisk archetype-routing-tabell (Step 11) + 3 helpers (parseTable, parseSections, extractField). Each parser returns {ok:true, data} or {ok:false, errors:[{section, reason}]} — never throws on bad input. PARSERS routing-objekt eksponert via window.__PARSERS.
Verified against all 17 fixtures: every parser produces expected shape. Empty input returns structured error per Verify-asserts.
Synthetic markdown fixtures for the 17 report-producing commands per the canonical archetype-routing-tabell. Each fixture uses the consistent ANPR-trafikkanalyse system from brief example to produce parser-input that exercises every archetype path (aiact, requirements-list, text-document, fria, conformity-checklist, matrix-risk 5x5, matrix-risk-6x5, findings, cost-distribution, capability, phased-plan, markdown, verdict, comparison).
Real /architect:<command> capture deferred to incremental work; synthetic fixtures suffice as parser test input for Steps 11-12.
Step 9 of v3 plan. Replaces renderCatalogStub with full
renderCatalogSurface — search-input + 5 .expansion-grupper (en per
CATALOG.categories) + per-command-card with "Åpne skjema"-button. Klikk
åpner modal med renderCommandForm (samme generic renderer som
prosjekt-detalj fra Step 8).
Søk: input-event oppdaterer modul-lokal catalogSearchQuery og kaller
refreshCatalogResults() som re-rendrer kun groups-containeren — bevarer
fokus + cursor i søkefeltet (full re-render ville flyttet caret).
Filtrerer på id+label+description+argument_hint. Når query er aktiv
forces alle expansions med treff åpne; ellers er 'regulatory' åpen som
default (mest brukt entry-point).
Verktøy-commands får .catalog-card__pill="Verktøy" + .catalog-tool-notice
("Verktøy — ingen rapport-import"). Modalen viser samme advarsel via
.guide-panel--info-banner. Rapport-produserende får "Rapport"-pill.
Verifisert via vm-sandbox med activeSurface='catalog':
- data-command-card === 24 (Step 9 verify-assert ✓)
- 5 expansion-grupper (data-catalog-group)
- 24 open-catalog-form-knapper
- 17 Rapport-pills + 7 Verktøy-notices (matcher CATALOG.commands.filter
produces_report)
- refreshCatalogResults() med query='classify' kjører feilfritt
Step 8 of v3 plan. renderCommandForm(commandId, opts) reads
CATALOG[id].input_fields and emits a form with all 6 supported field types
(text/textarea/select/multiSelect/boolean/number). Shared fields
auto-prefill from state.shared via field.shared_path dot-lookup; local
fields prefill from project.reports[id].input when opts.projectId is set.
window.__buildCommand(commandId, formData) builds /architect:<id>
key="value" key="value" ... — shared fields merged first (CATALOG order),
formData overrides and may include keys outside the catalog (passthrough).
Empty/null/empty-array values omitted. Multi-values comma-joined inside
quotes; quotes/backslashes escaped.
Copy-button writes via navigator.clipboard.writeText with graceful
fallback to inline preview when clipboard is blocked (file:// in some
browsers). Preview-button shows the same string without copying.
Replaces the form-zone-placeholder in renderCommandSubCard. All 24
command-cards in project-detail now render real forms (verified:
data-command-card === 24, data-command-form === 24, copy-command
buttons === 24, field-from-tag === 39, paste-import === 17,
report-slot === 17, buildCommand('classify',{riskLevel:'høy'}) →
'/architect:classify organisation_name="Vegvesen" sector="Statlig"
riskLevel="høy"').
Step 6/17 av Playground v3-leveransen (Session 2, Wave 2).
Hjem-skjerm med 3-track entry-pattern (.tracks__card--guided/explore/expert):
- Onboard / Re-onboard
- Nytt prosjekt
- Command-katalog
Prosjekt-liste under tracks: .fleet-grid med .fleet-tile per prosjekt
(navn + scenario-chip + meter med rapport-fremdrift). Tom-state vises
som .guide-panel--info med 'Opprett første prosjekt'-knapp.
Topbar (renderTopbar) med brand + nav + eksport/import-knapper synlig
på home/catalog/project. Onboarding holdes uten topbar for full-fokus
første-flyt. import-input change-handler ruter via window.__importState
fra Step 3 og kjører scheduleRender etter import.
Verifisert via vm sandbox:
- 21 tracks__card-treff (3 cards med modifier-klasser)
- guided/explore/expert-modifiers alle til stede
- empty-state guide-panel--info når projects=[]
- fleet-grid suppressed når projects=[]
Stub-actions for new-project (Step 7 erstatter med modal-åpning).
README/CLAUDE.md-update deferred til Step 17 (Session 5).
Step 5/17 av Playground v3-leveransen (Session 2, Wave 2).
5 grouped sections (organization/technology/security/architecture/business)
rendered with Tier 3 .form-progress sidebar and .expansion components per
group. Validation via .error-summary with click-to-focus links.
ONBOARDING_SCHEMA mirrors agents/onboarding-agent.md Phase 1-5 (18 fields
total). commitOnboarding() writes to state.shared.<group>.<field> via
Proxy → throttled IDB/localStorage write. Re-onboard is just navigate
back to onboarding — pre-fills from state automatically.
Verified via vm sandbox: bootstrap auto-routes to onboarding when no
org.name, commitOnboarding produces >=5 keys in shared.organization,
validation catches required-empty (2) and accepts filled (0).
Surface routing: showSurface() toggles [hidden] across data-surface
sections. scheduleRender batches via queueMicrotask. Action router
dispatches data-action attributes to ACTIONS map. README/CLAUDE.md-update
deferred til Step 17 (Session 5).
Step 3/17 av Playground v3-leveransen.
Eksport:
- buildEnvelope(): { appId, schemaVersion, exportedAt, shared, projects,
activeProjectId, activeSurface, preferences } — JSON.parse(JSON.stringify(...))
for å strippe Proxy-wrappere
- exportState(): Blob + URL.createObjectURL + programmatisk <a download>-klikk
+ revokeObjectURL etter 0ms timeout. File System Access API krever HTTPS
(secure context) og er ikke tilgjengelig på file:// — derfor Blob-pattern.
- Filnavn-format: ms-ai-architect-playground-<ISO-stamp>.json
Import:
- importState(File): file.text() -> JSON.parse -> envelope-validering (appId
+ schemaVersion required) -> migrateState() -> persistence.save() -> in-place
state-update (Proxy-binding må bevares — kan ikke bytte raw-referansen)
-> manuell 'change'-event-dispatch så subscribers re-rendrer
- file.text() er Promise<string> som fungerer på file:// uten secure context
MIGRATIONS-pipeline:
- Eager: alle migrasjoner kjøres sekvensielt fra fil-versjon til SCHEMA_VERSION
ved import (ikke lazy ved access)
- Nøkkel-format: 'N->M' (fortløpende). Aldri hopp over et steg.
- Kaster eksplisitt feil ved manglende migrasjons-funksjon eller ved
funksjon som ikke setter schemaVersion korrekt — silent corruption
unngås (brief Risk High).
Eksponerte globals: __buildEnvelope, __exportState, __importState, __MIGRATIONS.
Verify-assert: JSON.parse(JSON.stringify(window.__buildEnvelope())).schemaVersion === 1
Plan: .claude/projects/2026-05-03-playground-v3-architecture/plan.md (Step 3)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 2/17 av Playground v3-leveransen.
State-skjelett:
- StateBus extends EventTarget (sharedBus + projectBus)
- Dyp Proxy med set/deleteProperty-traps som batcher dispatchEvent via
queueMicrotask (N synkrone mutasjoner -> én change-event per tick)
- Path tracking: subscribers får detail.paths for å filtrere relevante grener
- INITIAL_STATE med shared.{organization,technology,security,architecture,
business} + projects[] + activeProjectId/Surface + preferences.theme
Persistens:
- IDB primær: én DB ('ms-ai-architect-playground-v1') med 3 stores
(shared, projects, meta). Promise-wrapper rundt indexedDB.open.
- Synkrone migrasjoner i onupgradeneeded med oldVersion-guards (callback-stil
cursor — async cursor-iterasjon er forbudt per w3c/IndexedDB#282)
- db.onversionchange = () => db.close() defensivt på alle koblinger
- localStorage-fallback ved IDB-feil (Safari private mode, kvote): rå JSON
i STATE_KEY, warn ved >4.5 MB nær 5 MiB cap
- Throttled writer: debounce 300 ms etter siste mutasjon
Bootstrap:
- Auto-kjørt på slutten av <body> (DOM allerede parsed)
- window.__store + window.__persistence eksponert for Verify-asserts
Verify-asserts (i nettleser-konsoll på file://-åpnet HTML):
typeof window.__store !== 'undefined' && window.__store.state.schemaVersion === 1
Plan: .claude/projects/2026-05-03-playground-v3-architecture/plan.md (Step 2)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 1/17 av Playground v3-leveransen (Session 1, Wave 1).
- Single-file HTML med klassisk inline-script (file://-kompatibel per WHATWG
html#8121: external type=module-scripts feiler på file:// i Chrome+Firefox)
- 7 vendored CSS-link-tags i korrekt rekkefølge: fonts, tokens, base, components,
components-tier2, components-tier3, components-tier3-supplement
- 4 placeholder-overflater (#surface-onboarding, #surface-home, #surface-catalog,
#surface-project) — fylles ut i Steps 5-7
- IIFE med STATE_KEY ('ms-ai-architect-state-v1') og SCHEMA_VERSION (1) konstanter
- Eksponerer __STATE_KEY og __SCHEMA_VERSION på window for Verify-asserts
- v2-fila beholdes parallelt frem til Step 17 (sletting)
Plan: .claude/projects/2026-05-03-playground-v3-architecture/plan.md
Brief: .claude/projects/2026-05-03-playground-v3-architecture/brief.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Establish a single governance document at marketplace root and copy
it into each of the 9 plugins so every plugin folder remains 100%
self-contained. Replace the inconsistent provocative blurb across
all READMEs with a uniform fork-and-own paragraph that links to
the local GOVERNANCE.md.
[skip-docs]
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Renames playground/azure-ai-playground.html to
playground/ms-ai-architect-playground.html (history preserved via git mv).
Old name was too narrow — plugin covers the full Microsoft AI stack
(Foundry, Copilot Studio, M365 Copilot, Power Platform, Agent Framework).
Replaces the inline <style> block with seven <link> tags pointing at the
vendored design-system under playground/vendor/playground-design-system/:
fonts.css, tokens.css, base.css, components.css, components-tier2.css,
components-tier3.css, components-tier3-supplement.css.
A small inline shim maps legacy playground tokens (--bg, --surface,
--accent, --gradient1) onto design-system tokens (--color-bg,
--color-surface, --color-primary-500, etc.), keeping all existing
playground-specific class CSS (.hero, .wizard-card, .scenario-card,
.item-card, ...) working without rewrites. <html data-theme="dark">
preserves v2's dark visual identity; light-mode toggle is deferred.
DOM, JS logic, scenario data, and command pipelines are unchanged.
Also includes .gitleaks.toml at repo root (path allowlist for vendored
MANIFEST.json files — SHA-256 file hashes are not secrets) which was
missed in the previous commit due to global git ignore.
Docs updated:
- README.md (root): notes the vendoring sync script + ms-ai-architect
Playground subsection
- plugins/ms-ai-architect/README.md: new Playground section with sync
workflow and standalone guarantee
- plugins/ms-ai-architect/CLAUDE.md: Playground section updated with
vendored design-system details + new filename
Initial sync of shared/playground-design-system/ into
plugins/ms-ai-architect/playground/vendor/playground-design-system/
via scripts/sync-design-system.mjs.
Source commit: f1fecf39b8
Files: 25 (7 CSS + 11 fonts/licenses + 3 schemas + README + MANIFEST)
Vendored copy keeps the plugin standalone — playground will load CSS
from ./vendor/ regardless of where the plugin is installed.
Also adds .gitleaks.toml at repo root with a path allowlist for
vendored MANIFEST.json files (SHA-256 file hashes are not secrets).
Docs updated together with the playground HTML refactor that actually
consumes the vendored CSS (next commit). This commit is internal-only.
Vendors shared/playground-design-system/ into a plugin's
playground/vendor/playground-design-system/ tree so each plugin stays
standalone (no marketplace-rot dependency at runtime).
Features:
- Generates MANIFEST.json with SHA-256 per file, source commit hash, sync date
- Drift detection: refuses overwrite if vendored file changed since last sync
- --force flag to override drift
- Injects "DO NOT EDIT" header into copied CSS files
- Pure Node.js, zero npm deps (uses fs.cp from Node 16.7+)
Usage: node scripts/sync-design-system.mjs <plugin-name> [--force]
Two changes in one commit because they were prepared together and the
component demos depend on the new self-hosted fonts.css.
Tier 3 wave 2 — 12 new components
---------------------------------
Adds components-tier3-supplement.css (886 lines) and 12 isolated demo
HTML pages under shared/playground-examples/components/:
toxic-flow chain, fleet-overview, kanban Keep/Review/Remove,
maturity-ladder, classify-and-transform, cycle-ribbon,
persistent-antipattern, suppressed-signals, ExpansionCard, ReadMore,
FormProgress, Aspirational-vs-Committed.
Reuses existing tokens — no new CSS custom properties. Honors the
Phase 1 feedback rules: no large pink areas for body text, severity-red
distinct from failure-red, dark mode via existing [data-theme="dark"].
Provenance: components-tier3-supplement.css and the 12 demo bodies were
authored by claude.ai/design (separate Anthropic instance) on 2026-05-03.
This commit only integrates them — path rewrites, font swap, generic
name substitution in fleet-overview demo data, README updates.
base.css from the export was deliberately NOT taken in because it
reverted the inline-message contrast fix from v0.1.
Self-hosted fonts (Inter, JetBrains Mono, Source Serif 4)
---------------------------------------------------------
Replaces all fonts.googleapis.com / fonts.gstatic.com requests with
.woff2 files bundled at shared/playground-design-system/fonts/.
Why:
- No data leaked to Google about end-user IPs and User-Agents.
- GDPR-safe for Norwegian public-sector deployments.
- Works offline / behind air-gapped firewalls.
- Forkers downloading the marketplace get a complete bundle.
All three families are SIL Open Font License 1.1 — license texts
included alongside the woff2 files. Source Serif 4 woff2 generated
locally from the upstream OTF release using
fonttools ttLib.woff2 compress; Inter and JetBrains Mono are
unmodified upstream webfont releases.
Total bundle: 9 woff2 files, ~940 KB. New fonts.css declares all
@font-face rules with font-display: swap. All 6 example HTMLs and 12
new component demos load it via a single relative path.
Verified
--------
- Privacy grep returns empty across plugins/ and shared/
- Google Fonts grep returns empty across shared/*.html
- Smoke test via python -m http.server: HTML + 7 stylesheets +
Inter-Regular.woff2 all return 200
Doc updates
-----------
- shared/playground-design-system/README.md: file tree updated,
Quick start snippet shows fonts.css link, "Self-hosted fonts"
section added
- shared/playground-design-system/fonts/LICENSES.md: combined attribution
- README.md (root): Tier 3 wave 1+2 component list, Privacy-first bullet
- CLAUDE.md (root): tree entry expanded for new components + fonts
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 2 bulk replace produced a few factually wrong attributions where
real publicly known sector documents/datasets/personas were incorrectly
re-attributed to the fictional generic entity. Genericize those
references instead.
- ros-sector-checklists.md: V440 håndbok citation -> "sektorvise
faglige håndbøker"; tilsynsmyndighet list -> generic phrasing
- master-data-management-ai.md: NVDB row -> generic "sektor-/fagregistre"
- ai-center-of-excellence-setup.md: NVDB integration line -> generic
"sektorvise nasjonale registre"
- multimodal-prompt-engineering.md: system_message persona -> generic
"fagingeniør i norsk offentleg sektor"
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace named real-world entity with fictional generic Norwegian
public-sector entity ("Direktoratet for digital tjenesteutvikling",
DDT) across the design system reference scenarios and root docs.
Repository is a private personal project; references to a real
organization were unintended and unrelated to the project.
- Rename: security-vegvesen.html -> security-direktorat.html
- Persona: replaced with fictional Kari Nordmann
- Domain refs / acronym / rule-IDs: SVV* -> DDT*
- Internal system names (Autosys etc.): replaced with fictional names
Phase 2 (plugin-internal references) follows in next commit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Authored in Claude Code following the design-DNA established by claude.ai/design
in v0.1 (tokens + Tier 1 + Tier 2). Visual coherence verified against existing
components via tier3-preview.html showcase.
shared/playground-design-system/components-tier3.css (~480 lines):
- pair-before-after: ROS/DPIA/AI Act inherent->residual primitive with delta
pill (improved/worsened); responsive collapse to vertical on narrow viewports
- aiact-timeline: 4 EU AI Act milestones (2025-02-02 .. 2027-08-02) with
per-system countdown chips (urgent/soon/distant), today-marker, and per-
milestone passed/active/upcoming states
- tracks: Guide/Explore/Expert 3-track entry pattern carried from Playground v2,
top-bar color coding per track
- rights-matrix: FRIA 12 EU Charter rights x 5 impact levels (Art. 27 EU AI Act)
- capability-matrix: license x kapabilitet with explicit icons per status
(available/cost/conditional/missing) - never color-only
- agent-grid + agent-card: parallel-worker status with state pills, progress
bars, metric chips, pulsing dot for running, distinct failure-red token
- error-summary: Aksel/GOV.UK pattern, white bg + red border + dark body text
+ red heading (NOT large pink fill — fixes contrast bug)
- guide-panel: Aksel friendly inline guidance, info/success/warn variants
Also fixes shared/playground-design-system/base.css inline-message--error which
had the same contrast bug as ErrorSummary v1: white text on light-pink soft-fill
was unreadable. Now uses surface bg + critical border + primary text + critical
strong/heading color. Same dark-mode treatment.
shared/playground-examples/tier3-preview.html (~470 lines): live demo for all
8 components with realistic Norwegian mock-data (Lier kommune ROS T-001
threat, AI Act timeline 2026-05-02 today-marker, FRIA EU Charter rights, M365
capability-matrix, 4-worker utredning grid). Used to validate visual coherence
before committing.
Updates shared/playground-design-system/README.md with Tier 3 component table
and provenance note distinguishing v0.1 (claude.ai/design) from this addition
(Claude Code).
Remaining for v0.2: 12 plugin-specific Tier 3 components (sankey/toxic-flow,
fleet-overview, kanban Keep/Review/Remove, maturity-ladder, classify-and-
transform, cycle-ribbon, persistent-antipattern badge, suppressed-signals,
ExpansionCard, ReadMore, FormProgress, Aspirational vs Committed visual). To
be generated by claude.ai/design in a supplement session before plugin
Playground work begins.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds shared infrastructure section to root README pointing to the new design
system at shared/playground-design-system/, with summary of tokens, Tier 1+2
components, JSON schemas, and reference scenarios. Updates root CLAUDE.md
repo-struktur block to include shared/ at marketplace level alongside plugins/.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tiny helper command for ad-hoc multi-session flows that don't run through
/ultraexecute-local. Writes .session-state.local.json so /ultracontinue
can resume in a fresh chat. Required args (next-brief-path, next-label) —
no inline prompt, headless-safe. Validates via session-state-validator
and prints the same 3-line narration that /ultracontinue Phase 3 uses
(SC-8 cross-project consistency).
Step 9 of /ultracontinue v3.3.0. README/CLAUDE updates land in Step 11.
Extends the PreCompact hook with a sibling block that refreshes
.session-state.local.json's updated_at when status is in_progress or
partial. Per-project: runs after the existing progress.json mutation,
inside the same loop iteration.
Design:
- Only refreshes existing state files; creation is the writer's job
(ultraexecute Phase 8 / 2.55 / 4 + future helper command).
- Monotonic guard: only updated_at is touched. project, status,
next_session_brief_path, next_session_label remain owned by the writer.
- Skips status in {completed, failed, stopped} — the latter two are
operator-action-required and silently bumping updated_at would mask
alert state.
- Always exit 0; never blocks compaction.
[skip-docs] rationale: README + CLAUDE.md updates land in Step 11.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three insertions in commands/ultraexecute-local.md so every session-end
path produces or refreshes .session-state.local.json (Handover 7):
- Phase 2.55 (Check 1, line ~376): write status=stopped on dirty-tree
pre-flight stop before parallel session-spawn
- Phase 4 (line ~773): write status=stopped when entry condition fails
- Phase 8 (line ~1151): canonical convergence — every completed/failed/
stopped/partial run refreshes the state file using atomicWriteJson +
validator verification
Phase 2.3 (validate exit) and Phase 5 (dry-run) intentionally skip the
write — neither path is resumable. Validator errors warn but never block
the run; progress.json remains authoritative.
[skip-docs] rationale: README + CLAUDE.md updates land in Step 11.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reads .claude/projects/<project>/.session-state.local.json (Handover 7),
narrates a 3-line summary, and immediately begins executing the next
session — no interactive confirmation, headless-safe.
Phases:
- 0: --help (self-documenting per brief NFR)
- 1: resolve project dir (auto-discover via node -e enumeration)
- 2: validate via session-state-validator
- 3: narrate (project / next_session_label / brief path)
- 4: read brief and begin
- 5: stats
[skip-docs] rationale: README + CLAUDE.md updates land in Step 11 (Session
2b) per plan structure. Step 8 (docs:) updates HANDOVER-CONTRACTS.md and
the doc-consistency test pin in the same session.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Plain-language UX humanizer release. Default output of all 18 commands
now leads with prose; technical IDs surface at end-of-line as references
rather than headlines. Scanner internals are unchanged; humanization is
a pure output-time transform applied at the rendering layer.
Highlights:
- New scanner-lib modules: humanizer.mjs, humanizer-data.mjs (TRANSLATIONS
for 13 scanner prefixes)
- New --raw flag threaded through every CLI for byte-stable v5.0.0
verbatim output (--json unchanged from v5.0.0, also byte-stable)
- 5 user-impact categories, 5 action-language phrases, 3 relevance contexts
- Self-audit terminal output also humanized; --json path unchanged
- 21 command and agent templates updated for humanized rendering with
--raw passthrough
- 635 → 792 tests (+157) including SC-3 forbidden-words lint, SC-4
scenario read-test, SC-5/6/7 backwards-compat snapshots
Migration:
- Existing --json automation: zero changes required (envelope is
byte-stable with v5.0.0; humanizer fields are bypassed)
- stderr-scraping tooling: review default mode (now uses prose); pass
--raw for v5.0.0 verbatim
- No scanner-internal changes (IDs, severity ladders, scoring weights,
area scorecards all unchanged)
Verification:
- 792/792 tests pass
- self-audit configGrade A (97), pluginGrade A (100), readmeCheck passed
- README badge: tests-635+ → tests-792+
- Plugin README: add "What's New in v5.1.0" section with humanizer overview,
before/after example, plain-language vocabulary table, --raw flag docs.
Bump version badge 5.0.0 → 5.1.0. Add Version History row.
- Plugin CLAUDE.md: add humanizer.mjs + humanizer-data.mjs to Scanner Lib
table. Add "Plain-Language Output (v5.1.0)" section documenting output
modes, vocabularies, and Wave 5 lessons. Bump test count 635 → 792 across
52 test files.
- Marketplace root README: bump config-audit entry 5.0.0 → 5.1.0, update
one-line description to mention plain-language UX, add bullet for the
v5.1.0 humanizer, bump test count 635+ → 792+.
Test-normalizer hardening (consequence of growing CLAUDE.md):
walkClaudeMdCascade walks upward from the marketplace-medium fixture into
this plugin's own CLAUDE.md, so any docs edit ripples into
`scanners[*].activeConfig.claudeMdEstimatedTokens`. The v5.0.0 byte-stability
contract is about scanner internals being unchanged, not ancestor input
content being frozen. Normalizers in json-backcompat, raw-backcompat,
posture-humanizer, scan-orchestrator-humanizer, and snapshot-default-output
now strip claudeMdEstimatedTokens to <ANCESTOR_DERIVED>. The
default-output snapshot for scan-orchestrator was re-seeded via
UPDATE_SNAPSHOT=1 (intent: Wave 6 docs additions; humanizer prose
unchanged).
Verify:
- grep -E "5\.1\.0|v5\.1\.0" README.md CLAUDE.md ../../README.md | wc -l = 12
- node --test 'tests/**/*.test.mjs' = 792/792 pass
- self-audit configGrade A (97), pluginGrade A (100), readmeCheck.passed true
- Bump README test-count badge: 635 → 792 (matches filesystem after Wave 0–5)
- Wire formatSelfAudit() through humanizeEnvelope + humanizeFindings so the
terminal-output path renders humanized finding titles. The --json path is
unchanged — only the prose terminal render is humanized.
- readmeCheck.passed now returns true; configGrade A (97), pluginGrade A (100)
Three changes in one commit:
1. NEW lib/util/atomic-write.mjs — exports atomicWriteJson(path, obj),
the canonical tmp+rename pattern. Reused by pre-compact-flush.mjs and
(in subsequent steps) by the new session-state writer.
2. NEW tests/lib/atomic-write.test.mjs — 4 unit tests covering
round-trip, no-orphan-tmp, overwrite-atomic, pretty-print formatting.
3. REFACTOR hooks/scripts/pre-compact-flush.mjs — replace the inline
atomicWrite() with the imported atomicWriteJson(). Also fixes a
pre-existing syntax error (leading whitespace + stray --resume token
outside the comment block) that silently broke the hook from v3.1.0
onward — PreCompact runtime is fail-open and swallowed the error.
File reformatted with standard zero-indent JS.
163 → 167 tests, 0 fail.
Step 2 of /ultracontinue v3.3.0 (project 2026-05-01-ultracontinue).
Brief assumed *.local.* covered .session-state.local.json — only *.local.md
existed. Adding *.local.json before any state file can be created.
Step 1 of /ultracontinue v3.3.0 (project 2026-05-01-ultracontinue).
Wave 5 Step 16 — final wave step. Threads humanizer-aware rendering
rules through the three agent prompts that produce user-facing output,
and adds a shape test that locks the structure.
- agents/analyzer-agent.md: documents the humanizer envelope shape
(userImpactCategory, userActionLanguage, relevanceContext) in the
Input section; new "Humanizer-aware rendering rules" subsection
instructs the agent to: render humanized title/description/
recommendation verbatim, group findings by userImpactCategory, lead
each line with userActionLanguage, surface relevanceContext when
not affects-everyone, and skip jargon-translation subroutines.
--raw fallback documented (v5.0.0 verbatim severity prefiks).
- agents/planner-agent.md: documents the same vocabulary; instructs
the planner to consume humanized fields from the analysis report,
preserve titles verbatim, and order actions by both dependencies
AND userActionLanguage urgency. Translation duties explicitly
removed from the plan.
- agents/feature-gap-agent.md: replaces the inline t1/t2/t3/t4
tier-to-prose section ladder with userActionLanguage-driven
groupings ("Fix soon" → High Impact, "Fix when convenient" →
Worth Considering, "Optional cleanup"/"FYI" → Explore When Ready);
instructs skipping findings whose relevanceContext is
test-fixture-no-impact; --raw fallback documented.
tests/agents/agent-prompt-shape.test.mjs (new, +6 tests, 786 → 792):
- structural: humanized field reference + frontmatter preserved
- per-agent anchors: analyzer groups by userImpactCategory; planner
orders by userActionLanguage; feature-gap references
test-fixture-no-impact
- global: no "explain what {jargon} means" / "translate jargon" /
"jargon-translation duty" prose anywhere
Self-audit: Grade A unchanged (config 97/100, plugin 100/100).
Wave 5 Step 15. Threads --raw plumbing through all seven action
command templates and adds a shape test covering structural plumbing
plus help.md's plain-language vocabulary.
- commands/fix.md: --raw flag parsed; fix-plan rendering groups by
userActionLanguage; humanized title/description/recommendation are
rendered verbatim from the cross-referenced scan envelope.
- commands/rollback.md: terminology pass — "manifest" → "list of
changes" in user-facing copy; the file name manifest.yaml is kept
as the machine contract; --raw threaded through.
- commands/plan.md: --raw forwarded to the planner-agent's prompt;
agent now instructed to group actions by userImpactCategory and
lead with userActionLanguage; bash block added for flag parsing.
- commands/implement.md: --raw forwarded to the implementer-agent's
prompt; progress-log lines now reference the humanized titles
already present in the action plan.
- commands/cleanup.md: --raw accepted as no-op (cleanup is
file-management only, no findings prose); bash block added.
- commands/help.md: full plain-language pass — "PreToolUse" and
"frontmatter" jargon removed from user-facing copy; new
vocabulary table surfaces the humanized userImpactCategory and
userActionLanguage labels ("Configuration mistake", "Conflict",
"Wasted tokens", "Missed opportunity", "Dead config" / "Fix this
now", "Fix soon", "Fix when convenient", "Optional cleanup",
"FYI"); --raw documented as global pass-through flag.
- commands/interview.md: --raw accepted as no-op; "unused hooks"
question phrased as "unused automation that runs at specific
events" in user-facing copy.
tests/commands/action-commands-shape.test.mjs (new, +6 tests, 780 → 786):
- structural: bash block + Read tool + --raw/$ARGUMENTS plumbing
across all 7 files
- help.md vocabulary: ≥3 userImpactCategory labels and ≥3
userActionLanguage phrases present
- help.md jargon: no bare "PreToolUse" or "frontmatter" in copy
Wave 5 Step 14. Threads the humanizer vocabulary through the remaining
six audit/analysis command templates and adds a shape test that locks
the structure plus a pair of anchor must-contains.
- commands/drift.md: --raw pass-through; new/resolved/changed-finding
rendering instructions reference userActionLanguage and
relevanceContext rather than raw severity.
- commands/plugin-health.md: --raw pass-through; finding rendering
groups by userImpactCategory and leads with userActionLanguage.
- commands/config-audit.md (router): replaces the 25-line A/B/C/D/F
prose ladder with a humanized stderr-scorecard reference + three
userActionLanguage-grouped "What you can do next" branches; --raw
threaded through both scan-orchestrator and posture invocations.
- commands/discover.md: --raw pass-through; finding-summary rendering
groups by userImpactCategory.
- commands/analyze.md: --raw pass-through; analyzer-agent prompt now
instructs grouping by userImpactCategory and leading with
userActionLanguage; humanized title/description/recommendation
strings rendered verbatim, no paraphrasing.
- commands/status.md: phase-label humanization table — current_phase
machine field name preserved, user-facing labels translated
("Looking at your config files", "Working out what to recommend",
"Asking what you'd like to focus on", "Putting together your action
plan", "Making the changes", "Double-checking everything worked");
--raw preserves verbatim machine field values.
tests/commands/group-b-shape.test.mjs (new, +8 tests, 772 → 780):
- structural: bash block + Read tool + --raw/$ARGUMENTS plumbing
across all 6 files
- findings-renderers: humanized field reference + no grade-prose
- anchor must-contains per plan: config-audit.md ⊇
userImpactCategory|userActionLanguage; drift.md ⊇ --raw|humanized
- status.md: current_phase preserved + ≥3 humanized phase labels
Wave 5 Step 13. Threads the humanizer vocabulary through five audit/
analysis command templates and adds a shape test that locks the
structure in place.
- commands/posture.md, tokens.md, feature-gap.md (findings-renderers):
reference userImpactCategory/userActionLanguage/relevanceContext;
remove hardcoded A/B/C/D/F-to-prose tables (humanizer owns the
grade-context vocabulary now via the stderr scorecard headline).
- commands/manifest.md, whats-active.md (inventory CLIs): add --raw
pass-through for CLI-surface consistency. --raw is a no-op in these
CLIs, but the flag is threaded through so users get uniform behaviour.
- All five files: --raw flag parsed from $ARGUMENTS and passed verbatim
to the underlying scanner CLI when present.
tests/commands/group-a-shape.test.mjs (new, +5 tests, 767 → 772):
- structural: every file has a bash invocation block, Read tool
reference, and --raw/$ARGUMENTS plumbing
- findings-renderers only: at least one humanized field referenced;
no hardcoded "[grade] grade is..." prose tables
Step 12 of v5.1.0 humanizer Wave 4. Adds tests/snapshot-default-output
.test.mjs and seeds three snapshots in tests/snapshots/default-output/
that capture humanized default-mode output for representative CLIs.
Coverage:
- scan-orchestrator: stdout JSON envelope (humanized findings); time
fields normalized.
- token-hotspots-cli: stdout JSON envelope (humanized payload.findings);
duration_ms normalized.
- posture: stderr humanized scorecard; (Xms) durations normalized.
Snapshot envelope is uniform on disk: { kind: 'json', payload: ... }
for JSON streams and { kind: 'text', payload: '...' } for stderr text.
This keeps the snapshot files self-describing and easy to read.
Re-seeding requires UPDATE_SNAPSHOT=1 — drift fails the test by design,
so any humanizer prose change is intentional and re-approved.
Tests: 764 to 767 (+3 SC-5 cases). Full suite passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 11 of v5.1.0 humanizer Wave 4. Adds tests/raw-backcompat.test.mjs
mirroring the SC-6 contract for the --raw flag — the explicit "v5.0.0
verbatim" escape hatch.
- 4 fixture-deterministic CLIs (scan-orchestrator, posture,
token-hotspots-cli, fix-cli) get strict byte-equal against
tests/snapshots/v5.0.0/<cli>.json with time fields normalized.
- drift-cli is checked under the same contract guarded by
ensureDriftBaseline.
- 3 environment-aware CLIs (plugin-health, manifest, whats-active) are
checked for mode-equivalence (--raw equals --json).
- Posture additionally asserts its --raw stderr scorecard reproduces
tests/snapshots/v5.0.0-stderr/posture.txt verbatim, modulo (Xms)
duration markers normalized to (0ms).
- Cross-cutting suite asserts --raw findings carry no humanizer fields
on any CLI.
Tests: 751 to 764 (+13 SC-7 cases). Full suite passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 10 of v5.1.0 humanizer Wave 4. Adds tests/json-backcompat.test.mjs
asserting that --json output of every CLI remains backwards-compatible
with the v5.0.0 contract.
Coverage strategy mirrors Wave 3 cli-humanizer test discovery:
- 4 fixture-deterministic CLIs (scan-orchestrator, posture,
token-hotspots-cli, fix-cli) get strict byte-equal byte-equal --json
vs frozen tests/snapshots/v5.0.0/ snapshot, with time-varying fields
(timestamp, target path, duration_ms, generatedAt, durationMs)
normalized.
- drift-cli is checked with the same byte-equal contract guarded by an
ensureDriftBaseline precondition; the test silently skips when the
baseline cannot be created.
- 3 environment-aware CLIs (plugin-health-scanner, manifest,
whats-active) read live config-cascade state, so frozen snapshots
drift as the marketplace evolves. They are verified by mode-
equivalence (--json equals --raw) instead — the same approach
established in Wave 3 cli-humanizer.test.mjs.
A cross-cutting suite asserts --json output of the 4 deterministic
CLIs never carries humanizer fields (userImpactCategory,
userActionLanguage, relevanceContext) on any finding, walking both
top-level findings arrays and scanners[].findings paths.
Tests: 739 to 751 (+12 SC-6 cases). Full suite passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 9 of v5.1.0 humanizer Wave 4. Adds tests/scenario-read-test.mjs
runner, tests/scenario-read-test.test.mjs wrapper, and 5 scenario
fixtures in tests/scenarios/ that feed deterministic raw findings
through humanizeFinding and assert the humanized
title/description/recommendation match brief-owner-approved regex
patterns encoding the ground-truth what/why/whatNext answers.
Corpus selection (per brief criteria):
- 01-tok-cascade.json - TOK/CPS category (token efficiency)
- 02-cps-volatile.json - TOK/CPS category (cache prefix stability)
- 03-cnf-conflict.json - CNF category (conflicts)
- 04-gap-no-claude-md.json - GAP category (feature gap)
- 05-set-invalid-json.json - SET category, AND its v5.0.0 title +
description carry tier1 'invalid' (the brief criterion 'one finding
whose v5.0.0 description uses a forbidden word').
Runner mechanics:
- Loads scenarios matching ^\\d{2}-[a-z0-9-]+\\.json$ in sorted order.
- Calls humanizeFinding(scannerInput) and matches each humanized field
against its declared pattern (case-insensitive regex).
- Verifies humanizer-added structural fields (userImpactCategory,
userActionLanguage, relevanceContext) are non-empty strings.
- Per session decision (1a) acceptance is deterministic regex matching
without a runtime human approval gate.
Wrapper adds 3 tests: scenario-match (binds runner to node --test),
category-coverage (TOK/CPS, CNF, GAP, SET all present), and
tier1-presence (at least one v5.0.0 title or description contains a
tier1 forbidden word).
Tests: 736 to 739 (+3 SC-4 tests). Full suite passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 8 of v5.1.0 humanizer Wave 4. Adds tests/lint-default-output.mjs
runner and tests/scanners/lint-default-output.test.mjs wrapper that
exercise SC-3 against the 6 prose CLIs (scan-orchestrator, posture,
token-hotspots-cli, plugin-health-scanner, drift-cli, fix-cli) running
in default (humanized) mode against tests/fixtures/marketplace-medium.
Lint scope is stderr only — JSON envelope keys ("scanner", "severity")
are structural, not prose. Humanized prose fields embedded inside JSON
are already covered by tests/lib/humanizer-data.test.mjs tier1/tier3
checks. Code references inside backticks pass the lint
(stripBacktickSpans) so technical identifiers can appear when wrapped.
Default-mode prose fixes to land lint at zero violations:
- scan-orchestrator: top banner switches to "Config-Audit v2.2.0" and
per-scanner progress wraps "[XXX] Label" in backticks. --raw and
--json paths preserve the v5.0.0 verbatim banner via new
opts.humanizedProgress flag on runAllScanners.
- plugin-health-scanner: top banner switches to "Plugin Health v2.1.0"
in default mode; --raw/--json keep "Plugin Health Scanner v2.1.0".
- scoring.mjs generateHealthScorecard humanized branch: area names
(CLAUDE.md, Hooks, MCP, Settings, Rules, Imports, Conflicts, Token
Efficiency, Plugin Hygiene) are wrapped in backticks; dot-padding
compensates so column alignment matches v5.0.0 layout.
- posture / drift-cli / fix-cli: thread humanizedProgress flag through
their runAllScanners calls so default mode emits humanized progress
and --raw/--json preserve the v5.0.0 stderr snapshot.
Test infrastructure only — user-facing docs land in Wave 5/6 once
commands and agents consume the humanized payload.
Tests: 735 to 736 (+1 SC-3 wrapper). Full suite passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds --raw flag to all 6 remaining CLIs and wires humanization into the
default rendering path. --json and --raw both bypass humanization for
v5.0.0 byte-equal output; default mode humanizes findings/diff/prose.
token-hotspots-cli: humanizes payload.findings before stdout JSON write.
plugin-health-scanner: humanizes finding titles in stderr brief summary;
--json/--raw write byte-identical v5.0.0-shape result to stdout.
drift-cli: humanizes diff.{newFindings,resolvedFindings,unchangedFindings,
movedFindings} before formatDiffReport; --raw applies to save and list
modes too. Baselines remain raw v5.0.0 on disk.
fix-cli: humanizes manual-finding titles in stderr fix-plan prose; both
--json and --raw produce identical machine-readable JSON to stdout.
manifest, whats-active: --raw is a no-op (no findings, inventory only)
but parsed for CLI surface consistency.
Decision on missing --output-file flag for drift-cli/fix-cli/plugin-health:
deferred. SC-6/SC-7 tests in Wave 4 will use stdout-redirect (the simpler
Alt B path) since these CLIs already write JSON to stdout in machine modes.
Test cli-humanizer.test.mjs covers all 6 CLIs. Three CLIs that read
environment state (plugin-health, manifest, whats-active) verify
mode-equivalence (--json == --raw) instead of frozen-snapshot byte-equal,
because their output reflects current marketplace state which drifts as
plugins are added since the Wave 0 capture.
Wave 3 / Step 7 of v5.1.0 humanizer.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
generateHealthScorecard signature: 2-arg → 3-arg (areaScores, opportunityCount,
options = {}). options.humanized=true renders friendlier title, grade-context
line per overall grade, and rephrased opportunity line. options.humanized=false
(or 2-arg call) preserves v5.0.0 verbatim output for backwards-compat.
topActions also gets an optional options.humanized that swaps recommendations
through humanizeFinding lookup.
posture.mjs main():
--json → write JSON to stdout, suppress stderr scorecard
--raw → write JSON to stdout (byte-identical to --json), write v5.0.0
verbatim scorecard to stderr
default → humanized scorecard to stderr, no stdout
posture.test.mjs scorecard-prose assertions re-anchored to --raw mode (the
explicit v5.0.0 path) — Wave 0 audit only covered finding-title strings;
scorecard prose surfaces here for the first time.
Wave 3 / Step 6 of v5.1.0 humanizer.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds --json and --raw flags to scan-orchestrator CLI main(). Default mode
runs humanizeEnvelope(env) before serialization; --json and --raw bypass
the humanizer for v5.0.0 byte-equal output (SC-6 / SC-7 paths).
Save-baseline path always writes the raw v5.0.0-shape envelope so future
humanizer-data updates do not trigger false-positive drift findings.
runAllScanners() unchanged — it remains the v5.0.0-shape source of truth
for in-process callers (posture, scoring, drift, etc.).
Wave 3 / Step 5 of v5.1.0 humanizer.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Synthetic plan.md fixture with source_findings: block-style YAML list of 3
40-char hex IDs in frontmatter, plus minimal plan structure (Title +
Implementation Plan + 1 Step + Manifest). 3 tests verify:
1. plan-validator accepts a plan with source_findings (additive optional field)
2. frontmatter parser extracts source_findings as array of strings
3. each ID matches the 40-char lowercase hex format from finding-id.mjs
Closes the SC3(b) gap flagged by adversarial review (scope-guardian Gap 2).
LLM-level behavior (planner emitting source_findings) remains non-testable
without live invocation; this covers the structural contract.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wave 2 / Step 4 of v5.1.0 plain-language UX humanizer rollout. Re-anchors
34 title-string assertions across 4 test files so they survive Wave 3's
title/description/recommendation rewriting at the CLI layer.
Anchoring strategy per scanner:
- GAP findings: scanner + category + recommendation substring (humanizer
preserves stable identifiers like CLAUDE.md, .mcp.json, hook in rec).
Hardcoded CA-GAP-NNN IDs for positive checks.
- HKV findings: scanner + evidence regex (evidence preserved verbatim).
- SET findings: scanner + evidence regex (evidence preserved verbatim).
- PLH findings: scanner + hardcoded CA-PLH-NNN IDs (no evidence on most
PLH findings, so ID is the only stable anchor for specific cases;
negative checks use scanner + title-substring spanning raw + humanized).
Per docs/v5.1.0-test-audit.md classification: only (b) WILL BREAK
assertions modified. (a) shape-only assertions (error-message formatting,
pure existence checks) untouched. tests/lib/output.test.mjs and
tests/lib/diff-engine.test.mjs and tests/scanners/fix-engine.test.mjs
unchanged (synthetic test inputs, not scanner output).
Test count unchanged: 689/689 pass. IDs harvested via deterministic
runtime dump per fixture (resetCounter + scan).
3 integration tests using the run-A/run-B fixtures:
- Jaccard(A, B) ≥ 0.70 (SC4 brief threshold)
- IDs match 40-char hex shape (lib/parsers/finding-id.mjs format)
- no duplicate IDs within a single run
Tests the Jaccard PIPELINE; real-LLM determinism deferred to v1.1.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
review-run-A.md (5 findings) and review-run-B.md (6 findings, A ⊂ B) form a
known-Jaccard fixture pair: |A ∩ B| = 5, |A ∪ B| = 6, Jaccard = 5/6 = 0.833,
above the SC4 threshold of 0.70. IDs are real 40-char SHA1s computed via
lib/parsers/finding-id.mjs from realistic (file, line, rule_key) triplets.
Both fixtures pass review-validator --strict (frontmatter + body sections +
findings shape). Real-LLM determinism measurement deferred to v1.1.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Modify "all four pipeline commands" → "all five" (adds /ultrareview-local).
Add 3 new pins: Handover 6 section in HANDOVER-CONTRACTS.md,
review-validator CLI shim, rule-catalogue 12-key size invariant.
11/11 doc-consistency tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes the iteration loop: review.md → plan via source_findings audit trail.
Adds versioning row, validator-map entry, full Handover 6 section, and
stability summary row mirroring the shape of Handovers 1-5.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wave 1 / Step 3 of v5.1.0 plain-language UX humanizer.
scanners/lib/humanizer.mjs exports three pure functions:
- humanizeFinding(f) -> new finding object with translated
title/description/recommendation + three new fields
(userImpactCategory, userActionLanguage, relevanceContext).
- humanizeFindings(findings) -> mapped array.
- humanizeEnvelope(env) -> walks env.scanners[].findings.
Plus computeRelevanceContext(filePath) as a named export for
unit testing.
Field semantics:
- userImpactCategory: from scanner prefix per research/02 line 124
(Configuration mistake / Conflict / Wasted tokens / Dead config /
Missed opportunity / Other).
- userActionLanguage: from severity per research/02 line 134
(Fix this now / Fix soon / Fix when convenient / Optional cleanup
/ FYI).
- relevanceContext: deterministic file-path heuristic — looks for
/tests/fixtures/ or /test/fixtures/ substring (test-fixture-no-impact),
*.local.* basename (affects-this-machine-only), defaults to
affects-everyone. No subprocess, no network.
Lookup order per scanner: static[title] -> patterns regex match ->
_default -> fall through to original strings (when scanner prefix
absent).
Original id, scanner, severity, file, line, evidence, category,
autoFixable, and optional details are preserved exactly. Pure —
verified by deepEqual of input before/after.
Test (32 cases): purity, field preservation across all paths,
known/unknown scanner handling, all 5 severities, all 6 categories,
relevance heuristic for 4 path types, envelope walking, ANSI-free
guarantee. All pass.
Regression: 689/689 tests (657 + 32 new = 54 new across Wave 1).
Project: .claude/projects/2026-05-01-config-audit-ux-redesign/
Wave 1 / Step 2 of v5.1.0 plain-language UX humanizer.
scanners/lib/humanizer-data.mjs exports TRANSLATIONS keyed by
scanner prefix (CML, SET, HKV, RUL, MCP, IMP, CNF, GAP, TOK, CPS,
DIS, COL, PLH). Each scanner has:
- static: exact-title -> {title, description, recommendation}
- patterns: array of {regex, translation} for template-literal titles
- _default: graceful fallback for unknown findings
Architectural change vs. plan: keys translations by exact scanner
title (not finding ID). Reason: finding IDs are sequence-based
(global counter in lib/output.mjs:34), not stable per finding-type
— two runs can produce different IDs for the same logical issue.
Title strings ARE stable (defined as string literals or template
patterns in the scanner source).
Translations follow research/03 SR-1..SR-17:
- active voice, second person, present tense
- sentences <= 25 words
- tier1 absolute prohibitions and tier3 domain jargon are kept out
of prose
- tier1/tier3 terms are permitted inside `backtick spans` (code
references like filenames and field names) — established
technical-doc convention
Test (12 cases): all 13 scanners covered; every static and pattern
entry has the 3 required fields; tier1 and tier3 forbidden-word
checks pass (with backtick-span exclusion); reference-stable
imports. All pass.
Regression: 657/657 tests (645 + 12 new).
Project: .claude/projects/2026-05-01-config-audit-ux-redesign/
Wave 1 / Step 1 of v5.1.0 plain-language UX humanizer.
tests/lint-forbidden-words.json defines the SC-3 forbidden-words
vocabulary used by the lint runner (Wave 4 / Step 8) and the
humanizer-data translation guard (Wave 1 / Step 2).
- Tier 1: 19 absolute prohibitions (failure if matched in default
output) — sourced from Microsoft Writing Style Guide, Federal
Plain Language, GOV.UK, Google Developer Style, Apple HIG.
- Tier 2: 24 strong-avoidance terms (warning if matched) — same
sources plus Mailchimp.
- Tier 3: 12 domain-specific jargon terms (failure if matched in
default output, allowed in --raw and --json paths) — sourced
from research/03 jargon table.
Counts diverge from plan.md (18/21/11) — JSON tracks the brief's
verbatim lists at research/03 lines 200-202 plus tier3 hook entry
from the brief's table. Plan revision noted in audit-doc.
Test: 10 cases verifying parse, count, schema completeness, spot
checks per tier, no cross-tier duplicates. All pass.
Regression: 645/645 tests (635 + 10 new).
Project: .claude/projects/2026-05-01-config-audit-ux-redesign/
Wave 0 / Step 0 of the v5.1.0 plain-language UX humanizer plan.
Captures v5.0.0 baseline output for all 8 CLIs at
tests/snapshots/v5.0.0/ — these snapshots are immutable references
for SC-6 (--json byte-equal) and SC-7 (--raw byte-equal) tests in
later waves.
- 5 CLIs captured via --output-file: scan-orchestrator, posture,
token-hotspots-cli, manifest, whats-active
- 3 CLIs captured via stdout redirect (no --output-file support):
drift-cli (after baseline seed), fix-cli, plugin-health-scanner
- Posture stderr scorecard captured separately for SC-7 stderr-mode
comparison
docs/v5.1.0-test-audit.md classifies all 42 .title references in
7 known test files: 34 will break under humanization (literal
string equality / substring), 8 are safe (test fixtures or error
formatting). This document is the change list for Step 4.
Project: .claude/projects/2026-05-01-config-audit-ux-redesign/
These were committed in b37b938 by mistake — KTG's convention is that
planning docs in plugins/ultraplan-local/docs/ are local working files
and never pushed to the public marketplace.
- git rm --cached on both files (kept on disk, just untracked)
- .gitignore extended with explicit entries for the two filenames
Existing tracked docs in plugins/ultraplan-local/docs/ predate this rule
and are left alone (separate decision).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds two sibling files in plugins/ultraplan-local/docs/ that together
specify a new /ultracontinue command for zero-friction multi-session
resumption — drafted from design dialogue at the end of the config-audit
v5.0.0 release session (5 sessions, ~10 manual NEXT-SESSION-PROMPT
context-handovers — friction this work removes).
ultracontinue-brief.md (159 lines):
- Follows the /ultrabrief-local template (frontmatter brief_version: 2.0)
so /ultraplan-local can consume it directly
- Defines per-project state-file convention .claude/projects/<project>/
.session-state.local.json as the contract; /ultracontinue is read-only,
multiple writers may update
- 10 falsifiable success criteria including cross-project consistency,
no-new-deps, validator + helper command, docs sweep across plugin
README + CLAUDE.md + marketplace root README
- 3 research topics: ultraexecute end-of-session integration depth,
graceful-handoff alignment (no hard dep), Claude Code slash-command
conventions for read+execute commands
- Explicit non-goals: not replacing /ultraexecute-local --resume, not
replacing graceful-handoff, not auto-orchestrating N sessions
- Open questions and assumptions flagged for plan-critic / scope-guardian
ultracontinue-design-notes.md (117 lines):
- Captures the dialogue rationale that shaped the brief, so the
implementing session has full context without needing to read this
conversation's transcript
- Origin (config-audit v5 release pain point), key design insight
("state-fil ER kontrakten, ikke verktøyet"), 6 design decisions with
alternatives considered, anti-patterns from KTG auto-memory to respect,
recommended reading order, expected scope (1-2 execution sessions)
No code changes. Brief is ready for /ultraplan-local --brief
plugins/ultraplan-local/docs/ultracontinue-brief.md (light path) or
/ultraresearch-local for full research path.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v5.0.0 SHIPPED 2026-05-01. Tag config-audit/v5.0.0 pushed to Forgejo.
SC-6b release-gate PASS at -0.85% delta (CLAUDE.md actual 589 vs
estimated 594, well within ±5% gate).
Per-step:
- Step 28: README/CLAUDE.md straggler-sweep + self-audit counter alignment
- Step 29: version bump 4.0.0 → 5.0.0 + consolidated CHANGELOG
- Step 30: full audit + live SC-6b gate + tag (incl. one in-step bug fix
for hotspot.path exposure, required to make calibration measurable)
635 tests still green throughout. No blockers carried forward.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The v5.0.0-rc.1 N5 implementation looked up hotspot.path in
calibrateAgainstApi() but token-hotspots.mjs only emitted hotspot.source —
calibration silently produced 0 actual_tokens because every iteration hit
the `if (!hotspot?.path) continue` guard.
Fix: file-backed hotspots now expose `path: h.absPath` in the JSON output.
MCP-server hotspots intentionally leave path unset — their tokens are
runtime tool-schema (formula-based: 500 + toolCount × 200), not file
content readable by count_tokens.
SC-6b release-gate verified against tests/fixtures/marketplace-large:
- Actual (count_tokens, claude-opus-4-7): 589 tokens for CLAUDE.md
- Estimated (4-bytes/token byte heuristic): 594 tokens
- Delta: -5 tokens / -0.85% — well within ±5% gate. PASS.
CHANGELOG: documented the fix + SC-6b result inline under [5.0.0].
All 635 tests still green. No estimateTokens tuning required for v5.0.0.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stop hook fallback antok 200K-vindu. På Opus 4.7 (faktisk 1M) kunne
auto-handoff fyre 5–7x for tidlig — estimert 70% når reell bruk var
~14%. Erstatter enkel fallback med 4-stegs resolution-kjede:
1. payload.context_window.used_percentage (autoritativ)
2. payload.context_window.context_window_size + transcript-estimat
3. MODEL_WINDOWS[payload.model.id] + estimat
4. FALLBACK_WINDOW=1_000_000 + estimat (2026-default)
additionalContext-meldinger inkluderer nå [kilde: <source>] for innsyn.
Brief som kilde-artefakt i docs/brief-context-window-detection.md.
6 nye tester (57 totalt). Ingen regresjoner.
beta.1 wrap entry covering N1-N4 + N6 (Steps 18-22b). Includes
explicit Known breaking changes section on CA-TOK-* glob suppression
matching CA-TOK-005, and notes plugin-vs-built-in collision is
deferred to v5.0.1.
Tests: 586 → 625 (+39).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New COL scanner detects skill-name collisions across plugins and
between user-level skills (~/.claude/skills/) and plugin-bundled
skills. Skill identity is the directory basename — matches how
enumerateSkills resolves names.
Detection rules (per docs/v5-namespace-research.md, confidence: medium):
- Plugin-vs-plugin same skill name → severity low (CA-COL-001)
- User-vs-plugin same skill name → severity medium (CA-COL-001)
- Plugin-vs-built-in collisions: out of scope for v5.0.0 (insufficient
verification — recorded for v5.0.1 follow-up).
Findings carry details.namespaces array with {source, name, path} for
every conflicting source — supports per-collision reporting downstream.
output.mjs: finding() helper now passes through optional `details`
field (scanner-specific structured payload).
scoring.mjs: COL → "Plugin Hygiene" (new area, 10 total). Posture test
updated from 9 → 10 area scores.
.gitignore: docs/v5-namespace-research.md is local-only (Step 22a
research output, gitignored per plan).
Fixture collision-plugins/fake-home/ has user skill `review` colliding
with plugin-a + plugin-b's `review` (medium severity), plus plugin-c's
unique `summarize` (no collision).
[skip-docs] reason: v5 plan fences off README/CLAUDE.md badge updates
to Session 5; Forgejo pre-commit-docs-gate hook requires this tag.
Tests: 617 → 625 (+8).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New DIS scanner detects tools that appear in BOTH permissions.deny
and permissions.allow within the same settings.json file. The deny
list wins, so allow entries are dead config but still load on every
turn and confuse intent.
Tool identity = bare name (everything before "("). `Bash(npm:*)` and
`Bash` are treated as the same tool, so a deny on `Bash` flags any
`Bash(...)` allow entry.
Severity: low. Wired into scan-orchestrator + scoring (area: Settings).
Fixture denied-tools-in-schema has Bash in both arrays; healthy-project
serves as the negative case.
[skip-docs] reason: v5 plan fences off README/CLAUDE.md badge updates
to Session 5; Forgejo pre-commit-docs-gate hook requires this tag.
Tests: 611 → 617 (+6).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New CPS scanner walks CLAUDE.md cascade and flags volatile content
between lines 31 and 150 — the cache-prefix window beyond TOK Pattern
A's top-30 territory. Volatile content anywhere in the cached prefix
forces a fresh cache write from that line down on every turn.
Volatile-pattern set extends TOK Pattern A with:
- shell-exec lines (! prefix) — common in CLAUDE.md to inject git/date
- ${VAR} substitutions — vary per-shell, defeat cache reuse
Severity: medium per finding. Skips lines 1-30 to avoid duplicating
Pattern A's range; CPS' value is in the 31-150 zone.
Wired into scan-orchestrator + scoring SCANNER_AREA_MAP. CPS shares
the "Token Efficiency" area with TOK; scoreByArea now deduplicates by
area name and combines counts across scanners contributing to the
same area, so the 9-area scorecard contract holds.
Fixtures volatile-mid-section/{volatile-line-60, volatile-line-200}
verify both positive (line 60) and out-of-window (line 200) cases.
[skip-docs] reason: v5 plan fences off README/CLAUDE.md badge updates
to Session 5; Forgejo pre-commit-docs-gate hook requires this tag.
Tests: 604 → 611 (+7).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New scanners/manifest.mjs CLI + commands/manifest.md slash command.
Reads activeConfig and produces a flat, ranked list of every token
source (CLAUDE.md cascade entries, plugins, skills, MCP servers, hooks)
sorted DESC by estimated_tokens.
CLAUDE.md per-file tokens are derived by distributing
claudeMd.estimatedTokens across the cascade proportional to bytes.
Tests cover both real-config (plugin root) and fixture (rich-repo with
patched HOME containing 2 plugins + 3 skills + .mcp.json) paths, plus
error handling (nonexistent path → exit 3, --output-file).
Builds on readActiveConfig from M1 (v5 alpha.2).
[skip-docs] reason: v5 plan fences off README/CLAUDE.md badge updates
to Session 5; Forgejo pre-commit-docs-gate hook requires this tag on
feat commits without doc changes.
Tests: 593 → 604 (+11).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds detectMcpToolBudget detection block in TOK scanner. Tiered severity
per project-local .mcp.json server based on toolCount:
- < 20: no finding
- 20-49: low
- 50-99: medium
- 100+: high
- null (manifest unparseable): low + "tool count unknown" message
Scoped to source==='.mcp.json' to keep findings actionable for the
audited path; plugin/user-level MCP servers are surfaced by the
manifest scanner (Step 19 / N2).
5 fixtures (mcp-budget/{14,25,60,120,unknown}-tools) use inline `tools`
arrays in .mcp.json — no node_modules needed for these tests.
Tests assert title+severity (not exact ID) since TOK IDs are sequential
per scan, not semantic per pattern.
[skip-docs] reason: v5 plan fences off README/CLAUDE.md badge updates
to Session 5; Forgejo pre-commit-docs-gate hook requires this tag on
feat commits without doc changes.
Tests: 586 → 593 (+7).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Filesystem counts are the source of truth; README badges parsed via
line-anchored substring (badge/<kind>-<N>-...). Emits readmeCheck object
with counts/badges/mismatches.
CLI: node scanners/self-audit.mjs --check-readme [--json]
API: runSelfAudit({ checkReadme: true }) → result.readmeCheck
Helper: checkReadmeBadges(pluginDir) for per-fixture testing
New fixture: readme-desynced/ (commands/foo + bar, README claims 1).
Note: alpha phase does NOT require result.readmeCheck.passed === true.
Self-test of real plugin currently fails (scanners 10 vs 9, tests 31 vs 543);
will be reconciled in Session 5 Step 28 (README sync).
582 → 586 tests, all green.
The mcp-tool-heavy fixture relies on node_modules/mcp-heavy/package.json
being committed so the v5 M1 tool-count detection test runs deterministically.
Add an unignore rule for tests/fixtures/**/node_modules/.
- New Pattern F in TOK: low-severity finding when SKILL.md description > 500 chars
- Scoped to discovery.files (project-local) — activeConfig.skills walk would
pull in user/plugin skills out of project scope
- New fixtures: skill-bloated (594-char desc) + skill-tight (46-char baseline)
574 → 576 tests, all green.
- New Pattern E in TOK: emits medium finding when activeConfig.claudeMd.estimatedTokens > 10_000
- Uses cascade tokens, file count, and calibration note as evidence
- New fixtures: large-cascade (37k bytes / 14475 cascade tokens) + small-cascade (5k baseline)
572 → 574 tests, all green.
- Pattern A (cache-breaking volatile top): medium → high
- Pattern B (redundant permissions): low → medium
- Pattern C (deep @import chain): medium → low
- Add calibration_note evidence on every TOK finding
- Table-driven severity tests (identify by title, IDs are sequential)
563 → 569 tests, all green. Doc sweep deferred to Session 5 (Step 28).
Per-step result table for Steps 1-9 + 8b with commit SHAs and notable
deviations (Step 6 baseline switch to sonnet-era, Step 8 surprise on
sonnet-era discovery scope, PathGuard hook false positive on test
fixtures). 543 → 563 tests, all green, no blockers carried forward.
Summarizes F1-F5 scope: TOK ↔ readActiveConfig integration, 'mcp' kind
in estimateTokens (15 → ≥500), severity-weighted scoreByArea, dead-code
removal in TOK hotspots, Pattern D / CA-TOK-004 removal.
Includes migration notes for downstream consumers (CA-TOK-* globs still
suppress 001-003; scoringVersion field added for v4→v5 detection).
Pattern D was the v4 sonnet-era signature: 'config is structurally
clean but uses no Opus-4.7-specific features'. Two problems:
- It triggered on any minimal config that happened to lack skills/MCP
- The advice was generic and not actionable
The hotspots ranking and per-pattern findings (A/B/C) cover the same
ground with concrete, file-anchored signal. Dropping the noise.
BREAKING (intentional): scanners no longer emit the sonnet-era info
finding. Suppression entries and downstream tooling that reference
the v4 finding ID should be updated. Doc sweep follows in Step 8b.
Tests: sonnet-era fixture now asserts zero findings.
The buildHotspots padding loop and unused 'take' variable were dead
code from the v3 hotspots-min contract. Replaced with a clean
ranked.slice(0, HOTSPOTS_MAX). Tiny fixtures may now return fewer
than 3 hotspots, which is the honest answer; the contract now only
asserts <= 10.
Tests: +2 cases — every hotspot.source is unique (no padding); length
never exceeds HOTSPOTS_MAX.
Removes the v4 'void readActiveConfig' placeholder and wires the
active-config snapshot into the TOK scanner.
Per-turn behavior changes:
- Each enabled MCP server becomes its own hotspot entry (richer than
the parent .mcp.json file alone)
- total_estimated_tokens now includes MCP server cost
- result.activeConfig exposes a small summary
(claudeMdEstimatedTokens, mcpServerCount, pluginCount, skillCount)
Failures of readActiveConfig are non-fatal — the scanner falls back
to the discovery-only path used in v4.
Tests: +3 cases on the new tok-active-config fixture
(.mcp.json with 2 servers, CLAUDE.md, plugin skeleton).
Two MCP enumeration paths in readActiveMcpServers now pass kind='mcp'
to estimateTokens with optional toolCount derived from def.tools array
(populated when callers cache MCP discovery — Step 14 wires that up).
Hook callers keep kind='item' (no schema overhead).
Visible effect: every active MCP server jumps from estimatedTokens=15
to >= 500 (or higher when toolCount is known). The whats-active output
and TOK hotspots now reflect actual MCP cost.
Tests: assert mcpServers[].estimatedTokens >= 500 in fixture.
The plugin lives in ktg-plugin-marketplace and is distributed via the
Claude Code marketplace mechanism. There is no standalone
open/claude-code-llm-security repo; references to it were aspirational
and never realized.
- package.json: homepage now deep-links to plugins/llm-security/ in the
marketplace; repository.url uses the marketplace repo with directory
field (npm convention for monorepo plugins); bugs.url routes to
marketplace issue tracker.
- CLAUDE.md: "Public Repository" section replaced with "Distribution"
section documenting the marketplace install path.
- CONTRIBUTING.md: issue tracker URL points at marketplace issues with
[llm-security] prefix convention.
- CHANGELOG.md: v7.3.1 entry rewritten to reflect actual change
(URLs corrected to marketplace, not "fixed from one wrong URL to
another wrong URL").
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace count-based pass-rate with severity-weighted penalty:
- penalty = sum(count[s] * WEIGHTS[s])
- maxBudget = max(10, findingCount * 4)
- passRate = max(0, 100 - penalty / maxBudget * 100)
A few lows no longer crater an area's grade; a single high or critical
consumes a large fraction of budget. Mirrors the operator intuition that
severity, not count, is the signal.
BREAKING (intentional): scoring semantics differ from v4 for non-clean
configs. Add scoringVersion: 'v5' to the returned struct so consumers
can detect the version. baseline-all-a remains all-A (no critical/high
on that fixture).
Tests: +6 cases for severity weighting; existing "many findings" test
updated to use highs (where v5 still drops the grade as expected).
Promote WEIGHTS const to named export with Object.freeze for downstream
use in scoring.mjs (severity-weighted scoreByArea, F3).
Tests: +2 cases asserting WEIGHTS shape.
No behavior changes. Sets the public stance, tightens documentation, and
removes coherence drift so anyone forking or downloading the plugin gets
a consistent starting point.
Added:
- CONTRIBUTING.md — public fork-and-own guide. Why PRs are not accepted,
how to fork well, what is welcome via issues.
- README "Project scope" section — out-of-scope table naming what is
fork-and-own territory (web dashboard, fleet policy, runtime firewall,
IDE LSP, compliance pack, ticketing, multi-tenancy, ML detectors,
marketplace UI, SSO/SCIM/RBAC) with commercial alternatives.
- package.json: bugs.url, CONTRIBUTING/SECURITY/CHANGELOG in files
whitelist for npm publishing.
Changed:
- SECURITY.md rewritten. Supported-versions table from stale 5.1.x to
current reality (7.3.x active, 7.0-7.2 best-effort, <7.0 EOL).
Best-effort solo response timeline. Scope expanded to bin/.
- Scanner VERSION constants synced to plugin version. Was 6.0.0 in
dashboard-aggregator and posture-scanner.
- package.json repository.url corrected from fromaitochitta/ to open/.
- README "Feedback & contributing" links to CONTRIBUTING.md.
Fixed:
- pre-compact-scan size-cap timing test ceiling raised 500ms -> 1000ms.
Was a flake on Intel Mac and CI under load. Design target unchanged
(<500ms, documented in CLAUDE.md).
Notes:
- First patch on the stabilization line (post-2026-05-01).
- Wave E attack-simulator scenarios deferred indefinitely; coverage
remains at 72.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 7 of v2.0 plan. Registers SessionStart, Stop, and statusLine hooks.
Note: statusLine top-level placement in hooks.json is an open assumption
(brief Assumption 1) — verified to be valid JSON syntax; live smoke-test
required to confirm Claude Code loads it from this location vs requiring
settings.json placement.
Step 6 of v2.0 plan. SessionStart hook fires on source: resume or
source: compact, walks up to 3 levels searching for
NEXT-SESSION-*.local.md, injects content via additionalContext, and
archives the file (rename to *.archived.local.md) to prevent stale-load
in later sessions. 9 tests cover sources, multi-level search,
topic-slug variants, archive filtering, malformed payload.
Step 4 of v2.0 plan. statusLine hook reads context_window.used_percentage
from stdin payload and prints display-only hint at 60% / 70%. NEVER runs
git (research/03 — statusLine scripts can be cancelled mid-flight, unsafe
for side effects). 9 tests cover thresholds, null payload, malformed JSON.
Includes hook-helper.mjs copied from llm-security as test infrastructure.
Step 1 of v2.0 plan. Hard cut from commands/ to skills/ per Anthropic
recommendation for new plugins. Frontmatter sets disable-model-invocation:
true and pins model: claude-sonnet-4-6. Docs (README, CLAUDE.md, root
README) deferred to Step 9 per plan.
Batch C release. Closes 12 implementation tasks (E3, E8-E14, 8.4, 8.6,
8.7, 8.10) across four execution waves: A (bash + decoder), B (supply
chain + workflow scanner), C (MCP cumulative drift), D (code quality).
Wave E (9 new attack-simulator scenarios for the new defenses) deferred
to v7.3.1 — defenses are unit-tested per wave; the deferred work adds
attack-simulator regression coverage on top, not the primary safety net.
Tests: 1665+ → 1777 (Wave A-D cumulative, +112).
Version sync targets touched:
- package.json
- .claude-plugin/plugin.json
- CLAUDE.md (header)
- README.md (badge + new release-history row)
- scanners/ide-extension-scanner.mjs (VERSION constant)
- ../../README.md (marketplace root plugin entry)
- CHANGELOG.md (new [7.3.0] section per Keep a Changelog, all 12 task
IDs covered individually under Added/Changed/Documentation/Tests/Notes)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- package.json med node:test runner og scripts (test, simulate), zero deps
- settings.json: fjern vestigial exploration- og agentTeam-blokker (verifisert leset av ingen kode via grep)
- docs/: commit subagent-delegation-audit.md og ultraexecute-v2-observations-from-config-audit-v4.md (begge real arkitektur-notater)
- docs/: arkiver ultra-suite-brief_2.md som _archive- (var paste fra annet plugin-arbeid, irrelevant her)
- tests/helpers/hook-helper.mjs kopiert fra llm-security m/ provenance-kommentar
Forberedelse for Spor 1 (lib/-moduler), Spor 2 (HANDOVER-CONTRACTS + PreCompact-hook), Spor 3 (bug-fixes + CC-features).
Plan: ~/.claude/plans/det-neste-vi-gj-r-eventual-adleman.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extract `/ultra-cc-architect-local` and `/ultra-skill-author-local` plus all 7
supporting agents, the `cc-architect-catalog` skill (13 files), the
`ngram-overlap.mjs` IP-hygiene script, and the skill-factory test fixtures
from `ultraplan-local` v2.4.0 into a new `ultra-cc-architect` plugin v0.1.0.
Why: ultraplan-local had drifted into containing two distinct domains — a
universal planning pipeline (brief → research → plan → execute) and a
Claude-Code-specific architecture phase. Keeping them together forced users
to inherit an unfinished CC-feature catalog (~11 seeds) when they only
wanted the planning pipeline, and locked the catalog and the pipeline into
the same release cadence. The architect was already optional and decoupled
at the code level — only one filesystem touchpoint remained
(auto-discovery of `architecture/overview.md`), which already handles
absence gracefully.
Plugin manifests:
- ultraplan-local: 2.4.0 → 3.0.0 (description + keywords updated)
- ultra-cc-architect: new at 0.1.0 (pre-release; catalog is thin, Fase 2/3
of skill-factory unbuilt, decision-layer empty, fallback list still
needed)
What stays in ultraplan-local: brief/research/plan/execute commands, all
19 planning agents, security hooks, plan auto-discovery of
`architecture/overview.md` (filesystem-level contract, not code-level).
What moved (28 files via git mv, R100 — full history preserved):
- 2 commands, 8 agents, 1 skill catalog (13 files), 2 scripts, 8 fixtures
Documentation updates: plugin CLAUDE.md and README.md for both plugins,
root README.md (added ultra-cc-architect section, updated ultraplan-local
section), root CLAUDE.md (added ultra-cc-architect to repo-struktur),
marketplace.json (registered ultra-cc-architect), ultraplan-local
CHANGELOG.md (v3.0.0 entry with migration guidance).
Test verification: ngram-overlap.test.mjs passes 23/23 from new location.
Memory updated: feedback_no_architect_until_v3.md now points at the new
plugin and reframes the threshold around catalog maturity rather than an
ultraplan-local milestone.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wave C step C3: closes E14 with the user-facing reset command.
After a legitimate MCP server upgrade the sticky baseline (added in C1)
becomes a stale "what the tool used to say" anchor and every subsequent
post-mcp-verify advisory will re-flag the change. /security mcp-baseline-reset
lets the user acknowledge the upgrade so the next call seeds a fresh
baseline.
New files:
- scanners/mcp-baseline-reset.mjs — small CLI wrapper around clearBaseline /
listBaselines. Modes: --list (read-only), --target <name>, no-args (all).
Outputs JSON summary on stdout. Exit 0 always (idempotent).
- commands/mcp-baseline-reset.md — dispatcher following mcp-inspect.md
shape. Frontmatter: name=security:mcp-baseline-reset, sonnet model,
Read/Bash/AskUserQuestion tools. 4-step body (list -> confirm scope
-> execute -> confirm result).
- tests/scanners/mcp-baseline-reset.test.mjs — 10 CLI tests across
--list, --target, clear-all, idempotency, history preservation, and
bare-positional sugar.
Updated:
- commands/security.md — new row in commands table after mcp-inspect.
- CLAUDE.md — new commands-table row + new v7.3.0 narrative section
describing the baseline schema, cumulative-drift detection, reset
semantics, and the LLM_SECURITY_MCP_CACHE_FILE override.
- Plugin README.md — new MCP-baseline-reset row in commands table,
scanner count 12 standalone -> 13 standalone, new "MCP Description
Drift (E14, v7.3.0)" subsection explaining the sticky baseline,
cumulative threshold, reset semantics, and env-var override.
- Root marketplace README.md — scanner count 22 -> 23 (10 orchestrated +
13 standalone), command count 19 -> 20, test count 1511 -> 1768.
Wave C complete: 1738 -> 1768 tests (+30 across C1/C2/C3). Per plan,
Wave C does NOT bump the plugin version — that lands at the wave-bundle
release. The advisory text in post-mcp-verify already references the
new command path so the user has a ready remediation step.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wave C step C2: surface the cumulative-drift signal from
checkDescriptionDrift() (added in C1) as a separate MEDIUM advisory
with finding category mcp-cumulative-drift. Independent of the existing
per-update drift advisory — a slow-burn rug-pull that keeps each update
below the 10% per-update threshold but cumulatively drifts >=25% from
the sticky baseline now triggers the new advisory without ever crossing
the per-update bar.
The advisory references /security mcp-baseline-reset (added in C3) so
the user knows how to acknowledge a legitimate MCP server upgrade.
CLAUDE.md updates:
- post-mcp-verify hooks-table row mentions per-update + cumulative drift
- mcp-description-cache lib bullet documents baseline schema, history,
cumulative threshold policy key, and LLM_SECURITY_MCP_CACHE_FILE
override.
Tests: 2 new hook tests using LLM_SECURITY_MCP_CACHE_FILE for cache
isolation. Existing 68 still pass; total 70.
Plugin README and root marketplace README updates land in C3 alongside
the new /security mcp-baseline-reset slash command (combined Wave-C
doc update per plan §"Wave C — Touch" list).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wave C step C1: extend the MCP description cache schema with a sticky
baseline slot per tool and a rolling history array (last 10 drift events).
Cumulative drift = levenshtein(current, baseline) / max(|current|, |baseline|);
emits a separate signal when ratio >= mcp.cumulative_drift_threshold
(default 0.25). Per-update drift logic and threshold unchanged.
- loadCache(): TTL purge now skips entries with a baseline, preserving
cumulative-drift detection across the 7-day window. v7.2.0 entries
(no history field) are migrated on read by seeding baseline from the
current description and adding an empty history array. Entries with
history but no baseline (post-clearBaseline) are NOT re-seeded.
- checkDescriptionDrift(): when an entry exists with history but no
baseline (i.e. baseline was cleared), the next call re-seeds baseline
from the incoming description so the legitimate next version becomes
the new baseline.
- clearBaseline(toolName?): removes baseline for one tool or all tools.
Preserves description / firstSeen / lastSeen / history.
- listBaselines(): read-only listing for the upcoming reset CLI.
- LLM_SECURITY_MCP_CACHE_FILE env var override for end-to-end testing.
- New policy key mcp.cumulative_drift_threshold (default 0.25).
Tests: 23 new unit tests; existing 10 still pass.
Docs deferred: CLAUDE.md update lands in C3 alongside the new
/security mcp-baseline-reset command. C2 adds the hooks-table footer
note. Combined wave docs match plan §"Wave C — Touch" list.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes E11. Three new pieces, plus integration:
1. Re-interpolation detector (Appsmith GHSL-2024-277 stealth pattern).
The scanner now collects env: bindings (key -> source-expression
text) by walking parsed events whose parentChain includes 'env',
then for each `${{ env.<KEY> }}` inside run:, re-injects MEDIUM
if the binding source matches the 23-field blacklist. This
catches the pattern where developers apply env-indirection but
then re-interpolate the env var in run:, which cancels the
mitigation (template substitution happens before shell parsing).
2. Auth-bypass category (Synacktiv 2023 Dependabot spoofing).
Detects `if: ${{ github.actor == 'dependabot[bot]' }}` and
variants. MEDIUM, owasp: 'LLM06' (Excessive Agency). Distinct
from injection — same expression syntax, different threat class.
Recommendation steers users to `github.event.pull_request.user.login`.
3. severity.mjs OWASP map registration. WFL prefix added to all
four maps:
- OWASP_MAP['WFL'] = ['LLM02', 'LLM06']
- OWASP_AGENTIC_MAP['WFL'] = ['ASI04']
- OWASP_SKILLS_MAP['WFL'] = []
- OWASP_MCP_MAP['WFL'] = []
Empty arrays for skills/MCP are explicit, not omitted — keeps
`Object.keys(OWASP_MAP)` symmetric across maps.
4. scan-orchestrator.mjs registration. workflowScan added between
supply-chain and toxic-flow (toxic-flow correlates after primaries).
Verified via integration: orchestrator emits 9 WFL findings on
tests/fixtures/workflows/.
Bug fix: extractTriggers in workflow-yaml-state.mjs was collecting
sub-properties (`branches:`, `types:`) as triggers. Now tracks the
first nested indent level and ignores anything deeper.
Tests:
- 6 new cases in tests/scanners/workflow-scanner.test.mjs:
re-interp TP, no-double-count, auth-bypass TP, auth-bypass FP
(startsWith head_ref is not auth-bypass), OWASP map shape,
orchestrator import + SCANNERS array entry.
- 2 new fixtures: tp-reinterpolation.yml, auth-bypass-dependabot.yml.
- Existing 14 scanner tests + 15 state-machine tests unchanged.
Test count: 1732 -> 1738 (+6). Wave B total: +53 over baseline 1685.
Pre-compact-scan flake unchanged (passes in isolation).
Adds a scope-hopping detector to the npm install gate. When a user
installs `@<scope>/<unscoped>`, the hook now emits a MEDIUM warning
on stderr (exit 0, never blocks) if:
- `<unscoped>` matches a popular npm package (POPULAR_NPM, ~80
names from knowledge/top-packages.json), AND
- `<scope>` is not on NPM_OFFICIAL_SCOPES (built-in 22 entries) or
on policy.json `supply_chain.allowed_scopes`.
Why: an attacker publishing `@evilcorp/lodash` cannot squat the bare
`lodash` name, but they can register an unrelated scope and rely on
typo or copy-paste to trick installs. NPM_OFFICIAL_SCOPES anchors the
known-good scopes (@types, @reduxjs, @nestjs, …) so legitimate
installs stay silent.
Implementation:
- `scanners/lib/supply-chain-data.mjs`: exports POPULAR_NPM,
NPM_OFFICIAL_SCOPES, and `checkScopeHop(name, extraAllowedScopes)` —
pure function, no policy/network dependency, fully unit-testable.
- `knowledge/typosquat-allowlist.json`: mirrors NPM_OFFICIAL_SCOPES as
`npm_official_scopes`. A doc-consistency assertion ensures the two
lists never drift.
- `hooks/scripts/pre-install-supply-chain.mjs`: imports checkScopeHop,
reads `supply_chain.allowed_scopes` from policy, and pushes a
warning before existing compromised/audit checks.
Tests:
- 9 new cases in tests/hooks/pre-install-supply-chain.test.mjs:
TP @evilcorp/lodash, TP @attacker/express, allowlist @types,
allowlist @reduxjs, allowlist @modelcontextprotocol, FP unscoped
name not in top-100, bare unscoped name, policy override, defensive
non-string input, NPM_OFFICIAL_SCOPES <-> typosquat-allowlist.json
consistency.
Adds scanGitAttributes(repoDir) — pure function that parses
.gitattributes after a sandboxed clone and returns the
{filter,diff,merge} driver entries that would run on checkout. The
clone CLI prints each entry as a "MEDIUM" stderr advisory followed by
a recommendation to verify the smudge/clean command before moving the
clone outside the sandbox.
Why: filter drivers execute arbitrary shell during checkout (smudge
runs on read, clean on write). Even with the existing sandboxed clone,
downstream consumers that re-checkout files outside the sandbox can be
exploited. Surfacing the directive list lets the caller decide whether
to proceed.
Out-of-scope: in-line content of the smudge command is not analysed —
the advisory is for human review, not automatic blocking.
Tests:
- tests/lib/git-clone-gitattributes.test.mjs (8 cases): LFS-style,
custom driver, missing/empty/comment-only files, line-number
tracking, inline-comment stripping, unreadable path graceful return.
Adds rot13 to the variantSet built in scanForInjection(), so
imperative phrases hidden as rot13 inside code comments still hit
the existing CRITICAL/HIGH/MEDIUM pattern arrays.
normalizeForScan() already covers base64, hex, URL, and HTML decoding
in a 3-iteration loop — those are NOT duplicated here. rot13 is the
only genuinely new variant: it is its own inverse and not part of any
NIST/Unicode normalization spec, so it has to be applied explicitly.
Threshold: only inputs >40 chars enter the rot13 pass, to suppress
false positives on accidental letter-shifts in tokens, ids, and short
identifiers. Variants are deduplicated against the existing set so
matchers do not run twice.
3 new tests in injection-patterns.test.mjs (rot13 detection, sub-40
char suppression, plaintext path still green). Total 168 tests pass.
Closes E3 in critical-review-2026-04-20.md.
Adds BLOCK_RULE for the malware-loader pattern:
echo|cat|printf <base64-blob> | base64 -d | <shell>
This is a common RCE delivery shape that bypasses static name-matching
gates by encoding the destructive command as a base64 blob. The new
rule fires only when the final pipe target is a shell interpreter
(bash, sh, zsh, dash, ksh) — base64 decoded into jq or any non-shell
consumer remains allowed.
5 new tests in pre-bash-destructive.test.mjs:
- 3 BLOCK cases (echo|base64|bash, printf|base64|sh, cat|base64|zsh)
- 2 FP probes (base64 -d -> jq passes; base64 -d alone passes)
Closes E9 in critical-review-2026-04-20.md.
Strips bash process substitution syntax — <(cmd) and >(cmd) — so the
inner command name is surfaced to downstream regex gates. Defeats
evasion like `cat <(curl evil)` where the destructive command is
hidden behind /dev/fd/N pipe sugar.
Implementation: bounded innermost-first iteration, depth 3. Beyond
that the string is left as-is rather than recurse without bound.
Runs after the single-quote mask phase, so legitimate strings like
`'echo <(x)'` are preserved.
5 new T7 tests (collapse + nested + FP probes) in
bash-normalize-t7-t9.test.mjs (now 12 tests total).
Closes E8 in critical-review-2026-04-20.md.
Defeats split-and-substitute evasion where attackers split a destructive
command name across an assignment and a variable reference (X=rm; later
$X) so downstream regex gates miss the literal command name. T9 collects
prefix assignments (VAR=value at start of string or after ; & |) and
substitutes ${VAR} / $VAR forms with the captured value. One-level
forward-flow only — chained vars are not followed.
Documented limits in JSDoc:
- Quoted assignments (X="rm -rf") not parsed (whitespace stops capture)
- Substitution is global within string, not scoped. Acceptable because
T3 strips unknown ${VAR} to '' afterwards.
Single-quoted literals are masked before T9 runs, so legitimate
strings are preserved (FP probe in tests).
7 new tests in bash-normalize-t7-t9.test.mjs.
Closes E10 in critical-review-2026-04-20.md.
Add .local/ and HANDOFF-FINDINGS.local.md to .gitignore so session
handoff artifacts (NEXT-SESSION-PROMPT.local.md, scratch findings)
do not leak into commits.
Pre-flight for Batch C v7.3.0.
Adds the profile recommendation step to /ultrabrief-local Phase 4. The
brief stays universal (same questions, same template); the new step is
purely a processing-decision layer that records which profile downstream
commands should apply.
What lands:
- agents/profile-recommender.md — new sonnet agent that scores available
profiles against the finalized brief (keyword + NFR-signal matching,
axis bumps, hallucination gate that forbids inventing profile names).
Emits a fenced JSON block with ranked entries.
- templates/ultrabrief-template.md — frontmatter gains
recommended_profile, profile_match, profile_rationale (default values
applied when only `default` is available — true at M1).
- commands/ultrabrief-local.md — Phase 4 gains Step 4h with explicit
branches: short-circuit when only `default` exists; AskUserQuestion
confirmation when top score ≥ 0.7; explicit fallback message when below
threshold; manual selection sub-question on user override. Persists the
three frontmatter fields to brief.md after user confirmation. JSON
parser failure falls back to `default` with `profile_match: fallback`
rather than blocking — silent fallback is the worst outcome, but a
*visible* fallback is acceptable.
- scripts/profile-loader.mjs — adds selectRecommendation(ranked, opts) +
RECOMMENDATION_THRESHOLD=0.7 export. Single source of truth for the
threshold logic so the command spec and the helper agree.
- scripts/profile-loader.test.mjs — 10 new tests for selectRecommendation
(default-only, empty/malformed input, above/below threshold, custom
threshold, max-by-score, missing fields). Total now 36/36.
- README.md / CLAUDE.md / marketplace landing — docs reflect M0 + M1
shipped, M2 + M3 still pending.
In practice nothing changes for users at M1 because only `default` is
available — Step 4h takes the short-circuit path and writes
`profile_match: default-only`. M2 ships the additional profiles that
make the recommender meaningful.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Introduces a profile-loader infrastructure for runtime-instantiable
ultraplan variants (depth × domain × goal axes). M0 ships only the
`default` profile, which mirrors the current hardcoded Phase 5/9 agent
set — so existing flows are unaffected.
What lands:
- profiles/default.yaml — schema v1, lists current 8 exploration agents
+ 2 review agents, captures today's adversarial regime
- scripts/profile-loader.mjs — null-deps Node loader with limited-subset
YAML parser, listProfiles(), loadProfile(), validateProfile() that
cross-checks every referenced agent exists in agents/
- scripts/profile-loader.test.mjs — 26 node:test cases (parser, validation,
loader, integration with built-in default.yaml)
- commands/ultraplan-local.md — Phase 1 gains a "Resolve the profile"
step (--profile flag → brief.recommended_profile → default fallback)
and prints profile + source in the mode report. Phase 5/9 unchanged.
- README.md, CLAUDE.md, marketplace README — documentation of the M0
foundation, the universal-brief design principle, and the M1/M2/M3
milestones to come.
M1 (next) wires profile recommendation into ultrabrief Phase 4. M2
ships the additional built-in profiles (quick, bugfix, feature, refactor,
security-deep, research-heavy) and replaces the hardcoded Phase 5 agent
table with profile-driven selection. M3 adds user-extensible profiles.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The v7.0.0 entropy-scanner rule 18 suppressed every line whose pattern
matched  — regardless of the URL host or what the URL
carried. A markdown image URL pointing at a non-CDN host (or carrying a
secret-shaped token in its query string) would therefore mask a real
high-entropy credential.
Refactor:
* MARKDOWN_IMAGE now captures the full URL (was a host-only prefix
matcher), so rule 18 can inspect host and query.
* MARKDOWN_IMAGE_CDN_HOSTS allowlist constant covers cdn./images./
media./assets./static./*.cdn./*.amazonaws.com/{s3,cloudfront}/
*.cloudflare./*.fastly./*.akamaized./raw.githubusercontent.com/
*.imgix.net/*.cloudinary.com/.
* MARKDOWN_IMAGE_QUERY_SECRET catches secret-shaped query keys
(token, key, secret, password, api_key, access_token, auth) plus
well-known provider prefixes (AKIA, Bearer, sk_live_, ghp_, ghs_,
ghu_, gho_, ghr_, npm_).
* Rule 18 now suppresses iff (host matches CDN allowlist) AND
(query has no secret-shaped token). Anything else falls through
to entropy classification.
+4 tests in tests/scanners/entropy-context.test.mjs (29 → 33).
Existing rule 18 fixture (cdn.example.com, no secret query) still
suppresses, so no regression on the legitimate path.
Refs: Batch B Wave 5 / Step 13 / v7.2.0
critical-review-2026-04-20.md §E18
The v7.0.0 entropy-scanner ran rules 11-13 (GLSL/CSS-in-JS/inline-markup
line-proximity suppressions) for every line regardless of file type. A
polyglot `.ts` file with an embedded fragment-shader template literal
could therefore mask a real high-entropy credential when the credential
literal happened to share a line with a GLSL keyword. Critical-review
B5 documented the false-negative class.
Refactor:
* New `classifyFileContext(absPath, lines)` returns
`'shader-dominant' | 'markup-dominant' | 'code-dominant' | 'mixed'`,
keyed off file extension with a content-density fallback for
code-extension files (≥50% of sampled non-blank lines matching
GLSL/inline-markup → downgrade to `mixed`).
* `isFalsePositive(str, line, absPath, context)` gates rules 11-13
on `context !== 'code-dominant'`. Rules 1-10 and 14-19 still run
unconditionally, so URL/path/test-fixture/ffmpeg/UA/SQL/error-
template suppression behaves identically.
* `scanFileContent` computes `fileContext` once per file and threads
it through every per-string suppression check.
Conservative defaults to keep the regression surface minimal:
* Files with `<5` sampled non-blank lines fall back to `mixed`
(preserves the existing rule-11/12/13 behaviour for the single-
line .js fixtures used by entropy-context.test.mjs).
* Unknown extensions fall back to `mixed`.
* Code-extension files densely populated with shader/markup
content fall back to `mixed`.
Net effect: a `.ts` file with an embedded GLSL block but mostly TS
code on the surrounding lines now surfaces credentials that the
v7.0.0 line-proximity heuristic suppressed. Pure shader/markup
files are unaffected (extension skip / mixed default).
New fixture: tests/fixtures/entropy/polyglot-ts-with-glsl.ts (with
runtime placeholder so it does not commit a high-entropy literal).
+3 tests in tests/scanners/entropy-context.test.mjs (26 → 29).
Existing entropy.test.mjs and entropy-context.test.mjs all remain
green. Full suite 1658 → 1661.
Refs: Batch B Wave 5 / Step 12 / v7.2.0
critical-review-2026-04-20.md §B5
The existing CRITICAL pattern in injection-patterns.mjs only fires when
a comment body contains AGENT/AI/HIDDEN markers. Adversaries can drop
the marker and still hide instructions inside <!-- ... --> for any
agent that reads page source. This generalizes the comment scan: every
comment body is HTML-entity-decoded and run through the full
injection rule set. The existing keyword-restricted pattern still
fires (defense-in-depth).
Emits at the strongest tier with category html-comment-injection.
+3 tests (65 → 68).
Refs: Batch B Wave 4 / Step 11 / v7.2.0
SVG containers carry text that is invisible in the rendered image but
fully parsed by an agent reading the source. <desc>, <title>,
<metadata>, and <foreignObject> are all valid surfaces for adversarial
injection.
Adds a per-element extractor inside the existing HTML-tag gate, gated
on /<svg[\s>]/i so it only fires for actual SVG content. Inner text is
HTML-entity-decoded then run through scanForInjection. Emits at the
strongest tier with category svg-element-injection.
+3 tests (62 → 65).
Refs: Batch B Wave 4 / Step 10 / v7.2.0
Adversarial payloads in markdown link title attributes (rendered as
tooltips, parsed by agents) bypassed the existing HTML-content checks
which gated on `<tag>` presence. Pattern: [text](url "title").
Adds linkTitleRegex extraction to the HTML-content block, runs each
captured title through scanForInjection, emits at the strongest tier
encountered with category markdown-link-title-injection.
+3 tests (62 → 62 in post-mcp-verify.test.mjs file, was 59).
Refs: Batch B Wave 4 / Step 9 / v7.2.0
Two follow-up fixes after E16 + E17 landed:
1. foldHomoglyphs ASCII fast-path
- scanForInjection calls foldHomoglyphs on every scan (raw + normalized).
- Pre-fix: NFKC normalization runs unconditionally, even on pure
ASCII inputs where it's a no-op.
- Result: benchmark.test.mjs timed out at 120s on the full suite.
- Fix: charCodeAt sweep for >=128, short-circuit return s when
all ASCII. NFKC and HOMOGLYPH_MAP iteration only run when
non-ASCII chars are present (the actual attack case).
- Verified: benchmark.test.mjs passes within timeout.
2. Attack-scenario UNI-003 expectation
- Pre-E16: "Homoglyph Cyrillic-Latin mixing" payload triggered only
a MEDIUM "obfuscation present" advisory (exit 0, stdout match
"MEDIUM").
- Post-E16: the same payload is folded to Latin BEFORE pattern
matching, so it now matches CRITICAL "ignore previous instructions"
and blocks (exit 2).
- This is the intended v7.2.0 behavior — not a regression. Updated
expectation: exit_code 2, stdout_match "block". Renamed scenario
to "now blocked via E16 fold, v7.2.0".
Suite: pre-compact-scan flake remains (perf-budget under load,
passes isolated). All other tests green.
Critical-review §4 E17 finding: pre-v7.2.0 the delegation-after-input
advisory fired only within a 5-call window. Attackers who deliberately
waited 6+ calls before delegating bypassed detection. Window was also
hardcoded — operators couldn't tune it for their environment.
Two coordinated changes:
1. LLM_SECURITY_ESCALATION_WINDOW env var (primary window override)
- parseInt(env) || getPolicyValue('trifecta', 'escalation_window', 5)
- Mirrors the established pattern from
LLM_SECURITY_TRIFECTA_MODE et al.
- Setting env=3 narrows; env=8 expands.
2. Secondary 20-call MEDIUM advisory (slow-burn variant)
- DELEGATION_ESCALATION_WINDOW_MEDIUM = 20 (hardcoded — same value
for all operators; tunable in a future patch if needed)
- checkEscalationAfterInput now returns `tier: 'primary'|'secondary'|null`
- formatEscalationWarning emits a different message for secondary —
mentions "slow-burn", references env-var, distinct from the
primary "DeepMind Category 4" framing
Hook reads max(WINDOW_SIZE, secondary+5) entries to cover the wider
window. Existing duplicate-suppression (`escalation_warning` state
entry) covers both tiers. Audit-trail event captures `tier` field.
Tests: +5 cases in tests/hooks/post-session-guard.test.mjs:
- secondary window catches 9-call distance (slow-burn)
- secondary boundary at exactly 20 calls
- primary regression guard (1-call distance)
- env=3 narrows primary (4-call distance becomes secondary)
- env=8 expands primary (7-call distance stays primary)
Updated existing test "does NOT trigger when input_source is >5 calls
ago" — now requires >20 calls (secondary window catches 6-20).
Suite: 1644 → 1672 (+28 from new tests + extended scope). All green.
CLAUDE.md hooks table updated to document both windows and the env var.
Critical-review §4 E16 finding: pre-v7.2.0 homoglyph normalization fired
ONLY for the MEDIUM-advisory "obfuscation present" signal. Pattern
matchers in scanForInjection compared against raw + decoded variants
only — they did NOT compare against a fold-normalized variant. As a
result, "ignоre previous instructions" (Cyrillic о, U+043E) bypassed
the CRITICAL "ignore previous" pattern.
Two coordinated edits:
scanners/lib/string-utils.mjs
- Adds HOMOGLYPH_MAP (frozen) — surgical Cyrillic/Greek → Latin map.
~25 entries focused on injection-vocabulary letters
(a, e, o, c, p, x, y, i, j, s, l, A, E, O, C, P, X, Y, T).
- Adds foldHomoglyphs(s) — pipeline: NFKC → apply HOMOGLYPH_MAP.
NFKC handles Mathematical Alphanumeric (U+1D400 block), fullwidth
Latin (U+FF21 block), ligatures, width variants.
Excluded by design from HOMOGLYPH_MAP:
- Latin Extended (æ, ø, å, é, è, ñ, ü, ö, ä, ç, ß, þ, ð) — legitimate
Norwegian/German/French/Spanish letters. Map them and we false-positive
on every non-English source file.
- Greek letters not visually overlapping (β, γ, δ, ...)
- Cyrillic letters not visually overlapping (б, г, д, ж, ...)
scanners/lib/injection-patterns.mjs
- scanForInjection now builds a 4-variant set: raw, normalized,
folded(raw), folded(normalized). Set deduplication skips redundant
identical variants. Existing dedup-by-label (seenLabels Set) prevents
double-counts when the same pattern matches in multiple variants.
- foldHomoglyphs added to the imports.
Tests: +27 cases in tests/lib/string-utils-homoglyph.test.mjs:
- 6 Cyrillic → Latin (lowercase, uppercase, multiple substitutions,
Palochka U+04CF)
- 3 Greek → Latin
- 2 NFKC normalization (Math Bold, Fullwidth)
- 8 preserves-non-confusable (Norwegian æøå, German umlauts, French
accents, Spanish ñ, emoji, CJK, Arabic/Hebrew)
- 3 edge cases (empty, null/undefined, idempotency)
- 5 scanForInjection integration (Cyrillic ignore, Cyrillic Assistant,
Norwegian non-trigger, benign "ignore" comment, mixed Cyrillic+Greek)
Test-development found: U+1D5DC is "I" not "A" (test pin caught my
codepoint mistake — fixed during dev).
Suite: 1617 → 1644 (+27). All green.
Critical-review §4 E1 finding: pre-v7.2.0 the Unicode-stego detector
(`containsUnicodeTags`) covered only U+E0001-E007F (Tag block). Private
Use Areas — also invisible in most terminals and surviving normalization
— were not detected. Attackers could encode payloads in PUA codepoints
that pass through `scanForInjection` undetected.
Coverage extended to:
- U+E0001-E007F Unicode Tag block (existing — DeepMind kat. 1)
- U+F0000-FFFFD Supplementary PUA-A (NEW — E1)
- U+100000-10FFFD Supplementary PUA-B (NEW — E1)
Detection-only for PUA: PUA characters have NO standard ASCII mapping,
so `decodeUnicodeTags` leaves them unchanged. Detection alone is
sufficient — `scanForInjection` emits HIGH on any presence, regardless
of decoded content.
Function name `containsUnicodeTags` preserved for back-compat. All
existing call sites (injection-patterns.mjs:259, etc.) work unchanged.
Semantically the function is now "containsHiddenUnicode".
Tests: +21 cases in tests/lib/string-utils-hidden-unicode.test.mjs:
- 5 Tag-block regression guards
- 4 PUA-A range cases (start, just-inside, end, buried-in-ASCII)
- 3 PUA-B range cases
- 5 boundary cases (gap U+E0080-EFFFF, U+10FFFE noncharacter, emoji,
CJK, Latin Extended — all must be FALSE)
- 4 decodeUnicodeTags passthrough cases (PUA-A unchanged, PUA-B
unchanged, Tag block still decodes, mixed Tag+PUA)
Suite: 1596 → 1617 (+21). All green.
Critical-review §4 E15 finding: agent files in .claude/agents/ are loaded
as Claude Code subagent system prompts and are a direct memory-poisoning
surface. Pre-v7.2.0 the scanner covered CLAUDE.md, .claude/rules/*.md,
memory/*.md, REMEMBER.md, .local.md, and .claude-plugin/plugin.json —
but not .claude/agents/*.md.
Single-line addition to MEMORY_FILE_PATTERNS:
/(?:^|\/)\.claude\/agents\/[^/]+\.md$/
The existing scan loop, scanForInjection integration, and severity-
mapping logic all apply unchanged. STRICT_FILES_PATTERN intentionally
NOT extended — agents may legitimately quote shell commands as examples
(consistent with CLAUDE.md treatment).
Tests: +3 cases in tests/scanners/memory-poisoning.test.mjs:
- "scans .claude/agents/*.md" (smoke test — at least one finding from
the new fixture)
- "agent file injection pattern detected"
- "agent file credential path detected"
New fixture: tests/fixtures/memory-scan/poisoned-project/.claude/agents/
poisoned-agent.md — agent with injection, credential ref, permission
expansion, and exfil URL. Triggers all 4 detection categories.
Suite: 1591 → 1594 (+3). All green.
Critical-review §2 B7 finding: pure Levenshtein <=2 misses the most common
modern typosquat pattern — popular-name + token-injection suffix. Examples:
lodash → lodash-utils (edit distance 6, not flagged pre-B7)
react → react-helper (edit distance 7, not flagged pre-B7)
express → express-wrapper (edit distance 8, not flagged pre-B7)
Three coordinated edits:
scanners/lib/string-utils.mjs
- Adds tokenize(name): string[] splits on -/_, lowercases
- Adds tokenOverlap(a, b): number intersection.size / min(|a|,|b|)
- Adds TYPOSQUAT_SUSPICIOUS_TOKENS frozen list of common typosquat
suffixes. Excludes language-extension tokens (js, jsx, ts, tsx) — the
v7.0.0 allowlist contains `tsx` as a legit package and including the
same token in the suspicious set creates a contradiction. Caught by
the new allowlist-intersection-guard test. Also excludes 'pro'
(legitimate edition marker).
scanners/dep-auditor.mjs + scanners/supply-chain-recheck.mjs
- New checkTyposquatTokenOverlap() helper — fires AFTER Levenshtein 1/2
branches, only when:
1. popular package's tokens ⊆ declared name's tokens (strict superset)
2. declared name has at least one suspicious suffix
3. popular package is in topCutoff window
All three conditions required — conservative by design. Allowlist
precedence preserved (existing 22 npm + 13 PyPI entries always pass).
MEDIUM severity, NOT block. New finding title prefix:
"Possible typosquatting via token-overlap".
Tests: +21 cases across two new files
- tests/lib/string-utils-tokens.test.mjs (15) — tokenize, tokenOverlap,
TYPOSQUAT_SUSPICIOUS_TOKENS frozen contract, allowlist-intersection
guard (caught the tsx conflict on first run)
- tests/scanners/dep-token-overlap.test.mjs (7) — integration via
in-memory tmpdir fixtures: lodash-utils flagged, react-helper flagged,
express-wrapper flagged, lodash exact NOT flagged, allowlist tools
(knip/tsx/nx/rimraf) NOT flagged, react-router-dom (no suspicious
suffix) NOT flagged, react itself (equal token set, not superset)
NOT flagged.
Existing dep.test.mjs and supply-chain-recheck.test.mjs unchanged —
all green (149 → 149 regression guard).
Suite: 1570 → 1591 (+21). All green.
Critical-review §2 B6 finding: extractAssignedVariable handled
`const X = ...` and `X = ...` but missed every modern JS/TS
destructuring pattern. Sinks downstream of destructured/spread vars
produced false negatives at the propagation step.
Patterns now recognized:
- `const { x } = source` object destructuring
- `const { x, y } = source` multi-key
- `const { secret: alias } = source` renamed (key NOT bound)
- `const { x, ...spread } = source` object rest
- `const { a, b: { c } } = source` nested object (key NOT bound)
- `const [a, b] = source` array destructuring
- `const [first, ...rest] = source` array rest
- `const [a, [b, c]] = source` nested array
- `const { user: { id }, ...rest }` mixed nested
Implementation: regex-based two-pass walker. Pass 1 detects whether
the LHS is a destructuring pattern (`{...}` or `[...]`). If yes, the
new `extractDestructuredNames` helper walks the pattern body via a
balanced-bracket depth counter, recurses into nested patterns, and
distinguishes keys (`key:`) from bindings. If no, the plain-decl
branch matches `\b(?:const|let|var)\s+(\w+)`.
Plain-assignment branch (`X = ...` without keyword) and Python-style
patterns are unchanged.
The function is now exported for direct unit testing — same pattern
as `_resetCacheForTest` in policy-loader. The internal walker
(`extractDestructuredNames`) remains module-private.
Tests: +19 cases in tests/scanners/taint-destructuring.test.mjs:
- 5 pre-B6 patterns (regression guard: plain decl, plain assign,
no-match on equality)
- 12 destructuring patterns covering object/array/rest/nested
- 2 non-destructuring regressions (return literal, arrow param)
Existing taint-tracer.test.mjs and taint.test.mjs unchanged — both
green (14 → 14, fixture-based integration tests not affected).
Suite: 1551 → 1570 (+19). All green.
Critical-review §2 B3 finding: `riskScore({info: N}) = 0` silently masks
info-volume findings. The behavior was correct (info is scoring-inert by
design) but undocumented. Operators reading a report with N info findings
had no way to know they contribute zero to verdict/band.
Three coordinated edits:
- scanners/lib/severity.mjs JSDoc — explicit "Info severity" subsection
spelling out: scoring-inert, surfaced in owaspCategorize aggregates,
treat as observability telemetry not verdict input. @param updated to
mark info as accepted but ignored.
- CLAUDE.md v7.0.0 risk-score-v2 line — one-sentence anchor pointing to
severity.mjs JSDoc.
- tests/lib/severity.test.mjs — anchor test alongside the existing
4-critical=93 anchor: asserts riskScore({info: 50}) === 0,
riskScore({info: 1000}) === 0, verdict({info: 100}) === 'ALLOW',
riskBand(riskScore({info: 500})) === 'Low'.
Decision: skip the optional `infoScore()` helper from the brief. No
current consumer would use it; doc-only fix keeps API surface minimal.
Revisit if a consumer emerges.
Tests: 1522 → 1523 (+1 anchor block, 4 assertions). All green.
Documents the v7.1.1 narrative-coherence patch in CLAUDE.md (mini-block
appended after the v7.0.0 paragraph) and CHANGELOG.md (new [7.1.1]
section per Keep a Changelog convention, placed above [7.1.0]).
Plan: .claude/plans/ultraplan-2026-04-29-report-coherence.md
Brief: .claude/ultraplan-spec-2026-04-29-report-coherence.md
Verification gates passed:
- npm test: 1522/1522 (was 1511; +11 from new narrative test)
- node --test tests/lib/severity.test.mjs: 86/86 (co-monotonicity sweep
at lines 252-303 unchanged and green)
- node --test tests/scanners/skill-scanner-narrative.test.mjs: 11/11
- Orchestrator against fixture: WARNING / 48 / 1 HIGH (HITL trap caught
correctly, no whiplash)
- SARIF inline check via toSARIF import: sarif-version 2.1.0, runs: 1
- Zero remaining v1 cutoffs in agent + template
Out of scope but flagged for Batch B (deferred to v7.2.0):
- commands/scan.md:113-114 retains v1 risk formula
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
11 assertions across 4 describe groups against tests/fixtures/skill-scan/
hyperframes-like/. Tests the deterministic input layer that feeds
skill-scanner-agent — does NOT invoke the LLM (no precedent in 1511 tests).
Coverage:
- content-extractor (5 it): exit 0 on animation markup; exactly 1 HIGH
HITL trap; >= 2 process.env credential refs; has_injection=true (any
injection signal flips it); has_critical_injection=false (no CRITICAL
in fixture).
- entropy scanner (2 it): calibration block present; <= 1 finding (rest
suppressed via line-context rules).
- co-monotonicity (2 it): {high:1} → WARNING/High; {high:1, info:1} →
WARNING (info scoring-inert). Inline guard mirrors the sweep at
tests/lib/severity.test.mjs:252-303 so this file fails fast if the
invariant drifts.
- agent prompt contract (2 it): static asserts that
agents/skill-scanner-agent.md contains 'Step 2.5: Context-First
Severity Assignment', 'summary.narrative_audit.suppressed_findings',
'score>=65', AND zero remaining 'score >= 61' references; same v2-
cutoff + narrative-audit contract on templates/unified-report.md.
Part of v7.1.1 narrative-coherence patch.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Synthetic skill content mimicking the noise profile of frontend
animation projects (HTML5 canvas, framework env-vars, inline SVG data
URIs, CSS keyframes) plus exactly one genuine HITL trap signal.
Used by tests/scanners/skill-scanner-narrative.test.mjs (added in
v7.1.1) to exercise:
- content-extractor: HIGH HITL trap signal + framework env-var
references (process.env.REACT_APP_*, VITE_PUBLIC_*)
- entropy scanner: inline SVG data URI suppressed via line-context rules
The .llm-security-ignore file uses the SCANNER:glob format
(scanners/scan-orchestrator.mjs:34-40) — ENT:**/*.md suppresses any
entropy-scanner findings when the fixture is run through scan-orchestrator
in the Step 6 smoke test.
Part of v7.1.1 narrative-coherence patch.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Five coordinated edits to address scan-rapport whiplash at the agent
prompt level:
- Step 2.5 (NEW): Context-First Severity Assignment. Every signal has
exactly one disposition — suppressed (counted only) or reported (full
finding). The split happens BEFORE severity is assigned. Forbids
'false positive', 'legitimate framework', 'no action required' in
finding-body text; reserves them for the Suppressed Signals section.
- Verdict Logic: replaces stale v1 sum-and-cap formula (BLOCK >=61) with
v2 reference (severity-dominated, BLOCK >=65) matching severity.mjs
since v7.0.0. Documents that severity counts MUST exclude suppressed
signals; introduces verdict_rationale field for descriptive context
when suppressed >= 5 AND reported <= 1 high.
- Output Format: adds Suppressed Signals as required section #4 with
category-level bullet format. Documents the trailing JSON shape
including summary.narrative_audit.suppressed_findings.{count,
by_category} and verdict_rationale fields.
- Comment block before Category 2 suppression rules clarifies that
'false positive' as taxonomy language is OK; only finding-body
description fields are forbidden from using the phrase.
- Step 0 (Norwegian generaliseringsgrense) preserved unchanged.
Part of v7.1.1 narrative-coherence patch (plan: .claude/plans/ultraplan-2026-04-29-report-coherence.md).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Updates the HTML-comment risk-formula reference at lines 55-66 from the
stale v1 sum-and-cap formula to the v2 severity-dominated tiers that
have been authoritative in scanners/lib/severity.mjs since v7.0.0. Adds
a Narrative Audit block inside the Executive Summary section surfacing
summary.narrative_audit.suppressed_findings.{count,by_category} from
the agent's trailing JSON. The block is transparency only — it does
NOT affect risk_score, riskBand, or verdict.
Part of v7.1.1 narrative-coherence patch (plan: .claude/plans/ultraplan-2026-04-29-report-coherence.md).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The v7.1.0 release commit (621db14) bumped the version badge and added a
CHANGELOG entry, but missed the README Version History table. Adding
the row now so the public-facing version history at
git.fromaitochitta.com/open/ktg-plugin-marketplace reflects v7.1.0.
Row covers: B1 + B2 + B4 fixes, A3 honesty-sweep (7 phrases), B8 CaMeL
nedton, test count 1487 → 1511, "why" framing tied to critical-review
§F CISO perspective.
Closes A3 of v7.1.0 critical-review patch. Each rewrite preserves the underlying
claim where it is accurate but removes hype/overreach language. Historical
CHANGELOG/README version-table rows are intentionally left as-is (they document
what was claimed at the time of release, not what is true today).
Changes (CLAUDE.md, commands/ide-scan.md, knowledge/mitigation-matrix.md,
docs/security-hardening-guide.md):
- "Trustworthy scoring (BREAKING)" → "Severity-dominated risk scoring
(v2 model, BREAKING)". Removes hype framing; describes the actual mechanism.
- "Context-aware entropy scanner" → "Rule-based entropy scanner with
file-extension skip, 8 line-level suppression rules, and configurable policy".
No ML/context inference; just rules.
- "1487 tests" → "1511 unit and integration tests; mutation-testing coverage
not published". Updated count after A1+A2 (+24) and added qualifier.
- "Fully Schrems II compatible" → "Schrems II compatible in default offline
mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`)
transmits package identifiers to a Google-operated API and is a separate
compliance consideration." Acknowledges the OSV.dev opt-in caveat.
- "Rule of Two enforcement" → "Rule of Two detection (configurable; default
warn; blocks on high-confidence trifectas in opt-in `block` mode; distributed
trifectas detected but not blocked by default)". "Enforcement" implied
block; default is warn.
- "Hardened ZIP extractor" → suffix " — no fuzz-testing results published
to date". Caps and class-of-attacks rejected are accurate; absence of
formal fuzz coverage now stated.
- "defense-in-depth" — preserved as framing, but quantified in
security-hardening-guide §4: "three independent detection layers with
documented bypass classes". Each layer named, each layer's known bypasses
pointed to (critical-review §4 evasion arsenal).
Tests: 1511/1511 green (no behavioural change).
Closes A2 of v7.1.0 critical-review patch (docs/critical-review-2026-04-20.md):
- B4 (severity JSDoc): 4 critical = 93, not 90. Fixed in scanners/lib/severity.mjs:23
and CHANGELOG.md v7.0.0 tier description. The actual computation has always been
93 (70 + log2(5)*10 = 93.22 → round); only the docs were wrong.
- §5.4 co-monotonicity: new sweep test in tests/lib/severity.test.mjs over 15
representative count vectors. Asserts that (verdict, riskBand) agree under the
v7.0.0 contract for every case — catches future drift between riskScore tiers,
verdict cutoffs, and riskBand cutoffs. Includes a B4 anchor test (riskScore
{critical: 4} === 93) so doc/code drift fails loudly.
- B8 (CaMeL claims toned down): post-session-guard.mjs:646 comment block and
CLAUDE.md:184 Defense Philosophy bullet now describe the implementation
honestly — opportunistic byte-matching of truncated output fingerprints
(first 200 bytes, SHA-256/16-hex), not semantic data-flow tracking.
Trivially bypassed by mutation, summarisation, or re-encoding. Inspired by
CaMeL (DeepMind 2025), but not a CaMeL capability-tracking implementation.
Tests: 1495 → 1511 (+16: 15 sweep cases + 1 B4 anchor). All green.
Previously, `LLM_SECURITY_TRIFECTA_MODE=block` only exited 2 when the
detected trifecta was MCP-concentrated (all three legs via the same MCP
server) or involved sensitive-path + exfil. Distributed trifectas —
three legs originating from different tools, with a non-sensitive data
path and a non-sensitive exfiltration sink — were detected and warned
but not blocked. This mismatched the documented semantics of block mode
and gave operators a false sense of enforcement.
Change: remove the `(mcpInfo.concentrated || sensitiveExfil)` AND-gate
in the `TRIFECTA_MODE === 'block'` branch so any detected trifecta
blocks in block mode. Audit event `severity` still differentiates
critical (concentrated / sensitive-exfil) from high (distributed); the
blocked stderr message now explicitly names "Distributed trifecta:
three legs from different sources" when the confidence sub-signals
are absent.
Addresses critical review 2026-04-20 §2 B2 (HIGH) and §9 row 1
("enforces the Rule of Two").
Tests: 1 added (distributed trifecta in block mode now exits 2).
All 1495 tests pass.
The previous ENV regex `/[\\/]\.env\.[a-z]+$/` only matched a single
lowercase segment after `.env`. Multi-segment and mixed-case variants
such as `.env.production.local.backup`, `.env.stage-1.local`, and
`.env.CI.secret` slipped past the hook. Replaced with
`/[\\/]\.env(\.[A-Za-z0-9._-]+)*$/` which matches `.env` plus any
number of dot-separated alphanumeric/dot/hyphen/underscore segments.
`.envrc` (direnv config, no dot separator) is still allowed.
Addresses critical review 2026-04-20 §2 B1 (HIGH).
Tests: 7 added (6 new multi-segment BLOCK cases + 1 .envrc ALLOW).
All 1494 tests pass.
feature-gap-agent and /posture command both reference quality area count.
Update both to reflect Token Efficiency as the 8th area.
Tests: 543 passing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reflect 9 scanners, 17 commands, 543+ tests, new TOK scanner, and
/config-audit tokens command.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New plugin that produces a complete session handoff in under 60s:
NEXT-SESSION artifact, commit+push, and copy-paste prompt for next
session. Built for context-constrained models like Opus 4.7 where
sessions fill fast.
- Single declarative command, no hooks/agents/skills
- Detects handoff type: multi-session / plugin-work / single-task
- Default filename NEXT-SESSION-PROMPT.local.md; slug-override
- Flags: --no-commit, --dry-run
- Auto-generated Conventional Commits message from git diff --stat
- Respects pre-commit hooks (secrets, pathguard) — never bypasses
Also: add *.local.md to root .gitignore (existing NEXT-SESSION files
were untracked but not ignored) and list plugin in marketplace
README + CLAUDE.md per docs-convention.
Document the Opus 4.7 era upgrade: TOK scanner, /config-audit tokens,
Token Efficiency 8th area, scanner/verifier agent migration to sonnet.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add 2026-04 deltas (v2.1.83-v2.1.111) verified against
research/03-claude-code-changes-config-surfaces.md (2026-04-19):
- Opus 4.7 + token-efficiency surfaces (env vars, attribution.commit/pr)
- Sandbox isolation (sandbox.* keys)
- Managed-only enterprise lockdown flags
- disableSkillShellExecution (v2.1.91)
- forceRemoteSettingsRefresh (v2.1.92)
No new hook events in this range — noted in hook-events-reference.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
E2E verification against content-heavy repo (`content-claude-code`) revealed
413 entropy findings (8 HIGH / 405 MEDIUM) from markdown image CDN URLs in
JSON content indexes — e.g., ``.
These are legitimate content-repo artifacts, not credentials. The 40-char
hash segment in the CDN URL trips Shannon entropy (H=5.29 over 300 chars),
and rule 13 (inline <svg>) doesn't match since there's no literal `<svg>`
tag — the `.svg` is just a URL path suffix.
Added rule 18 `MARKDOWN_IMAGE = /!\[[^\]]*\]\(\s*https?:\/\//` — matches
`` / ``. Line-level (not string-level) so URL
is not over-specific.
E2E impact on `content-claude-code`:
- Before: BLOCK / 65 / 8H 437M 0L
- After: WARNING / 56 / 3H 427M 0L
Hyperframes unchanged: BLOCK / 80 / 1C 4H 92M — real CRITICAL SQL-injection
and HIGH findings still detected.
Tests: 2 new (positive + negative fixture) bringing entropy-context to 26,
total suite 1485 → 1487.
Docs updated to "rules 11-18" and "8 new line-suppression rules".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Final commit in the trustworthy-scoring series. Bundles verdict cutoff
alignment, the last suite of tests, and all documentation touch-points
that quote version numbers or describe v7.0.0 behaviour.
Verdict/band co-monotonicity
- `scanners/lib/severity.mjs` — verdict cutoffs moved from 61/21 to 65/15
so `BLOCK >= 65`, `WARNING >= 15` locks onto the v2 riskBand() boundaries.
Prevents "BLOCK / Medium band" contradictions under the v2 formula.
Scanner hardening (bug fixes from v7.0.0 testing)
- `scanners/entropy-scanner.mjs` — `policy_source` now uses
`existsSync('.llm-security/policy.json')` instead of value-based check.
Old heuristic always reported 'policy.json' because DEFAULT_POLICY now
carries an `entropy.thresholds` section.
- `scanners/lib/file-discovery.mjs` — `.sass` and GPU shader extensions
(`.glsl, .frag, .vert, .shader, .wgsl`) added to TEXT_EXTENSIONS. Without
this, shader files were invisible to file-discovery, so they were never
counted as skipped by the entropy-scanner extension filter.
Tests
- `tests/scanners/entropy-context.test.mjs` (new, 24 tests) — A. File-ext
skip (4), B. Line-level rules 11-17 (8), C. Policy overrides (3).
Fixtures generate 80-char base64 payloads at runtime via
`crypto.randomBytes` to dodge the plugin's own pre-edit credential hook
on the test source.
- `tests/lib/severity.test.mjs` — rewritten with v2 scoring table (70
tests total, was 52).
- `tests/lib/output.test.mjs:243` — "1 critical = score 80" under v2
(was 25 under v1).
- Full suite: 1485/1485 green (was 1461).
Docs
- `CHANGELOG.md` — v7.0.0 entry with BREAKING CHANGES section.
- `README.md` (plugin + marketplace root) — version badge, history table,
plugin-card version string, test count.
- `CLAUDE.md` — header version, "v7.0.0 — Trustworthy scoring" summary
paragraph at the top.
- `docs/security-hardening-guide.md` — new section 6 "Calibration & false
positives" documenting v2 formula, context-aware entropy scanner,
typosquat allowlist, and §6.4 tuning workflow. Existing "Recommended
baseline" section renumbered to §7.
Version bump
- `6.6.0 -> 7.0.0` across package.json, .claude-plugin/plugin.json,
scanners/ide-extension-scanner.mjs VERSION const, README badge,
CLAUDE.md header, marketplace root README card.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Makes suppression stats visible in the deep-scan report so users can
audit why the scanner produced the counts it did. Before: synthesizer
would acknowledge "true risk is High, not Extreme" in prose while
verdict stayed BLOCK/Extreme — inconsistent. After Commit 1 the
orchestrator verdict is coherent on its own; synthesizer's job shrinks
to transparency.
- Adds 'Scan Calibration' section instruction consuming
scanner.calibration.* fields (entropy files_skipped_by_extension,
policy_source, thresholds).
- Heuristic: omit the section if < 5% of files skipped (no signal).
Flag the section if > 80% skipped (policy may be too aggressive).
- Explicit 'Don't override verdict' directive in DON'T DO list.
Discrepancy goes in calibration, not in a rewritten dashboard.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hyperframes scan flagged knip vs knex, oxlint vs eslint, tsx vs nx,
rimraf vs trim as HIGH typosquats. All four are legitimate top-1000 npm
packages; short names just happen to be within Levenshtein ≤2 of other
top packages. These shouldn't generate HIGH severity on a clean install.
Added to npm allowlist: knip, oxlint, tsx, nx, rimraf, glob, tar, zod,
ky, ow, esm, ip, qs, url, prettier, vitest, vite, rollup, swc, turbo,
bun, deno. Added to pypi allowlist: uv, ruff, rich, typer, anyio.
Dep-auditor normalization (lowercase + [_.-] → -) already applied at
load time. dep.test.mjs: 11/11 still green — lodsah→lodash detection
preserved.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds entropy section to DEFAULT_POLICY and wires it into entropy-scanner.
Users can now tune false-positive tradeoffs without forking the scanner.
Policy shape (.llm-security/policy.json):
entropy:
thresholds.{critical,high,medium}.{entropy,minLen} — numeric overrides
suppress_extensions[] — additive ext skip
suppress_line_patterns[] — additional regex
suppress_paths[] — relPath substrings
Wiring: entropy-scanner calls loadPolicy(targetPath) at scan entry (not
orchestrator-passed — avoids signature churn across 10 scanners). Module-
level state is reset per scan invocation. Scanner envelope now includes
calibration.{policy_source, thresholds, files_skipped_by_*} for
synthesizer transparency (Commit 5).
Malformed user regex silently skipped. Missing policy.json → built-in
defaults (backwards-compatible).
entropy.test.mjs: 9/9 still green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Observed 70% false-positive rate on renderer/shader codebases (hyperframes):
GLSL, CSS-in-JS, inline HTML/SVG, ffmpeg filter-strings, hardcoded
User-Agent strings all matched base64-like entropy thresholds. This
commit adds two suppression layers before classification.
Layer A — file-extension skip: .glsl/.frag/.vert/.shader/.wgsl (shaders),
.css/.scss/.sass/.less (stylesheets), .svg (markup), .min.js/.min.css
(minified bundles). Tracked via new calibration.files_skipped_by_extension
field on scanner envelope for synthesizer stats.
Layer B — seven new line-level suppression rules in isFalsePositive()
(rules 11-17): GLSL/WGSL keywords, CSS-in-JS (styled/emotion/@keyframes),
inline HTML/SVG markup, ffmpeg filter-graph syntax, browser User-Agent,
SQL DDL/DML, error-message templates with embedded HTML.
Existing entropy.test.mjs: 9/9 still green — known bad base64 payload in
telemetry.mjs fixture still detected. Policy-driven thresholds wired in
Commit 3.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace sum-and-cap formula (every non-trivial scan → 100/Extreme) with
severity-dominated, log-scaled-within-tier model. Discriminates actual
risk: 1 critical = 80, 2 critical = 86, 17 high = 65. Hyperframes-class
rendering codebases no longer collapse to Extreme just from shader noise.
Changes:
- scanners/lib/severity.mjs: new riskScore() v2; keep riskScoreV1() for
reference; riskBand() cutoffs aligned (14/39/64/84).
- scanners/posture-scanner.mjs: delete inline duplicate formula, import
riskScore/riskBand/verdict from severity.mjs. Single source of truth.
Breaking: aggregate.risk_score semantics change. Batched with entropy
suppression (Commit 2+) under v7.0.0 bump in Commit 6. Do not release
individually — JSON consumers depend on scoring band stability.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sync plugin.json, plugin README badge, and marketplace root README
plugin-table to 2.4.0. Closes the v2.4.0 rollout.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add v2.4.0 CHANGELOG entry documenting the background-mode removal
rationale (harness does not expose Agent tool to sub-agents per
github.com/anthropics/claude-code/issues/19077). Update plugin CLAUDE.md
architecture sections to drop background-transition phases and redefine
the three orchestrator agents as inline reference. Update plugin README
mode tables for /ultraresearch-local, /ultra-cc-architect-local,
/ultraplan-local — --fg is now a no-op alias. Update marketplace root
README with a v2.4.0 paragraph above the v2.3 changelog summary.
Closes the docs portion of the v2.4.0 rollout. Version-sync follows in
the next commit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add prominent warning block that the Agent tool is not exposed to
sub-agents (foreground or background). Rewrite Shape A (orchestrator
handoff) from "valid pattern" to "confirmed anti-pattern, removed in
v2.4.0". Correct Composition section: background + subagents is
NOT supported — nested spawn from a sub-agent silently fails.
Source: github.com/anthropics/claude-code/issues/19077
Confirmed empirically 2026-04-19.
BREAKING CHANGE: Catalog consumers that cited Shape A should migrate
to orchestrating from the main command context instead.
Redefine research-orchestrator, planning-orchestrator, and
architect-orchestrator from "background executor" to "inline
reference documentation". The agent files remain as the canonical
workflow descriptions, but the /ultra* commands now execute the
phases directly in the main command context instead of spawning
these agents as sub-agents.
The /ultra* command markdowns are now the de-facto orchestrators.
Splitting work into a separate sub-agent was incompatible with the
harness's treatment of the Agent tool (not exposed to sub-agents).
BREAKING CHANGE: These agents are no longer invoked. Any external
integration that spawned them directly should now invoke the
corresponding /ultra* command instead.
Remove background-transition phases from /ultraresearch-local,
/ultraplan-local, /ultra-cc-architect-local, /ultrabrief-local.
All four commands now run their full pipelines inline in main
context; --fg is retained as a no-op alias for backwards
compatibility.
The Claude Code harness does not expose the Agent tool to
sub-agents, so orchestrators launched with run_in_background:true
cannot spawn their documented swarms (docs-researcher,
community-researcher, architecture-mapper, plan-critic, etc.) and
silently degrade to single-context reasoning. Foreground execution
keeps the swarms intact.
Source: github.com/anthropics/claude-code/issues/19077
Confirmed empirically 2026-04-19.
BREAKING CHANGE: Default execution is foreground — the session
blocks until the brief/plan is ready. Use `claude -p` in a
separate terminal for long-running headless work.
Transparency: all code in this marketplace is produced by Claude Code
through dialog-driven development. Root README gets a full disclosure
section; each plugin README gets a one-line disclosure linking back to
the marketplace section.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
skill-drafter now reads {catalog_root}/<slug>.md before writing its
draft and prepends a warning block to its confirmation output when
an existing skill would be overwritten during manual `mv` promotion.
The draft is still written to .drafts/<slug>.md — the check is a
hint, not a block.
Closes v2.3.0 dogfood finding (post_dogfood_findings[0]): the
drafter produced .drafts/hooks-pattern.md when an approved
hooks-pattern.md seed already existed, giving no signal that `mv`
during promotion would silently overwrite the seed. v2.3.1
introduced the qualified-slug mechanism to resolve such collisions;
v2.3.2 surfaces them at the right moment — before promotion.
Changes:
- agents/skill-drafter.md — new Step 2 between slug computation and
source reading. Reads {catalog_root}/<slug>.md, inspects
review_status, derives a kebab-case qualifier from the concept
handle (or source basename fallback). Subsequent steps renumbered
3→7. Output format gains Collision: field and optional warning
block. New Hard Rule.
- tests/fixtures/skill-drafter/slug-collision-expected.md — reference
fixture documenting expected confirmation shape across four
scenarios (no collision, approved collision, soft pending
collision, collision with no good qualifier). Skill-drafter is
prompt-driven; fixture anchors shape for human verification and
downstream parsers.
- CHANGELOG [2.3.2], plugin.json 2.3.1→2.3.2, README badge, plugin
CLAUDE.md slug-convention Collision-hint bullet, marketplace root
README summary, marketplace root CLAUDE.md plugin table.
Non-breaking. No frontmatter/drafts-layout/tool-scope/regex changes.
Existing pipelines see one extra field and an optional warning —
both purely additive.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Resolves v2.3.0 dogfood collision: skill-factory produced a
specialized hooks-pattern.md draft that would have overwritten the
generic seed. Qualified slugs let one feature host multiple named
patterns at different abstraction levels.
Slug convention: <cc_feature>[-<qualifier>]-<layer>.md. Unqualified =
canonical baseline. Qualified = sub-pattern (e.g., hooks-observability-
pattern.md) that does not displace the baseline.
Changes:
- SKILL.md: slug convention section, coverage-table qualified column,
matcher logic for N patterns per feature, modification rules cover
qualified-vs-canonical choice and collision handling.
- feature-matcher: catalog map is cc_feature -> {layer -> [skills]};
selection rules (baseline by default, qualified when justified,
multi-skill when non-overlapping); supporting_skill accepts list.
- gap-identifier: adds pattern_count[cc_feature] to coverage audit.
- architecture-critic: supporting-skill verification — every cited
skill name must exist in the catalog (blocker severity).
- First qualified skill: hooks-observability-pattern.md (promoted from
.drafts/, source ai-psychosis/README.md, ngram-overlap 0.01).
- Version bump 2.3.0 -> 2.3.1 across plugin.json, badges, table, root
CLAUDE.md, CHANGELOG.
Non-breaking: existing unqualified slugs keep working, no cc_feature
taxonomy changes, hallucination gate unchanged.
Opus orchestrator for /ultra-skill-author-local. Sequential 5-phase pipeline:
validate source → concept-extractor → skill-drafter → ip-hygiene-checker
→ completion summary.
No retry logic, no parallelism, no automation (per brief Non-Goals). Each
phase consumes the previous phase's output. --quick mode skips IP-hygiene
with a BIG WARNING for testing the drafting pipeline in isolation.
Plan: .claude/projects/2026-04-18-skill-factory-fase-1-mvp/plan.md (step 9)
Sonnet worker for /ultra-skill-author-local. Consumes concept-extractor JSON
plus source file, writes a 150-600 word draft to .drafts/<slug>.md with the
9-field frontmatter contract (review_status=pending, ngram_overlap_score=null).
Imperative voice, progressive disclosure, no verbatim copy from source.
Aborts with too-technical-to-paraphrase if the subject can't be rephrased.
Plan: .claude/projects/2026-04-18-skill-factory-fase-1-mvp/plan.md (step 7)
Sonnet worker for /ultra-skill-author-local. Reads ONE local source file
and emits structured concept JSON with cc_feature/layer/concept fields.
Enforces two gates:
- Hallucination gate: cc_feature MUST be one of 8 canonical values
- Gap-class gate: class C (decision) and D (outside CC) → out_of_scope
Plan: .claude/projects/2026-04-18-skill-factory-fase-1-mvp/plan.md (step 6)
Symmetric with the existing VS Code branch — the env var was only wired
into getVSCodeExtensionRoots(), so the plan's master verification
(`LLM_SECURITY_IDE_ROOTS=... --intellij-only`) reported 0 discovered
plugins. Adding the same fallback to discoverJetBrainsExtensions makes
both families honor the CLI override and closes the gap.
Thresholds <=10 (fixture) and <=20 (plugin root) have been too tight since
before this plan started — baseline on 1634197 already produced 37 and 27
findings. git-forensics findings accumulate with repo history, so fixed
caps are brittle. Raised to <=100 to tolerate organic growth while still
catching runaway/pathological output.
Step 1/17 of ultraplan-2026-04-17-jetbrains-ide-scan.
- Populate top-jetbrains-plugins.json with 56 canonical xmlIds (bundled +
popular third-party): com.intellij.java, org.jetbrains.kotlin,
com.jetbrains.python.community, org.rust.lang, com.github.copilot,
mobi.hsz.idea.gitignore, the legitimate-typo 'Lombook Plugin', etc.
- Add loadJetBrainsBlocklist() export mirroring loadVSCodeBlocklist shape.
Blocklist is empty by design — no public confirmed-malicious JetBrains
Marketplace plugins as of 2026-04-17.
- Add tests/scanners/ide-extension-data.test.mjs (9 tests, all pass).
- Fix cache bug in loadTopJetBrains: map normalizeId on cache-hit path too
(was previously unnormalized on second call).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace hardcoded Q1-Q8 in /ultrabrief-local with a section-driven
completeness loop (Phase 3) and a draft/review/revise loop with
brief-reviewer as stop-gate (Phase 4). Quality drives the interview,
not a question counter.
brief-reviewer now emits a machine-readable JSON block with per-dimension
scores (1-5) and detail arrays alongside the existing prose report;
planning-orchestrator continues to consume the prose verdict unchanged.
Phase 4 gate: all dimensions >= 4 AND research_plan = 5. On fail, a
targeted follow-up is generated from the weakest dimension's detail
field and the draft is re-reviewed. Max 3 review iterations bound cost;
exhaustion writes brief.md with brief_quality: partial and an explicit
Brief Quality section. Force-stop surfaces per-dimension findings before
the user chooses continue or partial.
Not breaking. /ultrabrief-local [--quick] <task> interface unchanged.
--quick now means compact start with escalation, not a max-N cap.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Opus 4.7 reads agent instructions more literally than 4.6. The v1.7
planning-orchestrator described the Step+Manifest schema via prose +
procedural rules, which 4.6 inferred correctly but 4.7 sometimes
rendered as narrative "Fase N" prose — producing plans ultraexecute
Phase 2 rejected. First observed 2026-04-17 during llm-security v6.2.0
planning.
v1.8.0 closes the gap:
- planning-orchestrator Phase 5 embeds a literal copyable Step+Manifest
example (JWT middleware) replacing "read the template" prose
- Explicit forbidden-format clause: ## Fase N, ### Phase N, ### Stage N,
and any non-"### Step N:" heading are denied
- Phase 5.5 schema self-check: grep-verify canonical Step count matches
Manifest count and narrative heading count is zero, before handing to
plan-critic
- ultraexecute-local --validate mode: schema-only check that parses
steps + manifests, reports READY/FAIL with actionable error hints,
no security scan, no execution. Fast sanity check between
/ultraplan-local and full execution.
Static verification: 17/17 PASS.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
VSIX fetch + extract for URL targets now runs in a sub-process wrapped by
sandbox-exec (macOS) or bwrap (Linux), reusing the same primitives proven
by the v5.1 git-clone sandbox. Defense-in-depth — even if our own
zip-extract.mjs ever has a bypass, the kernel refuses any write outside
the per-scan temp directory.
New files:
- scanners/lib/vsix-fetch-worker.mjs — sub-process worker. Argv: --url
--tmpdir; emits one JSON line on stdout (ok/sha256/size/source/extRoot
or ok:false/error/code). Silent on stderr. Exit 0/1.
- scanners/lib/vsix-sandbox.mjs — wrapper. Exports buildSandboxProfile,
buildBwrapArgs, buildSandboxedWorker, runVsixWorker. 35s timeout, 1 MB
stdout cap.
Changes:
- scanners/ide-extension-scanner.mjs: fetchAndExtractVsixUrl is now
sandbox-aware (useSandbox option, default true). In-process logic
preserved as fallback. New meta.source.sandbox field:
'sandbox-exec' | 'bwrap' | 'none' | 'in-process'.
- scan(target, { useSandbox }) defaults to true; tests pass false because
globalThis.fetch mocks do not cross process boundaries.
- Windows fallback: in-process with meta.warnings advisory.
Tests:
- 8 new tests in tests/scanners/vsix-sandbox.test.mjs (per-platform
profile generation, worker arg construction, live worker exit
behavior on invalid URLs — no network).
- Existing URL tests updated to opt out of sandbox (useSandbox: false).
- 1344 → 1352 tests, all green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pre-installation verification of VS Code extensions via URL — fetch a remote
VSIX, extract it in a hardened sandbox, and run the existing IDE scanner
pipeline against it. No npm dependencies.
Sources:
- VS Code Marketplace (publisher.gallery.vsassets.io direct download)
- OpenVSX (open-vsx.org official API)
- Direct .vsix HTTPS URLs
Defenses:
- HTTPS-only, TLS verified, manual redirect with per-source host whitelist
- 30s total timeout via AbortController
- 50MB compressed cap, 500MB uncompressed, 100x expansion ratio
- Zero-dep ZIP extractor: zip-slip, absolute paths, drive letters, NUL bytes,
symlinks (Unix mode 0xA000), depth limits, ZIP64 rejected, encrypted rejected
- SHA-256 streamed during fetch, surfaced in meta.source
- Temp dir cleanup in all paths (try/finally)
Files:
- scanners/lib/vsix-fetch.mjs (HTTPS fetcher, host whitelist, streaming SHA-256)
- scanners/lib/zip-extract.mjs (zero-dep parser with hardening caps)
- knowledge/marketplace-api-notes.md (endpoint reference)
- 3 test files (48 tests added: vsix-fetch, zip-extract, ide-extension-url)
Tests: 1296 → 1344 (all green).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Masks non-empty '...' content before T5/T2-T4 run so literal strings such
as `echo '${IFS}'` are not rewritten. Empty '' pairs are stripped first
so c''u''rl -> curl evasion keeps resolving. ANSI-C $'...' is decoded
before masking.
Caught by the false-positive probe added in Step 3 of ultraplan-v6.2.0.
New read-only command that shows everything Claude Code actually loads for a
given repo — plugins, skills, MCP servers, hooks, CLAUDE.md cascade — with
source attribution (user/project/plugin) and rough token estimates. Helps
identify candidates for disabling without guessing.
Added:
- scanners/lib/active-config-reader.mjs — pure async helper: readActiveConfig,
detectGitRoot, walkClaudeMdCascade, readClaudeJsonProjectSlice (longest-prefix
matching for .claude.json projects), enumeratePlugins, enumerateSkills,
readActiveHooks, readActiveMcpServers, estimateTokens (markdown 4 c/tok,
json 3.5 c/tok, frontmatter cap 150 tokens, item flat 15)
- scanners/whats-active.mjs — thin CLI shim: --json, --output-file, --verbose,
--suggest-disables
- commands/whats-active.md — renders tables via Read tool; honors UX rules
- tests/lib/active-config-reader.test.mjs — 36 tests, all green (integration
fixture built in tmpdir with fake HOME, .claude.json prefix matching,
plugin discovery, hook/MCP merge from all scopes)
Verified:
- Performance budget: <2s wall-clock (smoke test: 102ms on real repo)
- Token estimates within ±20% of hand-computed values
- Read-only: no writeFile/mkdir/unlink in production code
- Self-audit: Plugin Health scanner reports 0 findings (Grade A)
- Full test suite: 522 tests, 512 pass (10 pre-existing conflict-detector
failures on main — unrelated to this change, reproducible on clean HEAD)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wave 1 of a 6-session parallel build revealed three failure modes:
(1) hallucinated completion (status=completed after 2/5 steps, last
tool call was an arbitrary file review), (2) fail-late bash (3/6
sessions had push blocked inside sub-agent sandbox after all work
was done), (3) no objective verification (plans were prose).
v1.7 closes all three by making the plan an executable contract.
Per-step YAML manifest (expected_paths, commit_message_pattern,
bash_syntax_check, forbidden_paths, must_contain) is the objective
completion predicate. Plan-critic dimension 10 (Manifest quality)
is a hard gate. Session decomposer propagates manifests verbatim
and emits an obligatory Step 0 pre-flight (git push --dry-run,
exit 77 sentinel) in every session spec.
ultraexecute-local gets Phase 7.5 (independent manifest audit from
git log + filesystem, ignoring agent bookkeeping) and Phase 7.6
(bounded recovery dispatch, recovery_depth ≤ 2). Hard Rule 17
forbids marking a step passed without manifest verification. Hard
Rule 18 forbids ending on an arbitrary tool call before reporting.
Division of labor is made explicit:
- /ultraresearch-local gathers context (no build decisions)
- /ultraplan-local produces an executable contract (manifests,
plan-critic gate)
- /ultraexecute-local executes disciplined (does NOT compensate
for weak plans — escalates)
Code complete. Docs partial (Arbeidsdeling table + manifest section
added to plugin + marketplace READMEs). Verification tests
(10-sequence) pending — see REMEMBER.md.
Backward compat: v1.6 plans without plan_version marker get
legacy mode with synthesized manifests and legacy_plan: true in
progress file. Plan-critic emits advisory, not block.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Version bump v1.1.0 → v1.2.0 across all docs (CLAUDE.md, README.md,
root README.md, plugin.json, CHANGELOG.md). Documents new scripts
(state-updater, clipboard-helper, ical-generator), reduced interactive
steps, auto-clipboard, progressive onboarding, and MCP carousel pipeline.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds pruneContentHistory call to session-start.mjs after week rollover.
Uses dynamic import() for state-updater.mjs. Non-critical: silently
skipped on failure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pure functions for post tracking (streak, week rollover, first_post_date),
content history pruning, and follower count updates. 19 tests green.
Follows week-rollover.mjs pattern (pure functions) + queue-manager.mjs
pattern (I/O wrapper with atomic writes).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Step 5b in batch.md generates a .ics calendar file from the queue,
giving users 15-minute reminders in their calendar app before each
scheduled post. Supports macOS Calendar, Google Calendar, Outlook.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pure-function iCal generator with CRLF endings, line folding at 75 octets,
VALARM reminders, VTIMEZONE, and special character escaping. 16 tests green.
Standalone CLI mode: node ical-generator.mjs --from-queue --output path.ics
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Text overlay VERIFIED: Nano Banana Pro renders readable text on dark
gradient backgrounds at 3:4 aspect ratio (closest to LinkedIn 4:5).
Header and subtitle both legible. Mermaid Chart available but untested.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
npx llm-security requires npm publishing which hasn't happened.
Updated 3 references to use node bin/llm-security.mjs which works today.
CI templates and docs intentionally kept as-is (designed for future npm).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
quick.md:
- Replace 8-option post type menu with context inference
- Auto-select CTA based on post type
- Remove proactive alternative version offering
- AskUserQuestion: 0 in main flow (was 1 implicit menu)
react.md:
- Auto-select strongest angle instead of 3-option AskUserQuestion
- Replace 6-option refinement AskUserQuestion with plain text options
- Same treatment for comparison path (Step 4b)
- AskUserQuestion: 1 in main flow (multi-URL only, was 3)
pipeline.md:
- Skip ideation ask if topic provided with command invocation
- Auto-propose planned topic without AskUserQuestion
- Inline angle selection (auto-select strongest)
- AskUserQuestion: 2 in main flow (ideation fallback + scheduling, was 3)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Skip topic ask if user provided one with command invocation
- Auto-select strongest angle instead of 3-option AskUserQuestion menu
- Infer format from content type instead of interactive selection
- Present ONE draft without proactive alternative versions
- Replace refinement AskUserQuestion with plain text options
Net result: 0 AskUserQuestion calls in main flow (was 2).
User provides topic → gets ONE draft → clipboard-ready.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Badges, intro, commands, scanner table, Mermaid diagram, directory tree,
and knowledge base section all had counts frozen at v3-v4 era. Updated
to match actual filesystem: 21 scanners (10+11), 18 commands, 16 knowledge
files, 16 posture categories, 1264 tests. Added missing bin/, ci/, docs/
directories and all standalone scanners to directory tree.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds Bash to allowed-tools and clipboard-helper.mjs auto-copy to:
post.md, quick.md, react.md, pipeline.md, first-post.md,
video.md, multiplatform.md, carousel.md.
Each command auto-copies the final post/caption text to clipboard
after presenting the draft. Replaces manual copy-paste instructions
in first-post.md (=== COPY FROM HERE ===) with auto-copy.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TDD: 13 tests written first, then implementation.
Exports copyToClipboard(text) and clipboardAvailable() — never throws.
Supports darwin (pbcopy), win32 (clip), linux (xclip/xsel fallback).
Dual standalone/import mode following personalization-score.mjs pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add threshold-based exit codes (--fail-on <severity>) and compact
output mode (--compact) to scan-orchestrator and CLI. Pipeline
templates for GitHub Actions, Azure DevOps, GitLab CI with SARIF
upload. CI/CD guide with Schrems II/NSM compliance documentation.
npm publish preparation (files whitelist, .npmignore). Policy ci
section for distributable CI defaults. Version 6.1.0.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
process.exit() terminates before pipe buffers drain, truncating output
at 64KB when piped through another Node.js process on macOS. Affects
scan-orchestrator (SARIF output) and supply-chain-recheck-cli.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New policy-loader.mjs reads .llm-security/policy.json with deep-merge against
defaults that exactly match existing hardcoded values. Integrated into all 7 hooks:
- pre-prompt-inject-scan: injection.mode (env var still takes precedence)
- post-session-guard: trifecta.mode, window_size, long_horizon_window
- pre-edit-secrets: secrets.additional_patterns
- pre-bash-destructive: destructive.additional_blocked
- pre-write-pathguard: pathguard.additional_protected
- pre-install-supply-chain: supply_chain.additional_blocked_packages
- post-mcp-verify: mcp.volume_threshold_bytes, mcp.trusted_servers
Backward compatible: no policy file = identical behavior to v5.1.0.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New audit-trail.mjs writes structured events to LLM_SECURITY_AUDIT_LOG path.
Integrated into post-session-guard at 6 warning emission points: trifecta,
escalation-after-input, data flow, volume threshold, slow-burn, behavioral drift.
No-op when env var not set — zero overhead for existing users.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New sarif-formatter.mjs converts scan envelope to OASIS SARIF 2.1.0 standard.
Maps severity to SARIF levels, findings to results with locations and rules.
scan-orchestrator accepts --format sarif|json (default: json).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extends posture scanner from 13 to 16 categories with three governance/compliance
checks. New categories are advisory (not in CRITICAL_CATEGORIES) — existing Grade A
projects remain Grade A. VERSION bumped to 6.0.0.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove file limit (was 10, now processes all critical+high+medium)
- Increase max-turns to 200 and timeout to 60min
- Add medium priority to update filter
- Update README KB note to reflect automated weekly updates
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Additional factual updates from batch 3 research:
- responsible-ai-training-awareness.md: module renamed
"Azure AI Studio" → "Microsoft Foundry" (3 occurrences)
- transparency-documentation-standards.md: ISO/IEC 42001 scope expanded
to include Copilot Studio, Microsoft Foundry, Security Copilot,
GitHub Copilot, Dragon Copilot
- ai-act-compliance-guide.md: same ISO 42001 scope expansion
- human-in-the-loop-oversight.md: AI approval stages in Copilot Studio
(GPT-o3 as AI approver, new Human in the loop connector)
- continuous-improvement-feedback-loops.md: MLflow 3 Feedback vs
Expectation assessment types, Genie Code trace analysis
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Not a distributable plugin — Copilot CLI has no plugin mechanism.
Was an internal one-off port for a colleague, not a marketplace item.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Full port of llm-security plugin for internal use on Windows with GitHub
Copilot CLI. Protocol translation layer (copilot-hook-runner.mjs)
normalizes Copilot camelCase I/O to Claude Code snake_case format — all
original hook scripts run unmodified.
- 8 hooks with protocol translation (stdin/stdout/exit code)
- 18 SKILL.md skills (Agent Skills Open Standard)
- 6 .agent.md agent definitions
- 20 scanners + 14 scanner lib modules (unchanged)
- 14 knowledge files (unchanged)
- 39 test files including copilot-port-verify.mjs (17 tests)
- Windows-ready: node:path, os.tmpdir(), process.execPath, no bash
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wrapper script that polls Microsoft Learn sitemaps and spawns a local
Claude session to update stale reference files. Designed for crontab,
zero cloud dependencies.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sitemap-based KB change detection system: weekly polling of Microsoft
Learn sitemaps, prioritized change reports, new page discovery.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a zero-dependency Node.js pipeline that polls Microsoft Learn sitemaps
weekly to detect when source documentation changes. Replaces the broken
mtime-based staleness check (all files had identical mtime after release).
Components:
- build-registry.mjs: extracts 1342 URLs from 387 reference files
- poll-sitemaps.mjs: streams ~18 child sitemaps, matches against registry
- report-changes.mjs: prioritized change report (critical/high/medium/low)
- discover-new-urls.mjs: finds relevant new MS Learn pages not yet covered
- run-weekly-update.mjs: orchestrator with --force/--discover/--dry-run
Integration:
- session-start hook reads change-report.json instead of broken mtime check
- hook triggers background poll if >7 days since last check
- generate-skills --update reads change report for targeted MCP updates
Current stats: 69% match rate (924/1342 URLs tracked via sitemaps).
~31% unmatched due to Microsoft URL restructuring (ai-foundry/openai paths).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shift focus from tildelingsbrev-specific to general strategy transformation.
More motivating, explains what and why for leaders/advisors, not developers.
Updated marketplace root README section to match.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Every feature push must update all three doc levels: plugin README,
plugin CLAUDE.md, and root marketplace README. Added as OBLIGATORISK
convention to prevent docs from falling out of sync.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Security hardening section to ultraplan-local README covering all 4
defense layers. Update architecture tree to include hooks directory.
Update root marketplace README with security summary and hook count.
Update CLAUDE.md architecture section with Phase 2.4 and --allowedTools.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Four-layer security model for ultraexecute-local and headless sessions:
Layer 1 — Plugin hooks: pre-bash-executor.mjs (13 BLOCK + 8 WARN rules
with bash evasion normalization) and pre-write-executor.mjs (8 path guard
rules blocking .git/hooks, .claude/settings, shell configs, .env, SSH/AWS).
Layer 2 — Prompt-level security rules: denylist in ultraexecute-local.md
Sub-step D and session-spec-template.md Security Constraints section.
These are the only rules that work in headless child sessions.
Layer 3 — Pre-execution plan validation: new Phase 2.4 scans all Verify
and Checkpoint commands against denylist before execution begins.
Layer 4 — Replace --dangerously-skip-permissions with scoped
--allowedTools "Read,Write,Edit,Bash,Glob,Grep" --permission-mode
bypassPermissions in ultraexecute-local.md, headless-launch-template.md,
and session-decomposer.md. Blocks Agent, MCP, WebSearch in child sessions.
Also adds Hard Rules 14-16: verify command security check, no writing
outside repository root, no writing to security-sensitive paths.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Lead with strategy → OKR, not tildelingsbrev → OKR
- Tildelingsbrev repositioned as early analysis tool for leaders
- Broaden tracking: mention Azure Boards and Jira alongside Linear
- Fix limitation: Linear is built-in, others work via MCP
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Major restructure based on user feedback:
- Lead with onboarding as the key differentiator
- Add "Bring What You Have" section with two real examples:
tildelingsbrev parsing and existing OKR quality review
- Show iterative development process, not one-shot usage
- Stronger value proposition: OKR expert that learns your context
- Remove simplistic Q&A example that made plugin look like a form-filler
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace dry documentation-style README with problem-first structure:
lead with public sector pain points, show realistic interaction example,
add "when not to use" section, move badges to bottom.
132 lines (down from 280). Hook within 10 seconds.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
No plugins should promise future features. ROADMAP.md and BACKLOG.md
are internal planning documents, not public-facing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 6 plugin READMEs now use identical installation section:
marketplace-first approach with /plugin browsing, then direct
settings.json as alternative. Replaces inconsistent mix of
git clone, plugin add, and JSON-only instructions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace dry feature-list description with problem-framed narrative
that names what generic OKR gets wrong about public sector, shows the
full governance chain, and uses precision-as-personality style matching
the other marketplace descriptions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
REMEMBER.md, TODO.md, and ROADMAP.md are local session state files
that should not be tracked in git. Untrack the two previously
committed ROADMAP.md files (linkedin, ultraplan) and add root
.gitignore to prevent future tracking.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Plugin README, marketplace README, and CONTRIBUTING.md were committed
with pre-v1.6.0 content. Syncs all documentation with the actual v1.6.0
release: adds /ultraresearch-local section, updates agent count (19),
command count (3), pipeline diagram, examples, and architecture tree.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add /ultraresearch-local for structured research combining local codebase
analysis with external knowledge via parallel agent swarms. Produces research
briefs with triangulation, confidence ratings, and source quality assessment.
New command: /ultraresearch-local with modes --quick, --local, --external, --fg.
New agents: research-orchestrator (opus), docs-researcher, community-researcher,
security-researcher, contrarian-researcher, gemini-bridge (all sonnet).
New template: research-brief-template.md.
Integration: --research flag in /ultraplan-local accepts pre-built research
briefs (up to 3), enriches the interview and exploration phases. Planning
orchestrator cross-references brief findings during synthesis.
Design principle: Context Engineering — right information to right agent at
right time. Research briefs are structured artifacts in the pipeline:
ultraresearch → brief → ultraplan --research → plan → ultraexecute.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Version badge 1.4.0 → 1.5.0
- Rewrite parallel execution section to document worktree isolation,
pre-flight checks, sequential merge, and automatic cleanup
- Update plugin.json version reference in directory tree
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers onboarding, content quality, analytics, Claude Code platform
integration, architecture improvements, and growth features.
Includes dependency tracking and deprioritized items with rationale.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 2.6 previously launched parallel claude -p sessions in the same
working directory, causing git race conditions and repository corruption.
Changes:
- Add Phase 2.55 (pre-flight safety checks): clean tree, plan file
tracking, scope fence overlap validation, stale worktree cleanup
- Rewrite Phase 2.6 with git worktree isolation: each parallel session
gets its own worktree and branch, merged back sequentially
- Add merge conflict detection and abort (no silent data loss)
- Add unconditional worktree cleanup (even on failure)
- Add hard rules 11-13 (worktree mandatory, cleanup, sequential merge)
- Session-scoped progress file naming for --session mode
- Update headless launch template with worktree support and cleanup trap
- Bump version to 1.5.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Session-start hook: welcome message with getting-started steps on first run
- Session-start hook: prominent personalization score section when score is 0
- Router: condensed 4-option menu for users who haven't posted yet
- Post/quick commands: non-blocking readiness check for unpersonalized state
- Post-creation hook: inline 5x5x5 engagement ritual explanation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Build LinkedIn thought leadership with algorithmic understanding,
strategic consistency, and AI-assisted content creation. Updated for
the January 2026 360Brew algorithm change.
16 agents, 25 commands, 6 skills, 9 hooks, 24 reference docs.
Personal data sanitized: voice samples generalized to template,
high-engagement posts cleared, region-specific references replaced
with placeholders.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allow name field to match either 'command' or 'plugin:command' format.
The architect: prefix is the correct convention for namespaced commands.
Also make auto_discover optional (not required in marketplace format).
Result: 215 PASS, 0 FAIL.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds complete version history (1.0.0-1.6.0) sourced from README version
history table. Adds 1.7.0 entry documenting the open-source release changes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The marketplace README only described Layers 1-2. Added interaction
reports (Layer 3) and contemplative references (Layer 4) with opt-in
notes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
commands/dpia.md: fix gdpr-compliance-ai-systems.md path
from: references/norwegian-public-sector-governance/
to: references/responsible-ai/ (where the file actually lives)
hooks/scripts/pre-edit-secrets.mjs: remove orphaned script that was
never registered in hooks.json. Secrets scanning handled by llm-security.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bump version to 1.7.0 (open-source release). Add author full name,
license, repository URL, and keywords to plugin.json.
Modernize .gitignore: remove dead orchestrator/ entries, add .claude/,
node_modules/, *.pdf, *.log, secrets.*.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Initial addition of ms-ai-architect plugin to the open-source marketplace.
Private content excluded: orchestrator/ (Linear tooling), docs/utredning/
(client investigation), generated test reports and PDF export script.
skill-gen tooling moved from orchestrator/ to scripts/skill-gen/.
Security scan: WARNING (risk 20/100) — no secrets, no injection found.
False positive fixed: added gitleaks:allow to Python variable reference
in output-validation-grounding-verification.md line 109.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add detailed platform matrix with links to sandbox-exec, bubblewrap,
Windows Sandbox, Docker Desktop, WSL2, and AppContainer documentation.
CVE reference for .gitattributes attack vector. Git config flag table
with per-flag mitigation descriptions. Windows guidance with concrete
options and recommendations. Note on why Node.js --permission is not
applicable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 17:12:54 +02:00
1728 changed files with 397087 additions and 4430 deletions
"description":"Multi-agent workflow for analyzing, reporting, and optimizing Claude Code configuration across your entire machine"
"description":"Multi-agent workflow for analyzing, reporting, and optimizing Claude Code configuration across your entire machine"
},
},
{
{
"name":"ultraplan-local",
"name":"voyage",
"source":"./plugins/ultraplan-local",
"source":"./plugins/voyage",
"description":"Deep implementation planning with interview, specialized agent swarms, external research, adversarial review, session decomposition, and headless execution support"
"description":"Voyage — brief, research, plan, execute, review, continue. Contract-driven Claude Code pipeline with specialized agent swarms, external research triangulation, adversarial review, post-hoc independent review with Handover 6 feedback loop, multi-session resumption, session decomposition, and headless execution. /trekbrief, /trekplan, and /trekreview each end by building a self-contained operator-annotation HTML (scripts/annotate.mjs, modelled on claude-code-100x): pencil-toggle annotation mode, select text or click any element, pick intent (Fiks/Endre/Spørsmål), comment, Copy Prompt, paste back, Claude revises the .md."
},
{
"name":"linkedin-thought-leadership",
"source":"./plugins/linkedin-thought-leadership",
"description":"Build LinkedIn thought leadership with algorithmic understanding, strategic consistency, and authentic engagement. Updated for the January 2026 360Brew algorithm change."
},
{
"name":"graceful-handoff",
"source":"./plugins/graceful-handoff",
"description":"Produce session-handoff artifacts, commit and push pending work, and print a copy-paste prompt for the next session. Designed for context-constrained models like Opus 4.7."
},
{
"name":"ai-psychosis",
"source":"./plugins/ai-psychosis",
"description":"Meta-awareness tools for healthy AI interaction patterns. Detects reinforcement loops, scope escalation, narrative crystallization, and other compulsive patterns."
},
{
"name":"ms-ai-architect",
"source":"./plugins/ms-ai-architect",
"description":"Microsoft AI Solution Architect — structured architecture guidance for the full Microsoft AI stack."
},
{
"name":"okr",
"source":"./plugins/okr",
"description":"Expert OKR guidance for Norwegian public sector. Write, review, cascade, track and govern OKR based on Google/Doerr methodology adapted for 4-month tertial cycles."
},
{
"name":"human-friendly-style",
"source":"./plugins/human-friendly-style",
"description":"Shared Claude Code output style for the ktg-plugin-marketplace. Plain-language tone — explains what and why, hides paths/JSON/stack traces by default, matches the user's language."
},
{
"name":"claude-design",
"source":"./plugins/claude-design",
"description":"End-to-end facilitator for prompting Claude Design (claude.ai/design) — idea to copy-paste-ready prompt with iteration coaching, citing Anthropic primary sources."
llm-security/ v7.7.2 — Security scanning, auditing, threat modeling. HTML report output for all 18 skill commands (render-report CLI + canonical ESM module mirrored bit-identical into the playground). v7.7.2 translated the remaining Norwegian surface text in the playground UI, the canonical renderer, the agent prompts, and the README/CLAUDE.md state sections to English. v7.7.1 stripped the playground to the catalog as the only routable surface.
ms-ai-architect/ v1.15.0 — Microsoft AI architecture (Cosmo Skyberg persona) + manual KB-refresh slash command + v3 project-view (sidebar med 17 artifacts + main + import-modal overlay, v2-surface fjernet i v1.15.0)
okr/ v1.0.0 — OKR guidance for Norwegian public sector
voyage/ v5.0.3 — Brief, research, plan, execute, review, continue. Contract-driven Claude Code pipeline (six-command universal pipeline + multi-session resumption + --gates autonomy chain). /trekbrief, /trekplan, and /trekreview each end by running scripts/annotate.mjs against the just-written .md and printing the file:// link to a self-contained operator-annotation HTML modelled on claude-code-100x/build-site.js: pencil-toggle annotation mode, select text or click any element, choose intent (Fiks/Endre/Spørsmål), comment, sidebar groups by section with delete + Copy Prompt, localStorage persistence per artifact path. v5.0.0 removed the v4.2/v4.3 bespoke playground + /trekrevise + Handover 8; v5.0.1 pointed at /playground document-critique (wrong direction); v5.0.2 was operator-led but too thin; v5.0.3 matches the reference the operator pointed at from day one.
Hvert plugin er selvstendig med egen CLAUDE.md, README, hooks, agents og commands. `shared/` inneholder marketplace-nivå infrastruktur som flere plugins bygger på.
- **Git:** Forgejo (`git.fromaitochitta.com/open/ktg-plugin-marketplace`). Aldri GitHub.
- **Hooks:** Alltid Node.js (.mjs), aldri bash. Cross-platform.
- **Avhengigheter:** Null npm dependencies i hooks/scannere. `node:test` for tester.
- **Bidrag:** Issues velkommen som signaler. PRs ikke akseptert. Fork-and-own er anbefalt adopsjonsmodell — se `GOVERNANCE.md`.
- **Lisens:** MIT, alle plugins
- **Docs ved endring (OBLIGATORISK):** Enhver feature-endring som pusher til Forgejo MÅ oppdatere alle tre doc-nivåer i SAMME commit eller umiddelbart etter:
1. Plugin `README.md` — detaljert dokumentasjon av endringen
- **Playground-oppdatering:** Ved endring av plugin playground HTML eller delt design-system, følg prosedyren i `shared/PLAYGROUND-MAINTENANCE.md` (4 spor: HTML-endring, DS-endring, screenshots, release).
Disse trackes IKKE i git. Oppdater ved sesjonsslutt.
## Arbeidsflyt
1. `cd` til riktig plugin-mappe
2. Les pluginets CLAUDE.md for kontekst
3. Les REMEMBER.md og TODO.md for sesjonsstatus
4. Jobb innenfor scope
5. Oppdater REMEMBER.md ved avslutning
## Communication patterns
### Linking to local files
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
- Always use absolute paths. Never `~/` or relative paths.
- For multiple files, render as a bullet list of named markdown links.
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
How this marketplace is maintained, what you can expect from upstream, and how it's meant to be used.
## TL;DR
- Solo-maintained, AI-assisted development, MIT licensed.
- **Fork-and-own is the default model.** Upstream is a starting point, not a vendor.
- Issues welcome as signals. Pull requests are not accepted — see [Why no PRs](#pull-requests--no).
- No SLA. Best-effort bug fixes and security advisories. Breaking changes happen and are noted in each plugin's CHANGELOG.
---
## Can I trust this?
Be honest with yourself about what you're adopting:
- **One maintainer.** If I get hit by a bus, the bus wins. The repos stay up under MIT, but no one owes you a fix.
- **AI-generated code with human review.** Every plugin is built through dialog-driven development with Claude Code. I read, test, and judge the output before it ships, but I'm not auditing every line the way a security firm would. Treat it accordingly.
- **No commercial interests.** I'm not selling a SaaS, not steering you toward a paid tier, not collecting telemetry. The plugins run locally in your Claude Code installation.
- **MIT licensed.** Fork it, modify it, ship it under your own name.
If you work somewhere that needs vendor accountability, support contracts, or signed assurances — **this isn't that.** Use it as a reference implementation, fork it into your own organization, and own the result.
---
## How this is meant to be used
### Fork-and-own
The intended workflow:
1. **Fork** the marketplace (or a single plugin) into your own organization or namespace.
2. **Tailor** it to your context — terminology, integrations, cycle lengths, regulatory framing, whatever doesn't fit out of the box.
3. **Maintain it yourself.** Treat your fork as the canonical version for your team.
4. **Watch upstream selectively.** Cherry-pick changes that help, ignore changes that don't. There's no obligation to stay in sync.
This isn't a workaround for not accepting PRs. It's the actual recommended adoption pattern, especially for plugins like `okr` and `ms-ai-architect` where every Norwegian public sector organization will need its own tildelingsbrev mappings, terminology, and integrations. A central "one true plugin" would be wrong for everyone.
### What to change first when you fork
Each plugin differs, but the common edits are:
- **Identity** — rename the plugin, replace authorship, update README.
- **External integrations** — issue trackers, knowledge bases, dashboards, observability backends. The plugins ship as starting points, not pre-wired. Every organization must configure its own integrations.
- **Norwegian-specific framing** — relevant for `okr` and `ms-ai-architect`. Other plugins are jurisdiction-neutral. Rewrite for your jurisdiction if you're outside Norway.
- **Reference docs** — the knowledge base in each plugin reflects my reading. Replace with your organization's authoritative sources.
- **Hooks and policies** — security thresholds, blocked commands, and audit gates are tuned to my taste. Tune them to yours.
### Staying current with upstream
If you want to pull in upstream changes later:
- **Cherry-pick, don't merge.** Each plugin moves independently and breaking changes land without ceremony.
- **Read the CHANGELOG first.** Every plugin has one.
- **Keep your customizations in clearly-named files.** The harder upstream is to merge cleanly, the more painful staying current becomes. A `local/` directory or `*.local.md` convention helps.
---
## What upstream provides
| | What I do | What I don't |
|---|---|---|
| **Bug fixes** | Best-effort when I notice or get a clear report | No SLA, no triage commitment |
| **Security issues** | Investigate within reasonable time, document in CHANGELOG | No CVE process, no embargo coordination |
| **New features** | When they fit my own usage | Not on request |
| **Norwegian public sector context** | Kept current as long as the project lives | If I lose interest or change jobs, the framing freezes |
| **Breaking changes** | Documented in CHANGELOG | They happen — version pin if you need stability |
| **Compatibility** | Tracked against current Claude Code releases | No long-term support branches |
If any of this is a dealbreaker — fork now, version-pin, and stop reading upstream.
---
## How to contribute
### Issues — yes, please
Issues are the most valuable thing you can send me:
- **Bug reports** with reproduction steps. Even a screenshot helps.
- **Use-case feedback.** "I tried to use this in my organization and X didn't fit" is genuinely useful, even if I can't fix it for you.
- **Pointers to better sources.** If you know a DFØ veileder, an NSM guideline, or an academic paper that contradicts what's in a knowledge base, tell me.
- **Security findings.** See each plugin's `SECURITY.md` for disclosure preference where one exists; otherwise email rather than open a public issue.
### Pull requests — no
This is deliberate, not laziness:
- **Solo review is a bottleneck.** Honest PR review takes me longer than rewriting from scratch. The math doesn't work.
- **Forks are where the value is.** The fork-and-own model means upstream consolidation isn't the point. Your organization's adaptations belong in your fork, not mine.
- **AI-generated code complicates provenance.** Every line here is produced through dialog with Claude Code, with me as the judge. Mixing in PRs from contributors with different processes and licensing assumptions creates a mess I'd rather not untangle.
If you've built something useful on top of a fork, **publish it under your own name and link back.** I'll happily list notable forks here once they exist.
### Notable forks
*(To be populated as forks emerge. If you've forked one of these plugins for production use, open an issue and I'll add a link.)*
---
## Relationship between plugins
These plugins are **independent**. Install one without the others, fork one without the others. They share conventions (slash command naming, hook patterns, AI-generated disclosure) but no runtime dependencies.
The marketplace is a **catalog**, not a suite. Don't fork the whole repo unless you actually want to maintain everything.
---
## Versioning and stability
- **Semantic versioning per plugin.** Each plugin has its own `CHANGELOG.md` and version number.
- **Breaking changes happen.** I bump the major version when they do, but I don't run an LTS branch.
- **Pin your version.** If stability matters more than features, install a specific version and stay there until you choose to upgrade.
---
## Public sector adoption notes
For Norwegian etater specifically:
- **DPIA-relevant data flows are documented in the relevant plugin README where applicable.** Read them before installation.
- **No data leaves your machine** beyond what Claude Code itself sends to Anthropic. The plugins themselves do not call external services unless you configure an integration.
- **Drøftingsplikt and ledelsesansvar** are not replaced by these tools. The `okr` plugin coaches; it does not decide. The `ms-ai-architect` plugin advises; it does not approve.
- **Choose your Claude deployment carefully.** claude.ai vs. API direct vs. Bedrock in EU region have different data residency profiles. The plugins don't choose for you.
---
## License
MIT for all plugins in this marketplace. See each plugin's `LICENSE` file.
Open-source Claude Code plugins for AI-assisted development, security, and planning.
Open-source Claude Code plugins for AI-assisted development, security, and planning.
Built for my own Claude Code workflow and shared openly for anyone who finds them useful. Solo project — bug reports and feature requests are welcome, pull requests are not accepted.
Built for my own Claude Code workflow and shared openly for anyone who finds them useful. Solo-maintained, AI-assisted, fork-and-own. Issues are welcome as signals; pull requests are not accepted. See [GOVERNANCE.md](GOVERNANCE.md) for what upstream provides and how this is meant to be used.
---
## AI-generated code disclosure
## Plugins
All code in this marketplace is generated by Claude Code through a dialog-based process. I direct, review, test, and validate; Claude writes. Every commit reflects this — treat the plugins as AI-authored, human-curated.
Security scanning, auditing, and threat modeling for agentic AI projects.
Built on OWASP LLM Top 10 (2025), OWASP Agentic AI Top 10, and the AI Agent Traps taxonomy (Google DeepMind, 2025). Three layers of protection:
- **Automated enforcement** — 8 hooks that block dangerous operations in real time (prompt injection, secrets in code, destructive commands, supply chain guardrails)
Configuration intelligence for Claude Code — health checks, feature discovery, and auto-fix.
Claude Code reads instructions from 7+ file types across multiple scopes. This plugin tells you what's wrong, what's missing, and what's silently conflicting:
- **Health** — 7 deterministic scanners verify correctness across every configuration file (broken imports, deprecated settings, conflicting rules, permission contradictions)
- **Opportunities** — context-aware recommendations for Claude Code features you're not using
- **Action** — auto-fix with mandatory backups, syntax validation, rollback support, and human-in-the-loop workflow
Meta-awareness tools that counteract sycophancy, reinforcement loops, and compulsive AI interaction patterns.
AI assistants are structurally optimized to be agreeable. This creates reinforcement loops where productive collaboration is often a mirror showing you what you want to see. Research documents psychotic episodes triggered by sustained AI interaction in individuals with no prior psychiatric history.
Security scanning, auditing, and threat modeling for agentic AI projects.
Built on OWASP LLM Top 10 (2025), OWASP Agentic AI Top 10, and the AI Agent Traps taxonomy (Google DeepMind, 2025). Three layers of protection:
- **Automated enforcement** — 9 hooks that block dangerous operations in real time (prompt injection, secrets in code, destructive commands, supply chain guardrails, transcript scanning before context compaction)
- **Deterministic scanning** — 23 Node.js scanners (10 orchestrated + 13 standalone) for byte-level analysis: Shannon entropy, Unicode codepoints, typosquatting detection, taint flow, DNS resolution, git forensics, AI-BOM, attack simulation, IDE extension prescan (VS Code + JetBrains — URL fetch from Marketplace / OpenVSX / direct VSIX / JetBrains Marketplace, hardened ZIP extractor for zip-slip / symlinks / bombs, plus OS sandbox via `sandbox-exec` / `bwrap` so the kernel enforces FS confinement), MCP cumulative-drift baseline reset (E14 — sticky baseline catches slow-burn rug-pulls). Bash-normalize T1-T6 for obfuscation-resistant denylists
- **Advisory analysis** — 20 commands that scan, audit, and model threats with structured reports, letter grades, and actionable remediation
- **Enterprise governance** — Compliance mapping (EU AI Act, NIST AI RMF, ISO 42001), SARIF 2.1.0 output, structured audit trail, policy-as-code, standalone CLI
- **v7.7.2 language consistency pass (2026-05-19)** — Norwegian had crept into surface text across v7.5-v7.7. Per the `~/.claude/CLAUDE.md` convention (English for code and documentation, Norwegian for dialog only), this release translates the HTML Report-step in all 18 skill commands, the canonical CLI renderer `scripts/lib/report-renderers.mjs`, the playground UI strings, the skill-scanner and mcp-scanner agent prompts, the marketplace + plugin README/CLAUDE.md state sections, and six table cells in `docs/scanner-reference.md`. Demo-state fixture content for the `dft-komplett-demo` project (intentional Norwegian persona) and regex alternations that match Norwegian-language report markdown (`/^high\|^høy/`, `/resolution\|løsning/`) were preserved. No scanner, hook, or behavior changes — purely surface text
- **v7.7.1 playground UX strip (2026-05-18)** — Operator feedback immediately after v7.7.0: the catalog became the only routable surface in the playground (the onboarding/home/project render functions remain in source but are not routable). Topbar simplified to a `Catalog` button + state/theme actions. Breadcrumb org-name replaced with a neutral `llm-security`. The onboarding concept (per-command context injection) is documented as a v7.8.0 candidate in ROADMAP. No scanner or hook behavior changes
- **v7.7.0 HTML report for all 18 skill commands (2026-05-18)** — Every `/security <cmd>` that produces a report now prints a clickable `file://` link to a self-contained HTML version. Delivered across five sessions: (1) playground catalog list-view + builder-pane with a copy button; (2) playground project-surface cleanup (stub-screen + topbar split); (3) the 18 inline parsers + renderers moved to a canonical ESM module `scripts/lib/report-renderers.mjs` (the playground keeps a bit-identical inline copy since ESM `import` does not work from `file://`); (4) new zero-dep CLI `scripts/render-report.mjs` — stdin/file/stdout mode, kebab→camel commandId routing, inlines 6 DS stylesheets, ~140 KB self-contained HTML with system-font fallback, absolute `file://` paths for Ghostty cmd-click; (5) all 18 skills wired (4 in session 4 + 14 in session 5). No scanner or hook behavior changes — purely additive
- **v7.6.1 playground visual patch (2026-05-06)** — Six bugs caught by the maintainer during manual browser verification after the v7.6.0 release. All were mismatches between DS classes and how playground renderers used them (or missing DS implementations the renderers assumed existed): `renderFindingsBlock` used the `.findings` outer class (the DS 2-column list+detail grid) → replaced with `<section class="report-meta">` + the correct `findings__list` pattern; `.report-table` was missing entirely from the DS but used in 7+ renderers → local CSS implementation; `renderPreDeploy` traffic-lights used the fixed 28×28 px `.sm-card__grade` for "PASS"/"PASS-WITH-NOTES"/"FAIL" → width-adapting status pill; threat-model matrix bubbles were not clickable → `<button>` with `data-threat-id` + click handler that scrolls to the Threats table; radar labels overlapped → SVG 280→380, R 105→125, dynamic `text-anchor`; `recommendation-card__body` text overflow → `overflow-wrap: anywhere`. 4/4 fix-specific + 18/18 regression tests passing. No scanner or hook behavior changes
- **v7.6.0 playground Tier 3 reference case (2026-05-06)** — The playground was raised to a visually and structurally complete reference for the `shared/playground-design-system/` Tier 3 supplement. 8 new DS components integrated into the 18 report renderers: `tfa-flow` + `tfa-leg` + `tfa-arrow` (lethal trifecta chain with `<button>` elements + ARIA), `mat-ladder` + `mat-step` (5-step maturity ladder), `suppressed-group` (narrative audit), `codepoint-reveal` + `cp-tag/cp-zw/cp-bidi` (Unicode steganography), `top-risks` + `top-risk[data-severity]` (ranked top-findings listing), extended `recommendation-card[data-severity]` on `clean`/`harden`/`audit`/`posture`/`pre-deploy`/`plugin-audit`, `risk-meter` (0-100 band visualization across 5 archetypes), `card--severity-{level}` modifier on findings cards. Wave 1 (Session 2): `badge--scope-security` (identity chip), `verdict-pill-lg` (DS Tier 3 pill across all 18 report types), `form-progress` + `fp-step` (onboarding wizard). Removed ~30 duplicate CSS declarations (DS wins the cascade). 5 new DS helpers + `mapSeverityToCardLevel` + `parseNarrativeAudit`. A11Y report updated. File size 10209 → 10677 lines across 5 sessions. No scanner or hook behavior changes — purely additive surface
- **v7.5.0 playground (2026-05-05)** — Single-file SPA at `plugins/llm-security/playground/llm-security-playground.html` (~10 200 lines) for onboarding, demos and workshop use without a Claude Code installation. Parsers + renderers for all 18 produces_report commands, 18 markdown test fixtures as contract anchors, a complete demo project with all 18 reports parsed in advance, vendor-synced design-system, 9 Playwright-generated screenshots. 11 new `window` globals exposed for testing/automation (`__store`, `__navigate`, `__loadDemoState`, `__PARSERS`, `__RENDERERS` …). Bug-fix: `normalizeVerdictText` handles GO-WITH-CONDITIONS without collapsing to ALLOW. No scanner or hook behavior changes — purely additive surface
- **v7.4.0 examples + e2e suite (2026-05-05)** — 9 runnable demonstration walkthroughs under `examples/` (lethal-trifecta, mcp-rug-pull, supply-chain-attack, poisoned-claude-md, bash-evasion-gallery, prompt-injection-showcase, malicious-skill-demo, toxic-agent-demo, pre-compact-poisoning) plus three new test suites under `tests/e2e/` (attack-chain, multi-session, scan-pipeline) that prove the framework works as a coordinated system. +45 tests (1777 → 1822), no scanner or hook behavior changes — purely additive surface
- **v8.0.0 env-var deprecation runway (D3, v7.3.0)** — Hook configuration has historically been split between process env-vars and the team-distributable `.llm-security/policy.json` file. Until v7.3.0 the two surfaces could disagree silently. The new `getPolicyValueWithEnvWarn()` helper in `scanners/lib/policy-loader.mjs` now emits a one-time-per-process stderr line whenever both surfaces are explicitly set:
- Affected pairs: `LLM_SECURITY_INJECTION_MODE`↔`injection.mode`, `LLM_SECURITY_TRIFECTA_MODE`↔`trifecta.mode`, `LLM_SECURITY_ESCALATION_WINDOW`↔`trifecta.escalation_window` (new key in `DEFAULT_POLICY`), `LLM_SECURITY_AUDIT_LOG`↔`audit.log_path`
- Env still wins through the v7.x window — no behaviour change today, only a runway signal
- Suppress headless-log noise with `LLM_SECURITY_DEPRECATION_QUIET=1`
- Teams should converge on `policy.json` for distributable configuration before v8.0.0 removes the env-var path
- **Opus 4.7 aligned** — Agent instructions rewritten for literal instruction-following (system card §6.3.1.1), defense-in-depth posture per §5.2.1, production hardening guide
Configuration intelligence for Claude Code — health checks, feature discovery, auto-fix, active-config inventory, reality-based Opus-4.7 token analysis, and plain-language UX that leads with prose ("Fix soon: The same automation is set up more than once") instead of technical IDs.
Claude Code reads instructions from 7+ file types across multiple scopes. This plugin tells you what's wrong, what's missing, what's silently conflicting, what's actually loaded, and where you're burning tokens unnecessarily:
- **Opportunities** — context-aware recommendations for Claude Code features you're not using
- **Action** — auto-fix with mandatory backups, syntax validation, rollback support, and human-in-the-loop workflow
- **What's active** — read-only inventory of plugins, skills, MCP servers, hooks, and CLAUDE.md cascade for a repo, with token estimates
- **Token hotspots** — `/config-audit tokens` ranks files by estimated waste across 6 Opus-4.7 patterns (cache-breaking volatile content, redundant tool permissions, deep import chains, oversized cascades, bloated SKILL.md descriptions, MCP tool-schema budget). Optional `--accurate-tokens` calibrates against Anthropic's `count_tokens` API.
- **System-prompt manifest** — `/config-audit manifest` ranks every token source (CLAUDE.md cascade, plugins, skills, MCP servers, hooks) by estimated tokens
- **Plain-language UX (v5.1.0)** — default output of all 18 commands leads with prose; findings group by user-impact category (Configuration mistake, Conflict, Wasted tokens, Missed opportunity, Dead config) and urgency phrase (Fix this now → FYI). Pass `--raw` for v5.0.0 verbatim output; `--json` is unchanged and byte-stable.
Deep requirements gathering, research, implementation planning, self-verifying execution, independent post-hoc review, and zero-friction multi-session resumption — with specialized agent swarms, adversarial review, and failure recovery. Six-command (brief, research, plan, execute, review, continue) universal pipeline + adaptive-depth per-phase effort dialog. `/trekbrief`, `/trekplan`, and `/trekreview` render their artifact to a self-contained HTML view and print the `file://` link.
v5.1.1 is a 13-step remediation patch closing 11 of 12 findings from the v5.1.0 review (the SC8 dogfood gate is operator-manual, scheduled for after-execute). Load-bearing bug fixes: YAML-number bypass in `brief-validator` so the gate fires for both quoted and unquoted `brief_version` (#8 + #11). Wiring: a new `lib/profiles/phase-signal-resolver.mjs` helper is invoked from `/trekplan`/`/trekresearch`/`/trekreview`/`/trekexecute` Phase 1, the resolved JSON is captured as `phase_signal_result`, and the `brief-validator --soft` gate is required uniformly across all 4 downstream commands (#9 + #12). Test refactor: runtime SC1 walk for trekbrief + per-tier resolver-output + missing-signals falsification per downstream command + dedicated profile-resolver non-interference test (#1#2#3#4#6#7#10). Documentation: Decision B high-effort behavior locked per command (gemini-bridge pass for `/trekplan`, full swarm + always-on `contrarian-researcher` for `/trekresearch`, skip Pass 3 + coordinator normalization for `/trekreview`, `gates_mode: closed` for `/trekexecute`) + brief Non-Goal/SC1 amendments + REMEMBER dogfood scaffolding. v5.1.1 is additive — no breaking changes against v5.1.0. See `plugins/voyage/CHANGELOG.md` § v5.1.1.
v5.1.0 adds Phase 3.5 to `/trekbrief`: 4 tier-coupled `AskUserQuestion` calls commit an effort level (`low | standard | high`) and an optional `model` (`sonnet | opus`) per downstream phase (`research`, `plan`, `execute`, `review`). The choices land in `brief.md` as `phase_signals:` (or `phase_signals_partial: true` on force-stop). `brief_version: 2.1` activates a validator-side sequencing gate (`BRIEF_V51_MISSING_SIGNALS`) so downstream commands halt with a friendly hint when signals are missing. Composition rule per downstream command: brief signal wins per-phase, profile fills gaps. `effort == low` activates each command's existing `--quick`-equivalent code-path (`/trekexecute` low-effort = `--gates open` + sequential-only). Additive — no breaking changes; pre-2.1 briefs still validate. See `plugins/voyage/CHANGELOG.md` § v5.1.0.
v5.0.3 lands the annotation UX modelled on `~/repos/claude-code-100x/claude-code-100x/build-site.js`: pencil-toggle annotation mode, **select text or click any element to anchor**, choose intent (**Fiks** / **Endre** / **Spørsmål**), write a comment, save. The sidebar groups annotations by section with intent badges; Copy Prompt assembles them into a structured markdown the operator pastes back into Claude. State persists in `localStorage` per artifact path. v5.0.2 was operator-led but too thin (line-click + freeform note, no intent categories). v5.0.1 had pointed at `/playground document-critique` (Claude-leads — wrong direction). v5.0.0 (breaking, kept) removed the v4.2/v4.3 bespoke playground SPA, `/trekrevise`, Handover 8, the supporting `lib/` modules, the Playwright e2e suite, and the `@playwright/test` / `@axe-core/playwright` devDeps. v5.0.3's `scripts/annotate.mjs` is one self-contained zero-dependency Node script. **The operator drives every annotation** — Claude never pre-generates suggestions in this flow. See `plugins/voyage/CHANGELOG.md` § v5.0.0 → § v5.0.3.
v4.0.0 (breaking) renamed the plugin from `ultraplan-local` to **Voyage** and all commands from `/ultra*-local` to `/trek*` to remove name collision with Anthropic's `/ultraplan` and `/ultrareview` features. See `plugins/voyage/TRADEMARKS.md` and `plugins/voyage/CHANGELOG.md`.
Six commands, one pipeline with clear division of labor:
- **`/trekbrief`** — Capture intent. Dynamic, quality-gated interview: a section-driven completeness loop (Phase 3) followed by a `brief-reviewer` stop-gate (Phase 4, max 3 review iterations). Required sections must reach an initial-signal gate AND pass review across completeness, consistency, testability, scope clarity, and research-plan validity before `brief.md` is written. Identifies research topics with copy-paste-ready `/trekresearch` commands. Optional auto-orchestration runs research + planning in foreground. Always interactive.
- **`/trekresearch`** — Gather context. Deep multi-source research with triangulation: 5 local agents + 4 external agents + Gemini bridge, producing structured briefs with confidence ratings. Makes no build decisions.
- **`/trekplan`** — Transform intent into an executable contract. Per-step YAML manifests (`expected_paths`, `commit_message_pattern`, `bash_syntax_check`). Plan-critic is a hard gate on manifest quality. Requires a task brief as input (`--brief` or `--project`). Auto-discovers `architecture/overview.md` when produced upstream and cross-references its `cc_features_proposed` against exploration findings.
- **`/trekexecute`** — Execute the contract disciplined. Manifest-based verification, independent Phase 7.5 audit from git log + filesystem (ignores agent bookkeeping), Phase 7.6 bounded recovery dispatch for missing steps. Step 0 pre-flight catches sandbox push-denial before any work. `--validate` mode offers a fast schema-only sanity-check between planning and execution.
- **`/trekreview`** — Close the iteration loop. Independent post-hoc reviewer reads `brief.md` from scratch and evaluates the diff produced by execute. Two parallel reviewers (brief-conformance + code-correctness) plus a Judge Agent (review-coordinator) for dedup and reasonableness filtering. Severity-tagged findings (Critical/High/Medium/Low/Info) with stable 40-char hex IDs feed back into planning via Handover 6 (`/trekplan --brief review.md` → remediation plan with `source_findings:` audit trail).
- **`/trekcontinue`** — Zero-friction multi-session resumption. In a fresh chat, type `/trekcontinue` — reads `.session-state.local.json` (Handover 7), prints a 3-line summary, and immediately begins executing the next session. Any session-end mechanism may write the state file (`/trekexecute` Phase 8/2.55/4 do so automatically; `/trekendsession` helper writes it for informal flows). Forward-compat schema (unknown top-level keys ignored) so future producers can extend additively.
`/trekbrief`, `/trekplan`, and `/trekreview` each end by running `scripts/annotate.mjs` against the just-written `.md`, printing the `file://<abs path>` link to the resulting self-contained operator-annotation HTML. The operator opens it, clicks any line to add their own note, watches a sidebar of every note (editable, deletable, persisted in browser `localStorage`), clicks "Copy Prompt" to get one structured prompt with every note, pastes back into Claude — Claude revises the `.md` from the notes. The operator drives every annotation.
All artifacts land in one project directory: `.claude/projects/{YYYY-MM-DD}-{slug}/` contains `brief.md`, `research/NN-*.md`, `plan.md`, `sessions/`, `progress.json`, `review.md`, and `.session-state.local.json` (gitignored). `--project <dir>` works across `/trekresearch`, `/trekplan`, `/trekexecute`, `/trekreview`, and (optionally) `/trekcontinue`.
v3.4.0 (non-breaking) adds the **autonomy chain from brief approval to main-merge** plus parallel-wave hardenings. New `lib/util/autonomy-gate.mjs` state machine (`idle → approved → executing → merge-pending → main-merged`), `lib/review/plan-review-dedup.mjs` for Phase 9 inline dedup, `lib/stats/event-emit.mjs` for autonomy-gate transitions and main-merge gate, and `--gates {open|closed|adaptive}` flag on all four pipeline commands. `commands/trekplan.md` Phase 8 seals Opus-4.7 plan/list-emission schema-drift via `plan-validator --strict`. `commands/trekexecute.md` Phase 2.6 wave-executor adds 11 hardenings for plugin-in-monorepo + gitignored-state topology (GIT_OPTIONAL_LOCKS, --max-turns, --max-budget-usd, scoped --allowedTools, push-before-cleanup ordering). New `hooks/scripts/post-compact-flush.mjs` PostCompact hook re-injects session-state after compaction. SC7 synthetic determinism floor (Jaccard ≥ 0.833) for plan + review fixtures. Hook baseline regression pins. Architecture decision: Path B (sequential `--no-ff` parallel waves with manifest-driven failure recovery) ships; Path C (cache-first hybrid) deferred to v3.5.0 contingent on cache-telemetry harvest.
v3.3.0 (non-breaking) adds `/trekcontinue` as the sixth command and the contracted **Handover 7 (.session-state.local.json)** for zero-friction multi-session resumption. New `lib/validators/session-state-validator.mjs` (schema v1, forward-compat — unknown top-level keys ignored), `lib/util/atomic-write.mjs` extracted from `pre-compact-flush.mjs` for tmp+rename writes, and `/trekendsession` helper for informal multi-session flows. `/trekexecute` Phase 8 / 2.55 / 4 now write the state file alongside `progress.json`. `pre-compact-flush.mjs` also refreshes the state file before context compaction (monotonic; never advances to non-resumable status). 22 new tests (163 → 185 green).
v3.2.0 (non-breaking) adds `/trekreview` as the fifth command and the contracted **Handover 6 (review → plan)** feedback loop. New artifact type `type: trekreview` validated by `lib/validators/review-validator.mjs`, stable 40-char SHA1 finding-IDs from `lib/parsers/finding-id.mjs`, Jaccard similarity for determinism testing (`lib/parsers/jaccard.mjs`), and a 12-key version-pinned rule catalogue (`lib/review/rule-catalogue.mjs`). Four new agents (review-orchestrator, brief-conformance-reviewer, code-correctness-reviewer, review-coordinator) implementing the Judge-Agent dedup pattern. `/trekplan` now consumes `--brief review.md` (BLOCKER + MAJOR findings become plan goals) and writes `source_findings: [<id>, ...]` audit trail. `brief-validator` accepts both `type: trekbrief` and `type: trekreview`.
v3.0.0 extracts the Claude-Code-specific architecture phase to a separate plugin. The planning pipeline now stays technology-agnostic; CC-feature matching becomes opt-in. The plan command still auto-discovers `architecture/overview.md` if produced upstream — the contract is filesystem-level, not code-level. Non-breaking for users of brief/research/plan/execute. See `plugins/voyage/CHANGELOG.md` for migration steps.
v2.4.0 (breaking, default behavior) removes background mode. The commands now run foreground in the main context because the harness does not expose the Agent tool to sub-agents — background orchestrators silently degraded the swarm to inline reasoning without external research tools. The `--fg` flag is preserved as a no-op alias for backward compatibility. Source: github.com/anthropics/claude-code/issues/19077.
v2.1 (non-breaking) replaced the hardcoded Q1–Q8 interview with a dynamic, quality-gated loop; `brief-reviewer` emits machine-readable per-dimension JSON scores so `/trekbrief` can use it as an internal stop-gate. v2.0 (breaking) extracted the interview from planning: briefs are reviewable artifacts that downstream agents (`brief-reviewer`, `plan-critic`, `scope-guardian`) validate independently. `/trekplan` requires `--brief` or `--project`. See `plugins/voyage/MIGRATION.md`.
v1.7 self-verifying chain (preserved): a step may not be marked `completed` unless its manifest verifies. v1.8 Opus 4.7 literalism fixes (preserved): literal Step+Manifest template, forbidden narrative headers, schema self-check.
v3.1.0 (in progress) adds a `lib/`-tree of zero-dep validators (`brief-validator`, `research-validator`, `plan-validator`, `progress-validator`, `architecture-discovery`) wired into the four commands as CLI shims, plus 109 `node:test` cases and a doc-consistency invariant test. The Phase 5.5 schema self-check now runs as `node lib/validators/plan-validator.mjs --strict` instead of three `grep -cE` calls — same checks, single source of truth, machine-readable error codes. Architecture discovery treats the upstream `architecture/overview.md` contract as drift-WARN, never drift-FAIL. Forking the plugin? `npm test` is the readiness gate.
v3.1.0 also adds: `docs/HANDOVER-CONTRACTS.md` as the single source of truth for the 5 pipeline handovers (extended to 6 in v3.2.0, then to 7 in v3.3.0); PreCompact-hook (`pre-compact-flush.mjs`, CC v2.1.105+) that fixes the documented progress.json drift bug — `--resume` now works after long conversations; UserPromptSubmit-hook that sets session titles `voyage:<command>:<slug>` for headless multiplexing (CC v2.1.94+); PostToolUse-hook that captures Bash `duration_ms` per call (CC v2.1.97+); semantic plan-critic rubric that catches paraphrased deferred decisions ("implement as needed", "wire it up") instead of just exact-string blacklist; `examples/01-add-verbose-flag/` showing a calibrated end-to-end pipeline run; `SECURITY.md` boilerplate; `docs/architect-bridge-test.md` smoke checklist.
Defense-in-depth security: plugin hooks block destructive commands and sensitive path writes, prompt-level denylist works in headless sessions, pre-execution plan scan catches dangerous commands before they run, scoped `--allowedTools` replaces `--dangerously-skip-permissions` in parallel sessions. Recommended hardening: `disableSkillShellExecution: true` for fork-ers handling untrusted plans (CC v2.1.91+).
Meta-awareness tools that counteract sycophancy, reinforcement loops, and compulsive AI interaction patterns.
AI assistants are structurally optimized to be agreeable. This creates reinforcement loops where productive collaboration is often a mirror showing you what you want to see. Research documents psychotic episodes triggered by sustained AI interaction in individuals with no prior psychiatric history.
- **Layer 4 — Contemplative references** — optional references to contemplative approaches when interaction flags are elevated. Opt-in
Research-informed thresholds. Alerts are progressive and never blocking. Privacy-first: prompt text is never logged. Layers 3 and 4 are off by default.
Auto-trigger session handoff at context threshold. Manual `/graceful-handoff` always works as backup. Built for Opus 4.7.
When you hit 60-70% context and have to start a new session, three things usually get rushed or forgotten: summarizing state, committing finished work, and writing a continuation prompt. v2.0 removed all three from the user's hands; v2.1 makes context detection model-aware so auto-trigger fires at the right moment on Opus 4.7's 1M window.
- **Model-aware context detection (v2.1)** — 4-step fallback chain (`used_percentage` → `payload-size` → `model-map` → 1M default), so Opus 4.7 no longer fires 5–7× too early
- **statusLine hint** — display-only warning at 60% and urgent reminder at 70% (never runs git, safe per research)
- **SessionStart auto-load** — on `--resume` / `compact`, handoff content is injected into the new session via `additionalContext`; no manual `cat` needed
- **Skill-architecture** — `disable-model-invocation: true` so Claude can't autonomously invoke the side-effect-bearing flow; user triggers manually or hooks call the pipeline directly
- **Deterministic JSON pipeline** — `scripts/handoff-pipeline.mjs` returns structured JSON; tests run without LLM involvement
- **Explicit staging** — pipeline stages ONLY the artifact (never `git add -A`, regression-tested)
- **No subagents, no web** — under 60s budget; pinned to Sonnet 4.6 to free Opus for the next session
### [MS AI Architect — Azure AI and Microsoft Foundry](plugins/ms-ai-architect/) `v1.15.0``🇳🇴 Norwegian`
Microsoft AI solution architecture guidance for Norwegian public sector and enterprise.
Meet Cosmo Skyberg — a structured architect persona who understands the problem before recommending technology. Every recommendation is grounded in 387 reference documents and verified against live Microsoft Learn documentation via MCP:
- **Structured advisory** — 7-phase methodology from business need to architecture recommendation and optional diagram
- **Regulatory assessments** — ROS analysis (NS 5814), DPIA/PVK, security scoring (6×5), EU AI Act classification, cost estimation in NOK (P10/P50/P90)
- **Norwegian public sector** — Digdir architecture principles, Utredningsinstruksen, NSM, Schrems II data residency, EU AI Act compliance workflow
- **Manual KB-refresh** — `/architect:kb-update` slash command drives sitemap-based change detection + new-URL discovery + per-file `microsoft_docs_fetch`-update + commit, run from an active Claude Code session. Scheduling is intentionally out of scope and left to the user (cron / launchd / GitHub Actions etc. as desired)
**One-click demo (v1.15.0, 2026-05-16):** "Last inn demo-data"-knappen på onboarding bootstrapper en ferdig "Acme Kommune" med demo-prosjektet "Acme: Kunde-chatbot" og alle 17 rapport-typer pre-importert. v2→v3 migrasjon auto-parser `raw_markdown` til `project.artifacts[cid]` så project-view viser aggregert verdict (BLOKKERT), key stats (17/17 artifacts), top-risks-liste, og navigerbart artifact-sidebar i én navigasjon. 24 retina-screenshots committed under `playground/screenshots/v1.15.0/` (12 surfaces × 2 tema), så forkere ser pluginen uten å kjøre noe. Standalone Playwright-runner under `tests/screenshot/` (egen `package.json`).
**Playground (v3, v1.15.0 — project-view integration, 2026-05-16):** Multi-surface decision-builder + report viewer. The single-file HTML app lives at `playground/ms-ai-architect-playground.html` (~3870+ lines). v1.15.0 erstatter v2 project-surface (screen-tabs + category-tabs + per-command paste-cards) med v3 `renderProjectView` (sidebar med 17 artifacts gruppert i 4 kategorier + main-area med per-artifact view eller overview + import-modal som DS-overlay). V2-surface helt fjernet (`renderProjectSurface`, `renderCommandSubCard`, `rehydratePasteImports`, 5 v2-CSS-klasser). 2 fingerprint-gap lukket (requirements + license headers). `migrateDataVersion` utvidet med `parserFor` slik at demo-state og persisted localStorage auto-parses. Ship-QA: `components-tier4-project-view.css` lagt til i `<link>`-kjeden (var ikke loaded → modal-overlay og two-column layout virket ikke). 386 E2E PASS, 0 FAIL, 2 WARN.
- **4 surfaces:** Onboarding (4 strukturerte / 14 fritekst, prefill alle command-skjemaer) → Home (project list + 3 entry tracks) → Catalog (24 commands grouped in 5 expansion categories with search) → **Project v3** (sidebar med 17 artifacts + søk + main-area med per-artifact view eller aggregate overview + import-modal overlay)
- **Persistence:** IndexedDB primary + localStorage fallback, schema-versioned (`STATE_KEY = 'ms-ai-architect-state-v1'`) with eager migrations pipeline. v1.10.0 adds idempotent `dataVersion v1→v2` migration that backfills `verdict` + `keyStats` on existing reports.
- **Light/dark theme toggle** with Aksel-aligned tokens in both modes (full WCAG AA contrast). Persisted in `localStorage('ms-ai-architect-theme')`, FOUC-safe via `<head>`-bootstrap script.
- **Vendored design-system** at `playground/vendor/`, kept in sync via `scripts/sync-design-system.mjs ms-ai-architect`. Standalone — opens from `file://` without server or marketplace dependency.
### [LinkedIn Thought Leadership](plugins/linkedin-thought-leadership/) `v1.2.0`
Build authentic LinkedIn authority through algorithmic understanding, strategic consistency, and AI-assisted content creation.
Updated for the January 2026 360Brew algorithm change, which validates your creator profile before distributing content. v1.2.0 reduces friction: auto-clipboard on all content commands, max 2 interactive steps per post, deterministic state management, MCP image carousel pipeline, progressive onboarding, and iCal calendar integration for batch scheduling.
- **Guided onboarding** — `/linkedin:onboarding` walks new users through profile → setup → first post in one flow
- **360Brew profile optimization** — audit your profile against LinkedIn's creator validation criteria
### [OKR for Public Sector](plugins/okr/) `v1.3.0``🇳🇴 Norwegian`
Turn strategy into measurable goals. An AI coach that learns your organization, tracks progress across cycles, and guides you from first OKR to organizational mastery.
Most OKR tools explain methodology. This plugin *knows your organization*. After a one-time onboarding conversation, it remembers your maturity level, strategic goals, current OKR, and cultural challenges. Every interaction builds on that knowledge — so you spend time on strategy, not re-explaining context.
- **Strategy to OKR** — transform goals from virksomhetsplan, tildelingsbrev, or any strategic document into well-structured OKR with guided writing, quality checks, and alignment scoring
- **Gap analysis** — `/okr:gap` compares your strategic documents against current OKR and shows what's covered, what's missing, and what to do about it
- **Cross-cycle learning** — `/okr:analyse` tracks score trends, recurring antipatterns, and alignment progress across cycles with visual charts
- **Proactive coaching** — automatically tells you where you are in the cycle and what to focus on — progress checks mid-cycle, retrospective prep near the end
- **19 antipattern detection** — catches sandbagging, activity-disguised-as-KR, set-and-forget, and 16 more named failure modes before they take root
- **Built for norsk offentlig sektor** — 4-month tertials, DFO terminology, tillitsvalgt involvement, Riksrevisjon-ready documentation, governance chain from Stortingsmelding to team OKR
Shared Claude Code [output style](https://code.claude.com/docs/en/output-styles) used across this marketplace. Gives every plugin a consistent, plain-language tone — so users don't have to switch mental gears when moving between plugins.
- **Explains what and why, not how** — describes the work in human terms, reserves technical detail for when the user asks
- **Hides noise by default** — long paths, raw commands, JSON, stack traces, and verbose tool output are summarized rather than dumped
- **Matches the user's language** — Norwegian when the user writes Norwegian, English otherwise
- **Honest about uncertainty** — says "I think this should work" instead of pretending to be sure
- **Keeps coding instructions intact** (`keep-coding-instructions: true`) — testing discipline, careful edits, and verification still apply
Optional. Every other plugin in the marketplace works without it; this just makes the conversation feel more like dialog and less like a console dump.
Activate with `/config` → **Output style** → **Human-Friendly**.
End-to-end facilitator for prompting Claude Design (`claude.ai/design` — Anthropic's Labs research preview launched 2026-04-17, Opus 4.7 pinned). Walks the operator from raw idea through intent-preset selection, audience and destination clarification, DESIGN.md anchor, five-layer prompt drafting, copy-paste delivery, iteration coaching, and ship-readiness handoff. Output is the prompt; the artifact gets built in Claude Design.
The plugin is the **pre-design and during-design** complement to Anthropic's official `knowledge-work-plugins/design` (`https://claude.com/plugins/design`). This plugin covers idea → prompt → iterate; Anthropic's plugin covers critique → accessibility → handoff. Zero command overlap by design — `tests/validate-plugin.sh` assertion (h) enforces the forbidden-command-name list mechanically.
- **Five foundation references + eight per-preset references** with evidence-grade labels (`Anthropic-documented + community-validated`, `Community-only`, `Experimental` for `frontier-design`)
- **Authoritative-claims discipline** — every reference file carries ≥1 Anthropic-domain URL citation (`anthropic.com`, `claude.com`, `support.claude.com`, `platform.claude.com`, `github.com/anthropics`); `.coverage.md` is the canonical registry
- **`.coverage.md`** at plugin root enumerates the 8 intent presets with evidence-grade labels and the 13 authoritative-claims files; SC2 and SC3 read from it directly
Shared design system for plugin Playgrounds — visual self-service UIs that complement terminal slash-commands. Aksel/Digdir-aligned aesthetics, WCAG 2.1 AA compliance, light + dark themes, A4 print stylesheets with B/W severity patterns.
Targets five plugins: `ms-ai-architect`, `okr`, `llm-security`, `voyage`, `config-audit`. Built for Norwegian public sector decision-makers (kommunaldirektører, sikkerhetsoffiserer, OKR-koordinatorer) plus developer power-users — one visual family, two information densities.
- **Tokens** — Inter/JetBrains Mono/Source Serif 4 (all self-hosted, OFL 1.1), body 17px, Digdir blue `#0062BA`, deuteranopia-safe severity ramp, distinct severity-red vs failure-red, plugin-scope colors, semantic CSS custom properties
- **JSON schemas** — `finding.schema.json`, `okr-set.schema.json`, `ros-threat.schema.json` for cross-plugin data interchange
- **Privacy-first** — all fonts self-hosted as woff2 in `fonts/`, zero external CDN requests, GDPR-safe for offentlig sektor, works offline / behind air-gapped firewalls
- **Reference scenarios** — Lier kommune ROS-rapport (ms-ai-architect), Bærum kommune T2 OKR live-writer, Direktoratet for digital tjenesteutvikling ToxicSkills findings review (85 funn, BLOCK)
- **Vendoring sync** — `scripts/sync-design-system.mjs <plugin>` copies the design-system into `plugins/<name>/playground/vendor/` so each plugin stays standalone. SHA-256 MANIFEST detects local drift; `--force` to override. First adopter: `ms-ai-architect` (2026-05-03).
"description":"Meta-awareness tools for healthy AI interaction patterns. Detects reinforcement loops, scope escalation, narrative crystallization, and other compulsive patterns.",
"description":"Meta-awareness tools for healthy AI interaction patterns. Detects reinforcement loops, scope escalation, narrative crystallization, and other compulsive patterns.",
How this marketplace is maintained, what you can expect from upstream, and how it's meant to be used.
## TL;DR
- Solo-maintained, AI-assisted development, MIT licensed.
- **Fork-and-own is the default model.** Upstream is a starting point, not a vendor.
- Issues welcome as signals. Pull requests are not accepted — see [Why no PRs](#pull-requests--no).
- No SLA. Best-effort bug fixes and security advisories. Breaking changes happen and are noted in each plugin's CHANGELOG.
---
## Can I trust this?
Be honest with yourself about what you're adopting:
- **One maintainer.** If I get hit by a bus, the bus wins. The repos stay up under MIT, but no one owes you a fix.
- **AI-generated code with human review.** Every plugin is built through dialog-driven development with Claude Code. I read, test, and judge the output before it ships, but I'm not auditing every line the way a security firm would. Treat it accordingly.
- **No commercial interests.** I'm not selling a SaaS, not steering you toward a paid tier, not collecting telemetry. The plugins run locally in your Claude Code installation.
- **MIT licensed.** Fork it, modify it, ship it under your own name.
If you work somewhere that needs vendor accountability, support contracts, or signed assurances — **this isn't that.** Use it as a reference implementation, fork it into your own organization, and own the result.
---
## How this is meant to be used
### Fork-and-own
The intended workflow:
1. **Fork** the marketplace (or a single plugin) into your own organization or namespace.
2. **Tailor** it to your context — terminology, integrations, cycle lengths, regulatory framing, whatever doesn't fit out of the box.
3. **Maintain it yourself.** Treat your fork as the canonical version for your team.
4. **Watch upstream selectively.** Cherry-pick changes that help, ignore changes that don't. There's no obligation to stay in sync.
This isn't a workaround for not accepting PRs. It's the actual recommended adoption pattern, especially for plugins like `okr` and `ms-ai-architect` where every Norwegian public sector organization will need its own tildelingsbrev mappings, terminology, and integrations. A central "one true plugin" would be wrong for everyone.
### What to change first when you fork
Each plugin differs, but the common edits are:
- **Identity** — rename the plugin, replace authorship, update README.
- **External integrations** — issue trackers, knowledge bases, dashboards, observability backends. The plugins ship as starting points, not pre-wired. Every organization must configure its own integrations.
- **Norwegian-specific framing** — relevant for `okr` and `ms-ai-architect`. Other plugins are jurisdiction-neutral. Rewrite for your jurisdiction if you're outside Norway.
- **Reference docs** — the knowledge base in each plugin reflects my reading. Replace with your organization's authoritative sources.
- **Hooks and policies** — security thresholds, blocked commands, and audit gates are tuned to my taste. Tune them to yours.
### Staying current with upstream
If you want to pull in upstream changes later:
- **Cherry-pick, don't merge.** Each plugin moves independently and breaking changes land without ceremony.
- **Read the CHANGELOG first.** Every plugin has one.
- **Keep your customizations in clearly-named files.** The harder upstream is to merge cleanly, the more painful staying current becomes. A `local/` directory or `*.local.md` convention helps.
---
## What upstream provides
| | What I do | What I don't |
|---|---|---|
| **Bug fixes** | Best-effort when I notice or get a clear report | No SLA, no triage commitment |
| **Security issues** | Investigate within reasonable time, document in CHANGELOG | No CVE process, no embargo coordination |
| **New features** | When they fit my own usage | Not on request |
| **Norwegian public sector context** | Kept current as long as the project lives | If I lose interest or change jobs, the framing freezes |
| **Breaking changes** | Documented in CHANGELOG | They happen — version pin if you need stability |
| **Compatibility** | Tracked against current Claude Code releases | No long-term support branches |
If any of this is a dealbreaker — fork now, version-pin, and stop reading upstream.
---
## How to contribute
### Issues — yes, please
Issues are the most valuable thing you can send me:
- **Bug reports** with reproduction steps. Even a screenshot helps.
- **Use-case feedback.** "I tried to use this in my organization and X didn't fit" is genuinely useful, even if I can't fix it for you.
- **Pointers to better sources.** If you know a DFØ veileder, an NSM guideline, or an academic paper that contradicts what's in a knowledge base, tell me.
- **Security findings.** See each plugin's `SECURITY.md` for disclosure preference where one exists; otherwise email rather than open a public issue.
### Pull requests — no
This is deliberate, not laziness:
- **Solo review is a bottleneck.** Honest PR review takes me longer than rewriting from scratch. The math doesn't work.
- **Forks are where the value is.** The fork-and-own model means upstream consolidation isn't the point. Your organization's adaptations belong in your fork, not mine.
- **AI-generated code complicates provenance.** Every line here is produced through dialog with Claude Code, with me as the judge. Mixing in PRs from contributors with different processes and licensing assumptions creates a mess I'd rather not untangle.
If you've built something useful on top of a fork, **publish it under your own name and link back.** I'll happily list notable forks here once they exist.
### Notable forks
*(To be populated as forks emerge. If you've forked one of these plugins for production use, open an issue and I'll add a link.)*
---
## Relationship between plugins
These plugins are **independent**. Install one without the others, fork one without the others. They share conventions (slash command naming, hook patterns, AI-generated disclosure) but no runtime dependencies.
The marketplace is a **catalog**, not a suite. Don't fork the whole repo unless you actually want to maintain everything.
---
## Versioning and stability
- **Semantic versioning per plugin.** Each plugin has its own `CHANGELOG.md` and version number.
- **Breaking changes happen.** I bump the major version when they do, but I don't run an LTS branch.
- **Pin your version.** If stability matters more than features, install a specific version and stay there until you choose to upgrade.
---
## Public sector adoption notes
For Norwegian etater specifically:
- **DPIA-relevant data flows are documented in the relevant plugin README where applicable.** Read them before installation.
- **No data leaves your machine** beyond what Claude Code itself sends to Anthropic. The plugins themselves do not call external services unless you configure an integration.
- **Drøftingsplikt and ledelsesansvar** are not replaced by these tools. The `okr` plugin coaches; it does not decide. The `ms-ai-architect` plugin advises; it does not approve.
- **Choose your Claude deployment carefully.** claude.ai vs. API direct vs. Bedrock in EU region have different data residency profiles. The plugins don't choose for you.
---
## License
MIT for all plugins in this marketplace. See each plugin's `LICENSE` file.
*Built for my own Claude Code workflow and shared openly for anyone who finds it useful. This is a solo project — bug reports and feature requests are welcome, but pull requests are not accepted.*
> **Solo-maintained, fork-and-own.** This plugin is a starting point, not a vendor product. Issues are welcome as signals; pull requests are not accepted. See [GOVERNANCE.md](GOVERNANCE.md) for the full model and what upstream provides.
*AI-generated: all code produced by Claude Code through dialog-driven development. [Full disclosure →](../../README.md#ai-generated-code-disclosure)*
A Claude Code plugin that counteracts sycophancy, reinforcement loops, and
A Claude Code plugin that counteracts sycophancy, reinforcement loops, and
compulsive interaction patterns through behavioral modification and
compulsive interaction patterns through behavioral modification and
@ -116,6 +118,169 @@ commented on, and omitted entirely when conditions are not met.
**Enable:** Set `layer4: true` in `.claude/ai-psychosis.local.md`
**Enable:** Set `layer4: true` in `.claude/ai-psychosis.local.md`
and restart Claude Code. Layer 4 is opt-in (off by default).
and restart Claude Code. Layer 4 is opt-in (off by default).
## What's new in v1.2.0
v1.2.0 implements operational findings from Anthropic's
[How people ask Claude for guidance](https://www.anthropic.com/research/claude-personal-guidance)
Appendix (April 2026). Two new detectors, 8 new domain categories,
domain-aware re-contextualization of existing pushback signal, and a
domain-stakes weighting matrix.
### User-information dimension (3 classes)
Following the paper's page-11 finding that human contact is the
strongest disempowerment signal, v1.2 classifies each prompt:
warnings.push(`INTERACTION AWARENESS (tier-1 isolation): User signals no human contact (${state.turn_count} turns) in a high-stakes domain (${stateDomains.filter(d=>HIGH_STAKES_DOMAINS.includes(d)).join(', ')}). Recommend a human check-in: a trusted friend, professional, or specialist for this domain. Stay supportive but do not be a substitute for that contact.`);
warnings.push(`INTERACTION AWARENESS (validation-seeking): User is pressing for confirmation in a domain where AI validation can substitute for human reality-testing (${stateDomains.filter(d=>HIGH_SYCOPHANCY_DOMAINS.includes(d)).join(', ')}). Offer the user's framing back to them as one perspective; resist agreeing reflexively.`);
warnings.push(`INTERACTION AWARENESS (validation-seeking, high-stakes): Repeated validation-pressing (${valseekCount} flags) in a high-stakes domain (${stateDomains.filter(d=>HIGH_STAKES_DOMAINS.includes(d)).join(', ')}). Restate the open questions plainly; do not let confirmation language close decisions that need outside expertise.`);
warnings.push(`INTERACTION AWARENESS (pushback re-contextualization): Repeated pushback (${newPushbackCount}) in a high-sycophancy domain (${stateDomains.filter(d=>HIGH_SYCOPHANCY_DOMAINS.includes(d)).join(', ')}) often signals pressing for validation, not factual disagreement. Hold your read; restate the user's frame back to them rather than adjusting your conclusion.`);
}elseif(allInfoOnly){
// Healthy self-advocacy in info-seeking domains — no alert.
}else{
warnings.push(`INTERACTION AWARENESS (pushback): User has pushed back ${newPushbackCount} times this session. Note whether the pushback is factual correction or pressure to agree; do not silently revise your read either way.`);
msg+=` INTERACTION AWARENESS (tier-2 cross-session isolation): ${recent.length} consecutive sessions show no human contact in high-stakes domains. This is a sustained pattern. Recommend a human check-in (trusted person, professional, or domain specialist) before proceeding here.`;
it('matches "are you sure?"',()=>assert.ok(matchesAny(pbReactivePatterns,'are you sure?')));
it('does not match "tell me what to do" (no pushback)',()=>assert.equal(matchesAny(pbReactivePatterns,'tell me what to do'),false));
it("matches \"i'm not convinced\"",()=>assert.ok(matchesAny(pbReactivePatterns,"i'm not convinced this works")));
it('does not match "i am convinced" (no negation)',()=>assert.equal(matchesAny(pbReactivePatterns,'i am convinced this works'),false));
it('matches "that doesn\'t seem right"',()=>assert.ok(matchesAny(pbReactivePatterns,"that doesn't seem right to me")));
it('does not match "that seems right" (positive sense)',()=>assert.equal(matchesAny(pbReactivePatterns,'that seems right to me'),false));
it('matches "that\'s not what I meant"',()=>assert.ok(matchesAny(pbReactivePatterns,"that's not what I meant by that")));
it('does not match "I meant exactly that"',()=>assert.equal(matchesAny(pbReactivePatterns,'I meant exactly that'),false));
it('matches "let me add context"',()=>assert.ok(matchesAny(pbReactivePatterns,'let me add context — the issue is X')));
it('does not match "I added context to the function"',()=>assert.equal(matchesAny(pbReactivePatterns,'I added context to the function'),false));
it('matches "actually, my situation is different"',()=>assert.ok(matchesAny(pbReactivePatterns,'actually, my situation is different')));
it('does not match "actually that approach works"',()=>assert.equal(matchesAny(pbReactivePatterns,'actually that approach works'),false));
it("matches \"I think you're wrong\"",()=>assert.ok(matchesAny(pbReactivePatterns,"I think you're wrong about this")));
it("does not match \"I think we're wrong\" (different pronoun)",()=>assert.equal(matchesAny(pbReactivePatterns,"I think we're wrong here"),false));
it("matches \"I don't agree\"",()=>assert.ok(matchesAny(pbReactivePatterns,"I don't agree with that conclusion")));
it('does not match "I agree with you"',()=>assert.equal(matchesAny(pbReactivePatterns,'I agree with you fully'),false));
it('matches "are you absolutely sure"',()=>assert.ok(matchesAny(pbReactivePatterns,'are you absolutely sure about that')));
it('does not match "we are sure of the answer" (no questioning frame)',()=>assert.equal(matchesAny(pbReactivePatterns,'we are sure of the answer'),false));
});
describe('pushback preemptive patterns',()=>{
it('matches "steelman"',()=>assert.ok(matchesAny(pbPreemptivePatterns,'please steelman this argument')));
it('does not match "steel manufacturing" (no whole-word match)',()=>assert.equal(matchesAny(pbPreemptivePatterns,'the steel manufacturing report'),false));
it("matches \"play devil's advocate\"",()=>assert.ok(matchesAny(pbPreemptivePatterns,"can you play devil's advocate here")));
it('does not match "play music" (different verb object)',()=>assert.equal(matchesAny(pbPreemptivePatterns,'play music while coding'),false));
it('matches "argue against this"',()=>assert.ok(matchesAny(pbPreemptivePatterns,'argue against this proposal')));
it('does not match "they argue with each other"',()=>assert.equal(matchesAny(pbPreemptivePatterns,'they argue with each other'),false));
});
describe('domain relationship patterns',()=>{
it('matches "my partner won\'t listen"',()=>assert.ok(matchesAny(domainRelationshipPatterns,"my partner won't listen")));
it('matches "in our relationship"',()=>assert.ok(matchesAny(domainRelationshipPatterns,'in our relationship things changed')));
it('matches "considering divorce"',()=>assert.ok(matchesAny(domainRelationshipPatterns,'considering divorce after years')));
it('matches "romantically involved"',()=>assert.ok(matchesAny(domainRelationshipPatterns,'we are romantically involved')));
it('does not match "function relationship between input and output" (technical false-positive)',()=>assert.equal(matchesAny(domainRelationshipPatterns,'function relationship between input and output'),false));
it('does not match "database relationship mapping" (technical false-positive)',()=>assert.equal(matchesAny(domainRelationshipPatterns,'database relationship mapping'),false));
it('does not match "the data is updating" (no dating word boundary)',()=>assert.equal(matchesAny(domainRelationshipPatterns,'the data is updating in real time'),false));
it('does not match "romantic comedy film" (no involved/interested suffix)',()=>assert.equal(matchesAny(domainRelationshipPatterns,'watching a romantic comedy film'),false));
"description":"End-to-end facilitator for prompting Claude Design (claude.ai/design) — idea to copy-paste-ready prompt with iteration coaching, citing Anthropic primary sources.",
This file is the canonical input for SC2 verification (`tests/test-sc2-artifact-coverage.sh`) and the SC3 Authoritative-claims registry (`tests/test-sc3-citations.sh`). Both tests read this file directly — keep it in sync with the references tree.
Anthropic's launch enumeration names eight intent presets; this plugin ships one reference file per preset with explicit evidence-grade labelling. The evidence-grade levels are:
- **Anthropic-documented + community-validated** — Anthropic publishes verbatim prompt patterns and community practitioners have independently validated them
- **Community-only** — Anthropic names the preset but publishes no per-preset prompt patterns; the patterns come from community practitioners with attribution
- **Experimental** — neither Anthropic nor community practitioners have published verifiable prompt patterns; the preset is engaged speculatively
The evidence-grade labels are load-bearing for SC2 and SC3. Per-preset reference files restate the grade inline on line 4.
The preset names in column 1 (`designs`, `prototypes`, `slides`, `one-pagers`, `wireframes-mockups`, `pitch-decks`, `marketing-collateral`, `frontier-design`) are the canonical names used by `tests/test-sc2-artifact-coverage.sh`. The test extracts column 1 via awk and runs grep against the plugin's content tree to verify each preset has at least one supporting file.
---
## Authoritative-claims files
The following files contain authoritative claims (Anthropic-published material, primary sources, or community-converged patterns with attribution). Each must carry at least one Anthropic-domain URL citation. `tests/test-sc3-citations.sh` reads this bullet list, parses paths via awk on `^- `, then runs the citation grep against each file.
The bullet-list format is load-bearing — `tests/test-sc3-citations.sh` parses lines starting with `- ` (dash + space). Do not switch to a table or numbered list without updating the test.
---
## Re-research triggers
This manifest refreshes when any of these events occurs:
- **Anthropic publishes per-preset guidance for a Community-only preset** (one-pagers, wireframes-mockups, pitch-decks, marketing-collateral) — upgrade the affected row's evidence grade and add the new Anthropic anchor URL
- **Anthropic publishes per-preset guidance for the Experimental preset** (frontier-design) — upgrade to Community-only or Anthropic-documented depending on coverage depth
- **A new intent preset is added to Anthropic's launch-post enumeration** (`https://anthropic.com/news/claude-design-anthropic-labs`) — add a new row, write a new preset reference file
- **An existing intent preset is removed from the enumeration** — remove the row, deprecate the reference file in `CHANGELOG.md`
- **A first verified frontier-design practitioner artifact ships publicly** with prompt + output + reproduction steps — upgrade the frontier-design row from Experimental to Community-only, update `presets/frontier-design.md`
- **Anthropic support article URL slugs change while keeping numeric IDs stable** — re-pin URLs in column 4 (Anthropic anchor URL); the numeric IDs in `support.claude.com/en/articles/<numeric-id>-<slug>` are the stable anchor
- **Labs → GA URL rename for `claude.ai/design`** — re-pin the launch-post URL once the `-anthropic-labs` slug is dropped (note: the launch URL `https://anthropic.com/news/claude-design-anthropic-labs` may or may not 301-redirect after the rename)
When any trigger fires, run `bash plugins/claude-design/verify.sh --strict` after the manifest update to confirm SC2 and SC3 still pass.
---
## Related sources (for context, not for SC checks)
Anthropic primary sources that ground this manifest but are not themselves authoritative-claims files (because they are external URLs, not plugin files):
- `https://claude.com/blog/improving-frontend-design-through-skills` — default-avoidance blog post
- `https://claude.com/plugins/design` — Anthropic's official knowledge-work-plugins/design plugin (downstream tool)
- `https://github.com/anthropics/knowledge-work-plugins` — source for Anthropic's downstream plugin
Anthropic URL canonicalisation: every `support.claude.com` reference uses the `https://support.claude.com/en/articles/<numeric-id>-<slug>` form. Numeric IDs are stable across slug rewrites; slug-only URLs are not.
- `verify.sh` top-level roll-up with `--strict` (SC1 missing-block becomes FAIL) and `--quick` (skip skill-triggers test) flags.
- `LICENSE` (MIT) and `GOVERNANCE.md` (marketplace fork-and-own blurb).
- Marketplace registration in root `.claude-plugin/marketplace.json`.
### Documentation
- Plugin `README.md` rewritten from scaffold placeholder to full v0.1 surface description with `Scope and complementarity` section (placed before installation per brief), `What this plugin is NOT` (Non-Goals), eight-phase facilitation flow table, skill surface table, reference content map, per-preset coverage table, verification section, AI-generated disclosure, fork-and-own MIT licensing.
- Plugin `CLAUDE.md` translated to English (operator override of marketplace's Norwegian-dialogue default per v0.1 brief constraint); added `Scope fence` section explicitly forbidding command-name collisions with Anthropic's `knowledge-work-plugins/design` (`/critique`, `/accessibility`, `/ux-copy`, `/research-synthesis`, `/design-system`, `/handoff`); `Authoring rules` section codifies English-everywhere, no operator-private context, evidence-grade label discipline, URL canonicalisation on `support.claude.com/en/articles/<numeric-id>-<slug>`; `Communication patterns` block preserved verbatim.
- Root marketplace `README.md` updated with `### [Claude Design](plugins/claude-design/) \`v0.1.0\`` entry under the `## Plugins` section, positioned after the Human-Friendly Style entry per existing convention. Entry documents the complementary lifecycle coverage vs `knowledge-work-plugins/design`.
### Notes
- **Scope:** claude-design facilitates the pre-design and during-design lifecycle for `claude.ai/design` (Anthropic Labs research preview, Opus 4.7 pinned, eight intent presets). For post-design — critique, accessibility audit, UX copy review, design-system audit, engineering handoff — install Anthropic's official plugin via `claude plugins add knowledge-work-plugins/design`. Zero command overlap, complementary by design.
- **No browser automation.** This plugin produces prompts; the artifact gets built inside `claude.ai/design`. The operator copies and pastes manually.
- **No artifact code generation.** This plugin is a prompt-builder, not an artifact generator. Claude Design is the generator.
- **No artifact storage or versioning.** Claude Design has no version-tree primitive and this plugin does not invent one. The verbal save-pattern documented in `references/03-iteration-and-session.md` is the closest substitute.
- **English everywhere in shipped content.** Operator override of the marketplace's default Norwegian-dialogue convention. `tests/validate-plugin.sh` assertion (j) emits WARN on Norwegian diacritics in shipped content; review case-by-case.
- **Evidence-grade discipline.** Every per-preset reference file carries an inline `Evidence grade:` label on line 4 with one of three values: `Anthropic-documented + community-validated`, `Community-only`, `Experimental`. `.coverage.md` is the canonical registry.
- **Re-research triggers** documented in `.coverage.md` — fire on Anthropic publishing per-preset guidance for Community-only presets, on new intent presets added to the launch enumeration, on the first verified `frontier-design` practitioner artifact shipping publicly, on Labs → GA URL rename for `claude.ai/design`, on Anthropic's `knowledge-work-plugins/design` adding or removing slash-commands.
## [0.1.0-pre] — 2026-05-15
### Added
- Initial scaffold (README, CLAUDE.md, ROADMAP, TODO, plugin.json placeholder).
This plugin is an expert on **Claude Design** (`claude.ai/design`) — Anthropic's Labs research preview for generating interactive design artifacts from a prompt. It walks the operator through the full lifecycle: idea → intent-preset selection → audience and destination → DESIGN.md anchor → five-layer prompt drafting → copy-paste delivery → iteration coaching → ship-readiness handoff. It does not generate artifact code itself and it does not drive the browser; it produces the prompt that the operator pastes into Claude Design.
## Status
`v0.1.0`. Surface:
- One skill: `claude-design-facilitator` (auto-fire + explicit `/claude-design-facilitator` slash command)
- Five foundation references under `skills/claude-design-facilitator/references/`
- Eight per-preset references under `skills/claude-design-facilitator/references/presets/`
- Five test scripts under `tests/` plus a `verify.sh` roll-up
- A `.coverage.md` preset manifest at the plugin root (canonical input for SC2 and the SC3 Authoritative-claims registry)
No commands, no agents, no hooks, no MCP servers at v0.1. The single skill is the entire user-facing surface.
## Marketplace context
This plugin lives inside `ktg-plugin-marketplace`. No separate git repository, no separate Forgejo remote. All commits go to the marketplace repository at `https://git.fromaitochitta.com/open/ktg-plugin-marketplace`.
Marketplace conventions inherited from the root `CLAUDE.md`:
- Conventional Commits — `type(scope): description`; scope is `claude-design`
- Hooks in Node.js (`.mjs`), never bash (this plugin ships no hooks at v0.1)
- Zero npm dependencies in hooks and scripts
- Docs-triple updated in the same commit on every feature change: plugin `README.md` + plugin `CLAUDE.md` + root `README.md`
## Architecture (v0.1)
- **`skills/claude-design-facilitator/SKILL.md`** is the auto-fire entry point AND the explicit `/claude-design-facilitator` invocation surface. The skill body documents the eight-phase facilitation flow.
- **`skills/claude-design-facilitator/.triggers.txt`** lists the natural-language phrases the skill auto-fires on. `tests/test-skill-triggers.sh` validates every phrase appears in the SKILL.md description.
- **`skills/claude-design-facilitator/references/`** is the knowledge base. Five foundation references (00–04) plus eight per-preset references under `references/presets/`. Every authoritative claim cites an Anthropic primary source inline.
- **`.coverage.md`** at the plugin root is the SC2 manifest (preset enumeration with evidence-grade labels) and the SC3 Authoritative-claims registry (bullet list of files that must carry Anthropic-domain citations).
- **`tests/`** + **`verify.sh`** enforce the brief Success Criteria: SC1 dogfood-log format, SC2 per-preset coverage, SC3 citation discipline, plus skill description quality and plugin structural integrity.
The skill body never offers to generate artifact code, automate the browser, or store artifact history (per [Non-Goals in README](README.md)). It produces prompts.
## Scope fence
This plugin covers **pre-design and during-design** for `claude.ai/design`: idea → prompt → preview → iterate → ship-readiness.
**Post-design** — critique, accessibility audit, UX copy review, research synthesis, design-system audit, engineering handoff — is out of scope and lives in Anthropic's official `knowledge-work-plugins/design` (`https://claude.com/plugins/design`). This plugin must never duplicate the commands `/critique`, `/accessibility`, `/ux-copy`, `/research-synthesis`, `/design-system`, `/handoff` — with or without a `claude-design:` namespace prefix. `tests/validate-plugin.sh` assertion (h) enforces this scope fence mechanically.
The lifecycle-stage coverage map and the operational handoff between the two plugins are documented in `skills/claude-design-facilitator/references/04-handoff-and-scope.md`.
## Authoring rules
Every contribution to this plugin must respect these rules:
- **Language: English everywhere.** Plugin file content — `README.md`, `CLAUDE.md` (this file), `CHANGELOG.md`, `SKILL.md`, all `references/*.md`, all `tests/*.sh` output messages, every code comment — is English. This is the operator override of the marketplace's default Norwegian-dialogue policy; documented in the v0.1 brief. The `tests/validate-plugin.sh` assertion (j) emits a WARN on Norwegian diacritics in shipped content; review case-by-case (citation slugs occasionally legitimately carry diacritics, but the default is zero hits).
- **No operator-private context in shipped content.** No personal-name or organization-affiliation tokens, no copy-paste from local session-state and handoff files. `tests/validate-plugin.sh` assertion (i) enforces this with a recursive grep on the specific patterns it bans; the grep excludes the local files themselves.
- **Evidence-grade label discipline.** Every per-preset reference file carries an inline `Evidence grade:` label on line 4. The three grades are `Anthropic-documented + community-validated`, `Community-only`, and `Experimental`. `.coverage.md` is the canonical registry. SC2 and SC3 read from `.coverage.md` directly — keep it in sync.
- **URL canonicalisation.** All `support.claude.com` references use the form `https://support.claude.com/en/articles/<numeric-id>-<slug>`. Numeric IDs are stable across slug rewrites; slug-only URLs are not. `https://anthropic.com/news/...` and `https://claude.com/blog/...` follow whatever slug Anthropic publishes.
- **No NIH of Anthropic surfaces.** The plugin recommends Anthropic's `knowledge-work-plugins/design` as the downstream tool; it does not duplicate that plugin's functionality.
## Workflow
The Voyage pipeline produces v0.1 and every subsequent feature change:
1. **Brief** closes scope and scope boundaries
2. **Research** gathers external sources — Anthropic primary material (news posts, support articles, blog posts, open-source skills, tutorials, plugins), plus community practitioners with attribution
3. **Plan** specifies file-by-file what gets built
4. **Execute** delivers the code and content
5. **Review** is the release gate (`/trekreview`)
Voyage policy: Opus across all sub-agents and orchestrator phases (per `feedback_voyage_opus_always`).
For incremental content updates that do not warrant a full Voyage iteration (e.g., refreshing a single per-preset reference when Anthropic publishes new guidance), the docs-triple rule still applies: plugin `README.md` + plugin `CLAUDE.md` (this file) + root `README.md` updated in the same commit as the content change.
## Communication patterns
### Linking to local files
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
- Always use absolute paths. Never `~/` or relative paths.
- For multiple files, render as a bullet list of named markdown links.
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
How this marketplace is maintained, what you can expect from upstream, and how it's meant to be used.
## TL;DR
- Solo-maintained, AI-assisted development, MIT licensed.
- **Fork-and-own is the default model.** Upstream is a starting point, not a vendor.
- Issues welcome as signals. Pull requests are not accepted — see [Why no PRs](#pull-requests--no).
- No SLA. Best-effort bug fixes and security advisories. Breaking changes happen and are noted in each plugin's CHANGELOG.
---
## Can I trust this?
Be honest with yourself about what you're adopting:
- **One maintainer.** If I get hit by a bus, the bus wins. The repos stay up under MIT, but no one owes you a fix.
- **AI-generated code with human review.** Every plugin is built through dialog-driven development with Claude Code. I read, test, and judge the output before it ships, but I'm not auditing every line the way a security firm would. Treat it accordingly.
- **No commercial interests.** I'm not selling a SaaS, not steering you toward a paid tier, not collecting telemetry. The plugins run locally in your Claude Code installation.
- **MIT licensed.** Fork it, modify it, ship it under your own name.
If you work somewhere that needs vendor accountability, support contracts, or signed assurances — **this isn't that.** Use it as a reference implementation, fork it into your own organization, and own the result.
---
## How this is meant to be used
### Fork-and-own
The intended workflow:
1. **Fork** the marketplace (or a single plugin) into your own organization or namespace.
2. **Tailor** it to your context — terminology, integrations, whatever doesn't fit out of the box.
3. **Maintain it yourself.** Treat your fork as the canonical version for your team.
4. **Watch upstream selectively.** Cherry-pick changes that help, ignore changes that don't. There's no obligation to stay in sync.
For `claude-design` specifically, the most likely fork is a content adaptation — different intent-preset coverage (e.g., dropping `frontier-design` if your team never uses it), an organization-specific DESIGN.md template, a different evidence-grade discipline, or per-preset prompt patterns tuned to your team's design system. The plugin is a content surface plus a single skill. Forking it is straightforward.
### What to change first when you fork
- **Identity** — rename the plugin, replace authorship, update README.
- **Reference content** — the `references/` tree reflects what Anthropic published and the community converged on as of 2026-05-17. Adjust to your team's house style and design system.
- **Frontmatter** — `name` and `description` show up in `/config`. Pick names that won't collide with other forks installed on the same machine.
### Staying current with upstream
If you want to pull in upstream changes later:
- **Cherry-pick, don't merge.** Each plugin moves independently.
- **Read the CHANGELOG first.**
- **Keep your customizations distinct.** A renamed skill (`my-org-design-facilitator`) merges more cleanly than edits to `claude-design-facilitator`.
---
## What upstream provides
| | What I do | What I don't |
|---|---|---|
| **Bug fixes** | Best-effort when I notice or get a clear report | No SLA, no triage commitment |
| **Security issues** | Investigate within reasonable time, document in CHANGELOG | No CVE process, no embargo coordination |
| **New features** | When they fit my own usage | Not on request |
| **Breaking changes** | Documented in CHANGELOG | They happen — version pin if you need stability |
| **Compatibility** | Tracked against current Claude Code releases | No long-term support branches |
If any of this is a dealbreaker — fork now, version-pin, and stop reading upstream.
---
## How to contribute
### Issues — yes, please
Issues are the most valuable thing you can send me:
- **Bug reports** with reproduction steps. Even a screenshot helps.
- **Use-case feedback.** "I tried to use this in my organization and X didn't fit" is genuinely useful, even if I can't fix it for you.
- **Content suggestions.** If a reference file in `claude-design` produces guidance that doesn't match what you observe in `claude.ai/design` today, tell me what you saw. Concrete examples beat abstract complaints.
### Pull requests — no
This is deliberate, not laziness:
- **Solo review is a bottleneck.** Honest PR review takes me longer than rewriting from scratch. The math doesn't work.
- **Forks are where the value is.** The fork-and-own model means upstream consolidation isn't the point. Your organization's adaptations belong in your fork, not mine.
- **AI-generated code complicates provenance.** Every line here is produced through dialog with Claude Code, with me as the judge. Mixing in PRs from contributors with different processes and licensing assumptions creates a mess I'd rather not untangle.
If you've built something useful on top of a fork, **publish it under your own name and link back.** I'll happily list notable forks here once they exist.
### Notable forks
*(To be populated as forks emerge. If you've forked this plugin for production use, open an issue and I'll add a link.)*
---
## Relationship between plugins
These plugins are **independent**. Install one without the others, fork one without the others. They share conventions (slash command naming, hook patterns, AI-generated disclosure, and the shared `human-friendly-style` output style) but no runtime dependencies.
`claude-design` is a content surface with a single skill — it works without any other plugin installed. It recommends Anthropic's official `knowledge-work-plugins/design` as the downstream tool for post-design critique, accessibility audit, and engineering handoff, but does not depend on it being present.
The marketplace is a **catalog**, not a suite. Don't fork the whole repo unless you actually want to maintain everything.
---
## Versioning and stability
- **Semantic versioning per plugin.** Each plugin has its own `CHANGELOG.md` and version number.
- **Breaking changes happen.** I bump the major version when they do, but I don't run an LTS branch.
- **Pin your version.** If stability matters more than features, install a specific version and stay there until you choose to upgrade.
For `claude-design` specifically: changes to skill trigger behavior or per-preset reference content schema are minor or major bumps. Pure documentation or per-preset content refresh from Anthropic source updates are patch. The skill surface itself is meant to be stable across patch releases.
---
## License
MIT for all plugins in this marketplace. See [LICENSE](LICENSE) in this plugin and each other plugin's `LICENSE` file.
> End-to-end facilitator for prompting Claude Design (`claude.ai/design`). Walks the operator from raw idea through intent-preset selection, audience and destination clarification, DESIGN.md anchor, five-layer prompt drafting, copy-paste delivery, iteration coaching, and ship-readiness handoff. Cites Anthropic primary sources inline. Recommends Anthropic's official `knowledge-work-plugins/design` as the downstream post-design tool.
> **Solo-maintained, fork-and-own.** This plugin is a starting point, not a vendor product. Issues are welcome as signals; pull requests are not accepted. See [GOVERNANCE.md](GOVERNANCE.md) for the full model and what upstream provides.
*AI-generated: all content produced by Claude Code through dialog-driven development. [Full disclosure →](../../README.md#ai-generated-code-disclosure)*
A Claude Code plugin that ships one skill (`claude-design-facilitator`) plus a reference tree for prompting Anthropic's `claude.ai/design` workspace. The skill auto-fires on natural-language triggers, walks the operator through an eight-phase facilitation flow, and produces a copy-paste-ready prompt grounded in Anthropic's verbatim Goal / Layout / Content / Audience framework and the four published per-preset prompt patterns. Output is the prompt — the artifact gets built in Claude Design.
---
## Why this exists
Claude Design has a strong gravitational pull toward convergent middle-ground output. A one-line prompt like *"make me a slide deck for Q1 results"* reliably produces what Anthropic's own cookbook for [prompting frontend aesthetics](https://platform.claude.com/cookbook/coding-prompting-for-frontend-aesthetics) names as the failure mode: Inter or Roboto typography, white-to-purple gradients, evenly-spaced cards, cramped layouts that read as AI-generated. The convergence is not random — it is what the model defaults to when prompts are underspecified.
The fix is in the prompt itself, not in the artifact. Anthropic publishes a five-layer prompt scaffold across three primary sources — the Goal / Layout / Content / Audience framework in the [Claude Design launch post](https://anthropic.com/news/claude-design-anthropic-labs) and [Get started article](https://support.claude.com/en/articles/14604416-get-started-with-claude-design), the DESIGN.md anchor in the [design system article](https://support.claude.com/en/articles/14604397-set-up-your-design-system-in-claude-design), and the AI-slop avoid-list plus cultural-reference anchoring in the [aesthetics cookbook](https://platform.claude.com/cookbook/coding-prompting-for-frontend-aesthetics). Assembling a prompt that actually uses all five layers, with the right per-preset pattern, in the right order, takes deliberate scaffolding most operators do not do unprompted.
This plugin does the scaffolding interactively. The `claude-design-facilitator` skill walks the operator through eight phases, surfaces the questions that produce a workable Goal / Layout / Content / Audience answer, anchors on DESIGN.md if one exists or extracts one if not, composes the five layers in the right order, and outputs a copy-paste prompt the operator pastes into `claude.ai/design`. The artifact gets built in Claude Design; this plugin produces the prompt.
The output is honest about what it is. Every authoritative claim cites an Anthropic primary source inline. Community patterns are labelled and attributed. The `frontier-design` preset is flagged Experimental rather than dressed up as canonical. The plugin recommends Anthropic's official [`knowledge-work-plugins/design`](https://claude.com/plugins/design) for everything that happens after the artifact is generated — there is zero command overlap by design.
---
## Scope and complementarity
This plugin covers the **pre-design and during-design lifecycle** for `claude.ai/design`: idea → intent-preset selection → prompt engineering → copy-paste delivery → iteration coaching → ship-readiness check.
For **post-design** work — critique, accessibility audit, UX copy review, research synthesis, design-system audit, engineering handoff guidance — install Anthropic's official plugin:
```
claude plugins add knowledge-work-plugins/design
```
Anthropic's plugin operates on existing artifacts (Figma URLs, screenshots, copy snippets) and ships six slash-commands: `/critique`, `/accessibility`, `/ux-copy`, `/research-synthesis`, `/design-system`, `/handoff`. There is zero command overlap with this plugin and complementary lifecycle coverage — the two plugins are designed to be installed together. See [skills/claude-design-facilitator/references/04-handoff-and-scope.md](skills/claude-design-facilitator/references/04-handoff-and-scope.md) for the full coverage map.
---
## What this plugin is NOT
By design, this plugin does not:
- **Drive the browser.** No automation of `claude.ai/design` itself; you copy and paste the prompts the skill produces.
- **Generate the artifact code.** Claude Design is the artifact generator. This plugin produces prompts that go into Claude Design.
- **Store artifact history or version artifacts.** Claude Design has no version-tree primitive and this plugin does not invent one.
- **Cover adjacent Anthropic surfaces.** Classic Artifacts at `claude.ai`, Live Artifacts in Claude Cowork, custom visuals embedded in a chat reply are out of scope — see [skills/claude-design-facilitator/references/00-what-claude-design-is-and-isnt.md](skills/claude-design-facilitator/references/00-what-claude-design-is-and-isnt.md) for the disambiguation reference.
- **Duplicate Anthropic's `knowledge-work-plugins/design` plugin.** No `/critique`, no `/accessibility`, no `/ux-copy`, no `/research-synthesis`, no `/design-system`, no `/handoff`. The post-design lane belongs to Anthropic's plugin.
`tests/validate-plugin.sh` enforces the forbidden-command-name list mechanically.
---
## Installation
Add the marketplace once, then install the plugin:
```bash
claude plugins marketplace add ktg-plugin-marketplace https://git.fromaitochitta.com/open/ktg-plugin-marketplace
The skill auto-discovers; no further configuration needed.
---
## What you can do with it
The skill `claude-design-facilitator` walks the operator through eight phases. The phases are scoping + grounding (1–4), drafting + delivery (5–6), and iteration + ship-readiness (7–8).
| Phase | What happens |
|-------|--------------|
| **1. Disambiguate the surface** | Confirm `claude.ai/design` is the intended surface, not classic Artifacts, Live Artifacts, custom chat visuals, or `knowledge-work-plugins/design`. Read [references/00](skills/claude-design-facilitator/references/00-what-claude-design-is-and-isnt.md) when signals are mixed. |
| **2. Name the intent preset** | Pick one of eight Claude Design presets: `designs`, `prototypes`, `slides`, `one-pagers`, `wireframes-mockups`, `pitch-decks`, `marketing-collateral`, `frontier-design`. The per-preset reference file shapes the prompt pattern. Evidence-grade labels are surfaced. |
| **3. Audience and destination** | Capture audience (internal team / external stakeholder / investor / customer) and destination (PDF / PPTX / HTML / Canva / Code-handoff / share-link). Flag PPTX-export traps for `pitch-decks` early. |
| **4. Anchor on DESIGN.md** | Read [references/02](skills/claude-design-facilitator/references/02-design-md.md). If the operator has no DESIGN.md, point at the copy-paste brand-to-DESIGN.md extractor prompt. |
| **5. Draft the prompt** | Compose layers 1–5 from [references/01](skills/claude-design-facilitator/references/01-prompt-fundamentals.md): Anthropic's verbatim Goal / Layout / Content / Audience framework + start-simple-layer-complexity + concrete-alternative-spec + propose-options-before-building + AI-slop negative constraints + four design dimensions + four grading criteria + the per-preset pattern. |
| **6. Deliver** | Output a single copy-paste-ready fenced markdown code block. Add a one-line caption and three to five expected follow-up turns. |
| **7. Iteration coaching** | Read [references/03](skills/claude-design-facilitator/references/03-iteration-and-session.md). Coach which surface to use next — Tweak panel (zero-token, surgical), inline comments (component-scoped), or chat (full regen). Session-break heuristics + recovery prompt library when iteration gets stuck. |
| **8. Ship-readiness** | Run the export validation checklist. If shipping to engineering, confirm the Design → Code handoff bundle is complete. Recommend installing `knowledge-work-plugins/design` for downstream critique / accessibility / handoff. |
The skill auto-fires on natural-language triggers like *"I want to build a dashboard in Claude Design"*, *"help me prompt claude.ai/design"*, *"iterate on my Claude Design artifact"*. The full trigger list is in [skills/claude-design-facilitator/.triggers.txt](skills/claude-design-facilitator/.triggers.txt) and `tests/test-skill-triggers.sh` validates each phrase appears in the skill description.
Explicit invocation works too: the skill registers as the slash command `/claude-design-facilitator` for when the operator wants to start a clean facilitation session.
---
## Workflow example: from idea to prompt
A realistic session against the `slides` preset — Q1 results deck for an internal engineering all-hands.
**Operator:** *"I want to build a Q1 results slide deck for the engineering team in Claude Design."*
The skill auto-fires (the phrase matches `.triggers.txt`). It walks the eight phases:
**Phase 1 — Disambiguate the surface.** The skill confirms `claude.ai/design` is the intended surface, not classic Artifacts or Live Artifacts. The operator confirms.
**Phase 2 — Name the intent preset.** Slide deck → `slides` preset. The skill notes this is one of three Anthropic-documented presets (evidence-grade label surfaced from `.coverage.md`). It opens [`references/presets/slides.md`](skills/claude-design-facilitator/references/presets/slides.md) and surfaces the five canonical Anthropic patterns (Q1 results, executive roadmap, customer-prep briefing, partner proposal, all-hands announcement). The operator picks pattern 1.
**Phase 3 — Audience and destination.** Internal engineering team; deck stays in HTML preview during the meeting, optional PPTX export to share with adjacent leads afterward. The skill flags the PPTX-export trap from [`references/presets/slides.md`](skills/claude-design-facilitator/references/presets/slides.md) section (e): fonts substitute, master slides drop, charts may flatten to images. If a brand-compliant PPTX template exists, upload it to Claude Design as a project asset before prompting — Claude reads the slide master, layouts, fonts, and colour scheme and respects them ([PowerPoint-mode article](https://support.claude.com/en/articles/13521390-use-claude-for-powerpoint)).
**Phase 4 — Anchor on DESIGN.md.** The operator has no DESIGN.md yet. The skill points at the brand-to-DESIGN.md extractor prompt in [`references/02-design-md.md`](skills/claude-design-facilitator/references/02-design-md.md): paste a brand-guidelines URL or PDF into Claude.ai, get back a DESIGN.md filling the nine canonical sections (typography, colour, spacing, layout primitives, motion, voice, imagery, density, accessibility). The operator runs the extractor against the company's brand site, gets a DESIGN.md, drops it into the Claude Design project assets.
**Phase 5 — Draft the prompt.** The skill composes the five layers from [`references/01-prompt-fundamentals.md`](skills/claude-design-facilitator/references/01-prompt-fundamentals.md): Goal / Layout / Content / Audience (Layer 1) → start-simple-layer-complexity (Layer 2) → concrete-alternative-spec (Layer 3) → AI-slop negative constraints (Layer 4) → per-preset pattern from `presets/slides.md` (Layer 5). The skill produces a single copy-paste fenced block. The operator inspects it, optionally edits the Goal sentence, then proceeds.
**Phase 6 — Deliver.** The skill outputs the prompt block. The structure of a realistic delivery for this scenario looks like:
```
**Goal:** Communicate Q1 engineering results to the all-hands —
where we started, what changed, what we shipped, what we learned,
what's next for Q2. The deck should land as confident but honest:
real numbers, named risks, no victory-lap framing. Audience is
~80 engineers across three teams. Density should be high enough
to skim later, low enough to follow live.
**Layout:** 10–12 slides, slide-by-slide:
1. Title + Q1 in one sentence
2. The three things we shipped (one per row, screenshot + metric)
3. The two things that slipped (named honestly, with the why)
4. Hiring update (count + retention)
5. Reliability (incident count, MTTR trend)
6. Customer signal (NPS + 2 verbatim quotes)
7. Engineering health (PR throughput, review latency)
8. The big bet for Q2 (one slide, named)
9. Risks for Q2 (3 bullets, ranked)
10. Asks from the all-hands (1–3 specific asks)
11. Q&A placeholder
**Content:** Use the metrics in DESIGN.md's `tone` section — direct,
specific, no marketing voice. Numbers are placeholder; I'll edit
before the meeting.
**Audience:** Internal engineering all-hands, 80 people, ICs through
EM/Director level. They want to know: did we ship what we said, what
broke, what's next, can I help.
**Avoid:** Inter or Roboto, white-to-purple gradients, evenly-spaced
(pattern 1), with the Layout above overriding the tutorial's slide count.
```
That block is what gets pasted into `claude.ai/design`. The skill also surfaces three to five expected follow-up turns (e.g., *"the headline slide is too marketing, push it more technical"*, *"slide 5 reliability — show the MTTR trend as a sparkline, not a bar chart"*) so the operator knows what iteration looks like before starting.
**Phase 7 — Iteration coaching.** Once Claude Design produces the first version, the skill points the operator at the Tweak → Comment → Chat cascade in [`references/03-iteration-and-session.md`](skills/claude-design-facilitator/references/03-iteration-and-session.md): Tweak panel for surgical zero-token edits (spacing, font size, colour), inline comments for component-scoped changes (rewrite slide 5), full chat regeneration as a last resort. Plus the session-break heuristic (after 4 substantive screens, start a fresh session with a verbal save-pattern carrying state forward) and the recovery prompt library when iteration gets stuck.
**Phase 8 — Ship-readiness.** Before the all-hands, the skill runs the export validation checklist for the chosen destination (HTML preview → keep in Claude Design; PPTX → check fonts and master, charts may flatten). If the deck is being handed off to engineering for any reason, it recommends installing [`knowledge-work-plugins/design`](https://claude.com/plugins/design) for `/critique`, `/accessibility`, and `/handoff` — the post-design lane.
The full output of the session is a single fenced markdown block (Phase 6) plus a short follow-up-turns list and an iteration-coaching pointer. That is the entire user-facing deliverable.
---
## Skill surface
| Skill | Triggers | Output |
|-------|----------|--------|
| `claude-design-facilitator` | 12 natural-language phrases (full list in `.triggers.txt`); also explicit `/claude-design-facilitator` slash command | A copy-paste-ready Claude Design prompt block composed from the five-layer stack and the per-preset pattern, with follow-up-turn expectations |
No commands, no agents, no hooks, no MCP servers at v0.1. The single skill is the entire user-facing surface.
---
## Reference content map
The plugin ships 13 reference files in `skills/claude-design-facilitator/references/`:
**Foundation references (5):**
- [`00-what-claude-design-is-and-isnt.md`](skills/claude-design-facilitator/references/00-what-claude-design-is-and-isnt.md) — Surface disambiguation against Artifacts, Live Artifacts, custom chat visuals, and Anthropic's `knowledge-work-plugins/design`.
- [`01-prompt-fundamentals.md`](skills/claude-design-facilitator/references/01-prompt-fundamentals.md) — The five-layer prompt stack: GLCA framework + start-simple-layer-complexity + concrete-alternative-spec + propose-options + AI-slop negative constraints + four design dimensions + four grading criteria. Anchored on four Anthropic primary sources.
- [`presets/frontier-design.md`](skills/claude-design-facilitator/references/presets/frontier-design.md) — Experimental — no validated practitioner pattern as of 2026-05-16
---
## Per-preset coverage
The canonical coverage manifest is [`.coverage.md`](.coverage.md). Below mirrors that file.
When Anthropic publishes per-preset guidance for a Community-only or Experimental preset, [`.coverage.md`](.coverage.md) and the affected preset file refresh — re-research triggers are documented inline.
---
## Verification
```bash
bash plugins/claude-design/verify.sh
```
Runs five test scripts under `tests/` in dependency order:
| `test-skill-triggers.sh` | SKILL.md description >=400 chars; every phrase in `.triggers.txt` appears in SKILL.md |
| `test-sc2-artifact-coverage.sh` | Each preset in `.coverage.md` has >=1 file hit in plugin content |
| `test-sc3-citations.sh` | No unsourced-attribution placeholders (citation-stub markers, verification-flag markers, vague second-hand phrasing); each Authoritative-claims file has >=1 Anthropic-domain URL. The script enforces the exact patterns it bans — see the script source for the regex. |
| `test-sc1-dogfood-log.sh` | Format-check the operator dogfood log in `REMEMBER.md` (gitignored) — 5 fields well-formed |
Flags:
- `--strict` — pass-through to `test-sc1-dogfood-log.sh`. Without `--strict`, missing dogfood block is advisory. With `--strict`, it is the release gate.
- `--quick` — skip `test-skill-triggers.sh` for fast incremental runs.
Exit codes: `0` = all pass; non-zero = at least one sub-test failed.
---
## Compatibility
| Requirement | Version |
|-------------|---------|
| Claude Code | Recent versions with plugin support |
| Network | None for the skill itself; the artifact-generation lives in `claude.ai/design` |
| Dependencies | None — no npm packages, no Python, no external tools. Bash 3.2 compatible for test scripts. |
---
## Re-research triggers
The reference tree carries Anthropic citations that may decay. Re-research is triggered by:
- Anthropic publishing per-preset guidance for a `Community-only` or `Experimental` preset
- Anthropic announcing material changes to the Goal / Layout / Content / Audience framework, the AI-slop avoid-list, or the design grading criteria
- Anthropic adding or removing an intent preset from the launch enumeration
- A first verified `frontier-design` practitioner artifact shipping publicly
- Anthropic's `knowledge-work-plugins/design` plugin adding or removing slash-commands (scope-fence implications)
- Labs → GA URL rename for `claude.ai/design`
When a trigger fires, run `bash verify.sh --strict` after the update to confirm SC2 and SC3 still pass.
---
## Recent versions
**v0.1.0 — 2026-05-17.** Initial public release. Single skill (`claude-design-facilitator`) with eight-phase facilitation flow, 12 natural-language trigger phrases, 13 reference files (5 foundation + 8 per-preset with evidence-grade labels), `.coverage.md` preset manifest plus Authoritative-claims registry, five verification scripts under `tests/` enforcing structural integrity / scope fence / skill description quality / per-preset coverage / Anthropic-domain citation discipline / operator dogfood log format, top-level `verify.sh` roll-up with `--strict` and `--quick` flags, MIT license, GOVERNANCE.md fork-and-own model.
Full release history: [`CHANGELOG.md`](CHANGELOG.md). The plugin follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html). The path from v0.1 to v1.0 is dogfood-driven — see the plugin's `REMEMBER.md` for the v1.0 readiness criteria (multi-preset breadth, auto-fire validation in real natural-language requests, two consecutive dogfood sessions with zero critical patches).
---
## License
[MIT](LICENSE). Fork it, modify it, ship your own version under your own name.
End-to-end facilitator for prompting Claude Design (claude.ai/design — Anthropic Labs research preview launched 2026-04-17, Opus 4.7 pinned). Walks the operator from raw idea through intent-preset selection, audience and destination clarification, DESIGN.md anchor, prompt drafting using Anthropic's verbatim Goal / Layout / Content / Audience framework plus the five-layer prompt stack, copy-paste delivery, iteration coaching across the Tweak / Comment / Chat cascade, and ship-readiness handoff to Anthropic's official knowledge-work-plugins/design plugin for critique, accessibility audit, and engineering handoff. Cites Anthropic primary sources inline; refuses to generate the artifact code itself or drive the browser. Use for any work that ends with a Claude Design artifact.
Triggers on:
- "I want to build a dashboard in Claude Design"
- "help me prompt claude.ai/design"
- "make a slide deck in claude.ai/design"
- "iterate on my Claude Design artifact"
- "what should I prompt Claude Design with"
- "build a one-pager in Claude Design"
- "design a prototype in claude.ai/design"
- "refine my Claude Design output"
- "create a pitch deck in Claude Design"
- "use Claude Design"
- "draft a Claude Design prompt"
- "make wireframes in claude.ai/design"
---
# claude-design-facilitator
You are a facilitator for prompting Claude Design (`claude.ai/design`). You walk the operator from raw idea to a copy-paste-ready prompt, through iteration, to ship-readiness. You do **not** generate artifact code yourself and you do **not** drive the browser. Claude Design is where the artifact gets built; you exist to make the operator's interaction with that surface land on the first try.
You follow the phases below in order. Phases 1 through 4 are scoping and grounding; do not draft a prompt before they are done. If the operator pushes for a prompt straight away, briefly explain that a five-second alignment pass produces a one-shot prompt instead of a four-round iteration spiral, then ask the Phase 2 intent question.
All output is English. All authoritative claims about Claude Design behaviour cite Anthropic primary sources — `anthropic.com/news`, `support.claude.com`, `claude.com/blog`, `claude.com/resources/tutorials`, `claude.com/plugins`, `platform.claude.com`, `github.com/anthropics`. Community patterns are labelled as such with the source link. The reference files under `references/` carry the canonical content; this file is the flow.
---
## Phase 1 — Disambiguate the surface
Confirm the operator wants `claude.ai/design` specifically, not one of the four surfaces it is most commonly confused with: classic Artifacts at `claude.ai`, Live Artifacts in Claude Cowork, custom visuals embedded in a chat reply, or Anthropic's `knowledge-work-plugins/design` plugin (which audits already-built artifacts and does not generate them).
If the operator is clear, move on. If signals are mixed — they mention "Artifacts" or "Cowork", they describe a feature that does not exist in Claude Design (no `/rewind`, no version history, no branching), or they expect round-trip handoff back from Claude Code — read `references/00-what-claude-design-is-and-isnt.md` and walk through the relevant anti-conflation block.
---
## Phase 2 — Name the intent preset
Claude Design exposes eight intent presets. The operator picks one before drafting begins, because the prompt pattern differs per preset and the per-preset reference files are the place that pattern lives.
The eight presets, in the order they appear in Anthropic's launch enumeration (`anthropic.com/news/claude-design-anthropic-labs`, 2026-04-17):
- **wireframes-mockups** — low-fi or high-fi layout structure, pre-visual-design
- **pitch-decks** — investor or external pitch decks (note: PPTX export trap — see preset file)
- **marketing-collateral** — landing pages, social variants, visual assets
- **frontier-design** — Anthropic's "code-powered prototypes with voice, video, shaders, 3D" preset (labelled experimental in this plugin — no validated practitioner pattern as of 2026-05-16)
If the operator is uncertain which preset fits, read `.coverage.md` and the matching one-line summaries; offer the two or three that match the situation. The evidence-grade label on each preset reference file is load-bearing — surface it: `Anthropic-documented + community-validated` (designs / prototypes / slides), `Community-only` (one-pagers / wireframes-mockups / pitch-decks / marketing-collateral), or `Experimental` (frontier-design).
---
## Phase 3 — Audience and destination
Establish the audience and the destination *before* drafting the prompt. This is `@claudedesign` Anthropic-affiliated guidance: the destination format constrains the prompt because Claude Design's export options have asymmetric fidelity.
Ask:
- **Audience:** who reads or uses this artifact? Internal team, external stakeholder, investor, customer prospect, partner, user-testing participant?
- **Destination:** where does it end up? PDF (lossless for static layouts, lossy for interactive elements), PPTX (Claude reads slide master / layouts / fonts / color scheme, but text flattens to images on complex compositions — see `references/03-iteration-and-session.md` PPTX trap section), HTML standalone, Canva import, Claude Code handoff for engineering build, or share-link?
If the destination is PPTX and the preset is `pitch-decks`, flag the export trap explicitly (`moda.app/blog/claude-design-for-pitch-decks` documents the case where PPTX flattens richly-styled text to images). If the destination is Claude Code handoff, set expectation that the bundle Claude Design produces is one-way (no return path to Claude Design — see `references/04-handoff-and-scope.md`).
---
## Phase 4 — Anchor on DESIGN.md
A DESIGN.md file is the operator's leverage against Claude Design's defaults. It anchors design-system identity (colors, typography, motion, layout, do's-and-don'ts) so the model does not fall back to its convergent middle-ground aesthetic.
Read `references/02-design-md.md`. The reference file documents the community-converged 9-section canonical structure and a copy-paste extractor prompt that converts a brand URL or screenshot into a DESIGN.md.
If the operator already has a DESIGN.md, confirm it is uploaded to the Claude Design project and that the agent prompt guide section names it. If they do not have one, point at the extractor prompt — it is the highest-leverage single piece of content in this plugin.
**Evidence grade context for the operator:** Anthropic publishes the concept of DESIGN.md (`support.claude.com/en/articles/14604397-set-up-your-design-system-in-claude-design`) but not the 9-section structure. The 9-section template comes from community practitioners (`github.com/rohitg00/awesome-claude-design`, `github.com/VoltAgent/awesome-claude-design`).
---
## Phase 5 — Draft the prompt using the five-layer stack
Now draft. Open `references/01-prompt-fundamentals.md` and `references/presets/<preset>.md` for the named preset. Compose the prompt from these layers, in order:
1. **Layer 1 — Goal / Layout / Content / Audience (GLCA)** — Anthropic's verbatim framework. Source: `support.claude.com/en/articles/14604416-get-started-with-claude-design`. Every prompt to Claude Design starts here.
2. **Layer 1.5 — Start simple, layer in complexity** — Anthropic's verbatim incremental-prompting advice (same source). Do not ship a 600-word first prompt; ship a 120-word first prompt and add detail in turn two.
3. **Layer 2a — Concrete-alternative-spec house-style control** — Anthropic's verbatim guidance from `platform.claude.com/docs/en/docs/build-with-claude/prompt-engineering/claude-4-best-practices`. Includes the AEFRM example with explicit hex palette and motion timing.
4. **Layer 2b — Propose-options-before-building** — Anthropic's verbatim prompt template asking Claude Design to surface four distinct visual directions before committing.
5. **Layer 3 — Negative constraints (the AI-slop avoid-list)** — verbatim banned items from `claude.com/blog/improving-frontend-design-through-skills` and `github.com/anthropics/skills/skills/frontend-design/SKILL.md`. Inter, Roboto, Arial, Space Grotesk; purple gradients on white; solid-color backgrounds; cookie-cutter framing; convergent middle-ground palettes; scattered micro-interactions.
6. **Layer 4 — Four design dimensions** — verbatim typography / color / motion / backgrounds guidance from `frontend-design/SKILL.md`.
7. **Layer 5 — Four design grading criteria** — Anthropic's verbatim quality criteria from `anthropic.com/engineering/harness-design-long-running-apps` (design quality, originality, craft, functionality) plus the emphasis-weighting recommendation.
On top of the five layers, layer the per-preset pattern from `references/presets/<preset>.md`. For `designs`, `prototypes`, and `slides`, this is Anthropic-published prompt material. For the other four `Community-only` presets, it is community-converged pattern with attribution. For `frontier-design`, it is honest-experimental and labelled as such.
Resist the urge to over-spec. Anthropic's own guidance is start simple, layer in complexity. Draft the layer-1+layer-2a+layer-3 composition first. Save layers 4 and 5 for the refinement turn.
---
## Phase 6 — Deliver the prompt
Output a single copy-paste-ready fenced markdown code block containing the composed prompt. No preamble, no commentary inside the block. Add a one-line caption above the block: which preset, which audience, which destination.
After the block, list three to five expected follow-up turns the operator should anticipate (e.g., "if it lands too generic, add layers 4 + 5 in turn two", "if PPTX is destination, validate the rendered text-as-text count in turn three"). This sets the iteration expectation honestly — Claude Design quality is non-monotonic across turns (`anthropic.com/engineering/harness-design-long-running-apps`).
---
## Phase 7 — Iteration coaching
When the operator returns with feedback after a Claude Design generation, you do not regenerate the prompt. You coach which surface to use next. Read `references/03-iteration-and-session.md`.
The three-surface cascade, in order of token cost:
- **Tweak panel** — controls and sliders Claude pre-derives at artifact generation time. Zero token cost. Surgical. Use for: section reordering, variant swap, density slider, spacing scale, color temperature, typography scale, padding / radius / shadow. The Anthropic-published guidance is verbatim in `references/03`.
- **Inline comments** — component-scoped edits via the comment surface. Surgical when the edit is in-component. Has an Anthropic-acknowledged vanish bug — if a comment disappears, paste the comment text into chat. Fails for new structural containers.
- **Chat** — full regeneration. Use for any structural change (add a new section), aesthetic pivot, multi-component change, or anything Claude did not pre-derive a Tweak control for. Costs one full chat turn.
Operator mantra (the synthesis from `research/04`): *anything Claude pre-derives at generation time is surgical thereafter; new controls cost one chat turn for setup.*
Session-management heuristics from `references/03-iteration-and-session.md`:
- 4-screen inflection — quality drops noticeably after the fourth screen of context in a session.
- Opus 4.7 context — quality degrades at the 40–50% context mark.
- Pro budget burns in roughly 25–30 minutes of active design; Max in roughly 60–90.
If the operator hits a stuck state, point at the recovery prompt library in `references/03-iteration-and-session.md` — the `break-default-aesthetic.md` adapted prompt, "fix the system not the prompt" pattern, edit-previous-message workaround, the 3-failed-comment escalation rule, and the model downshift escalator (Opus 4.7 → Opus 4.6 / Sonnet 4.6).
---
## Phase 8 — Ship-readiness
Before the operator declares an artifact done, run a short ship-readiness check against `references/04-handoff-and-scope.md`:
- Has the destination format been validated against the rendered output? (PPTX text-as-text count, PDF interactive-element check, HTML standalone export at target viewport.)
- If handing off to engineering: is the export bundle complete? Anthropic's handoff bundle includes a machine-readable component spec, design tokens, layout hierarchy, referenced assets, standalone HTML + inline CSS + JS, per-state screenshots, PM-annotated notes, and a stack / framework README (`anthropic.com/news/claude-design-anthropic-labs` + `support.claude.com/en/articles/14604416`).
- Is the operator aware that the Design → Code direction is one-way? Once handed off, the path back to Claude Design is lossy (screenshot → new Claude Design session).
**Downstream tool recommendation.** Once the operator has an artifact in hand and wants critique, accessibility audit, UX copy review, design-system audit, or engineering handoff guidance, recommend Anthropic's official plugin:
```
claude plugins add knowledge-work-plugins/design
```
That plugin operates on existing artifacts (Figma URLs, screenshots, copy snippets) and ships six commands: `/critique`, `/accessibility`, `/ux-copy`, `/research-synthesis`, `/design-system`, `/handoff`. It is the lifecycle complement to this one — see `references/04-handoff-and-scope.md` for the full coverage table. This plugin (claude-design) covers idea through delivered prompt through iteration coaching; `knowledge-work-plugins/design` covers everything after. There is no command overlap and no functional redundancy.
---
## What this skill never does
- It does not generate the artifact code itself. Claude Design is the artifact generator. This skill produces prompts that go into Claude Design.
- It does not automate the browser, paste prompts on the operator's behalf, or read the Claude Design canvas. The operator copies and pastes manually.
- It does not store artifact history, version artifacts, or branch between iterations. Claude Design has no version tree and this skill does not invent one.
- It does not duplicate the post-design lane covered by `knowledge-work-plugins/design`. No `/critique`, no `/accessibility`, no `/ux-copy`, no `/research-synthesis`, no `/design-system`, no `/handoff` commands.
- `references/presets/frontier-design.md` — Experimental, no validated practitioner pattern
- `.coverage.md` — preset enumeration with evidence-grade labels (the source of truth for SC2 verification)
---
## Explicit invocation
The skill name registers as the explicit slash command `/claude-design-facilitator`. Operators can either trigger by natural language (the description above is the auto-fire surface) or invoke explicitly when they want to start a facilitation session from a clean state.
This file disambiguates `claude.ai/design` from the four surfaces it is most commonly conflated with. The cost of getting this wrong is wasted iteration: applying a prompt pattern that fits Live Artifacts to Claude Design (or vice versa) produces output that misses the operator's intent, and the failure looks like a prompt problem instead of a surface problem.
Read this file when the operator's signals are mixed — they reference "Artifacts" loosely, they expect a feature that does not exist in Claude Design (like `/rewind` or a version tree), they think Claude Design audits artifacts rather than generates them, or they expect round-trip handoff back from Claude Code.
---
## 1. What Claude Design is
Claude Design is an Anthropic Labs research preview that launched on 2026-04-17 (`https://anthropic.com/news/claude-design-anthropic-labs`). It is a dedicated workspace at `claude.ai/design` for generating interactive design artifacts from a prompt.
Five properties define the surface:
- **Labs research preview, not GA.** The product can change without notice. Anthropic surfaces it under the Labs banner specifically to signal that the contract is non-stable. The URL still carries the `-anthropic-labs` slug today; a Labs → GA rename is a known re-research trigger captured in `.coverage.md`.
- **Opus 4.7 pinned.** All generations run on Opus 4.7. Operators cannot select a model from the Claude Design UI. The session inherits Anthropic's model choice for this surface (`https://anthropic.com/news/claude-design-anthropic-labs`).
- **Single HTML/React substrate.** Underneath every output is one rendering engine — HTML, React components, inline CSS — regardless of which intent preset the operator picks. The intent preset shapes prompting and export, not the underlying tech.
- **Eight intent presets exposed in the UI.** Anthropic's launch post enumerates: designs, prototypes, slides, one-pagers, wireframes-mockups, pitch-decks, marketing-collateral, frontier-design. The enumeration is the source of truth for SC2 coverage in this plugin (`https://anthropic.com/news/claude-design-anthropic-labs`).
- **Multiple export paths.** PDF (lossless for static layouts, lossy for interactive elements), PPTX (slide master / layouts / fonts honored, but text can flatten to images on complex compositions), HTML standalone, Canva import, share-link, and Claude Code handoff (machine-readable component spec + design tokens + layout hierarchy + assets + standalone HTML/CSS/JS + per-state screenshots + PM notes + framework README — verbatim per `https://support.claude.com/en/articles/14604416-get-started-with-claude-design`).
Three Anthropic-published support articles ground the surface: the get-started article (`https://support.claude.com/en/articles/14604416-get-started-with-claude-design`), the design-system setup article (`https://support.claude.com/en/articles/14604397-set-up-your-design-system-in-claude-design`), and the PowerPoint-mode conventions article (`https://support.claude.com/en/articles/13521390-use-claude-for-powerpoint`).
---
## 2. What Claude Design is NOT
### Not classic Artifacts at `claude.ai`
Classic Artifacts live in any Claude.ai chat. They appear in a side panel when Claude generates code, markdown, SVG, mermaid diagrams, or other inline outputs. Artifacts carry no intent presets, no Tweak panel, no export-to-PPTX, no Claude Code handoff bundle, no DESIGN.md anchor concept. They are a chat affordance.
Confusion happens because both surfaces produce HTML/React output and Anthropic's documentation has used "Artifacts" loosely in launch contexts. If the operator says "Artifacts" but describes intent presets or destination formats (`https://anthropic.com/news/claude-design-anthropic-labs`), they mean Claude Design. If they describe a side panel inside a chat (`support.claude.com` discusses Artifacts in the chat-product context), they mean classic Artifacts.
### Not Live Artifacts in Claude Cowork
Live Artifacts is a different Labs surface — collaborative real-time editing of artifacts inside Claude Cowork sessions. It runs in a different workspace, has different affordances (multi-cursor presence, version stream), and is a separate product line at `claude.ai/code` family (see Anthropic's Cowork-related communications). Claude Design has none of those collaborative primitives. The operator working alone in `claude.ai/design` is the canonical flow.
### Not custom visuals embedded in a chat reply
Sometimes Claude generates an inline HTML/SVG visual as part of a chat answer (a chart, a diagram, an illustration). That is a one-off chat artifact, not a Claude Design session. The prompt patterns are different (chat conversational tone vs design intent presets), the export options are different (chat artifact has Save / Copy, Claude Design has the full export matrix), and there is no Tweak panel on the chat-inline visuals.
### Not Anthropic's `knowledge-work-plugins/design` plugin
This is the most consequential conflation. Anthropic ships an official plugin at `https://claude.com/plugins/design` (`https://github.com/anthropics/knowledge-work-plugins`) with six slash-commands: `/critique`, `/accessibility`, `/ux-copy`, `/research-synthesis`, `/design-system`, `/handoff`. That plugin operates on **existing** artifacts (Figma URLs, screenshots, copy snippets). It does not generate artifacts.
The lifecycle split is clean:
- This plugin (`claude-design`) covers **pre-design and during-design** — idea → intent-preset selection → prompt drafting → copy-paste delivery → iteration coaching → ship-readiness.
There is zero command overlap (this plugin ships no commands named `/critique`, `/accessibility`, `/ux-copy`, `/research-synthesis`, `/design-system`, or `/handoff` — `tests/validate-plugin.sh` assertion (h) enforces this mechanically). Workflow recommendation: use this plugin to land the artifact in `claude.ai/design`; once the artifact exists, install Anthropic's official plugin via `claude plugins add knowledge-work-plugins/design` for downstream review and handoff.
### Not third-party clones like `jiji262/claude-design-skill`
Several third-party repos use names like `claude-design-skill`. They are independent community efforts targeting general design workflows in Claude Code, not the `claude.ai/design` surface specifically. They predate the Anthropic Labs launch in some cases. This plugin is *Claude Design facilitation* — it targets the Anthropic surface explicitly, citing Anthropic's primary sources. Verify the operator's mental model accordingly.
---
## 3. Why the distinction matters operationally
Three operational consequences flow from getting the surface identification right.
### Prompt patterns differ
Claude Design's prompt patterns are documented in `https://anthropic.com/news/claude-design-anthropic-labs`, `https://support.claude.com/en/articles/14604416-get-started-with-claude-design`, and the two per-preset tutorials (`https://claude.com/resources/tutorials/using-claude-design-for-prototypes-and-ux`, `https://claude.com/resources/tutorials/using-claude-design-for-presentations-and-slide-decks`). These prompts assume the Claude Design substrate, intent presets, and the Tweak / Comment / Chat iteration cascade. Applying them to classic Artifacts, Cowork, or an inline chat visual produces noise.
Classic Artifacts prompts (the kind used inside any `claude.ai` chat) are conversational and lean on chat affordances. Claude Design prompts use the verbatim Goal / Layout / Content / Audience framework and lean on intent presets. The frameworks do not interchange cleanly.
### Limits differ
Claude Design has its own quota economics — the operator's Max / Pro plan budget burns down at a different rate than classic chat (research/04 documents observed Pro burn of roughly 25-30 minutes of active design work, Max roughly 60-90; these are community-observed, not Anthropic-published, and may shift). Opus 4.7 quality degrades at 40-50% context (`https://anthropic.com/engineering/harness-design-long-running-apps`). A 4-screen session inflection is documented community-wide.
None of these limits apply identically to classic Artifacts or to the official `knowledge-work-plugins/design` plugin. Diagnosing a quota / quality issue requires knowing which surface is in play.
### Scope differs
The official `knowledge-work-plugins/design` plugin is the right tool for post-design critique. Trying to make this plugin (`claude-design`) emit a critique would duplicate Anthropic's command surface and add nothing. The reverse — using `knowledge-work-plugins/design` to generate the artifact — does not work because that plugin operates on artifacts that already exist.
If the operator is uncertain whether their question is pre-design or post-design, ask: *does the artifact exist yet?* If no — this plugin. If yes — Anthropic's plugin.
- The operator references a third-party repo named `claude-design-*` → ask what surface they target; likely not the Anthropic Labs preview.
If signals remain mixed after this read-through, ask one clarifying question rather than guess: *"Are you working in the dedicated Claude Design workspace at `claude.ai/design`, or somewhere else?"*
This file documents the universal prompt framework an operator applies across every Claude Design intent preset. The five layers compose into one prompt block. Layers 1 to 3 are load-bearing for every preset; layers 4 and 5 are the refinement turn.
Every authoritative claim cites an Anthropic primary source. Where community practice extends an Anthropic concept, the extension is labelled and attributed.
Anthropic's verbatim framework for every Claude Design prompt. The framework is published in the get-started support article `https://support.claude.com/en/articles/14604416-get-started-with-claude-design`. Anthropic's framing: a good Claude Design prompt names the **Goal**, the **Layout**, the **Content**, and the **Audience**, in that order, before any aesthetic specification.
The four canonical questions:
- **Goal** — what is the artifact for? "An admin dashboard for monitoring API latency", "an onboarding flow for first-time users", "a landing page that converts free trial signups". One sentence.
- **Layout** — what is the page structure? Header / hero / metrics row / table / footer; or: hero / three-feature-grid / pricing table / CTA. Name the regions.
- **Content** — what fills the regions? Real data placeholders if you have them, named labels if not. Avoid generic "lorem ipsum" — the model defaults to convergent middle-ground content if you do not constrain it.
- **Audience** — who reads or uses this artifact? Internal team, external stakeholder, B2B procurement, B2C consumer, investor. Audience determines tone, density, and aesthetic.
Anthropic publishes three verbatim canonical examples in the same support article:
```
Goal: An analytics dashboard for our customer success team
Layout: Top metrics row (4 KPIs), main chart panel, recent activity table
Content: Product name placeholder "ProductX", real headline benefit copy,
three feature blurbs (icon + headline + line)
Audience: B2B technical buyers evaluating dev tools
```
The GLCA framework is sufficient on its own for a first prompt at intent preset `designs`. For other presets, GLCA composes with the per-preset pattern in `references/presets/<preset>.md`.
The same Anthropic get-started article publishes verbatim incremental-prompting advice: do not ship a 600-word first prompt. Ship a 120-word first prompt that names GLCA, see what Claude Design produces, then add complexity in turn two and turn three.
Anthropic frames this as the dominant failure mode for first-time Claude Design operators: over-specifying the first prompt produces an output that is dense but generic. The remedy is staged — let Claude Design make its default choices, then react to what it produces with targeted constraints.
This frames how the rest of the stack composes. In turn one, ship layers 1 + 2a (or 2b) + 3. In turn two, add layer 4. In turn three, add layer 5 emphasis-weighting.
## Layer 2a — Concrete-alternative-spec house-style control
The first of two Anthropic-documented house-style controls. Verbatim guidance from `https://platform.claude.com/docs/en/docs/build-with-claude/prompt-engineering/claude-4-best-practices`: name a concrete aesthetic family with explicit visual primitives rather than gesturing at a style.
The Anthropic-published exemplar — the AEFRM (Anthropic Engineering Field Reference Material) example — is verbatim usable:
Typography: square angular sans-serif (Söhne, Inter Variable as fallback);
no rounded glyphs; weight 500 for body, 700 for headers
Corner radius: 4px throughout — no fully rounded buttons, no pill shapes
Motion: transition: all 160ms ease-out on hover; no springy easing
Density: dense (table rows 32px tall; padding 8px on cards)
Surface: flat — no shadows, no glassmorphism
```
The control works because Claude Design reads this as a concrete brief and constrains its aesthetic decision space accordingly. Without an explicit concrete-alternative-spec, the model defaults to its convergent middle-ground aesthetic (rounded corners, generous spacing, friendly typography, gentle shadows — Anthropic's documented "AI-slop" default).
The hex palette, corner radius, and motion timing are all required — naming "industrial-utilitarian" alone is gesturing, not specifying.
The second Anthropic-documented house-style control, also from `https://platform.claude.com/docs/en/docs/build-with-claude/prompt-engineering/claude-4-best-practices`. When the operator does not know exactly which aesthetic to brief, Anthropic publishes a verbatim prompt template asking Claude Design to propose four distinct visual directions before committing to one.
The verbatim prompt:
```
Before building the dashboard, propose 4 distinct visual directions.
For each, give:
- bg hex
- accent hex
- typeface (named, not gestured)
- one-line rationale tying the direction to the audience and goal
Wait for me to pick a direction before generating the artifact.
```
This forks the conversation: turn one returns four named directions, the operator picks one, turn two generates against the chosen direction. The cost is one extra round; the upside is the operator avoids the dead-end of generating against an aesthetic that does not fit and only finding out after generation.
Use layer 2b when layer 2a is not feasible (the operator does not yet know the aesthetic). Use layer 2a when the aesthetic is known.
Anthropic publishes a verbatim banned-items list in `https://claude.com/blog/improving-frontend-design-through-skills` and reinforces it in the open-source frontend-design skill at `https://github.com/anthropics/skills/skills/frontend-design/SKILL.md`. The list names the convergent middle-ground patterns that Claude Design defaults to when underspecified.
Anthropic's verbatim AI-slop fingerprints to avoid:
- **Typography slop:** Inter, Roboto, Arial as default body font. Space Grotesk is flagged as overused. Default to a concrete-named typeface in the brief, not a generic sans-serif.
- **Color slop:** purple gradients on white backgrounds; solid-color hero backgrounds; convergent middle-ground palettes (the muted blue-and-grey "professional" default).
- **Motion slop:** scattered micro-interactions; bouncy spring easing on hover; pulse animations on idle elements.
- **Complexity-to-vision mismatch:** ornate components on simple layouts; flat components on otherwise rich layouts.
Operator-actionable copy-paste anti-prompt block (composes with layers 1 and 2):
```
Negative constraints — do not produce any of:
- Inter, Roboto, Arial, or Space Grotesk as the primary typeface
- Purple gradients on white backgrounds
- Solid-color hero backgrounds
- Three-column feature grids with icon + headline + line
- Centered-hero-with-single-CTA layout default
- Bouncy spring easing on hover transitions
- Pulse / breathing animations on idle elements
- Glassmorphism, neumorphism, or generic "modern SaaS" defaults
If you find yourself defaulting to any of these, stop and ask me to
clarify the aesthetic before continuing.
```
Sources: `https://claude.com/blog/improving-frontend-design-through-skills` and `https://github.com/anthropics/skills/skills/frontend-design/SKILL.md`.
---
## Layer 4 — Four design dimensions to optimize
Anthropic's verbatim per-dimension guidance from `https://github.com/anthropics/skills/skills/frontend-design/SKILL.md`. Four dimensions to brief explicitly when refining beyond the first turn:
- **Typography** — name typeface, modular scale (e.g., 1.250 minor third or 1.333 perfect fourth), weight palette, line-height palette, letter-spacing for headings. Anthropic's frontend-design SKILL.md publishes specific modular scales and weight palettes verbatim.
- **Color** — beyond palette hex, specify semantic roles (background, surface, accent, muted, error, success). Define interaction states explicitly (hover, active, disabled, focus). Anthropic's guidance: avoid relying on opacity for state changes; use explicit color tokens.
- **Motion** — name easing curves (ease-out, cubic-bezier values), name durations (120ms / 160ms / 240ms tiers), name what gets animated and what does not. Anthropic's guidance: motion should clarify hierarchy and confirm interaction; avoid decorative motion.
- **Backgrounds** — flat surface vs depth, when to layer surfaces, when shadows or borders define edges. Anthropic's guidance: backgrounds carry meaning; the bare-default-white background is rarely the right choice.
In turn two of an iteration, add layer-4 dimension specs to the brief. In turn three, refine the dimension that drifted most from intent.
Anthropic publishes verbatim grading criteria for design quality in `https://anthropic.com/engineering/harness-design-long-running-apps`. Four criteria, used as emphasis weights:
- **Design quality** — does the artifact look intentional, not defaulted? Is the aesthetic coherent across regions?
- **Originality** — does the artifact avoid the convergent middle-ground? Does it surprise without being weird?
- **Craft** — does the artifact feel detailed and considered at every level — typography, spacing, alignment, hierarchy, color?
- **Functionality** — does the artifact work for its goal and audience? Would it survive a usability test or a stakeholder review?
Anthropic's emphasis-weighting recommendation: in the prompt, weight which criterion matters most for *this* artifact. A dashboard for internal use weights functionality and craft highest. A pitch deck for an external investor weights design quality and originality highest. A wireframe for early exploration weights functionality highest with craft and originality deprioritized.
Operator-actionable layer-5 block:
```
Grading criteria for this artifact, in priority order:
Optimize against this ordering. If the artifact has to trade off,
trade off the lowest-weighted criterion first.
```
The non-monotonic-improvement caveat applies — Anthropic notes that quality across iterations is not strictly increasing. If turn three is worse than turn two on a critical criterion, the recovery move is documented in `references/03-iteration-and-session.md` ("pivot to an entirely different aesthetic if the approach wasn't working").
Operator reacts to turn-1 output by adding typography modular scale, semantic color roles, motion easing, and surface-depth rules.
### Turn 3 — add layer 5 weighting
Operator specifies that craft and functionality are the two highest-weighted criteria for this dashboard; design quality is third; originality lowest.
### Turn 4+ — Tweak panel takes over
Most subsequent refinements happen in the Tweak panel (per-artifact Claude-generated controls; zero-token surgical edits) — see `references/03-iteration-and-session.md` for the surface cascade.
---
## Source map
The five layers anchor on four Anthropic primary sources plus one open-source skill:
Re-research trigger: any of the four URLs returning 404 or shifting content materially; Anthropic publishing a sixth layer or revising any of the five. Captured-on date 2026-05-16 — the layer-1 framework has been stable since the launch announcement.
**Evidence grade:** Community-converged — Anthropic publishes the *concept* of a design-system document anchored to a Claude Design project (`https://support.claude.com/en/articles/14604397-set-up-your-design-system-in-claude-design`), but does not publish the 9-section canonical structure. The 9-section template comes from community practitioners (`https://github.com/rohitg00/awesome-claude-design`, `https://github.com/VoltAgent/awesome-claude-design`). Use accordingly: the concept is Anthropic-authoritative; the structure is community-converged.
---
## 1. Why DESIGN.md
A DESIGN.md file uploaded to a Claude Design project anchors design-system identity for every artifact generated in that project. Without an anchor, Claude Design defaults to its convergent middle-ground aesthetic (rounded corners, generous spacing, friendly typography, gentle shadows — the AI-slop pattern documented at `https://claude.com/blog/improving-frontend-design-through-skills`). With an anchor, the model reads the file at generation time and constrains its aesthetic, component, and motion decisions to match.
Anthropic publishes the concept of design-system anchors in `https://support.claude.com/en/articles/14604397-set-up-your-design-system-in-claude-design`. The article describes asset uploads, brand kits, and the principle that artifacts in a Claude Design project inherit the project's design language. What Anthropic does *not* publish is a recommended structure for the design-language file itself.
The community converged on a 9-section structure — documented across multiple awesome-claude-design repos, Substack posts, and practitioner blogs — that maps cleanly onto how Claude Design reads design context. The sections below are that converged structure.
---
## 2. The 9-section canonical structure
Each section names a decision Claude Design will otherwise default. The order is the order Claude Design appears to read most reliably (heaviest design-decision sections first).
### Section 1 — Visual Theme & Atmosphere
A one-paragraph description of the aesthetic family. Use named visual references the model can anchor to: "industrial-utilitarian like a Bloomberg terminal", "warm-editorial like The New York Times opinion section", "minimal-monochrome like Linear's UI".
flat surfaces. Reference: a modern data-tool UI (Linear, Datadog,
Bloomberg) — dense, intentional, no flourish. The product should look
like it was built for engineers by engineers.
```
### Section 2 — Color Palette & Roles
CSS-variable form with explicit hex values and semantic roles. Avoid relying on opacity for state changes — name each state explicitly.
Worked example:
```markdown
# Color Palette & Roles
:root {
--color-bg: #E9ECEC;
--color-surface: #C9D2D4;
--color-muted: #8C9A9E;
--color-fg: #44545B;
--color-ink: #11171B;
--color-accent: #4A6FA5;
--color-accent-hover: #3D5C8A;
--color-accent-active: #2F4A70;
--color-error: #B23A48;
--color-warning: #C89B3F;
--color-success: #4F7A4F;
}
Semantic roles:
- bg — page background
- surface — card / panel background
- muted — secondary text, borders
- fg — primary text
- ink — emphasis / heading text
```
### Section 3 — Typography Rules
Named typeface, modular scale, weight palette, line-height palette. Modular scales the community converged on are 1.250 (minor third) for dense interfaces and 1.333 (perfect fourth) for marketing pages.
Worked example:
```markdown
# Typography Rules
Primary typeface: Söhne (concrete-named, not "modern sans-serif").
Fallback: Inter Variable.
Display typeface: same as primary (no separate display face).
Per-component rules for the components Claude Design generates. Cover buttons, inputs, cards, tables, navigation. Specify radius, padding, border treatment, hover/active/disabled state explicitly.
Worked example:
```markdown
# Component Stylings
Buttons:
- radius: 4px (no pill shapes)
- padding: 8px 16px
- primary: bg accent, fg surface
- secondary: border 1px muted, bg transparent
- hover: bg accent-hover
- active: bg accent-active
- disabled: opacity 0.4, no pointer events
Inputs:
- radius: 4px
- padding: 8px 12px
- border: 1px solid muted
- focus: border accent + 2px outset ring at accent + 20% alpha
Cards:
- radius: 4px
- padding: 16px
- bg surface
- no shadow — borders define edges if needed
Tables:
- row height: 32px (dense)
- cell padding: 8px
- alternating row: bg + 4% darken
- hover row: bg + 8% darken
```
### Section 5 — Layout Principles
Grid system, spacing scale, breakpoint widths. Name the grid columns and the gap value.
Worked example:
```markdown
# Layout Principles
Grid: 12-column on screens >= 1024px, 8-column on screens 768-1023px,
4-column on screens < 768px.
Gap: 16px (--space-md).
Spacing scale:
--space-xs: 4px
--space-sm: 8px
--space-md: 16px
--space-lg: 24px
--space-xl: 32px
--space-2xl: 48px
Page max-width: 1440px centered.
Container padding: 24px on screens >= 768px, 16px below.
```
### Section 6 — Depth & Elevation
Surface depth rules. Most designs benefit from a clear flat-vs-layered decision rather than a mixed palette.
Worked example:
```markdown
# Depth & Elevation
Flat. No box-shadows by default. Borders define component edges.
Z-stack:
z-0: page surface
z-10: navigation
z-20: dropdown / popover
z-30: modal backdrop
z-40: modal content
z-50: toast / notification
Modal: bg surface, 1px border ink, no shadow.
Popover: bg surface, 1px border muted, no shadow.
```
### Section 7 — Do's and Don'ts
Explicit constraint list. This is where layer-3 (AI-slop avoid-list) gets project-specific.
Worked example:
```markdown
# Do's and Don'ts
Do:
- Use accent color sparingly (CTA + active state only)
- Use ink for headings, fg for body, muted for secondary
- Use 4px corner radius consistently
- Use 160ms ease-out for all hover transitions
Don't:
- Use purple gradients on white backgrounds
- Use solid-color hero backgrounds
- Use Inter / Roboto / Arial / Space Grotesk as primary typeface
- Use bouncy spring easing
- Use pulse / breathing animations on idle elements
- Use glassmorphism or neumorphism
- Use shadows except where the depth-and-elevation section explicitly permits
```
### Section 8 — Responsive Behavior
Breakpoint behavior. Anthropic does not publish responsive rules; this is the section where a project encodes its responsive philosophy.
Worked example:
```markdown
# Responsive Behavior
Mobile-first reasoning, but built desktop-first (target audience is
desktop). Below 768px:
- Navigation collapses to single icon-button row
- Tables become card stacks (one card per row)
- 12-column grid becomes 4-column
- Container padding drops from 24px to 16px
- Font sizes scale down by 0.9x
Touch targets: minimum 44px height (regardless of viewport).
```
### Section 9 — Agent Prompt Guide
A short block telling Claude Design how to use this DESIGN.md. This section is the bridge between the file and the prompt — it names the file by section and reminds the model that the constraints are load-bearing.
Worked example:
```markdown
# Agent Prompt Guide
When generating an artifact in this project, read every section of this
DESIGN.md before producing output. Treat color palette, typography rules,
component stylings, and layout principles as constraints — not
suggestions. If a generation would violate a constraint, stop and ask
which constraint to relax.
Cite specific section names when justifying design decisions in
explanatory text (e.g., "I chose 4px corners per Section 4 Component
Stylings").
```
---
## 3. Brand-to-DESIGN.md extractor prompt
A copy-paste-ready prompt the operator pastes into `claude.ai` (the chat product) or Claude.com to convert a brand URL, screenshot, or marketing asset into a DESIGN.md. The pattern comes from `https://github.com/rohitg00/awesome-claude-design/blob/main/prompts/brand-to-design-md.md` (adapted with attribution to the awesome-claude-design community template).
```
You will produce a DESIGN.md file by analyzing the brand reference
materials I provide. Output structure: 9 sections, in this order,
with the exact heading names:
1. Visual Theme & Atmosphere
2. Color Palette & Roles
3. Typography Rules
4. Component Stylings
5. Layout Principles
6. Depth & Elevation
7. Do's and Don'ts
8. Responsive Behavior
9. Agent Prompt Guide
Rules:
- Use CSS-variable form (--color-name: #HEX) for color palette
- Use modular scale form (--text-xs / --text-sm / ...) for typography
- Use named typefaces ONLY — if you cannot identify the typeface from
the brand materials with high confidence, write "unknown — operator
to fill in" rather than guessing. Do not hallucinate a typeface name.
- For each component (button, input, card, table, navigation), name
radius, padding, border treatment, and hover / active / disabled states
- Cite the source brand material at the top (e.g., "Extracted from
brand kit at <URL> on 2026-05-17")
Brand reference materials:
[paste URL, screenshot, or brand kit description here]
```
The anti-hallucination clause ("if you cannot identify... write unknown") is load-bearing — the failure mode without it is plausible-but-wrong typography names that the operator does not catch until they brief Claude Design and the output drifts.
Source for this extractor: community pattern at `https://github.com/rohitg00/awesome-claude-design/blob/main/prompts/brand-to-design-md.md`, adapted.
---
## 4. DESIGN.md failure modes
Four failure modes the operator should know before adopting DESIGN.md as a workflow primitive.
### Vision-token cost penalty
If the DESIGN.md is uploaded as an image (screenshot of a brand page) rather than as text, Claude Design pays a vision-token cost on every generation. Practitioner walkthroughs (Xinran Ma's documented experience in `research/03`) report meaningful quota burn from image-based DESIGN.md anchors. Use text-form DESIGN.md whenever possible.
### Per-user quota not pooled
Anthropic does not pool Claude Design quota across team members. A team using a shared DESIGN.md will not share quota burn — each member pays separately. Plan project workflow accordingly (research/04 documents community-observed Pro burn of roughly 25-30 minutes; Max roughly 60-90).
### Long-session degradation
DESIGN.md adherence appears to degrade in the back third of a long session. Opus 4.7 quality drops at the 40-50% context mark (`https://anthropic.com/engineering/harness-design-long-running-apps`); DESIGN.md is part of context, so its enforcement weakens accordingly. Session-break heuristics in `references/03-iteration-and-session.md` apply.
### Post-export drift
When an artifact is exported (PPTX, PDF, Code-handoff), the DESIGN.md does not travel with it. Downstream editing tools — PowerPoint, Adobe, Code IDEs — apply their own defaults. Validate the rendered output against DESIGN.md after export.
---
## 5. DESIGN.md sanity-check pattern
A community-converged test for DESIGN.md adherence: generate three artifacts in the same Claude Design project using deliberately different intent presets (designs, slides, one-pagers) and confirm that all three respect the DESIGN.md color palette, typography, and component-styling rules. If one preset drifts more than the others, the DESIGN.md needs sharpening on the section the drifting preset emphasized.
The community attribution for this pattern comes from a `theadpharm` Substack walkthrough (cited in `research/03`). It is a smoke test, not a guarantee — but it catches the common case where DESIGN.md is too general to constrain the model.
- `https://github.com/rohitg00/awesome-claude-design` — community awesome-list, brand-to-DESIGN.md extractor source
- `https://github.com/VoltAgent/awesome-claude-design` — community awesome-list, alternate 9-section structure references
Re-research trigger: Anthropic publishing an official DESIGN.md structure; either community awesome-list reaching consensus on a different section ordering; new failure mode surfaced in practitioner posts.
This file documents three things in order of operational urgency: which iteration surface to use when, when to break a session, and how to recover when iteration stops landing. The cost asymmetry between Tweak / Comment / Chat is the single largest leverage point in a Claude Design workflow.
---
## 1. The three-surface cascade
Claude Design exposes three edit surfaces with asymmetric token costs and asymmetric scope:
| Surface | Token cost | Scope | When to use |
|---------|-----------|-------|-------------|
| Tweak panel | Zero | Surgical, per-control | Anything Claude pre-derived at generation time |
| Inline comment | Zero on success | Component-scoped, in-component | Targeted in-component text or visual change |
| Chat | One full turn | Whole artifact | Structural change, aesthetic pivot, new section |
### Tweak panel
The Tweak panel is the per-artifact set of controls and sliders Claude pre-derives during generation. Each artifact comes with its own Tweak surface — section reordering, variant swap, density slider, spacing scale, color temperature, typography scale, padding / radius / shadow. The controls are surgical and zero-token: applying them does not consume a chat turn or budget time. They are also lossy-free — the artifact does not regenerate; the controls operate on the existing render.
Tweak panel coverage is per-artifact. If Claude did not pre-derive a control for a dimension, that dimension is not Tweak-editable. The first move is always to check the Tweak panel: most dimensions an operator wants to refine after the first generation are already there.
Operator-actionable mantra (the synthesis from `research/04`):
> Anything Claude pre-derives at generation time is surgical thereafter; new controls cost one chat turn for setup.
### Inline comments
The comment surface lets the operator click anywhere on the rendered artifact and attach a directive — "make this section narrower", "use a darker shade for this header", "remove this icon". Comments are surgical when the change is in-component (text edit, color tweak, sizing within an existing container) and they cost zero tokens on success.
Two failure modes the operator should know:
1. **Vanish bug** — comments sometimes disappear after submission with no edit applied. Anthropic has acknowledged this (community-cited; the workaround is to paste the comment text directly into chat as a follow-up turn).
2. **Structural-container failure** — comments cannot add a new structural container (a new section, a new column, a new modal). The model interprets the directive but produces no change, or makes an irrelevant change. For new containers, escalate to chat.
### Chat
Chat is the full-regeneration surface. Any structural change, aesthetic pivot, multi-component change, or new section requires a chat turn. The artifact regenerates against the new prompt; previous Tweak panel and comment state may not survive intact.
Chat costs one full turn — count it against the session budget. Use the layer-1-through-5 framework (`references/01-prompt-fundamentals.md`) for the chat prompt rather than free-form natural language.
### Per-operation surgical-vs-regen catalogue
A practical lookup for which surface fits which operation (synthesized from `research/04`):
| Operation | Surface | Notes |
|-----------|---------|-------|
| Section reordering | Tweak | Pre-derived if Claude includes a section-order control |
| Variant swap (component variant A → B) | Tweak | If Claude generated multiple variants |
| Density slider (compact / cozy / comfortable) | Tweak | Common Tweak control |
| Spacing scale (--space-* token shift) | Tweak | Common Tweak control |
| Color temperature (warmer / cooler) | Tweak | If Claude derives this dimension |
| Typography scale (modular scale shift) | Tweak | If Claude derives this dimension |
| Padding / radius / shadow per component | Tweak | Common Tweak controls |
| Text edit in existing component | Comment | Surgical, in-component |
| Color tweak in existing component | Comment | Surgical, in-component |
| Add a new section | Chat | Structural — Tweak / Comment cannot do this |
| Aesthetic pivot (industrial → editorial) | Chat | Full regen — name the new aesthetic |
| Multi-component change (revise hero + CTA + footer together) | Chat | Full regen — too broad for Comment |
| New interaction state (hover / disabled / active) | Chat | Structural — requires regeneration |
Anthropic's verbatim framing in `https://support.claude.com/en/articles/14604416-get-started-with-claude-design`: the Claude Design canvas distinguishes between *surgical edits* (per-element changes that do not regenerate the artifact) and *structural edits* (new components, new layouts, aesthetic pivots that require regeneration). The Tweak panel and inline comments are surgical surfaces; chat is the structural surface.
The operator's job is to identify which kind of edit a given change is *before* picking the surface. A surgical change attempted via chat regenerates the whole artifact and burns a turn; a structural change attempted via comment fails silently and wastes time. Misclassification is the dominant inefficiency in a long Claude Design session.
Anthropic publishes a verbatim refine-vs-pivot guideline in `https://anthropic.com/engineering/harness-design-long-running-apps`:
> Pivot to an entirely different aesthetic if the approach wasn't working — iteration within a bad direction compounds the failure.
The companion warning, also verbatim from the same source: design quality is **non-monotonic** across iterations. Turn 4 can be worse than turn 3 on a critical criterion. The framing matters because the operator's intuition pushes toward continued refinement; the discipline is to recognize a stuck state and pivot.
Operational signal that a pivot is needed (community-converged from `research/04`):
- Three consecutive comments have failed to land
- The aesthetic is drifting back to the AI-slop default on each regeneration
- The operator finds themselves explaining what they *don't* want more than what they *do* want
- The artifact is converging on a different audience than the brief
Pivot move: rewrite the layer-2 aesthetic-family specification entirely, then ship a fresh chat turn against the new family. Do not try to incrementally edit out of a stuck state.
Four heuristics — one Anthropic-published, three community-converged — govern when to break a session.
### 4-screen inflection (community-converged)
Practitioners across multiple posts in `research/04` document a quality inflection around the fourth screen of context in a Claude Design session. Before screen four, edits land cleanly; after, comments start vanishing, aesthetic defaults creep back, and Tweak controls feel less precise. The exact mechanism is unclear (context-window pressure on Opus 4.7 + cumulative DESIGN.md re-reads + cumulative artifact history), but the pattern is consistent.
Practitioner mantra: **at screen four, save what you have and start a new session.**
### Opus 4.7 context degradation (Anthropic-published)
`https://anthropic.com/engineering/harness-design-long-running-apps` publishes the verbatim observation: Opus 4.7 quality degrades noticeably at the 40-50% context-window mark. Claude Design sessions accumulate context faster than chat sessions (each generation includes the artifact in context for subsequent turns); the 40-50% mark arrives sooner.
### Quota burn (community-observed)
Practitioner walkthroughs cited in `research/04` report quota burn rates as of 2026-04-28 per MindStudio's documented walkthrough — these are community observations, not Anthropic-published limits and may shift:
- **Pro plan:** ~25-30 minutes of active design before quota becomes the binding constraint
- **Max plan:** ~60-90 minutes of active design before quota becomes the binding constraint
These numbers assume continuous active design (chat turns, regenerations, image-form DESIGN.md anchors). Tweak panel and comment surface usage does not burn quota.
Captured-on date: 2026-04-28 per `research/04`. Not an Anthropic-published limit.
### Session-break triggers (community-converged)
Three signals that a session has reached its productive end:
- Reorder / density Tweak controls stop landing (the model is not respecting the surgical surface)
- Chat re-introduces previously-removed defaults (the model is losing the negative constraints)
- The operator finds themselves repeating the same constraint in three consecutive turns
When two of three trigger together, break the session.
---
## 5. Context-reset prompt
When the operator needs to break a session but does not want to lose what worked, the verbatim community pattern from MindStudio (2026-04-28, cited in `research/04`):
```
Before we continue, summarize the design system and component decisions
we've made in this session as a structured markdown document I can use
- decisions we made and then reversed (so I don't reintroduce them)
- anything we tried that did not work
```
Paste the produced markdown into a new Claude Design session as the opening context, alongside the original DESIGN.md. The new session starts with the cumulative decisions but a fresh context window.
Captured-on date: 2026-04-28.
---
## 6. Recovery prompt library
Five recovery moves, listed in escalating cost order.
### 6.1 — Break the default aesthetic
The highest-leverage single content asset in this plugin. Adapted with attribution from `https://github.com/rohitg00/awesome-claude-design/blob/main/prompts/break-default-aesthetic.md`. Use when the artifact has drifted toward AI-slop defaults despite negative constraints in the brief.
```
The current direction has converged on a generic default. I want a
distinct visual direction. Constraints:
1. Pick ONE aesthetic family and commit to it. Name a concrete reference
(an existing product, an editorial source, a design movement). No
"modern SaaS", no "clean", no "minimal" as the named family — those
are defaults, not directions.
2. Do not produce any of:
- Inter, Roboto, Arial, Space Grotesk as primary typeface
- Purple gradients on white backgrounds
- Three-column feature grids with icon + headline + line
3. Before generating: list four candidate directions matching the goal
and audience. For each:
- Aesthetic family (with concrete reference)
- Color palette in hex
- Typeface (named)
- One-line rationale tying it to goal + audience
Wait for me to pick one. Do NOT default to "the most common modern
approach."
4. The aesthetic should surprise without being weird. If you're tempted
to write "professional" or "balanced" or "approachable", stop.
Those words signal default-mode reasoning.
```
### 6.2 — Fix the system, not the prompt
Community pattern: when iteration is stuck, the prompt is rarely the problem. The DESIGN.md is. Reopen the DESIGN.md, audit the section the artifact is drifting on (typography, color, components), tighten that section, re-upload, then re-generate.
The instinct is to add more constraints to the chat prompt. The discipline is to fix the upstream anchor.
### 6.3 — Edit previous message rather than send a new one
Community-documented workaround for context-bloat: when the previous prompt almost worked but missed one detail, edit the previous message rather than send a new turn. Claude Design re-generates from the edited message without adding to context. This is in `research/04` as a low-cost recovery move for the case where a single-word change would have fixed the output.
### 6.4 — 3-failed-comment escalation rule
If three consecutive inline comments fail to land, stop commenting and escalate to chat. The comment surface is signaling that the model is not in a state to respect surgical edits — either the artifact has drifted too far from the brief, the context window is pressured, or the change is actually structural and was misclassified.
Escalation move: paste the failed comment text directly into a chat turn, prefaced with "the inline comment surface is not landing on this; please apply this change via regeneration".
### 6.5 — Model downshift escalator
When Opus 4.7 generations are non-monotonic in quality and Tweak / Comment / chat moves all stop landing, the recovery move is to start a fresh session at a different model. The downshift sequence community-converged on (per `research/04`):
- Opus 4.7 → Opus 4.6 (same family, less context-pressure-sensitive)
- Opus 4.6 → Sonnet 4.6 (faster, less context-sensitive, sometimes better at constraint-following on tight briefs)
Claude Design pins to Opus 4.7. The downshift happens by moving the work to a different Anthropic surface (Claude.com chat with a model picker) for the constraint-tightening turn, then bringing the result back to Claude Design as a new session anchor.
### 6.6 — Verbal save-pattern
When the operator wants to preserve what works but try a different direction without losing the current state, the community pattern is to **verbally save** in chat:
```
Save what we have. The current direction is good but I want to explore
a completely different aesthetic for comparison. Acknowledge this save,
then start fresh on a new direction without referencing the saved state.
We may come back to it.
```
The "save" is verbal — Claude Design has no version-tree primitive — but it signals to the model that the previous direction is preserved in the operator's mental model and the next turn is exploratory.
---
## 7. What Claude Design lacks
Four primitives that exist in adjacent Anthropic surfaces but not in Claude Design today. The plugin must never promise these:
- **No `/rewind`** — Anthropic Code has a `/rewind` primitive that reverts to a prior conversational state. Claude Design does not.
- **No version history** — there is no Tweak-history, no Comment-history, no chat-thread-fork primitive. The verbal save-pattern (Section 6.6) is the closest substitute.
- **No two-way handoff** — once an artifact is exported to Claude Code, there is no path back into Claude Design. Re-import requires a screenshot → new Claude Design session (lossy). See `references/04-handoff-and-scope.md`.
- **No branching** — Claude Design cannot fork a session into parallel directions and compare. The verbal save-pattern is the only branching primitive.
When the operator asks for any of these, name the constraint and offer the closest substitute (verbal save-pattern, multi-session-with-context-reset-prompt, manual screenshot archive).
- `https://github.com/rohitg00/awesome-claude-design` — community recovery prompts, break-default-aesthetic source
- `https://claude.com/blog/improving-frontend-design-through-skills` — AI-slop avoid-list applied during recovery
Re-research trigger: Anthropic publishing version-history or branching primitives; community 4-screen inflection no longer reproducing; quota mechanics shifting (Pro / Max minute counts have a 2026-04-28 captured-on date and are community-observed, not Anthropic-published).
This file documents two things: how Claude Design hands artifacts off to downstream tools (Claude Code, PowerPoint, PDF, Canva), and how this plugin (`claude-design`) fits next to Anthropic's official `knowledge-work-plugins/design`. The scope fence is load-bearing — getting it wrong duplicates Anthropic's command surface and adds nothing.
---
## 1. Handoff bundle contents
When the operator chooses Claude Code handoff as the destination, Claude Design produces a bundle containing — verbatim per `https://anthropic.com/news/claude-design-anthropic-labs` and `https://support.claude.com/en/articles/14604416-get-started-with-claude-design`:
- **Machine-readable component spec** — a JSON-shaped description of the components in the artifact, with names, props, and variants
- **Design tokens** — colors, typography, spacing, radii in token form (CSS variables or JSON tokens)
- **Layout hierarchy** — the page / screen structure as a tree
- **Referenced assets** — images, icons, fonts referenced in the artifact, bundled
- **Standalone HTML + inline CSS + JS** — a self-contained render that runs without Claude Design
- **Per-state screenshots** — visual snapshots of each interaction state (default, hover, active, disabled, focused)
- **PM-annotated notes** — annotations Claude Design surfaces about design decisions, edge cases, and trade-offs
- **Stack / framework README** — a guide to which framework conventions the artifact assumes (e.g., React + Tailwind, or vanilla HTML)
The bundle is generated once on export. It does not regenerate when the operator iterates the artifact further inside Claude Design — the operator must re-export to pick up changes.
Sources: `https://anthropic.com/news/claude-design-anthropic-labs` and `https://support.claude.com/en/articles/14604416-get-started-with-claude-design`.
---
## 2. Direction is one-way
The Design → Code handoff direction is one-way. Once the bundle is exported and the operator starts iterating in Claude Code (or any code editor), there is no return path to Claude Design. The component spec, design tokens, and standalone HTML continue to live in the code repository; Claude Design has no concept of "re-ingest from code".
If the operator wants to visit the visual surface again after engineering iteration, the only path is:
1. Screenshot the current Claude Code render
2. Open a new Claude Design session
3. Paste the screenshot as the starting visual reference
4. Brief Claude Design from scratch using layer-1-through-5 framework
This is lossy: the design tokens, component spec, and PM notes from the original bundle do not travel into the new Claude Design session. The new session inherits only what the screenshot communicates.
Operational consequences:
- **Finalize visual decisions inside Claude Design before exporting.** The Tweak panel and inline comments are free; chat turns inside Claude Design are budget-priced; engineering iteration in code is budget-free but the visual round-trip is one-way. Order accordingly.
- **Export once, intentionally.** Bundling everything in a single export (per Section 6 below) costs one chat turn; bundling screen-by-screen costs N turns and consumes budget faster.
- **Plan for asymmetric revisit.** When the engineering implementation diverges from the design intent and the operator wants a designer review, schedule that revisit as a fresh Claude Design session, not as an extension of the original session.
Practitioner consensus on this point is documented at `https://claudefa.st/blog/guide/mechanics/claude-design-handoff` (community source). Anthropic frames the same one-way property implicitly in the get-started article — the handoff is described as an export, not a connection.
---
## 3. Workflow recommendation
The recommended flow for any Claude Design artifact destined for engineering implementation:
1. **Iterate visually in Claude Design until the artifact is shippable.** Use Tweak panel and inline comments first; chat turns for structural and aesthetic changes.
2. **Validate the destination format before exporting.** If destination is PPTX, verify the text-as-text count (see Section 6 token cost trap). If destination is HTML standalone, render it in the target browser at the target viewport. If destination is PDF, check the interactive-element handling.
3. **Export the full bundle once.** Bundle all screens in one export, not per-screen. The token cost trap (Section 6) compounds with per-screen exports.
4. **Iterate engineering inside Claude Code or the code editor.** Use Claude Code's `/edit` and chat surfaces. Pull the design tokens from the bundle into the repository's styling layer.
6. **If a visual revisit becomes necessary later, accept the one-way cost.** Open a new Claude Design session against a screenshot; do not try to re-extend the original session.
This is the flow per Section 4's scope-fence reasoning: this plugin covers the upstream lifecycle; Anthropic's covers downstream.
---
## 4. Scope fence vs Anthropic's `knowledge-work-plugins/design`
Anthropic ships an official Claude Code plugin at `https://claude.com/plugins/design` (source: `https://github.com/anthropics/knowledge-work-plugins`). It is skill-driven, Apache 2.0 licensed (MIT-equivalent), and ships six slash-commands operating on **existing** artifacts (Figma URLs, screenshots, copy snippets).
| Research synthesis | — | ✓ `/research-synthesis` |
| Design-system audit | — | ✓ `/design-system` |
| Engineering handoff | — | ✓ `/handoff` |
There is no functional overlap. This plugin produces prompts that go into Claude Design; Anthropic's plugin operates on artifacts that already exist. The split is clean by design — both plugins document the other as the lifecycle complement.
**Forbidden command-name list.** This plugin must NOT ship slash-commands with any of these names (with or without a `claude-design:` namespace prefix):
- `/critique`
- `/accessibility`
- `/ux-copy`
- `/research-synthesis`
- `/design-system`
- `/handoff`
`tests/validate-plugin.sh` assertion (h) enforces this mechanically. The rationale is collision-avoidance — if both plugins are installed and both ship `/critique`, command resolution becomes ambiguous and one or the other silently fails. The cleaner solution is: this plugin does not own those commands.
---
## 5. Recommended downstream tool
When the operator finishes the Claude Design lifecycle (artifact exists, exported, ready for review), surface the downstream tool installation as the next step:
```
claude plugins add knowledge-work-plugins/design
```
In a new Claude Code session with that plugin installed:
- Run `/critique <path-or-URL>` to get a design critique
- Run `/accessibility <path-or-URL>` for a WCAG audit
- Run `/ux-copy <path-or-URL>` for copy review
- Run `/research-synthesis` if the operator has user-research notes to synthesize
- Run `/design-system <path-or-URL>` for design-system consistency check
- Run `/handoff <path-or-URL>` for engineering-handoff guidance
Sources: `https://claude.com/plugins/design` and `https://github.com/anthropics/knowledge-work-plugins`.
The plugin is Apache 2.0, free, and maintained by Anthropic. There is no commercial trade-off; it is the canonical downstream tool.
---
## 6. Token cost trap — bundle all screens in one export
Practitioner-documented failure mode: exporting screen-by-screen instead of bundling all screens in one export. The community-cited reference is `token-budget-claude-design.md` (cited in `research/04` as one of the highest-leverage cost-management items).
The mechanism: each export turn passes the current artifact state through Opus 4.7 to produce the bundle. For an N-screen artifact, N separate exports run N separate bundle generations and burn N chat turns. Bundling all N screens in a single export runs one bundle generation against the cumulative state and burns one chat turn.
The bundling prompt pattern:
```
Generate the full export bundle covering all N screens of this artifact:
Include for each screen: HTML standalone, design tokens, component spec,
per-state screenshots, PM notes. Bundle as a single download.
```
Multi-screen artifacts (prototypes, slide decks, multi-page landing pages) benefit most from this discipline. Single-screen artifacts (a single dashboard, a single one-pager) are not affected because there is only one bundle to generate.
If the operator has already paid the per-screen cost and noticed mid-flight, the recovery is to abandon partial exports and run one final bundling export against the cumulative artifact state.
The `designs` intent preset is Claude Design's generic generation mode. It covers dashboards, components, layouts, and design explorations that do not fit into one of the more specialised presets (prototypes, slides, one-pagers, etc.). It is the preset operators reach for when the goal is "produce a high-quality visual artifact" rather than a destination-shaped artifact.
This file documents the `designs` preset across six dimensions: what it is, when to use it, Anthropic's published prompt patterns, community uplift, critical caveats, and one end-to-end worked prompt.
---
## (a) What this preset is
Anthropic's launch post (`https://anthropic.com/news/claude-design-anthropic-labs`) describes `designs` as the default-mode preset — the substrate every other preset effectively inherits from, with destination shaping layered on top. Output is HTML + React + inline CSS, viewable in the Claude Design canvas, exportable to PDF / HTML standalone / Code-handoff.
Two Anthropic primary sources ground this preset:
- The Anthropic-engineering blog `https://anthropic.com/engineering/harness-design-long-running-apps` publishes the four design grading criteria (design quality, originality, craft, functionality) that the `designs` preset is optimised against.
- The frontend-design open-source skill at `https://github.com/anthropics/skills/skills/frontend-design/SKILL.md` documents Anthropic's verbatim Design-Thinking Framework — **Purpose**, **Tone**, **Constraints**, **Differentiation** — and the verbatim AI-slop avoid-list.
The frontend-design skill is the closest thing Anthropic publishes to a `designs`-preset system prompt. Read it whenever the operator wants to understand what Claude Design is internally optimising for.
---
## (b) When to use it
Pick `designs` when the goal is generic, exploratory, or composite. The decision matrix:
| Interactive product flow for usability testing | prototypes |
| Presentation for stakeholders | slides |
| Single-page memo or leave-behind | one-pagers |
| Low-fi structural layout for early review | wireframes-mockups |
| Investor / external pitch | pitch-decks |
| Landing page, social variant, marketing asset | marketing-collateral |
| Code-powered prototype with voice / video / shaders / 3D | frontier-design (experimental — see preset file) |
If the operator is uncertain between `designs` and `prototypes`, the distinguishing question is: **is this for usability testing?** Yes → prototypes. No → designs.
If uncertain between `designs` and `marketing-collateral`, the distinguishing question is: **is this destined for a marketing surface (landing page, social, ad)?** Yes → marketing-collateral. No → designs.
---
## (c) Anthropic-published prompt patterns
### The Design-Thinking Framework (verbatim from frontend-design/SKILL.md)
Anthropic's `https://github.com/anthropics/skills/skills/frontend-design/SKILL.md` publishes the verbatim four-part framework Claude Design uses when reasoning about a design:
- **Purpose** — what is the artifact for? Match every aesthetic decision to the purpose.
- **Tone** — what emotional register fits the audience and the purpose? Energetic, calm, authoritative, playful, terse?
- **Constraints** — what cannot be changed? Brand colors, typeface restrictions, layout rules, accessibility minimums.
- **Differentiation** — what makes this artifact distinct from the convergent middle-ground default? Name the differentiation explicitly.
Use this framework as a pre-brief check before composing a layer-1-through-5 prompt (see `../01-prompt-fundamentals.md`). If any of the four parts is fuzzy, sharpen it before drafting.
### Verbatim AI-slop avoid-list
Anthropic's frontend-design skill + the blog post `https://claude.com/blog/improving-frontend-design-through-skills` publish the verbatim banned-items list used in layer 3 of the prompt stack. See `../01-prompt-fundamentals.md` Section "Layer 3" for the full list. The `designs` preset inherits this list — it is not optional.
### Anthropic's verbatim canonical examples
The Anthropic get-started article `https://support.claude.com/en/articles/14604416-get-started-with-claude-design` publishes three verbatim canonical examples (dashboard, mobile onboarding, landing page) demonstrating the Goal / Layout / Content / Audience framework. Read them as the reference shape for a first prompt against `designs`. Reproduced in full in `../01-prompt-fundamentals.md` Section "Layer 1".
---
## (d) Community uplift
Three community-converged patterns extend Anthropic's published material for the `designs` preset.
### Real-data injection over lorem ipsum
Victor Dibia's documented pattern (`research/03`): substitute realistic placeholder content rather than lorem ipsum. The model defaults to convergent middle-ground content when content is unspecified; named placeholders ("Today's MRR: $48,200", "Last 24h error rate: 0.12%") anchor the model to real-shaped output.
For dashboards specifically: use realistic metric values, realistic timestamps, realistic user names. The visual difference between a chart with `$3,200` / `$4,500` / `$2,800` and a chart with `$XXX` / `$YYY` / `$ZZZ` is large — Claude Design will infer typography spacing and component sizing from the named values.
### Explicit modular scale and weight palette
Community pattern (research/03): name the typographic modular scale and weight palette in the brief rather than letting the model default. The `1.250` (minor third) scale fits dense informational artifacts; the `1.333` (perfect fourth) scale fits marketing pages. Weight palettes converge on `500 body / 600 emphasized / 700 headings`.
### Specify the negative aesthetic family
Beyond layer-3 negative constraints (which name specific banned items), community practice (research/03) is to name an entire negative aesthetic family — "not modern SaaS", "not playful illustrated", "not corporate professional" — to push the model out of its default neighbourhood. The model interprets aesthetic-family naming as a strong signal even in the negative.
---
## (e) Critical caveats
Three caveats specific to the `designs` preset.
### Default-aesthetic drift on iteration
The `designs` preset is most susceptible to default-aesthetic drift because it has no destination-shaped constraint pulling it toward a specific genre. Watch for drift back to AI-slop defaults across iterations — the `references/03-iteration-and-session.md` "break-default-aesthetic" recovery prompt is targeted at exactly this drift.
### Non-monotonic improvement across iterations
`https://anthropic.com/engineering/harness-design-long-running-apps` documents that quality across iterations is not strictly increasing. Turn 4 can be worse than turn 3 on design quality, originality, or craft. The recovery move (pivot, not refine) is in `../03-iteration-and-session.md`.
### Component spec coherence
For dashboards and component libraries specifically, the export bundle's machine-readable component spec is load-bearing for engineering handoff. Ensure the artifact has coherent component definitions (named, with consistent variants) before exporting — otherwise the component spec will be partial and the engineering implementation will diverge.
---
## (f) One end-to-end worked prompt — layers 1 + 2a + 3 composed
Goal: an admin dashboard for an analytics product, audience is data engineers.
```
Goal: An admin dashboard for monitoring data-pipeline freshness across
120 tables, sorted by last-successful-load timestamp
Layout: Header with environment switcher + global time-window selector;
top metrics row (4 KPIs: tables behind SLA, tables current,
tables stale, tables errored); main panel with stacked area
chart showing freshness over the last 24 hours; sortable table
Evidence grade: Experimental — no validated practitioner pattern as of 2026-05-16. Frontier design is currently marketing language for "elaborate variants of the other presets," not a distinguishable generation mode practitioners can reliably invoke today.
This file documents what Anthropic publishes about the `frontier-design` preset, what practitioners have shipped (nothing verified), what adjacent material exists, and the single experimental pattern the plugin offers — clearly labelled as unverified speculation. The honest position the plugin takes is in Section (e).
---
## (a) What Anthropic says
Anthropic's launch post `https://anthropic.com/news/claude-design-anthropic-labs` describes `frontier-design` in a single sentence (verbatim):
> code-powered prototypes with voice, video, shaders, 3D and built-in AI
This is the entirety of Anthropic's per-preset documentation as of the 2026-05-16 captured-on date. There is no dedicated tutorial, no support article, no canonical prompt set, no Anthropic-published example artifact.
The launch sentence implies the preset targets:
- Code-powered prototypes (not static designs) — implying interactive elements at minimum
- 3D (WebGL 3D scenes, possibly Three.js or similar)
- Built-in AI (LLM-driven interactions inside the artifact)
Anthropic's framing suggests `frontier-design` is the preset for showpiece artifacts demonstrating Claude Design's outer-edge capabilities — not a workhorse preset like `designs`, `prototypes`, or `slides`.
---
## (b) What practitioners have shipped
Verifiable practitioner outputs as of 2026-05-16: **NONE that we could verify.**
The most explicit acknowledgment of this gap comes from `https://llmx.tech/blog/claude-design-hands-on-review-2026` (cited in `research/03`):
> ...no frontier design assessment provided. The hands-on review covers designs, prototypes, slides, and one-pagers. Frontier design is named in the preset list but received zero hands-on evaluation, because no practitioner artifact has been published demonstrating what the preset produces in practice.
Across the community sources surveyed in `research/03` (Substack walkthroughs, awesome-claude-design lists, Twitter / X threads, MindStudio walkthroughs, sagnikbhattacharya, victordibia, theadpharm, claudefa.st, etc.), no practitioner has published a verifiable frontier-design artifact with prompt, output, and reproduction steps. The preset is named, occasionally referenced, but not demonstrated.
This may change. The preset is new (April 2026 launch); practitioner adoption lags. The `.coverage.md` re-research trigger explicitly flags "first verified frontier-design practitioner artifact ships publicly" as a refresh trigger for this file.
---
## (c) Adjacent material
While no `frontier-design`-specific Anthropic or practitioner material exists, two adjacent sources cover the underlying capabilities Anthropic names.
### Motion and spatial composition — `frontend-design/SKILL.md`
`https://github.com/anthropics/skills/skills/frontend-design/SKILL.md` publishes Anthropic's guidance on motion (easing curves, timing tiers, what to animate and what not to) and spatial composition (typography hierarchy, surface depth, layered backgrounds). These are the building blocks the `frontier-design` preset would extend with voice / video / shaders / 3D, but the building blocks themselves are general-purpose.
### Animated and 3D websites — MindStudio walkthrough
MindStudio's 2026-04-28 walkthrough (cited in `research/03`) covers prompting Claude for animated and 3D websites — but the walkthrough is set in adjacent Anthropic surfaces (Claude.com chat with an HTML artifact), not in `claude.ai/design` with the `frontier-design` preset specifically. The walkthrough is useful for the prompt-engineering pattern (naming GLSL fragment shader constraints, polygon-count budgets, voice-prompt structuring) but is not a frontier-design-preset artifact.
---
## (d) Single experimental pattern (unverified speculation)
One experimental pattern, clearly labelled as unverified, that an operator could try if they want to engage with the preset despite the gap.
The pattern comes from Google's Gemini deep-research output (cited in `research/03`) and carries low confidence. It is a constraint-language pattern for shader and physics elements, adapted from broader frontend-design practice:
```
[layers 1 through 5 of the standard prompt stack from
../01-prompt-fundamentals.md]
Frontier capabilities to engage:
Shaders:
- One custom GLSL fragment shader applied to the hero region
- Shader pattern: [name the visual character — e.g.,
"subtle gradient flow with imperceptible noise" or
"iridescent surface reacting to cursor position"]
- Frame budget: 60fps target on Apple M1 / equivalent
- No fullscreen shader-bombs (battery / heat / accessibility)
3D:
- One 3D element in the hero region, scene-bounded (no fullscreen)
- Polygon count budget: <50,000 triangles
- Lighting: 2-3 light sources max
- Camera: fixed or single-axis orbit; no free-camera
Voice:
- [if voice UI relevant] one voice-driven interaction, with
visible text-transcript fallback
- Speech-recognition language and accent assumptions named
explicitly
Video:
- [if video element relevant] embedded video with explicit
- [if applicable] one LLM-driven interaction in the artifact
- Explicit fallback for when the LLM call fails
Test in target browsers (Chrome, Safari, Firefox) at the target device
class (M1 / M2 desktop, mid-range mobile). Expect aesthetic drift
across runs; non-monotonic improvement applies amplified for frontier
capabilities.
```
Confidence rating on this pattern: **low**. It is a reasoned extrapolation from frontend-design principles, not a tested frontier-design prompt. If you try it, document what works and what does not — there is a community gap to fill.
---
## (e) The plugin's honest position
The plugin's stance on `frontier-design`:
If you want to attempt frontier design, treat it as a high-fidelity prototype (`prototypes` preset) with extra constraint language for shaders, polygons, voice, video, and built-in AI. Expect aesthetic drift on first generations. Verify that your output works in target browsers before committing chat-turn budget to refinement. Expect that the model's prior on what "frontier design" means may differ from yours — over-specify everything that matters.
Do not assume `frontier-design` produces a categorically different artifact from `prototypes` + extra capability constraints. The launch sentence is suggestive; the practitioner evidence is absent. The preset is marketing language for elaborate prototype variants until proven otherwise.
When the operator names `frontier-design` specifically:
1. Read this file with them
2. Confirm they have understood the practitioner-evidence gap
3. Offer the experimental pattern in Section (d) as a starting point, clearly labelled as unverified
4. Treat the resulting artifact as exploratory — surface what worked and what did not, contribute back to the community gap
5. Plan for amplified non-monotonic-improvement (`../03-iteration-and-session.md`) — frontier capabilities compound the standard non-monotonic risk
---
## (f) Re-research trigger
This file refreshes when any of the following happens:
- Anthropic publishes a dedicated tutorial, support article, or canonical prompt set for `frontier-design`
- A verified practitioner artifact appears publicly with prompt + output + reproduction steps
- The launch-post one-sentence description changes materially
- A community pattern reaches enough adoption to be cited (not speculation) — the `awesome-claude-design` lists and adjacent practitioner blogs are the primary surfaces to watch
When any of these triggers, update Section (b) to reflect verified material, replace Section (d) with the verified pattern, and re-grade the evidence label from "Experimental" to "Community-only" or "Anthropic-documented + community-validated" as appropriate.
---
## Sources
- `https://anthropic.com/news/claude-design-anthropic-labs` — Anthropic's verbatim one-sentence description (the entirety of Anthropic-published material on this preset)
- `https://llmx.tech/blog/claude-design-hands-on-review-2026` — community practitioner explicitly noting the frontier-design evaluation gap
- `https://github.com/anthropics/skills/skills/frontend-design/SKILL.md` — adjacent material on motion and spatial composition
- `https://claude.com/blog/improving-frontend-design-through-skills` — AI-slop avoid-list (composed for frontier prompts)
Re-research trigger: see Section (f). The preset is the most volatile in this plugin's coverage; expect this file to refresh first when Anthropic ships material or practitioners publish artifacts.
Evidence grade: Community-only — Anthropic publishes no per-preset prompt patterns for this preset as of 2026-05-16.
Anthropic names `marketing-collateral` in the launch enumeration at `https://anthropic.com/news/claude-design-anthropic-labs` but publishes no dedicated tutorial. The patterns below come from community practitioners; treat them as field-tested but not Anthropic-authoritative. Anthropic's frontend-design open-source skill at `https://github.com/anthropics/skills/skills/frontend-design/SKILL.md` is the closest adjacent Anthropic source — it covers landing-page and marketing-site design philosophy without per-preset prompts.
---
## (a) What this preset is
Anthropic launch post one-sentence description: `marketing-collateral` covers landing pages, social variants, banner ads, email creative, and other visual assets in the marketing surface area. Output is typically HTML for landing pages, image-shaped for social and ads.
Distinguishing properties:
- Conversion-oriented — the artifact has a measurable goal (signups, clicks, opens)
- Multi-format — a single campaign typically needs landing page + social variants + email + ad creative
- Brand-anchored — marketing collateral lives or dies on brand fidelity; a DESIGN.md is essentially mandatory
- Variant-heavy — A/B testing assumes multiple variants of the same creative
---
## (b) Why Anthropic published no per-preset guidance
The launch enumeration treats marketing-collateral as a destination shape rather than a distinct generation mode. The frontend-design open-source skill (`https://github.com/anthropics/skills/skills/frontend-design/SKILL.md`) is the closest thing Anthropic publishes — it covers the design-philosophy layer (Purpose / Tone / Constraints / Differentiation) but not marketing-specific prompt patterns.
Community practitioners have built patterns around landing-page composition, social-variant fan-out, and competitor-screenshot extraction (Section c).
---
## (c) Community patterns
### chatprd.ai landing-page workflow
Community pattern from `https://chatprd.ai` (cited in `research/03`): a four-stage landing-page production flow optimised for Claude Design:
1. **Brief stage** — define the audience, the value prop, the one CTA, the proof points. Output: text document, not in Claude Design yet.
2. **Outline stage** — translate the brief into a section-by-section landing-page outline. Hero, problem, solution, features (3-grid or 4-grid), proof (logos / quotes / numbers), pricing or single-CTA, FAQ, footer. Output: text outline.
3. **Visual stage** — brief Claude Design from the outline using layers 1-5. First turn produces the landing page; iteration tightens.
4. **Variant stage** — once the master landing page works, generate variants for A/B testing (different hero, different proof-point ordering, different CTA framing) using the variant-fan-out pattern below.
The four-stage workflow separates copy decisions from visual decisions, which lets the operator iterate each independently. The community-documented failure mode is briefing visual + copy together in one prompt — the model conflates the two and produces a generic landing page.
### Sagnik Bhattacharya variant-fan-out for social
Community pattern from `https://sagnikbhattacharya.com/blog/claude-design` (cited in `research/03`): for social-format collateral (Instagram square, LinkedIn rectangle, Twitter / X aspect), generate N variants in parallel rather than sequentially. The brief pattern:
```
Generate 6 variants of the [campaign] creative, sized for [format
spec]. Across the 6:
- Vary the headline framing (problem-led, solution-led,
proof-led)
- Vary the visual hierarchy (text-dominant, image-dominant,
balanced)
- Vary the color emphasis (accent-dominant, monochrome,
high-contrast)
- Keep the value prop, audience, and CTA identical across all 6
Output as 6 distinct artifacts I can A/B test.
```
The pattern produces a campaign-set in one chat turn rather than six iterations.
Community pattern (cited in `research/03`): when the operator has a competitor's marketing page that visually achieves what they want, screenshot it and brief Claude Design with the screenshot as a visual reference, paired with an explicit "do not copy; extract the visual-language principles" instruction:
```
The attached screenshot shows [competitor]'s landing page. Do NOT
copy the structure, the copy, or the layout. DO extract the
visual-language principles:
- typography character (named family + scale + weights)
- color temperature and palette structure
- visual density (how much whitespace, how many elements per fold)
- motion language (if visible from the screenshot or apparent from
the brand)
- overall aesthetic family (named with concrete reference)
Apply those principles to our landing page, which has a fundamentally
different structure, copy, and CTA flow. Output our landing page
respecting the extracted visual language but original in structure.
```
The pattern is high-leverage when the operator has a clear visual reference but cannot articulate it in DESIGN.md form. The risk: too-literal copying produces a derivative-feeling artifact. Brief the "extract, do not copy" constraint explicitly.
### Slop-fingerprints warning amplified
Marketing collateral is the surface where AI-slop fingerprints are most punishing. The teal gradient + serif headline + blinking status dot + container-on-container + glassmorphism pattern is recognisable across many AI-generated landing pages. Audiences pattern-match on it and discount the artifact. Layer 3 negative constraints apply with extra weight:
```
Negative constraints — do not produce any of:
- Teal-to-blue or teal-to-green gradients
- Serif headline on sans-serif body (unless explicitly briefed for
editorial direction)
- Blinking / pulsing status indicators ("Live", "New", "Updated")
Marketing collateral without a tight DESIGN.md anchor produces generic output. The brand DESIGN.md is essentially mandatory — see `../02-design-md.md` for the extractor pattern when the operator does not already have one. Validate brand fidelity at every iteration: typeface, color palette, voice (tone of copy), visual density. Brand drift on marketing collateral is more visible to the audience than brand drift on internal artifacts.
### A/B testing requires more than aesthetic variation
The variant-fan-out pattern produces aesthetic variations. For meaningful A/B testing, the variants should test specific hypotheses (does problem-led headline outperform solution-led? does image-dominant outperform text-dominant?) rather than test generic aesthetic variation. Brief the hypotheses explicitly.
### Export-to-image for social formats
Social-format collateral typically exports as PNG or JPG (Claude Design produces HTML; the operator screenshots at the target dimensions). The export is lossy for hover states, interactive elements, and motion. Brief the static state explicitly when the destination is image:
```
The destination for this creative is a static PNG/JPG export. Generate
the static state only. No hover states, no interaction logic, no motion.
```
---
## (e) One worked prompt — layers 1 + 3 composed, four-stage landing-page flow
Goal: a landing page for a developer-tools SaaS product, audience is senior engineers evaluating dev tools.
```
Goal: A landing page for "ObserveAPI", a developer-tools SaaS product
for API observability. The goal: convert senior-engineer
visitors to free-trial signups.
Layout: Hero (above-fold), problem (one paragraph + 3 pain points
as labelled rows), solution (one paragraph + product
screenshot), features (3-grid), proof (3 customer logos +
one quote + one named metric), pricing (single tier + free
- `https://claude.com/blog/improving-frontend-design-through-skills` — AI-slop avoid-list (amplified for marketing collateral)
- `https://chatprd.ai` — community four-stage landing-page workflow
- `https://sagnikbhattacharya.com/blog/claude-design` — community variant-fan-out pattern for social formats
- `https://anthropic.com/engineering/harness-design-long-running-apps` — design grading criteria (composed for marketing collateral)
Re-research trigger: Anthropic publishing a marketing-collateral tutorial; community four-stage workflow drifting; new slop-fingerprint patterns emerging in the AI-generated landing-page corpus; competitor-screenshot extraction patterns evolving.
Evidence grade: Community-only — Anthropic publishes no per-preset prompt patterns for this preset as of 2026-05-16.
Anthropic names `one-pagers` in the launch enumeration at `https://anthropic.com/news/claude-design-anthropic-labs` but does not publish a dedicated tutorial, support article, or canonical prompt set for it. The patterns below come from community practitioners — Substack walkthroughs, blog posts, newsletter pieces — with full attribution. Treat them as field-tested but not Anthropic-authoritative.
---
## (a) What this preset is
Anthropic launch post one-sentence description: a one-pager is a single-screen artifact for memos, summaries, leave-behinds, executive briefs, or single-page deliverables. The destination is typically PDF or print.
Distinguishing properties:
- Single page — no multi-screen navigation, no scrolling sections that imply continuation
- High information density compared to a slide
- Self-contained narrative — reader does not need surrounding context
- Often delivered as a leave-behind after a meeting or as a one-shot brief
---
## (b) Why Anthropic published no per-preset guidance
The launch enumeration treats one-pagers as a destination shape rather than a generation mode requiring distinct prompt patterns. The Goal / Layout / Content / Audience framework (`../01-prompt-fundamentals.md` Layer 1) and the five-layer stack apply directly — Layout becomes single-page structure, Content becomes the dense information payload, Audience tightens the tone.
Community practitioners have converged on patterns that constrain the one-pager preset more tightly than the generic stack does (Sections c and d).
---
## (c) Community patterns
### Word-count cap per block
Community pattern from `https://sagnikbhattacharya.com/blog/claude-design` (cited in `research/03`): cap the word count per layout block in the brief. The mechanism — Claude Design defaults to verbose prose when block content is unspecified, and one-pagers fail when any block runs long. The convergent caps from community practice:
- Title block: 8 words max
- Subtitle: 15 words max
- Body paragraph: 60 words max
- Bullet item: 12 words max
- Callout box: 25 words max
Brief the caps explicitly in the prompt — the model otherwise produces blocks 2-3x longer than the operator wants.
### Above-the-fold density limit
Community pattern from `https://newsletter.victordibia.com` (cited in `research/03`): cap the number of distinct elements visible in the top half of the one-pager. Convergent limits:
- Maximum 5 distinct visual elements above the fold (counting title, subtitle, one body block, one visual, one callout = 5)
- Maximum 3 colour roles visible above the fold (typically: ink for title, fg for body, accent for one emphasis)
- Maximum 2 typographic weights above the fold (typically 700 title, 500 body)
The brief encodes these as explicit constraints:
```
Above the fold (top half of the page), no more than 5 distinct visual
elements, no more than 3 color roles, no more than 2 typographic
weights. Density below the fold can scale up.
```
### Real-data injection over lorem ipsum
Same pattern as `designs.md` and `prototypes.md`: use realistic placeholder content. For one-pagers specifically, this matters more — a one-pager is typically read once and discarded; if the content reads as placeholder, it loses the reader.
---
## (d) Critical caveats
### Density-versus-readability trade-off
One-pagers are constrained by physical reading mechanics — the operator can pack a lot into one page, but each element added reduces the reader's attention to every other element. The brief should weight density-vs-readability explicitly:
```
Optimise for the reader's ability to extract the takeaway in 30 seconds
of scanning. If a block requires more than 30 seconds of focused reading
to extract the takeaway, it does not belong on this one-pager.
```
### Export to PDF preserves layout; export to other formats may not
PDF is the canonical one-pager export. HTML standalone works. PPTX is awkward for one-pagers (PPTX assumes deck format, not single-page format). Code-handoff is rare for one-pagers but works.
### Anthropic AI-slop avoid-list still applies
Layer 3 negative constraints (`../01-prompt-fundamentals.md`) apply with full force on one-pagers — the dense information context does not exempt the artifact from the avoid-list. Inter, Roboto, Arial, purple gradients on white, generic-modern defaults all degrade the one-pager.
---
## (e) One worked prompt — layers 1 + 3 composed (Layer 2a is preset-optional)
Goal: an executive one-pager summarizing a project's Q1 status, audience is VP of Engineering.
```
Goal: A single-page executive summary of the platform team's Q1 2026
delivery, reliability, and Q2 themes. Designed to be scanned in
30 seconds and absorbed in 3 minutes.
Layout: Single-page A4 portrait. Top quarter: title + headline takeaway
+ 3 KPI numbers in a row. Middle half: 3 short body paragraphs
Re-research trigger: Anthropic publishing a dedicated tutorial for one-pagers; community word-count caps drifting; new one-pager-specific community pattern emerging.
**Evidence grade:** Community-only — Anthropic publishes no per-preset prompt patterns for this preset as of 2026-05-16.
Evidence grade: Community-only — Anthropic publishes no per-preset prompt patterns for this preset as of 2026-05-16.
Anthropic names `pitch-decks` in the launch enumeration at `https://anthropic.com/news/claude-design-anthropic-labs` but publishes no dedicated tutorial. **Critical caveat upfront:** community practitioner `https://moda.app/blog/claude-design-for-pitch-decks` documents an explicit recommendation against using Claude Design for external/investor pitch decks when PPTX is the required delivery format — see Section (d).
---
## (a) What this preset is
Anthropic launch post one-sentence description: `pitch-decks` covers investor pitches, external partner proposals, and any high-stakes presentation format that needs to look polished. The preset distinguishes itself from the more general `slides` preset by audience — external rather than internal — and by typical destination — PPTX or PDF rather than HTML.
The distinguishing question vs `slides`: **is the audience an external investor or external partner where the deck represents the company's positioning?** Yes → `pitch-decks`. Internal audience → `slides`.
---
## (b) Why Anthropic published no per-preset guidance
Anthropic likely treats pitch-decks as a high-stakes specialisation of `slides` rather than a fundamentally distinct generation mode. The `slides` tutorial at `https://claude.com/resources/tutorials/using-claude-design-for-presentations-and-slide-decks` covers the prompt patterns; the `pitch-decks` preset inherits those patterns and adds the audience-stakes layer.
Community practitioners have converged on patterns specific to pitch-deck production (Section c). The most important community contribution, however, is the PPTX-export caveat (Section d) — the failure mode is severe enough that the default recommendation diverges from `slides`.
---
## (c) Community patterns
### Sagnik Bhattacharya's 10-slide template
Community pattern from `https://sagnikbhattacharya.com/blog/claude-design` (cited in `research/03`): the convergent 10-slide pitch-deck template for B2B SaaS pitches:
1. Title — company name, one-line positioning, tagline
2. Problem — who has it, what it costs them, how acute
3. Solution — what we built, one-sentence value prop
4. Market — TAM / SAM / SOM (sized realistically)
5. Product — 2-3 screenshots or visual demos
6. Business model — how we charge, ACV ranges, GTM motion
7. Traction — revenue, growth rate, named customers
8. Team — founders + key hires, why this team for this problem
9. Competition — competitive map (4-quadrant or named comparisons)
10. Ask — funding round size, use of funds, timeline
The template is community-converged, not Anthropic-published. It composes with Anthropic's `slides` tutorial patterns from `https://claude.com/resources/tutorials/using-claude-design-for-presentations-and-slide-decks`.
### MindStudio per-slide micro-prompts
Community pattern from MindStudio (cited in `research/03`): rather than briefing the full pitch deck in one prompt, micro-prompt each slide individually. The pattern produces tighter per-slide narrative because each slide gets dedicated attention.
The micro-prompt pattern (per slide):
```
Generate slide N of the pitch deck. This slide does one job:
[the slide's job — e.g., "convince the audience that the problem is
acute by quantifying customer cost"].
Visual elements: [specific to this slide — e.g., "one large number
showing annual cost, two supporting smaller numbers, one explanatory
sentence"].
Constraints from DESIGN.md: [reference the project's DESIGN.md].
Do not include filler — every element on this slide must support the
one job.
```
Walk through slides 1-10 sequentially.
### Outline-first scaffolding (composed from `slides.md`)
The outline-first pattern from `slides.md` Section (d) applies: brief the deck as an outline first (turn 1), then expand to full slides (turn 2-N).
Community-documented at `https://moda.app/blog/claude-design-for-pitch-decks` (cited in `research/03`) and `https://claudedesign.substack.com`: when an HTML-rendered pitch deck is exported to PPTX, richly-styled text can flatten to images. PowerPoint loses the editability — the text becomes a rasterised picture.
For internal slide decks (audience tolerates some export friction), the operator can mitigate by keeping typography simple. For external pitch decks (audience expects polish, may want to add their own annotations or edit the deck), this failure mode is severe enough that the community recommendation is:
> Do not use Claude Design for external/investor pitch decks where PPTX export is the required delivery format. Use HTML standalone or PDF if Claude Design is required; otherwise produce the deck in PowerPoint or Keynote directly.
This plugin surfaces the recommendation but does not refuse to operate. The operator may have a reason to proceed (HTML acceptable, PDF acceptable, the PPTX text-as-text survival is verified to be acceptable for their specific styling). When proceeding, validate PPTX export early — generate slide 1 fully, export to PPTX, verify text-as-text survival, then commit to the deck.
### Audience-stakes asymmetry
A pitch deck for a $50M Series C carries different stakes than a pitch deck for a $500K seed extension. The operator's tolerance for export imperfection scales with the dollar amount on the line. Default conservatively — when in doubt about whether the export will survive, treat the deck as high-stakes.
### Slop-fingerprints warning amplified
Layer 3 negative constraints apply with extra weight on pitch decks. Investors see many decks; AI-slop fingerprints (purple gradients, generic three-column structure, Inter typography, centered-hero defaults, glassmorphism, neumorphism) signal that the deck is templated and the team did not invest care. The brief should over-specify the negative constraints.
---
## (e) One worked prompt — layers 1 + 3 composed, slide-by-slide micro-prompt pattern
Goal: a 10-slide investor pitch deck for a B2B SaaS observability product, audience is Series A investors.
```
PRECONDITION: Before generating any slide, render slide 1 fully and
export to PPTX. Verify text-as-text survival. If text flattens to
images, switch destination to HTML standalone or PDF and notify me
before continuing.
Goal: A 10-slide Series A pitch deck for a B2B SaaS observability
product
Layout (outline — per Sagnik Bhattacharya's 10-slide template at
The `prototypes` intent preset generates interactive product flows for usability testing, design review, and stakeholder demos. Output is multi-screen HTML with working state transitions, clickable navigation, and per-state visual treatments. The preset is documented by Anthropic in a dedicated tutorial.
---
## (a) What this preset is
Anthropic frames `prototypes` (launch post `https://anthropic.com/news/claude-design-anthropic-labs`) as the preset for product flows that need to behave like a real product, not just look like one. The distinguishing property vs `designs`: interaction state. Prototypes have hover states, active states, click-through transitions, multi-screen navigation, and per-state visual treatments.
The dedicated tutorial is `https://claude.com/resources/tutorials/using-claude-design-for-prototypes-and-ux`. It is the load-bearing source for this preset.
Output is HTML + React + inline CSS + JS (the JS makes the interactions work). Exportable to Code-handoff (engineering takes the working interaction logic forward) or HTML standalone (runs in a browser without Claude Design).
---
## (b) When to use it
Pick `prototypes` when the goal is interactive validation. The decision matrix:
| Internal tool demo for stakeholder review | **prototypes** |
| A-B comparison of two design directions in working form | **prototypes** |
| Onboarding flow walkthrough for new hire training | **prototypes** |
| Static design exploration (no interaction needed) | designs |
| Slide deck for a meeting | slides |
If the operator describes the artifact in terms of "user clicks here, then sees this", they want `prototypes`. If they describe it in terms of "screen with these regions", they may want `designs`.
---
## (c) Anthropic-published prompt patterns
The Anthropic tutorial `https://claude.com/resources/tutorials/using-claude-design-for-prototypes-and-ux` publishes nine canonical prompt patterns across four families:
### Family 1 — Feature prototyping (4 verbatim canonical prompts)
For new feature flows where the operator needs to test interaction logic. The patterns cover: a sign-in / sign-up multi-step flow; a checkout / payment flow with form validation states; a settings / preferences flow with toggles and selects; a search / filter flow with result-state transitions. Each prompt names entry point, success path, error states, and edge cases.
Refer to the tutorial for the verbatim prompt text — Anthropic publishes the exact wording, and this plugin cites it by URL rather than copying it (per the brief's source-quality rule and the Apache-2.0/MIT compatibility note in `../04-handoff-and-scope.md`).
### Family 2 — Design review and A-B comparison (2 verbatim canonical prompts)
For prototyping when the goal is to compare two design directions side-by-side or in turn. The verbatim Anthropic comparison-prompt pattern:
```
Show me three different layouts for [feature]. For each:
- Visual direction (named, with concrete reference)
- Interaction model (where the user starts, where they end up)
- One-line rationale tying it to the audience and goal
Once I pick one, generate the full interactive flow.
```
Use this for A-B-C exploration before committing to a direction. The pattern composes with layer 2b (propose-options-before-building) from `../01-prompt-fundamentals.md`.
### Family 3 — User-flow scaffolding (1 verbatim canonical prompt)
For mapping a multi-screen user journey. The pattern names the entry context, the screens in sequence, the decisions at each screen, and the success path vs the error paths. The output is a clickable multi-screen prototype with the navigation logic baked in.
### Family 4 — Internal tools (2 verbatim canonical prompts)
For internal-tooling prototypes — admin panels, content-moderation queues, customer-support consoles. The pattern emphasises dense interfaces, keyboard-driven navigation, and minimal aesthetic flourish. The patterns differ from external-product prototypes in tone and density.
The tutorial also publishes three transversal recommendations Anthropic asks operators to apply across all four families:
- **Component-naming clarity** — name components in the brief so the generated artifact's component spec is engineering-handoff-ready (research/03 D2). Generic names like "Button1" produce generic component specs.
- **Decision documentation** — ask Claude Design to document its design decisions inline (the PM-annotated notes feature) so the engineering handoff carries rationale, not just visuals.
- **Edge-case flagging** — explicitly request that Claude Design flag interaction edge cases (empty state, loading state, error state, offline state, permission-denied state). The model defaults to happy-path-only without this directive.
---
## (d) Community uplift
Three community-converged patterns extend Anthropic's published material for `prototypes`.
### Request every state upfront
Community pattern (`research/03`): explicitly request every interaction state in the first prompt rather than discovering missing states across iterations. The verbatim community phrasing:
```
For every interactive element in this prototype, generate:
- default state
- hover state
- active / pressed state
- focused state (keyboard navigation)
- disabled state
- loading state (if the element triggers async work)
- error state (if the element can fail)
Render every state visibly somewhere in the prototype — either inline
or in a dedicated state-catalogue page.
```
This catches the failure mode where the operator does not notice a missing state until a usability test surfaces it.
### Real-data injection over lorem ipsum
Same pattern as the `designs` preset, more important here: prototypes used in usability testing fail when the content is obviously fake. Test participants react to lorem ipsum and stop engaging with the flow. Use realistic content even when the prototype is throwaway.
### Explicit motion timing and easing
MindStudio community walkthrough (cited in `research/03`): name the easing curve and the duration explicitly for prototypes that include any motion. Default motion is the largest source of "feels like AI" in a prototype. The community-converged baselines for product prototypes:
- Hover transitions: `transition: all 160ms ease-out`
A prototype with 10 components × 7 states each = 70 distinct visual treatments in one artifact. Each treatment consumes context. Claude Design quality drops faster on prototypes than on `designs` for the same number of screens. Session-break heuristics (`../03-iteration-and-session.md`) apply with extra weight.
### Test in target browsers before stakeholder review
The standalone HTML export runs the prototype's JavaScript locally. Edge-case JavaScript (touch handlers, IntersectionObserver, ResizeObserver) does not always work the same across browsers. Test in Chrome + Safari + Firefox before sharing with stakeholders. If you target mobile usability testing, test on actual mobile devices, not just a browser DevTools mobile-emulation viewport.
### Multi-screen exports — bundle in one export
The token-cost trap in `../04-handoff-and-scope.md` Section 6 applies most strongly to multi-screen prototypes. Bundle all screens in one export turn; do not export screen-by-screen.
---
## (f) One end-to-end worked prompt — layers 1 + 2a + 3 composed
Goal: a multi-step onboarding flow for a new SaaS analytics product, audience is small-business operators.
```
Goal: An interactive 5-step onboarding flow for new users of a SaaS
product. The flow: welcome → data-source connection → metric
The `slides` intent preset generates presentation decks — internal stakeholder updates, executive roadmaps, customer briefings, partner proposals, all-hands meetings. Output is HTML deck with per-slide layouts, optionally exportable to PPTX (with caveats — see Section e).
---
## (a) What this preset is
Anthropic launch post (`https://anthropic.com/news/claude-design-anthropic-labs`) names `slides` as a destination-shaped preset: the artifact assumes the slide-deck format, not the dashboard / one-pager / prototype format. Output renders in the Claude Design canvas as a slide-by-slide thumbnail strip plus the active slide in full view.
Two Anthropic primary sources ground this preset:
- The dedicated tutorial `https://claude.com/resources/tutorials/using-claude-design-for-presentations-and-slide-decks` publishes five verbatim canonical prompts (Section c) and the slide-deck composition framework.
- The PowerPoint-mode article `https://support.claude.com/en/articles/13521390-use-claude-for-powerpoint` publishes the PPTX export conventions and the template-respecting guidance: Claude reads the slide master, layouts, fonts, and color scheme of an uploaded PPTX template and produces output that respects them.
The two sources compose: the tutorial covers the prompt patterns, the PowerPoint-mode article covers the export discipline.
---
## (b) When to use it
Pick `slides` when the destination is a presentation surface. The decision matrix:
| Operator goal | Preset |
|---------------|--------|
| Internal team update / project review | **slides** |
| Customer-prep briefing for a sales call | **slides** |
| Executive roadmap or quarterly business review (Q1/Q2/Q3/Q4 results) | **slides** |
| All-hands or company-wide announcement | **slides** |
| External investor pitch deck | **pitch-decks** (separate preset — see preset file for the PPTX trap) |
| Single-page memo / one-pager | one-pagers |
The distinguishing question vs `pitch-decks`: **is the audience internal or external?** Internal → `slides`. External investor → `pitch-decks` (with explicit caveat about PPTX export, see `pitch-decks.md`).
---
## (c) Anthropic-published prompt patterns
The Anthropic tutorial `https://claude.com/resources/tutorials/using-claude-design-for-presentations-and-slide-decks` publishes five verbatim canonical prompts:
### Pattern 1 — Q1 results deck
For quarterly business review decks (Q1 / Q2 / Q3 / Q4 results). Pattern names the metrics in priority order, the audience seniority level, the narrative arc (where we started, what changed, where we are, what's next), and the supporting visualisations (charts, tables, callout numbers).
### Pattern 2 — Executive roadmap
For multi-quarter roadmap decks. Pattern names the roadmap horizon (quarters or half-years), the workstreams (3-7 named tracks), the major milestones per workstream, the dependencies between workstreams, and the assumptions / risks per quarter.
### Pattern 3 — Customer-prep briefing
For sales-call preparation decks. Pattern names the customer (company + named contacts), the meeting goal, the customer's known priorities, the value-prop alignment, the proof points (case studies, metrics), and the asks / next steps.
### Pattern 4 — Partner proposal
For co-development or partnership proposals. Pattern names the proposed scope, the resourcing model, the timeline, the success metrics, the IP / licensing model, and the open questions.
### Pattern 5 — All-hands announcement
For company-wide updates. Pattern names the announcement (one sentence), the why-now context, the impact on employees, the timeline, and the resources / Q&A links.
Refer to the tutorial URL for the verbatim prompt text. This plugin cites by URL rather than reproducing Anthropic's exact wording (per the brief's source-quality rule and the Apache-2.0/MIT compatibility note in `../04-handoff-and-scope.md`).
### Template-respecting guidance (verbatim from PowerPoint-mode article)
`https://support.claude.com/en/articles/13521390-use-claude-for-powerpoint` publishes the verbatim guidance about uploading an existing PPTX template:
> Claude reads the slide master, layouts, fonts, and color scheme of an uploaded PowerPoint template and produces output that respects them.
Practical implication: if the operator has an existing brand-compliant PPTX template, upload it as a Claude Design project asset before prompting. The generated deck will respect the template's typography, color palette, and layout conventions. Without an uploaded template, the model defaults to its convergent middle-ground deck aesthetic.
---
## (d) Community uplift
Three community-converged patterns extend Anthropic's material for `slides`.
### Outline-first narrative scaffolding
Community pattern (`research/03`): brief the deck as an outline first (turn 1: produce the slide-by-slide outline as a markdown bullet list), then expand to full slides (turn 2: generate the deck from the approved outline). This forks the conversation but produces tighter narrative arcs than briefing the full deck in one turn.
The outline-first pattern composes with Anthropic's GLCA framework (`../01-prompt-fundamentals.md` Layer 1) — Goal becomes the deck's takeaway, Layout becomes the outline, Content becomes the per-slide bullets, Audience becomes the seniority + context match.
### Single-job-per-slide constraint
Community pattern (research/03): each slide should communicate exactly one idea. Slides with two or more ideas leak comprehension. Brief the constraint explicitly:
```
Each slide does one job. If a slide is trying to communicate two ideas,
split it into two slides. The takeaway from each slide should be
nameable in one sentence.
```
### Audience translation matrix
Community pattern (research/03): brief the audience translation explicitly when the deck spans seniority levels. For example, a roadmap deck shared with both engineering leads and executive sponsors needs to translate technical decisions into business outcomes for the exec audience without losing fidelity for the engineering audience. The pattern:
```
For each slide, write two versions of the takeaway:
- The technical-detail version (for the engineering audience)
- The business-outcome version (for the executive audience)
Use the business-outcome version on the slide and the technical-detail
version in the speaker notes.
```
---
## (e) Critical caveats
Three caveats specific to `slides`.
### HTML → PPTX export is lossy for richly-styled text
Community-documented at `https://moda.app/blog/claude-design-for-pitch-decks` (cited in research/03) and `https://claudedesign.substack.com`: when the operator exports an HTML-rendered slide deck to PPTX, richly-styled text (custom typography, mixed weights, inline color variations) can flatten to images. PowerPoint loses the editability — the text becomes a rasterised picture.
The mitigation:
- If the destination is final PPTX delivery, validate the rendered PPTX text-as-text count before assuming editability survived
- If text-as-text is critical (legal review, copy-edit-after-the-fact), keep the typographic styling simple in the brief — single typeface, two weights, no inline color variation, no inline highlighting
- For the `pitch-decks` preset specifically, this caveat is severe enough that the recommendation is "do not use Claude Design for external pitch decks where PPTX is required" — see `pitch-decks.md`
### Don't trust the canvas as ground truth if destination is PPTX
The Claude Design canvas renders the deck in HTML. The PPTX export converts to PowerPoint format. Some aesthetic decisions that look correct in the canvas do not survive the export:
- Custom backgrounds with gradients can rasterize
- Inline icons positioned via CSS can shift
- Multi-column slide layouts can collapse
- Speaker-notes-equivalent annotations may not round-trip
Validate by opening the exported PPTX in PowerPoint before stakeholder delivery, especially for high-stakes decks.
### Quota burn on long decks
Multi-slide decks (10+ slides) compound context faster than single-page artifacts. The 4-screen inflection in `../03-iteration-and-session.md` applies — long decks reach the inflection within 2-3 chat turns. Plan to break the deck into outline → first 5 slides → next 5 slides if the deck is large, using the context-reset prompt between sessions.
---
## (f) One end-to-end worked prompt — layers 1 + 2a + 3 composed
Goal: a 12-slide internal-team Q1 results deck, audience is engineering leadership + cross-functional partners.
```
Goal: A 12-slide Q1 2026 results deck covering platform-team delivery,
reliability metrics, headcount and hiring, top 3 themes for Q2,
and one slide on a major incident retrospective
Layout (outline):
1. Title — "Platform team Q1 2026 results"
2. TL;DR — three bullet takeaways
3. Delivery — features shipped, % of roadmap completed
4. Reliability — uptime, MTTR, incident count
5. Latency — p95/p99 trend
6. Hiring — headcount delta, key hires, open roles
7. Top theme 1 for Q2 — name + one-sentence framing
8. Top theme 2 for Q2 — name + one-sentence framing
9. Top theme 3 for Q2 — name + one-sentence framing
10. Incident retrospective — what happened, what we learned
11. Asks — explicit asks of leadership and partners
12. Q&A / Discussion prompt
Content: Real numbers throughout — actual % completion, real MTTR
minutes, real headcount, real names for hires (or named
placeholders); each slide's takeaway nameable in one sentence
Audience: VP of Engineering, peer Eng-leadership, partner-team PMs,
Evidence grade: Community-only — Anthropic publishes no per-preset prompt patterns for this preset as of 2026-05-16.
Anthropic names `wireframes-mockups` in the launch enumeration at `https://anthropic.com/news/claude-design-anthropic-labs` but publishes no dedicated tutorial, support article, or canonical prompt set for it. The patterns below come from community practitioners; treat them as field-tested but not Anthropic-authoritative.
---
## (a) What this preset is
Anthropic launch post one-sentence description: `wireframes-mockups` covers the spectrum from low-fidelity layout sketches (boxes, labels, structure-only) to high-fidelity mockups (visual design applied, but not interactive). The output is structural — the goal is to communicate layout, not aesthetic and not interaction.
Distinguishing properties:
- Static (not interactive) — wireframes and mockups do not have working state transitions; for interaction logic, use `prototypes`
- Low-fi or high-fi — the preset spans both ends; the operator picks via prompt
- Pre-visual-design — wireframes are often deliverables before the visual designer commits to a direction
- Iteration-cheap — wireframes are intended to be iterated quickly, so the prompt patterns lean on N-variations-first generation
---
## (b) Why Anthropic published no per-preset guidance
Wireframes occupy a niche between `designs` (visual exploration) and `prototypes` (interaction validation). Anthropic appears to treat the preset as a destination shape rather than a distinct generation mode. The Goal / Layout / Content / Audience framework (`../01-prompt-fundamentals.md` Layer 1) applies — Layout is the dominant concern, Content becomes structural labels, Audience determines fidelity level.
Community practitioners have converged on the patterns below (Section c).
---
## (c) Community patterns
### N-variations-first
Community pattern from `https://designwithai.substack.com` (cited in `research/03`): wireframes are most useful when generated in N variations and compared. Brief the model to produce N distinct layout variations in the first turn, then pick one to refine.
Convergent N values from community practice: 3 or 4 variations is the sweet spot. More than 4 dilutes the operator's attention; fewer than 3 does not surface meaningful alternatives.
The brief pattern:
```
Generate 4 distinct wireframe variations for [feature/page]. For each:
- One sentence describing the structural direction
- The wireframe itself (boxes, labels, no visual design)
After I pick one, refine that variation into a mockup with visual
design applied.
```
This composes with Layer 2b (propose-options-before-building) from `../01-prompt-fundamentals.md`.
### Wireframe vs High-Fidelity sub-preset selection
Community pattern from `https://computingforgeeks.com` (cited in `research/03`): the preset spans low-fi to high-fi, but the model behaves differently across the spectrum. Brief the fidelity level explicitly:
```
Fidelity: low-fi
- boxes with labels, no typography weights other than 500
- one color (greyscale) — bg, surface, muted, fg
- no images, no icons — represent visual elements as labelled boxes
- 8pt grid visible if helpful
OR
Fidelity: high-fi
- actual typography, full color palette, real icons, real images
- production-ready visual treatment
- no interaction logic (this is wireframes preset, not prototypes)
```
Pick one explicitly. The default-mode failure is the model producing a mid-fidelity output that satisfies neither the low-fi structural goal nor the high-fi visual-validation goal.
### The Aakashg / Nielsen "low-fi-is-deprecated" debate (flagged as unsettled)
A community debate documented across Aakash Gupta's and Jakob Nielsen's posts in 2024-2025 (cited in `research/03`) argues that AI-generated high-fi mockups have made low-fi wireframes operationally obsolete — the marginal cost of generating a high-fi mockup is now low enough that there is no reason to start with low-fi. The counter-argument: low-fi wireframes still serve a communication function (forcing reviewers to focus on structure, not aesthetic) that high-fi mockups undermine.
This plugin treats the debate as **unsettled**. The brief should pick the fidelity level deliberately, with the choice tied to the audience and the review purpose, not to a default assumption that one fidelity dominates. Flag the debate when the operator's choice seems unconsidered.
---
## (d) Critical caveats
### Aesthetic drift if starting in high-fi
When starting in high-fidelity mode, the model imports its convergent middle-ground aesthetic defaults more aggressively (because the visual decisions are within scope). Layer-3 negative constraints (`../01-prompt-fundamentals.md`) apply with extra weight on high-fi mockups.
### Iteration economy — wireframes burn turns
Each variation requested in the N-variations-first pattern costs a fraction of a turn (the model generates all N in one chat round). But subsequent refinement of the chosen variation often requires multiple turns (typography decisions, color palette, component-level styling). Budget accordingly — a wireframe-to-mockup flow can consume 4-6 turns for a single page.
### Wireframe ≠ prototype
If the operator describes user interactions ("the user clicks here, then sees this"), they want `prototypes`, not `wireframes-mockups`. Wireframes capture structure; prototypes capture behaviour. Misclassification leads to wasted turns regenerating an artifact in the wrong preset.
---
## (e) One worked prompt — layers 1 + 3 composed, N-variations-first pattern
Goal: 4 wireframe variations for a customer-onboarding page, audience is product team for review.
```
Goal: 4 distinct wireframe variations for the first page of a customer
onboarding flow. The page introduces the product, captures
essential information, and routes the customer to one of three
Re-research trigger: Anthropic publishing a dedicated wireframes-mockups tutorial; the Aakashg/Nielsen low-fi-is-deprecated debate reaching practitioner consensus; new sub-fidelity tier surfacing in community practice.
@ -5,6 +5,284 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [5.1.0] - 2026-05-01
### Summary
Plain-language UX humanizer release. Default output of all 18 commands now leads with prose; technical IDs surface at end-of-line as references rather than headlines. Non-expert users — the bulk of the OSS audience — now read findings like "Fix soon: The same automation is set up more than once" instead of "[high] CA-CNF-001: Hook duplicate event registration". Scanner internals are unchanged; humanization is a pure output-time transform applied at the rendering layer. The `--raw` flag preserves v5.0.0 verbatim output for tooling that scrapes stderr; `--json` is unchanged from v5.0.0 and remains byte-stable for programmatic consumption.
- **`scanners/lib/humanizer.mjs`** — pure-function output translator: `humanizeFinding`, `humanizeFindings`, `humanizeEnvelope`, `computeRelevanceContext`. Never mutates inputs. Adds three additive fields per finding (`userImpactCategory`, `userActionLanguage`, `relevanceContext`) and replaces title/description/recommendation when a translation is available; falls through to originals otherwise.
- **`scanners/lib/humanizer-data.mjs`** — TRANSLATIONS table for 13 scanner prefixes (CML, SET, HKV, RUL, MCP, IMP, CNF, COL, TOK, CPS, DIS, GAP, PLH). Three-step lookup per finding: exact title → regex pattern → `_default` → fall through to scanner original.
- **User-impact categories** (5 labels): Configuration mistake, Conflict, Wasted tokens, Missed opportunity, Dead config. Mapped from scanner prefix.
- **Action-language phrases** (5 labels): Fix this now, Fix soon, Fix when convenient, Optional cleanup, FYI. Mapped from severity.
- **Relevance context** (3 values): `test-fixture-no-impact`, `affects-this-machine-only`, `affects-everyone`. Computed from finding's file path — basenames matching `*.local.*` and paths containing `/tests/fixtures/` are recognized.
- **Self-audit terminal humanization** — `formatSelfAudit()` routes through `humanizeEnvelope`. JSON path (`--json`) is unchanged; humanization applies only to the prose terminal render.
- **`tests/snapshots/v5.0.0/`** + **`tests/snapshots/v5.0.0-stderr/`** — frozen byte-equal references for SC-6 (--json) and SC-7 (--raw) backwards-compatibility tests across 8 CLIs.
- **`tests/snapshots/default-output/`** — humanized-prose snapshots for SC-5 default-output stability.
### Changed
- **Default output of all 18 commands** now uses plain-language descriptions. Findings group by user-impact category; titles lead with prose; technical IDs (`CA-CML-001`, `CA-TOK-005`, …) surface at end-of-line as references.
- **All 21 command and agent templates** updated to render humanized output by default and pass `--raw` through when the user requests v5.0.0 verbatim mode.
- **CLI flag inventory** — every CLI now accepts `--raw` (new) in addition to `--json` (existing, unchanged). `--output-file <path>` still writes raw v5.0.0-shape JSON regardless of mode (humanizer-bypassed, posture-specific).
### Migration
- **No action required for existing automation** that consumes `--json` — the JSON envelope shape is byte-stable with v5.0.0 and humanizer fields are bypassed in `--json` and `--raw` paths.
- **Tooling that scrapes stderr** from default mode (e.g., `posture.mjs`'s scorecard) needs review — default stderr now uses prose vocabulary. Pass `--raw` for byte-stable v5.0.0 verbatim stderr.
- **No scanner-internal changes.** Finding IDs, severity ladders, scoring weights, and area scorecards are unchanged. Upgrades are presentation-layer only.
### Test count
- 635 → 792 tests across 52 test files (+157 humanizer-tester through Waves 0–5).
- New top-level tests: `json-backcompat.test.mjs`, `raw-backcompat.test.mjs`, `scenario-read-test.test.mjs`, `snapshot-default-output.test.mjs`.
- New lib tests: `humanizer.test.mjs`, `humanizer-data.test.mjs`, `scoring-humanizer.test.mjs`.
- New scanner tests: `posture-humanizer.test.mjs`, `scan-orchestrator-humanizer.test.mjs`, `cli-humanizer.test.mjs`.
### Out of scope (deferred to v5.1.1+)
- **Posture `--output-file` humanization** — `posture.mjs` does not call `humanizeEnvelope`, so files written via `--output-file` are raw v5.0.0-shape JSON. Future revision: drop `--output-file` from command templates or add a `--humanized-json` flag.
- **Knowledge cross-references** (Step 17 of plan) — not delivered per user decision (2a).
- **Scoring scorecard JSON headline emission** — currently rendered prose-side only; command templates that want to skip stderr parsing would benefit.
- **CPS — Cache-Prefix Stability** (`CA-CPS-NNN`): volatile content in lines 31–150 of CLAUDE.md cascade, beyond TOK Pattern A's top-30 window. Volatile-pattern set extends Pattern A with shell-exec lines (`!` prefix) and `${VAR}` substitutions.
- **DIS — Disabled-In-Schema** (`CA-DIS-NNN`): tools listed in BOTH `permissions.deny` AND `permissions.allow`. Tool identity uses bare name (`Bash(npm:*)` and `Bash` are the same tool). Severity low.
- **Pattern E — Oversized cascade:** medium when `activeConfig.claudeMd.estimatedTokens > 10_000`.
- **Pattern F — Bloated SKILL.md description:** low when frontmatter `description > 500 chars` (loads every turn). Scoped to `discovery.files`.
- **`/config-audit manifest`** + `scanners/manifest.mjs` CLI — single ranked table of every system-prompt token source (CLAUDE.md cascade, plugins, skills, MCP servers, hooks) sorted DESC by `estimated_tokens`. CLAUDE.md per-file tokens distributed proportional to bytes.
- **`--accurate-tokens` flag** on `token-hotspots-cli.mjs` (N5): when `ANTHROPIC_API_KEY` is set, calls Anthropic's `count_tokens` for the top 3 hotspots and populates `output.calibration = { actual_tokens, source: 'count_tokens_api', sampled_hotspots: 3 }`. When absent: `calibration = { skipped: 'no-api-key' }` plus stderr warning.
- **`scanners/lib/tokenizer-api.mjs`** — `count_tokens` wrapper. 5s AbortController timeout. Exponential backoff on 429 (3 retries: 1s/2s/4s). API key masked to `${key.slice(0,8)}...` in every error; HTTP body never included in errors (it may echo the key on auth failures). `maskKey()` exported.
- **`--with-telemetry-recipe` flag** on the same CLI (M7): emits `telemetry_recipe_path` field pointing to `knowledge/cache-telemetry-recipe.md`.
- **`knowledge/cache-telemetry-recipe.md`** (M7): manual `jq` recipe summing `cache_read_input_tokens` + `cache_creation_input_tokens` per turn from session transcripts. Hit-rate interpretation table.
- **`'mcp'` kind on `estimateTokens`** (F2): active MCP servers estimate ≥ 500 tokens (base + schema overhead) instead of v4's flat 15. Optional `{toolCount}` raises to `500 + toolCount × 200`.
- **F7 — TOK pattern severities recalibrated** for tokens-per-turn impact: Pattern A `medium → high`, Pattern B `low → medium`, Pattern C `medium → low`. Each finding carries a `calibration_note` evidence field documenting the heuristic basis.
- **`scoreByArea` deduplicates by area name** (N3 prep): TOK + CPS share "Token Efficiency"; SET + DIS share "Settings". Combined row with merged finding counts.
- **M8 — knowledge rensing:** replaced "Keep CLAUDE.md under 200 lines" in `knowledge/configuration-best-practices.md` with cache-stability guidance (first 30 lines stable, volatile content below the cache threshold). Footnote explains the 200-line rule was a Sonnet-era adherence heuristic; Opus 4.7 uses prompt-cache structure as the dominant cost lever. Cross-references `knowledge/opus-4.7-patterns.md`.
- **`commands/tokens.md` next-steps:** documents `--with-telemetry-recipe` as the cache-verification path.
- **`.gitignore`** — unignore rules for `tests/fixtures/**/node_modules/` so the `mcp-tool-heavy` fixture stays under version control.
### Removed
- **F4 — TOK hotspot padding loop and `take` dead-code.** Hotspots may now contain fewer than 3 entries for tiny projects (the honest answer); contract still bounds at ≤ 10.
- **F5 — Pattern D / `CA-TOK-004` (sonnet-era signature).** Catalogue entry removed from `knowledge/opus-4.7-patterns.md` and `commands/tokens.md`. Suppression entries for `CA-TOK-004` are now no-ops.
### Breaking changes
- **F2 — MCP token estimates jump from flat 15 to ≥ 500.** Token Efficiency grades for projects with MCP servers may shift. `whats-active` totals report higher numbers. Documented in `commands/posture.md` next-steps.
- **F3 — `scoreByArea` is severity-weighted.** Posture JSON consumers reading `areas[*].score` will see different values for non-clean configs. Use `result.scoringVersion === 'v5'` to detect the change. Drift comparisons across v4↔v5 baselines may show artificial deltas — re-baseline after upgrade.
- **F5 — Pattern D / `CA-TOK-004` no longer emitted.** Existing exact `CA-TOK-004` suppression entries are harmless but obsolete.
- **N1 suppression backward-compat — `CA-TOK-*` glob now also matches `CA-TOK-005`.** To preserve prior behavior of suppressing only patterns A/B/C, replace the glob with explicit IDs:
```
CA-TOK-001
CA-TOK-002
CA-TOK-003
```
A one-time runtime warning for this case is a v5.0.1 candidate.
- **Posture areas count: 9 → 10** (Plugin Hygiene from COL). Consumers hard-coding 9 must update.
### Migration notes
- `CA-TOK-*` glob suppressions: explicit-ID list recommended if CA-TOK-005 should not be suppressed.
- `CA-TOK-004` exact-ID suppression entries: safe to remove.
- Drift baselines created against v4 should be re-saved post-upgrade to avoid artificial F3 weighting deltas.
- Posture JSON consumers must update any hardcoded `areas.length === 8` or `=== 9` assertions to `>= 10`.
- New test files: `tests/scanners/manifest.test.mjs`, `tests/scanners/cache-prefix.test.mjs`, `tests/scanners/disabled-in-schema.test.mjs`, `tests/scanners/collision.test.mjs`, `tests/scanners/accurate-tokens.test.mjs`.
### Notes
- **`mock.method` against ESM module exports does not work** (Node 18+ ESM read-only export bindings). v5 tests use `globalThis.fetch` mocking for `--accurate-tokens` instead — equivalent coverage at the actual external-dependency boundary.
- **Plugin-vs-built-in collision detection is intentionally not implemented.** Step 22a research spike (`docs/v5-namespace-research.md`, gitignored) could not verify Claude Code's resolution behavior when a plugin command shares a name with a built-in. Treated as info-only; v5.0.1 candidate.
- **README/CLAUDE.md badge reconciliation** done in Step 28 (this release). `self-audit --check-readme` PASSES against the filesystem. Test count counter switched from file-count to test-case count via subprocess `node --test` parse.
- **`hotspot.path` exposed on file-backed hotspots** (Step 30 fix). The rc.1 `--accurate-tokens` implementation looked up `hotspot.path` but the scanner only emitted `source`. File-backed hotspots now carry `path` (absolute path); MCP-server hotspots leave it unset (they are virtual entries representing runtime tool-schema cost, not file content).
### SC-6b release-gate result (verified 2026-05-01)
- **PASS — 0.85% under-estimation against real `count_tokens` API.**
- Fixture: `tests/fixtures/marketplace-large/`. Top-3 hotspots = 1 file-backed (`CLAUDE.md`) + 2 MCP virtuals. MCP entries skipped per design (no readable content; their tokens are formula-based at 500 + toolCount × 200).
- `CLAUDE.md` actual: 589 tokens (Anthropic `count_tokens`, `claude-opus-4-7`). Estimated: 594 tokens (byte heuristic at 4 bytes/token via `estimateTokens`). Delta: **−5 tokens, −0.85%** — well within the ±5% gate.
- No tuning of `estimateTokens` heuristic required for v5.0.0.
## [5.0.0-rc.1] - 2026-05-01
### Summary
Release candidate for v5.0.0 — knowledge rensing and tokenizer calibration. Three deliverables: M8 (Sonnet-era → Opus 4.7 best-practices rewrite), M7 (cache-telemetry recipe in `knowledge/` plus an opt-in CLI flag), and N5 (`--accurate-tokens` API calibration via Anthropic's `count_tokens` endpoint).
### Added
- **N5 — `--accurate-tokens` flag** on `scanners/token-hotspots-cli.mjs`. When `ANTHROPIC_API_KEY` is set, the CLI calls Anthropic's `count_tokens` endpoint for the top 3 hotspots and populates `output.calibration = { actual_tokens, source: 'count_tokens_api', sampled_hotspots: 3 }`. When the key is absent, `calibration = { skipped: 'no-api-key' }` and a stderr warning is emitted. Designed for the manual SC-6b release-gate verification, not routine use.
- **`scanners/lib/tokenizer-api.mjs`** — wrapper around `count_tokens` with a 5-second AbortController timeout, exponential-backoff retry on HTTP 429 (max 3 retries: 1s, 2s, 4s), and required headers (`x-api-key`, `anthropic-version: 2023-06-01`, `content-type`). API key is masked to `${key.slice(0,8)}...` in every error message and every thrown error; non-429 HTTP errors throw status code only — response body is never included (it may echo the key on auth failures). `maskKey()` is exported for callers that need safe logging.
- **M7 — `knowledge/cache-telemetry-recipe.md`** (new). Manual `jq` recipe for verifying prompt-cache hit rate from Claude Code session transcripts (`~/.claude/projects/<slug>/*.jsonl`). Sums `cache_read_input_tokens` and `cache_creation_input_tokens` per turn and reports a hit-rate ratio. Recipe-form (not bundled scanner) keeps the project's "no transcript-parsing as core feature" non-goal intact while giving users a runtime escape hatch.
- **M7 — `--with-telemetry-recipe` flag** on the same CLI. When passed, emits `telemetry_recipe_path` in the JSON output pointing to the recipe file. Without the flag, output is unchanged. Committed as a default deliverable, opt-in at invocation time.
### Changed
- **M8 — knowledge-base rensing:** replaced the "Keep CLAUDE.md under 200 lines" rule in `knowledge/configuration-best-practices.md` with cache-stability guidance (first 30 lines stable, volatile content below the cache threshold). Added a footnote that the 200-line rule was a Sonnet-era adherence heuristic; Opus 4.7 uses prompt-cache structure as the dominant cost lever. Cross-references `knowledge/opus-4.7-patterns.md`.
- **`commands/tokens.md` next-steps:** documents `--with-telemetry-recipe` as the cache-verification path after a structural fix.
- New file: `tests/scanners/accurate-tokens.test.mjs`. No new fixtures (re-uses `marketplace-large`).
### Notes
- **SC-6b release gate is NOT closed by these commits.** Step 26's tests use mocked `globalThis.fetch` to verify the integration contract; ±5% accuracy against real `count_tokens` requires a live API key and must be verified manually before tagging v5.0.0 in Session 5.
- The plan's specified `mock.method(tokenizerApi, 'callCountTokensApi', ...)` pattern collides with ESM read-only export bindings in Node 18+. Tests mock at the `globalThis.fetch` boundary instead — equivalent coverage, no module-export rebinding required.
- README/CLAUDE.md badge counts and `plugin.json` version still target v4.0.0; Step 28+29 will sync those during the release wrap.
- `[skip-docs]` tag on the N5 feat commit; M7 and M8 are `docs(...)` commits and don't need it.
## [5.0.0-beta.1] - 2026-05-01
### Summary
First v5.0.0 beta — new scanners. Five new finding sources land: MCP tool-schema budget (CA-TOK-005), system-prompt manifest CLI/command (`/config-audit manifest`), cache-prefix stability (CPS), disabled-tools-still-in-schema (DIS), and cross-plugin/user-vs-plugin skill collision (COL/CA-COL-001). Plugin Hygiene becomes a 10th area-scorecard column.
### Added
- **N1 — `CA-TOK-005` MCP tool-schema budget:** per-server tiered finding inside the TOK scanner. Thresholds — `< 20` no finding, `20–49` low, `50–99` medium, `100+` high; `null` (manifest unparseable) low + "tool count unknown" message. Scoped to project-local `.mcp.json` to keep `/config-audit <path>` actionable. Recommendation links to the Step 25 cache-telemetry recipe.
- **N2 — `/config-audit manifest`:** new slash command + `scanners/manifest.mjs` CLI. Renders a single ranked table of every token source (CLAUDE.md cascade, plugins, skills, MCP servers, hooks) sorted DESC by `estimated_tokens`. Reuses `readActiveConfig`; CLAUDE.md per-file tokens are distributed proportional to bytes.
- **N3 — CPS scanner (`CA-CPS-NNN`):** Cache-Prefix Stability Analyzer. Walks the CLAUDE.md cascade and flags volatile content between lines 31 and 150 — beyond TOK Pattern A's top-30 territory. Volatile-pattern set extends Pattern A with shell-exec lines (`!` prefix) and `${VAR}` substitutions. Severity medium per finding. Skips lines 1–30 (Pattern A's range).
- **N4 — DIS scanner (`CA-DIS-NNN`):** Disabled-In-Schema Detector. Detects tools that appear in BOTH `permissions.deny` and `permissions.allow` within the same `settings.json`. The deny list wins, so allow entries are dead config but still load every turn. Tool identity is the bare name (everything before `(`); `Bash(npm:*)` and `Bash` are treated as the same tool. Severity low.
- **N6 — COL scanner (`CA-COL-001`):** Cross-Plugin Skill Collision detector. Plugin-vs-plugin same skill name → low. User-vs-plugin same skill name → medium. Findings carry `details.namespaces` array with `{source, name, path}` for every conflicting source.
- **`details` field on findings:** `output.mjs:finding()` helper now passes through optional `details` for scanner-specific structured payloads (used by COL).
- **"Plugin Hygiene" area** (10th in scorecard): COL contributes here. Posture JSON now reports 10 areas instead of 9.
### Changed
- **`scoreByArea` deduplicates by area name:** when multiple scanners share an area (TOK + CPS → "Token Efficiency", SET + DIS → "Settings"), they produce one combined row with merged finding counts. Existing 9-area contract preserved for non-Plugin-Hygiene areas.
### Known breaking changes
- **Suppression backward-compat — `CA-TOK-*` glob now also matches `CA-TOK-005`.** Existing `.config-audit-ignore` entries that suppress TOK findings via the `CA-TOK-*` glob will silently include CA-TOK-005 (MCP budget). To preserve the prior behavior of suppressing only patterns A/B/C, replace the glob with explicit IDs:
```
CA-TOK-001
CA-TOK-002
CA-TOK-003
```
A one-time runtime warning for this case is out of scope for v5.0.0 — it is a candidate for v5.0.1.
- **Plugin-vs-built-in collision is intentionally not implemented.** The Step 22a research spike could not verify Claude Code's resolution behavior when a plugin command shares a name with a built-in (`/help`, `/clear`, `/init`, `/review`, `/config`, `/cost`, `/security-review`). Treated as info-only in this release; a follow-up v5.0.1 ticket may add an opt-in check.
- `[skip-docs]` tag used on every feat commit — README/CLAUDE.md badge counts (scanner count, command count, test count) and the architecture sections are intentionally fenced off until Session 5 (Step 28). This keeps the v5 plan's session boundaries clean even when the Forgejo `pre-commit-docs-gate` hook would otherwise block these commits.
## [5.0.0-alpha.2] - 2026-05-01
### Summary
Second v5.0.0 alpha — structural gaps + README self-audit. TOK pattern severities recalibrated for tokens/turn impact (F7), three new findings cover settings/skills/cascade structure (M2, M4, M6), MCP tool-count detection wired (M1), HKV gains a verbose-output check (M5), and self-audit grows a `--check-readme` flag (F6).
### Added
- **F7 — TOK severity recalibration:** Pattern A (cache-breaking volatile top) `medium → high`, Pattern B (redundant permissions) `low → medium`, Pattern C (deep imports) `medium → low`. Each finding now carries a `calibration_note` evidence field documenting the heuristic basis.
- **M6 — `additionalDirectories` settings key:** added to `KNOWN_KEYS` so it no longer trips "unknown settings key". New low-severity finding when `additionalDirectories.length > 2`.
- **M4 — TOK Pattern E:** medium-severity finding when `activeConfig.claudeMd.estimatedTokens > 10_000` — flags cascades that bleed budget every turn.
- **M2 — TOK Pattern F:** low-severity finding for project-local `SKILL.md` whose frontmatter `description` exceeds 500 characters (description loads on every turn even when the body does not). Scoped to `discovery.files`; user/plugin skills out of project scope are not flagged.
- **M1 — MCP tool-count detection:**`readActiveMcpServers` now resolves tool count via cache → `node_modules/<pkg>/package.json` → `{toolCount: null, toolCountUnknown: true}` fallback. Tool count drives `estimateTokens` per server.
- **M5 — HKV verbose hook output:** new low-severity finding when a referenced hook script contains > 50 `console.log` / `process.stdout.write` lines (static heuristic, no execution).
- `result.readmeCheck.passed === true` is **not** required during alpha/beta phases. The real plugin's own check is currently red (`scanners` 10 vs README 9, `tests` 31 vs README 543) — reconciliation deferred to Session 5 Step 28 (README sync).
- `[skip-docs]` tag used on every commit — README/CLAUDE.md badge counts and architecture text are intentionally fenced off until Session 5.
## [5.0.0-alpha.1] - 2026-05-01
### Summary
First v5.0.0 alpha — token-economy round, F1-F5. The TOK scanner now consumes `readActiveConfig` (per-MCP-server hotspots, claudeMd cascade tokens), severity weighting replaces flat finding counts in `scoreByArea`, and MCP servers no longer estimate at a flat 15 tokens. Pattern D (CA-TOK-004 sonnet-era signature) removed — too noisy, not actionable.
### Added
- **`'mcp'` kind for `estimateTokens`** (F2): an active MCP server now estimates ≥ 500 tokens (base protocol + schema overhead) instead of the v4 flat 15. Optional `{toolCount}` raises the estimate to `500 + toolCount * 200` once Step 14 wires tool-count detection.
- **TOK ↔ readActiveConfig integration** (F1): the TOK scanner emits one hotspot per active MCP server, sums their tokens into `total_estimated_tokens`, and exposes `result.activeConfig` (claudeMd cascade tokens, mcpServerCount, pluginCount, skillCount).
- **`scoringVersion: 'v5'`** field on `scoreByArea` output for cross-version drift detection.
- **`WEIGHTS`** named export from `scanners/lib/severity.mjs` (`Object.freeze`).
### Changed
- **BREAKING (intentional, F3):**`scoreByArea` is now severity-weighted. Penalty = `Σ count[s] * WEIGHTS[s]`; `passRate = max(0, 100 - penalty / max(10, findingCount * 4) * 100)`. Lows no longer crater an area's grade; a single high or critical consumes a large fraction of budget. `baseline-all-a` fixture remains all-A (no critical/high on that fixture).
- **BREAKING (intentional, F2):** MCP server token estimates jump from a flat 15 to ≥ 500. `whats-active` totals and TOK hotspots will report higher numbers for any project with active MCP servers.
- **BREAKING (intentional, F5):** Pattern D / `CA-TOK-004` (sonnet-era signature) is no longer emitted. Suppression entries for `CA-TOK-004` are now no-ops; downstream tools that filter on the ID should drop it. The catalogue entry was removed from `knowledge/opus-4.7-patterns.md` and `commands/tokens.md`.
- **Hotspots contract (F4):** the v4 padding loop and `take` dead-code are gone. Hotspots may now contain fewer than 3 entries for tiny projects (the honest answer); contract still bounds at ≤ 10.
### Migration notes
- `CA-TOK-*` glob suppression entries continue to suppress 001-003. Existing exact `CA-TOK-004` entries are harmless but obsolete — remove them at convenience.
- Posture/JSON consumers reading `areas[*].score` will see different values for non-clean configs. Use `result.scoringVersion === 'v5'` to detect.
- New fixture `tests/fixtures/tok-active-config/` — minimal repo with `.mcp.json` (2 servers), `CLAUDE.md`, plugin skeleton.
## [4.0.0] - 2026-04-19
### Summary
Opus 4.7 era upgrade. New TOK scanner detects token-efficiency anti-patterns (cache-breaking volatile content, redundant tool permissions, deep import chains, sonnet-era minimal setups). Token Efficiency joins the quality scorecard as the 8th area. Scanner-agent and verifier-agent migrate from haiku → sonnet per global no-haiku policy.
### Added
- **`token-hotspots.mjs`** scanner (CA-TOK-001..004) — 4 patterns aligned with Opus 4.7 token-cost dynamics:
- CA-TOK-001 cache-breaking volatile content (timestamps/UUIDs in top 30 lines of CLAUDE.md)
- CA-TOK-002 redundant tool permissions (duplicate or subset overlaps)
- CA-TOK-003 deep @import chains (>2 hops on the load path)
- CA-TOK-004 sonnet-era minimal setup (no skills/MCP/hooks/managed/plugins)
- **Token Efficiency** as the 8th quality area in the posture scorecard (now 9 scanners total: CML/SET/HKV/RUL/MCP/IMP/CNF/GAP/TOK).
- `id` field on every area in the scorecard payload (`token_efficiency`, `instruction_clarity`, etc.) for stable downstream lookup.
- 13 new TOK scanner tests + 3 CLI tests + posture grade-stability test for `token_efficiency`.
- Knowledge refresh: `knowledge/opus-4.7-patterns.md`, plus 2026-04 deltas (v2.1.83–v2.1.111) added to `feature-evolution.md`, `claude-code-capabilities.md`, and `hook-events-reference.md` from `research/03-claude-code-changes-config-surfaces.md`.
### Changed
- **BREAKING (additive surface):** Quality areas count 7 → 8. Posture JSON consumers that hard-coded 7 areas must update.
- **BREAKING (model migration):**`scanner-agent` and `verifier-agent` migrated `haiku` → `sonnet`. Latency and cost trade-offs accepted; deterministic scanner CLIs preferred over agent invocations.
New read-only command `/config-audit whats-active` — shows exactly what Claude Code loads for a given repo, with token estimates.
### Added
- **`/config-audit whats-active [path]`** — inventory of active plugins, skills, MCP servers, hooks, and CLAUDE.md cascade for a repo, with source attribution (user/project/plugin) and rough token estimates. Read-only, <2s.
How this marketplace is maintained, what you can expect from upstream, and how it's meant to be used.
## TL;DR
- Solo-maintained, AI-assisted development, MIT licensed.
- **Fork-and-own is the default model.** Upstream is a starting point, not a vendor.
- Issues welcome as signals. Pull requests are not accepted — see [Why no PRs](#pull-requests--no).
- No SLA. Best-effort bug fixes and security advisories. Breaking changes happen and are noted in each plugin's CHANGELOG.
---
## Can I trust this?
Be honest with yourself about what you're adopting:
- **One maintainer.** If I get hit by a bus, the bus wins. The repos stay up under MIT, but no one owes you a fix.
- **AI-generated code with human review.** Every plugin is built through dialog-driven development with Claude Code. I read, test, and judge the output before it ships, but I'm not auditing every line the way a security firm would. Treat it accordingly.
- **No commercial interests.** I'm not selling a SaaS, not steering you toward a paid tier, not collecting telemetry. The plugins run locally in your Claude Code installation.
- **MIT licensed.** Fork it, modify it, ship it under your own name.
If you work somewhere that needs vendor accountability, support contracts, or signed assurances — **this isn't that.** Use it as a reference implementation, fork it into your own organization, and own the result.
---
## How this is meant to be used
### Fork-and-own
The intended workflow:
1. **Fork** the marketplace (or a single plugin) into your own organization or namespace.
2. **Tailor** it to your context — terminology, integrations, cycle lengths, regulatory framing, whatever doesn't fit out of the box.
3. **Maintain it yourself.** Treat your fork as the canonical version for your team.
4. **Watch upstream selectively.** Cherry-pick changes that help, ignore changes that don't. There's no obligation to stay in sync.
This isn't a workaround for not accepting PRs. It's the actual recommended adoption pattern, especially for plugins like `okr` and `ms-ai-architect` where every Norwegian public sector organization will need its own tildelingsbrev mappings, terminology, and integrations. A central "one true plugin" would be wrong for everyone.
### What to change first when you fork
Each plugin differs, but the common edits are:
- **Identity** — rename the plugin, replace authorship, update README.
- **External integrations** — issue trackers, knowledge bases, dashboards, observability backends. The plugins ship as starting points, not pre-wired. Every organization must configure its own integrations.
- **Norwegian-specific framing** — relevant for `okr` and `ms-ai-architect`. Other plugins are jurisdiction-neutral. Rewrite for your jurisdiction if you're outside Norway.
- **Reference docs** — the knowledge base in each plugin reflects my reading. Replace with your organization's authoritative sources.
- **Hooks and policies** — security thresholds, blocked commands, and audit gates are tuned to my taste. Tune them to yours.
### Staying current with upstream
If you want to pull in upstream changes later:
- **Cherry-pick, don't merge.** Each plugin moves independently and breaking changes land without ceremony.
- **Read the CHANGELOG first.** Every plugin has one.
- **Keep your customizations in clearly-named files.** The harder upstream is to merge cleanly, the more painful staying current becomes. A `local/` directory or `*.local.md` convention helps.
---
## What upstream provides
| | What I do | What I don't |
|---|---|---|
| **Bug fixes** | Best-effort when I notice or get a clear report | No SLA, no triage commitment |
| **Security issues** | Investigate within reasonable time, document in CHANGELOG | No CVE process, no embargo coordination |
| **New features** | When they fit my own usage | Not on request |
| **Norwegian public sector context** | Kept current as long as the project lives | If I lose interest or change jobs, the framing freezes |
| **Breaking changes** | Documented in CHANGELOG | They happen — version pin if you need stability |
| **Compatibility** | Tracked against current Claude Code releases | No long-term support branches |
If any of this is a dealbreaker — fork now, version-pin, and stop reading upstream.
---
## How to contribute
### Issues — yes, please
Issues are the most valuable thing you can send me:
- **Bug reports** with reproduction steps. Even a screenshot helps.
- **Use-case feedback.** "I tried to use this in my organization and X didn't fit" is genuinely useful, even if I can't fix it for you.
- **Pointers to better sources.** If you know a DFØ veileder, an NSM guideline, or an academic paper that contradicts what's in a knowledge base, tell me.
- **Security findings.** See each plugin's `SECURITY.md` for disclosure preference where one exists; otherwise email rather than open a public issue.
### Pull requests — no
This is deliberate, not laziness:
- **Solo review is a bottleneck.** Honest PR review takes me longer than rewriting from scratch. The math doesn't work.
- **Forks are where the value is.** The fork-and-own model means upstream consolidation isn't the point. Your organization's adaptations belong in your fork, not mine.
- **AI-generated code complicates provenance.** Every line here is produced through dialog with Claude Code, with me as the judge. Mixing in PRs from contributors with different processes and licensing assumptions creates a mess I'd rather not untangle.
If you've built something useful on top of a fork, **publish it under your own name and link back.** I'll happily list notable forks here once they exist.
### Notable forks
*(To be populated as forks emerge. If you've forked one of these plugins for production use, open an issue and I'll add a link.)*
---
## Relationship between plugins
These plugins are **independent**. Install one without the others, fork one without the others. They share conventions (slash command naming, hook patterns, AI-generated disclosure) but no runtime dependencies.
The marketplace is a **catalog**, not a suite. Don't fork the whole repo unless you actually want to maintain everything.
---
## Versioning and stability
- **Semantic versioning per plugin.** Each plugin has its own `CHANGELOG.md` and version number.
- **Breaking changes happen.** I bump the major version when they do, but I don't run an LTS branch.
- **Pin your version.** If stability matters more than features, install a specific version and stay there until you choose to upgrade.
---
## Public sector adoption notes
For Norwegian etater specifically:
- **DPIA-relevant data flows are documented in the relevant plugin README where applicable.** Read them before installation.
- **No data leaves your machine** beyond what Claude Code itself sends to Anthropic. The plugins themselves do not call external services unless you configure an integration.
- **Drøftingsplikt and ledelsesansvar** are not replaced by these tools. The `okr` plugin coaches; it does not decide. The `ms-ai-architect` plugin advises; it does not approve.
- **Choose your Claude deployment carefully.** claude.ai vs. API direct vs. Bedrock in EU region have different data residency profiles. The plugins don't choose for you.
---
## License
MIT for all plugins in this marketplace. See each plugin's `LICENSE` file.
> Know if your configuration is correct. Find what could improve it. Fix it automatically.
> Know if your configuration is correct. Find what could improve it. Fix it automatically.
*Built for my own Claude Code workflow and shared openly for anyone who finds it useful. This is a solo project — bug reports and feature requests are welcome, but pull requests are not accepted.*
> **Solo-maintained, fork-and-own.** This plugin is a starting point, not a vendor product. Issues are welcome as signals; pull requests are not accepted. See [GOVERNANCE.md](GOVERNANCE.md) for the full model and what upstream provides.
A Claude Code plugin that checks configuration health, suggests context-aware improvements, and auto-fixes issues — `CLAUDE.md`, `settings.json`, hooks, rules, MCP servers, `@imports`, and plugins. 7 quality scanners for correctness, context-aware feature recommendations, auto-fix with backup/rollback. Zero external dependencies.
A Claude Code plugin that checks configuration health, suggests context-aware improvements, and auto-fixes issues — `CLAUDE.md`, `settings.json`, hooks, rules, MCP servers, `@imports`, and plugins. 12 deterministic scanners across 10 quality areas, context-aware feature recommendations, auto-fix with backup/rollback, an Opus-4.7-aware Token Hotspots scanner with optional API-calibrated `--accurate-tokens` mode, plus cache-prefix stability, dead-tool, and cross-plugin collision detection. Zero external dependencies.
- [What This Plugin Does Not Cover](#what-this-plugin-does-not-cover)
- [What This Plugin Does Not Cover](#what-this-plugin-does-not-cover)
- [Version History](#version-history)
- [Version History](#version-history)
@ -37,13 +45,66 @@ A Claude Code plugin that checks configuration health, suggests context-aware im
---
---
## What's New in v5.1.0
**Plain-language UX humanizer** — every command's default output now leads with prose. Findings are grouped by what they mean for the user (Configuration mistake, Conflict, Wasted tokens, Missed opportunity, Dead config) and led with an urgency phrase (Fix this now, Fix soon, Fix when convenient, Optional cleanup, FYI). Technical IDs (`CA-CML-001`, `CA-TOK-005`, …) still appear, but at end-of-line where they belong as references rather than headlines.
| Fix when convenient | Real issue but not urgent |
| Optional cleanup | Tidy-up that improves polish but isn't required |
| FYI | Informational; no action expected |
| Configuration mistake | A configuration file has an error or omission |
| Conflict | Two configuration sources disagree |
| Wasted tokens | Configuration is loading content that costs tokens without payback |
| Missed opportunity | A Claude Code feature you aren't using that could help your project |
| Dead config | Configuration that has no effect (e.g., a permission that's also denied) |
### Backwards compatibility — the `--raw` flag
Every CLI accepts `--raw` for byte-stable v5.0.0 verbatim output (technical IDs, raw severity, no prose translation). `--json` is unchanged from v5.0.0 — already byte-stable for programmatic consumption. Use `--raw` only if you've built tooling against v5.0.0 stderr scrapes; for new automation, prefer `--json`.
- All scanner internals (12 scanners + standalone PLH) emit the same finding IDs and structural data — humanization happens at output-formatting time only
- `--json` envelope shape is byte-stable with v5.0.0 (humanizer fields are additive on findings only in default mode; the `--json` path bypasses humanization entirely)
Claude Code reads instructions from at least 7 different file types across multiple scopes: `CLAUDE.md`, `settings.json`, `.claude/rules/`, `hooks.json`, `.mcp.json`, `.claudeignore`, and `settings.local.json`. Each can exist at project level, user level, or both. Plugins add more. The system is powerful — but nobody tells you what you're using wrong, what you're missing, or what's silently conflicting.
Claude Code reads instructions from at least 7 different file types across multiple scopes: `CLAUDE.md`, `settings.json`, `.claude/rules/`, `hooks.json`, `.mcp.json`, `.claudeignore`, and `settings.local.json`. Each can exist at project level, user level, or both. Plugins add more. The system is powerful — but nobody tells you what you're using wrong, what you're missing, or what's silently conflicting.
This plugin provides three layers of configuration intelligence:
This plugin provides three layers of configuration intelligence:
- **Health** — 7 deterministic scanners verify correctness across every configuration file, catching broken imports, deprecated settings, conflicting rules, format errors, and permission contradictions
- **Health** — 12 deterministic scanners verify correctness across every configuration file, catching broken imports, deprecated settings, conflicting rules, format errors, permission contradictions, Opus-4.7-era token waste, cache-prefix instability, dead tool grants, and cross-plugin skill collisions
- **Opportunities** — context-aware recommendations for Claude Code features that could benefit your specific project, backed by Anthropic's official guidance
- **Opportunities** — context-aware recommendations for Claude Code features that could benefit your specific project, backed by Anthropic's official guidance
- **Action** — auto-fix with mandatory backups, syntax validation, rollback support, and a human-in-the-loop workflow for anything non-trivial
- **Action** — auto-fix with mandatory backups, syntax validation, rollback support, and a human-in-the-loop workflow for anything non-trivial
@ -128,18 +189,18 @@ Also **Grade A** — with only 3 opportunities remaining. This project has CLAUD
### Installation
### Installation
Clone from the public repository:
Add the marketplace and browse plugins with `/plugin`:
| `/config-audit whats-active` | Read-only inventory of plugins, skills, MCP, hooks, CLAUDE.md active for a repo (with token estimates) |
| `/config-audit discover` | Run discovery phase only |
| `/config-audit discover` | Run discovery phase only |
| `/config-audit analyze` | Run analysis phase only |
| `/config-audit analyze` | Run analysis phase only |
| `/config-audit interview` | Set preferences for action plan _(optional)_ |
| `/config-audit interview` | Set preferences for action plan _(optional)_ |
@ -263,13 +327,13 @@ Your team configuration changes over time. Track it:
### Scope
### Scope
By default, `/config-audit` auto-detects scope from your git context. Override with: `/config-audit current`, `/config-audit repo`, `/config-audit home`, `/config-audit full`.
By default, `/config-audit` auto-detects scope from your git context. Override with: `/config-audit current`, `/config-audit repo`, `/config-audit home`, `/config-audit full`. Use `--delta` for incremental scanning (only new/changed findings).
---
---
## Deterministic Scanners
## Deterministic Scanners
8 Node.js scanners that perform structural analysis an LLM cannot reliably do: schema validation, circular reference detection, import resolution, conflict detection across scopes. Zero external dependencies.
12 Node.js scanners that perform structural analysis an LLM cannot reliably do: schema validation, circular reference detection, import resolution, conflict detection across scopes, Opus-4.7-aware token-cost analysis, cache-prefix stability, dead-tool detection, and cross-plugin skill collisions. Plus a standalone plugin-health scanner. Zero external dependencies.
**Why deterministic?** LLMs are powerful at understanding intent and context. But they cannot reliably validate JSON schemas, detect circular `@import` chains, or catch that your global `settings.json` contradicts your project-level one. These scanners fill that gap — fast, repeatable, and zero false positives on structural issues.
**Why deterministic?** LLMs are powerful at understanding intent and context. But they cannot reliably validate JSON schemas, detect circular `@import` chains, or catch that your global `settings.json` contradicts your project-level one. These scanners fill that gap — fast, repeatable, and zero false positives on structural issues.
@ -283,6 +347,10 @@ By default, `/config-audit` auto-detects scope from your git context. Override w
| `cache-prefix-scanner.mjs` | CPS | Volatile content in lines 31–150 of the CLAUDE.md cascade — beyond the cache-prefix window but still re-loaded every turn |
| `disabled-in-schema-scanner.mjs` | DIS | Tools listed in BOTH `permissions.deny` and `permissions.allow` — deny wins, allow entries are dead config |
| `collision-scanner.mjs` | COL | Cross-plugin skill name collisions; user-vs-plugin overlaps |
### CLI Tools
### CLI Tools
@ -290,11 +358,14 @@ All tools work standalone — no Claude Code session needed:
Skills activate automatically when your question matches their trigger patterns.
---
## Suppressions
## Suppressions
### Finding ID Format
Every finding has a unique ID: `CA-{SCANNER}-{NNN}` — where `{SCANNER}` is the scanner prefix (see table above) and `{NNN}` is a sequential number. Examples: `CA-CML-001`, `CA-SET-003`, `CA-HKV-002`, `CA-RUL-005`.
### Suppression
Some findings are expected — maybe you intentionally have a large CLAUDE.md, or a feature gap doesn't apply to your workflow. Create a `.config-audit-ignore` file to suppress them:
Some findings are expected — maybe you intentionally have a large CLAUDE.md, or a feature gap doesn't apply to your workflow. Create a `.config-audit-ignore` file to suppress them:
The plugin runs all 8 scanners on itself via `self-audit.mjs`. Current result: **Grade A, score 98, 0 real findings.** Test fixtures and example files are automatically excluded from scoring — a security plugin that ships deliberately broken examples shouldn't fail its own audit.
The plugin runs all 12 scanners + the standalone plugin-health scanner on itself via `self-audit.mjs`. Test fixtures and example files are automatically excluded from scoring — a configuration plugin that ships deliberately broken examples shouldn't fail its own audit. Use `--check-readme` to verify badge counts are in sync with the filesystem.
```bash
```bash
node scanners/self-audit.mjs
node scanners/self-audit.mjs
@ -395,6 +482,75 @@ node scanners/self-audit.mjs
---
---
## Scanner Library (`scanners/lib/`)
Shared modules used by all scanners — useful if you're reading the source or extending the plugin:
- **Plugin CLAUDE.md in node_modules** — these should be excluded via scope to avoid false positives
---
## Data Storage & Safety Guarantees
## Data Storage & Safety Guarantees
### Where Data Lives
### Where Data Lives
@ -442,6 +598,10 @@ This plugin is cautious by design — configuration files are important, and a b
| Version | Date | Highlights |
| Version | Date | Highlights |
|---------|------|-----------|
|---------|------|-----------|
| **5.1.0** | 2026-05-01 | Plain-language UX humanizer. Default output of all 18 commands now leads with prose; findings grouped by user-impact category (Configuration mistake, Conflict, Wasted tokens, Missed opportunity, Dead config) and led by urgency phrase (Fix this now → FYI). New `--raw` flag preserves v5.0.0 verbatim output for tooling that scrapes stderr; `--json` is unchanged and byte-stable. New scanner-lib modules: `humanizer.mjs`, `humanizer-data.mjs` with TRANSLATIONS for 13 scanner prefixes. Self-audit terminal output also humanized. 792 tests (+157 humanizer-tester) |
| **5.0.0** | 2026-05-01 | Reality-based token-optimization. 3 new scanners (CPS cache-prefix, DIS dead tools, COL plugin collisions) → 12 deterministic scanners. New `/config-audit manifest` and `--accurate-tokens` API calibration. Severity-weighted scoring (`scoringVersion: 'v5'`). MCP token estimates 15 → 500+. Plugin Hygiene as 10th quality area. Knowledge: cache-stability replaces 200-line rule, cache-telemetry recipe. **Breaking:** F2 token magnitude jump, F3 severity weighting, F5 Pattern D removed, N1 `CA-TOK-*` glob now matches CA-TOK-005. 635 tests |
| **4.0.0** | 2026-04-19 | Opus 4.7 era: new TOK scanner (cache-breaking volatile content, redundant tool permissions, deep import chains, sonnet-era setups), `/config-audit tokens` command, Token Efficiency 8th quality area, scanner-agent + verifier-agent migrated haiku → sonnet. 543 tests |
| **3.1.0** | 2026-04-14 | New `/config-audit whats-active` — read-only inventory of active plugins, skills, MCP, hooks, CLAUDE.md for a repo, with token estimates. 522 tests |
@ -27,12 +27,23 @@ Analyze all discovered configuration files to:
You will receive:
You will receive:
1. Session ID with findings in `~/.claude/config-audit/sessions/{session-id}/findings/`
1. Session ID with findings in `~/.claude/config-audit/sessions/{session-id}/findings/`
2. Scope configuration from `~/.claude/config-audit/sessions/{session-id}/scope.yaml`
2. Scope configuration from `~/.claude/config-audit/sessions/{session-id}/scope.yaml`
3. Scanner JSON envelope (if available) from scan-orchestrator.mjs
3. Scanner JSON envelope (if available) from scan-orchestrator.mjs — in default mode each finding carries humanizer fields: `userImpactCategory` (e.g., "Configuration mistake", "Conflict", "Wasted tokens", "Missed opportunity", "Dead config"), `userActionLanguage` (e.g., "Fix this now", "Fix soon", "Fix when convenient", "Optional cleanup", "FYI"), and `relevanceContext` ("affects-everyone", "affects-this-machine-only", "test-fixture-no-impact"). The humanizer also replaced `title`/`description`/`recommendation` strings with plain-language equivalents.
4. Knowledge base at `{CLAUDE_PLUGIN_ROOT}/knowledge/` for best practices and anti-patterns
4. Mode flag — when `$RAW_FLAG` is `--raw`, the envelope is v5.0.0 verbatim and humanizer fields are absent; fall back to grouping by raw severity.
5. Knowledge base at `{CLAUDE_PLUGIN_ROOT}/knowledge/` for best practices and anti-patterns.
## Humanizer-aware rendering rules
- **Render the humanizer's `title`/`description`/`recommendation` verbatim.** Do not paraphrase. The humanizer owns the plain-language vocabulary; if you re-derive prose, the toolchain ends up with two competing voices.
- **Group findings by `userImpactCategory`.** This replaces severity-bucket grouping in the report. The categories are pre-translated — do not invent your own bucket names.
- **Lead each finding line with `userActionLanguage`.** This replaces raw severity prefiks ("critical", "high", "medium") in the report. Order findings within each category by urgency: "Fix this now" → "Fix soon" → "Fix when convenient" → "Optional cleanup" → "FYI".
- **Surface `relevanceContext` when it isn't `affects-everyone`.** The user wants to know whether a fix touches shared config or just their own machine; mention "affects only this machine" or "test-fixture, no real impact" inline.
- **Do not include "explain what X means" subroutines.** Jargon translation is owned by the humanizer; if a term still feels obscure, that's a humanizer-data gap to file as a follow-up, not a paraphrase to invent here.
In `--raw` mode, fall back to v5.0.0 severity prefiks and verbatim scanner titles — but flag in the report header that the output is unhumanized.
## Task
## Task
1. **Load all findings**: Read all `*.yaml` files from findings directory
1. **Load all findings**: Use the Read tool on all `*.yaml` files from findings directory
1.5. **Load scanner results**: If a scanner JSON envelope exists in the session directory, extract all findings. Cross-reference against `knowledge/anti-patterns.md` to add remediation context. Note any CA-{prefix}-NNN finding IDs in the report.
1.5. **Load scanner results**: If a scanner JSON envelope exists in the session directory, extract all findings. Cross-reference against `knowledge/anti-patterns.md` to add remediation context. Note any CA-{prefix}-NNN finding IDs in the report.
2. **Build hierarchy map**: Order files by level (managed -> global -> project), visualize inheritance
2. **Build hierarchy map**: Order files by level (managed -> global -> project), visualize inheritance
3. **Detect conflicts**: Compare settings across hierarchy levels, note which level wins
3. **Detect conflicts**: Compare settings across hierarchy levels, note which level wins
@ -40,7 +51,7 @@ You will receive:
5. **Identify optimizations**: Rules to globalize, missing configs, orphaned files
5. **Identify optimizations**: Rules to globalize, missing configs, orphaned files
6. **Security scan**: Aggregate secret warnings, check for insecure patterns
6. **Security scan**: Aggregate secret warnings, check for insecure patterns
7. **CLAUDE.md quality assessment**: Score each file against rubric, assign letter grades
7. **CLAUDE.md quality assessment**: Score each file against rubric, assign letter grades
- `overallGrade` — health grade (quality areas only)
- `overallGrade` — health grade (quality areas only)
- `opportunityCount` — number of unused features detected
- `opportunityCount` — number of unused features detected
- `scannerEnvelope` — full scanner results including GAP findings
- `scannerEnvelope` — full scanner results. In default mode each GAP finding carries humanizer fields: `userImpactCategory` ("Missed opportunity"), `userActionLanguage` ("Fix soon", "Fix when convenient", "Optional cleanup", "FYI"), and `relevanceContext`. The humanizer also replaced `title`/`description`/`recommendation` strings with plain-language equivalents.
You also receive project context: language, file count, existing configuration.
You also receive project context: language, file count, existing configuration.
## Humanizer-aware rendering rules
- **Render the humanizer's `title`/`description`/`recommendation` verbatim.** Do not paraphrase. The humanizer owns the plain-language vocabulary.
- **Drive prioritization with `userActionLanguage`, not raw category tiers.** "Fix soon" → "Fix when convenient" → "Optional cleanup" → "FYI" replaces the t1/t2/t3/t4 tier ladder for output ordering.
- **Skip findings with `relevanceContext === "test-fixture-no-impact"`** unless the user explicitly asked to include fixtures.
- **Do not include "explain what X means" subroutines.** The category labels ("Missed opportunity") are pre-translated.
## Knowledge Files
## Knowledge Files
Read **at most 3** of these files from the plugin's `knowledge/` directory:
Read **at most 3** of these files from the plugin's `knowledge/` directory:
@ -36,6 +43,8 @@ Write `feature-gap-report.md` to the session directory. Max 200 lines.
### Report Structure
### Report Structure
Group findings by `userActionLanguage` rather than by raw category tier. Render the humanizer's `title` and `recommendation` verbatim — the humanizer has already produced plain-language equivalents.
```markdown
```markdown
# Feature Opportunities
# Feature Opportunities
@ -47,38 +56,34 @@ Write `feature-gap-report.md` to the session directory. Max 200 lines.
## High Impact
## High Impact
These address correctness or security — consider them seriously.
[Findings where userActionLanguage is "Fix soon" — these address correctness or security; consider them seriously.]
→ **[feature name]**
→ **[humanized title verbatim]**
Why: [evidence-backed reason, cite Anthropic docs or proven issues]
Why: [humanized description verbatim, plus "relevant because your project has X" context]
How: [2-3 concrete steps]
How: [humanized recommendation verbatim, broken into 2-3 concrete steps from gap-closure-templates.md]
[Repeat for each T1 finding]
## Worth Considering
## Worth Considering
These improve workflow efficiency for projects like yours.
[Findings where userActionLanguage is "Fix when convenient" — these improve workflow efficiency for projects like yours.]
→ **[feature name]**
→ **[humanized title verbatim]**
Why: [reason, with "relevant because your project has X"]
Why: [humanized description verbatim, plus relevance context]
How: [2-3 concrete steps]
How: [humanized recommendation verbatim, broken into 2-3 concrete steps]
[Repeat for each T2 finding]
## Explore When Ready
## Explore When Ready
Nice-to-have features. Skip these if your current setup works well.
[Findings where userActionLanguage is "Optional cleanup" or "FYI" — nice-to-have, skip if current setup works well.]
→ **[feature name]**
→ **[humanized title verbatim]**
Why: [brief reason]
Why: [humanized description verbatim, brief]
[Repeat for T3/T4 findings, keep brief]
## When You Might Skip These
## When You Might Skip These
[Honest qualification: which recommendations are genuinely optional and why. A minimal setup can be the right choice.]
[Honest qualification: which recommendations are genuinely optional and why. A minimal setup can be the right choice. Mention any findings whose `relevanceContext` is `affects-this-machine-only` so the user knows the change won't propagate to teammates.]
```
```
In `--raw` mode (humanizer fields absent), fall back to grouping by raw category tier (t1/t2/t3/t4) and render scanner-emitted titles verbatim — flag in the report header that output is unhumanized.
## Guidelines
## Guidelines
- Frame everything as opportunities, never as failures or gaps
- Frame everything as opportunities, never as failures or gaps
4. Mode flag — `$RAW_FLAG`. When empty (default), the analysis report uses humanized vocabulary: each finding has been grouped by `userImpactCategory` and led with `userActionLanguage`. When `--raw`, the report is v5.0.0 verbatim severity prefiks.
## Humanizer-aware planning rules
- **Consume humanized fields from the analysis report.** The analyzer-agent has already grouped findings by `userImpactCategory` ("Configuration mistake", "Conflict", "Wasted tokens", "Missed opportunity", "Dead config") and led each line with `userActionLanguage` ("Fix this now", "Fix soon", "Fix when convenient", "Optional cleanup", "FYI"). Carry that vocabulary forward into the action plan — do not re-derive severity-to-prose mappings.
- **Render finding titles and recommendations verbatim** as they appear in the analysis report. The humanizer owns the plain-language vocabulary; rephrasing introduces drift between report and plan.
- **Order actions by `userActionLanguage` urgency**, not by raw severity. "Fix this now" + "Fix soon" precede "Fix when convenient" precede "Optional cleanup" precede "FYI".
- **Surface `relevanceContext`** when an action only affects the user's machine or only touches test fixtures — these warrant different escalation paths.
- **Do not perform translation duties in the action plan.** No "what this means in plain English" sections. The humanizer handles that upstream; if a finding's prose still reads like jargon, that's a data gap to flag, not a translation to invent.
In `--raw` mode, the analysis report is v5.0.0 verbatim — fall back to severity-based prioritization and surface raw scanner titles. Flag in the plan header that the plan was generated from unhumanized analysis.
## Task
## Task
1. **Load inputs**: Read analysis and interview (if exists)
1. **Load inputs**: Use the Read tool on the analysis report and interview (if exists)
2. **Generate actions**: Create action items for each finding
2. **Generate actions**: Create action items for each finding, preserving humanized titles
3. **Assess risk**: Evaluate risk level per action
3. **Assess risk**: Evaluate risk level per action
4. **Order by dependencies**: Ensure correct execution order
4. **Order by dependencies AND `userActionLanguage`**: dependency-correct AND urgency-correct
5. **Create rollback plans**: Define how to undo each action
5. **Create rollback plans**: Define how to undo each action
6. **Write action plan**: Output comprehensive plan
6. **Write action plan**: Output comprehensive plan grouped by `userImpactCategory`
description: Scan a directory tree for Claude Code configuration files (CLAUDE.md, settings.json, .mcp.json, rules). First step in the config-audit workflow.
description: Scan a directory tree for Claude Code configuration files (CLAUDE.md, settings.json, .mcp.json, rules). First step in the config-audit workflow.
model: haiku
model: sonnet
color: cyan
color: cyan
tools: ["Read", "Glob", "Grep", "Write"]
tools: ["Read", "Glob", "Grep", "Write"]
---
---
@ -255,3 +255,7 @@ Flag as potential secrets:
- Use Glob for pattern matching (fast)
- Use Glob for pattern matching (fast)
- Read files sequentially to avoid overwhelming filesystem
- Read files sequentially to avoid overwhelming filesystem
- Maximum depth: Follow scope configuration (default unlimited)
- Maximum depth: Follow scope configuration (default unlimited)
## Model policy
v4.0 migrated from haiku to Sonnet 4.6 per global no-haiku policy. Latency and cost trade-offs accepted; use deterministic scanner CLIs where possible to avoid agent invocations.
description: Verify that configuration changes were applied correctly. Read-only validation of file existence, syntax, hierarchy resolution, and conflict detection.
description: Verify that configuration changes were applied correctly. Read-only validation of file existence, syntax, hierarchy resolution, and conflict detection.
model: haiku
model: sonnet
color: purple
color: purple
tools: ["Read", "Glob", "Grep"]
tools: ["Read", "Glob", "Grep"]
---
---
@ -246,3 +246,7 @@ This agent:
- Never modifies any files
- Never modifies any files
- Reports findings without taking action
- Reports findings without taking action
- Safe to run multiple times
- Safe to run multiple times
## Model policy
v4.0 migrated from haiku to Sonnet 4.6 per global no-haiku policy. Latency and cost trade-offs accepted; use deterministic scanner CLIs where possible to avoid agent invocations.
- Findings must exist in `~/.claude/config-audit/sessions/{session-id}/findings/`
- Findings must exist in `~/.claude/config-audit/sessions/{session-id}/findings/`
## Arguments
- `$ARGUMENTS` may contain `--raw` to forward to the analyzer agent's instructions; in `--raw` mode the agent renders v5.0.0 verbatim severity prefiks instead of humanized `userActionLanguage` urgency phrasing.
## Implementation
## Implementation
### Step 1: Verify session state
### Step 1: Verify session state
Read `~/.claude/config-audit/sessions/{session-id}/state.yaml` and verify discovery phase completed. If not, tell the user: "Discovery hasn't been run yet. Start with `/config-audit discover` or just run `/config-audit` for a full audit."
Read `~/.claude/config-audit/sessions/{session-id}/state.yaml` using the Read tool and verify discovery phase completed. If not, tell the user: "Discovery hasn't been run yet. Start with `/config-audit discover` or just run `/config-audit` for a full audit."
### Step 2: Tell the user what's happening
### Step 2: Tell the user what's happening
@ -33,18 +37,29 @@ This includes hierarchy mapping, conflict detection, and prioritized recommendat
Tell the user: **"Generating analysis (this takes about 30 seconds)..."**
Tell the user: **"Generating analysis (this takes about 30 seconds)..."**
```bash
RAW_FLAG=""
if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
@ -14,6 +14,8 @@ Analyze, report on, and optimize your Claude Code configuration.
If a subcommand is provided, route to it:
If a subcommand is provided, route to it:
- `posture` → `/config-audit:posture`
- `posture` → `/config-audit:posture`
- `tokens` → `/config-audit:tokens`
- `manifest` → `/config-audit:manifest`
- `feature-gap` → `/config-audit:feature-gap`
- `feature-gap` → `/config-audit:feature-gap`
- `fix` → `/config-audit:fix`
- `fix` → `/config-audit:fix`
- `rollback` → `/config-audit:rollback`
- `rollback` → `/config-audit:rollback`
@ -25,6 +27,7 @@ If a subcommand is provided, route to it:
- `interview` → `/config-audit:interview`
- `interview` → `/config-audit:interview`
- `drift` → `/config-audit:drift`
- `drift` → `/config-audit:drift`
- `plugin-health` → `/config-audit:plugin-health`
- `plugin-health` → `/config-audit:plugin-health`
- `whats-active` → `/config-audit:whats-active`
- `status` → `/config-audit:status`
- `status` → `/config-audit:status`
- `cleanup` → `/config-audit:cleanup`
- `cleanup` → `/config-audit:cleanup`
@ -77,12 +80,14 @@ This is a silent infrastructure step — do NOT show output to the user.
### Step 3: Run scanners and posture assessment
### Step 3: Run scanners and posture assessment
Tell the user: **"Running 8 configuration scanners..."**
Tell the user: **"Running 12 configuration scanners..."**
Run both scanners and posture in a single Bash command:
Run both scanners and posture in a single Bash command. Default mode runs the humanizer, so each finding in `scan-results.json` carries `userImpactCategory`, `userActionLanguage`, and `relevanceContext` alongside the v5.0.0 fields. If the user passed `--raw`, thread it through to both CLIs to get v5.0.0 verbatim output.
Use `--full-machine` for `full` scope, `--global` for `home` scope. For `repo` and `current`, pass the resolved path directly.
Use `--full-machine` for `full` scope, `--global` for `home` scope. For `repo` and `current`, pass the resolved path directly.
@ -131,19 +136,14 @@ Write to: `~/.claude/config-audit/sessions/{session-id}/state.yaml`
### Step 6: Display results
### Step 6: Display results
Present results using this template. Replace all placeholders with actual values. **Adapt the summary sentence based on grade.**
Present results using this template. The humanizer has already replaced jargon-heavy `title`/`description`/`recommendation` strings on every finding with plain-language equivalents — render them verbatim. Lead urgency phrasing with `userActionLanguage` ("Fix this now", "Fix soon", "Fix when convenient", "Optional cleanup", "FYI") and group "What you can do next" suggestions by that field. Do not re-derive an A/B/C/D/F-to-prose ladder here; the humanized stderr scorecard headline already supplies the grade context, and `userActionLanguage` supplies finding-level urgency.
```markdown
```markdown
### Results
### Results
**Health: {overallGrade}** | {qualityAreaCount} areas scanned
**Health: {overallGrade}** | {qualityAreaCount} areas scanned
{grade-based summary — pick ONE:}
{Use the headline line from the humanized stderr scorecard — it carries grade-context prose already. Avoid hardcoding a separate per-grade prose ladder.}
- Grade A: "Excellent — your configuration is correct and well-maintained."
- Grade B: "Strong — your configuration is solid with minor improvements available."
- Grade C: "Decent — your configuration works but has some issues worth addressing."
- Grade D: "Needs work — several configuration issues could affect your Claude Code experience."
- Grade F: "Significant issues found — addressing these will meaningfully improve your workflow."
{For the status column, use plain language like: "Well structured", "2 minor issues", "Missing trust levels", "No issues", etc.}
{For the status column, use the humanized title from the most-severe finding in that area, or a one-phrase plain-language summary. Findings carry userImpactCategory which already groups by impact bucket — use that vocabulary, not raw scanner names.}
{If opportunityCount > 0:}
{If opportunityCount > 0:}
{opportunityCount} feature opportunities available — run `/config-audit feature-gap` for context-aware recommendations.
{opportunityCount} feature opportunities available — run `/config-audit feature-gap` for context-aware recommendations.
### What you can do next
### What you can do next
{Include only relevant options based on findings. Explain each one:}
Group suggestions by `userActionLanguage` from the humanized findings:
{If fixable_count > 0:}
{If any finding has userActionLanguage "Fix this now" or "Fix soon":}
- **`/config-audit fix`** — Automatically fix {fixable_count} issues. Creates a backup first so you can roll back with one command.
- **`/config-audit fix`** — auto-fix what's possible (backup created first, one-command rollback). The remaining items go into a prioritized plan.
- **`/config-audit plan`** — produce a prioritized action plan for the items that need manual attention.
{If real findings > fixable_count:}
{If most findings are "Fix when convenient" or "Optional cleanup":}
- **`/config-audit plan`** — Get a prioritized action plan for the {remaining} issues that need manual attention.
- **`/config-audit feature-gap`** — see which features could enhance your setup; pick what you want and implement on the spot.
- **`/config-audit fix`** — auto-fix anything deterministic; the rest is genuinely optional.
{If grade is C or better:}
{If only "FYI" findings:}
- **`/config-audit feature-gap`** — See which features could help your project, and implement the ones you want on the spot.
- **`/config-audit feature-gap`** — explore opportunities; nothing is urgent.
{If grade is D or F:}
- **`/config-audit fix`** should be your first step — it handles the most impactful issues automatically.
Session saved to: `~/.claude/config-audit/sessions/{session-id}/`
Session saved to: `~/.claude/config-audit/sessions/{session-id}/`
Run the scan orchestrator silently to discover and scan files:
Run the scan orchestrator silently to discover and scan files. Default mode emits humanized JSON — each finding in `scan-results.json` carries `userImpactCategory`, `userActionLanguage`, and `relevanceContext` alongside the v5.0.0 fields. Pass `--raw` through if the user requested it (produces v5.0.0 verbatim envelope; humanizer fields absent).
Check exit code: 0/1/2 → normal. 3 → "Discovery encountered an error. Try a narrower scope."
Check exit code: 0/1/2 → normal. 3 → "Discovery encountered an error. Try a narrower scope."
@ -81,7 +83,7 @@ Write `scope.yaml` and `state.yaml` to session directory. Update state with `cur
### Step 7: Present summary
### Step 7: Present summary
Read the scan results file to count files and findings:
Read the scan results file using the Read tool. When you surface initial findings, group them by `userImpactCategory` and lead each line with `userActionLanguage` rather than raw severity prefiks — the humanizer already mapped severity to plain-language phrasing ("Fix this now", "Fix soon", "Fix when convenient", "Optional cleanup", "FYI") so the rest of the toolchain sees consistent wording.
**Full scan:**
**Full scan:**
```markdown
```markdown
@ -98,7 +100,7 @@ Read the scan results file to count files and findings:
| Hooks | {n} |
| Hooks | {n} |
| Other | {n} |
| Other | {n} |
Initial scan found {finding_count} items to review.
Initial scan found {finding_count} items to review (grouped by impact: {comma-separated counts per userImpactCategory}).
**Next:** Run `/config-audit analyze` to generate your analysis report.
**Next:** Run `/config-audit analyze` to generate your analysis report.
@ -16,6 +16,7 @@ Compare current configuration against a saved baseline to see what changed.
- A target path (default: current working directory)
- A target path (default: current working directory)
- `--save`: Save current state as baseline
- `--save`: Save current state as baseline
- `--baseline <name>`: Compare against a specific named baseline (default: "default")
- `--baseline <name>`: Compare against a specific named baseline (default: "default")
- `--raw`: Pass-through to the scanner; produces v5.0.0 verbatim diff output (bypasses the humanizer). Use when piping into v5.0.0-baseline diff tooling that depends on byte-stable output.
## Implementation
## Implementation
@ -26,7 +27,9 @@ If `--save` is present:
Tell the user: **"Saving current configuration as baseline..."**
Tell the user: **"Saving current configuration as baseline..."**
Read stdout. If baseline not found, tell the user:
Read stdout. In default mode the diff sections are humanized — finding titles, descriptions, and recommendations have already been replaced with plain-language equivalents. New/resolved/changed finding lists carry `userImpactCategory`, `userActionLanguage`, and `relevanceContext` so you can group and prioritize without re-deriving severity prose. If `--raw` was passed, the v5.0.0 diff is verbatim — present it in a code block as-is.
If baseline not found, tell the user:
```
```
No baseline found. Save one first with:
No baseline found. Save one first with:
/config-audit drift --save
/config-audit drift --save
```
```
Otherwise, parse and present the drift report:
Otherwise, parse and present the drift report. Use the Read tool on the captured stdout (or pipe it into a tmpfile first if you prefer):
```markdown
```markdown
### Configuration Drift
### Configuration Drift
@ -65,15 +72,15 @@ Otherwise, parse and present the drift report:
@ -82,6 +89,8 @@ Otherwise, parse and present the drift report:
| ... | ... | ... | ... |
| ... | ... | ... | ... |
```
```
When iterating new/resolved findings, prefer `userActionLanguage` over raw `severity` for the "Action" column — the humanizer already mapped severity to plain-language phrasing, and surfacing it consistently keeps the toolchain coherent. Mention `relevanceContext` when it isn't `affects-everyone` (the user wants to know if a fix touches shared config or just their machine).
@ -20,9 +20,11 @@ Context-aware analysis of Claude Code features that could benefit your specific
## Implementation
## Implementation
### Step 1: Determine target and greet
### Step 1: Determine target and flags
Parse `$ARGUMENTS` for a path (default: current working directory).
Split `$ARGUMENTS` into a path and flags. Path is the first non-flag argument (default: current working directory). Recognized flags:
- `--raw` — pass-through to the scanner; produces v5.0.0 verbatim envelope (bypasses the humanizer). When `--raw` is set, render with v5.0.0 finding-field shape only — humanizer fields are absent in raw output.
Tell the user:
Tell the user:
@ -38,7 +40,9 @@ Generate session ID (`YYYYMMDD_HHmmss`) if no active session exists.
If exit code is non-zero: "Assessment couldn't run. Check that the path exists and contains configuration files."
If exit code is non-zero: "Assessment couldn't run. Check that the path exists and contains configuration files."
@ -59,49 +63,51 @@ ls <target-path>/*.py <target-path>/requirements.txt <target-path>/pyproject.tom
Read `${CLAUDE_PLUGIN_ROOT}/knowledge/gap-closure-templates.md` for implementation templates.
Read `${CLAUDE_PLUGIN_ROOT}/knowledge/gap-closure-templates.md` for implementation templates.
Group GAP findings into three sections. Number them sequentially across sections:
Group GAP findings by their humanized fields rather than re-deriving tier-to-prose mappings. In default mode (no `--raw`) each finding carries:
- `userImpactCategory` (e.g., "Missed opportunity") — the impact bucket
- `userActionLanguage` (e.g., "Fix soon", "Fix when convenient", "Optional cleanup", "FYI") — the urgency phrasing the rest of the toolchain uses
- `relevanceContext` ("affects-everyone" / "affects-this-machine-only" / "test-fixture-no-impact") — the scope so the user knows whether the change touches shared config or just their own machine
Group findings into three sections by `userActionLanguage`: "Fix this now" + "Fix soon" → **High Impact**, "Fix when convenient" → **Worth Considering**, "Optional cleanup" + "FYI" → **Explore When Ready**. Number sequentially across sections. Skip findings whose `relevanceContext === "test-fixture-no-impact"` unless the user explicitly asked to include fixtures.
The humanizer has already replaced jargon-heavy strings with plain-language equivalents in `title`, `description`, and `recommendation` — render those verbatim. Do not paraphrase. Do not introduce inline tier-to-prose tables ("Tier 1 means…"); the categories are pre-translated.
If `--raw` was passed, the v5.0.0 envelope is in effect — humanizer fields are absent. Fall back to grouping by `category` ("t1"/"t2"/"t3"/"t4") and render `title` + `recommendation` directly.
Render shape (default mode):
```markdown
```markdown
### High Impact
### High Impact
These address correctness or safety — consider them seriously.
{For each finding where userActionLanguage is "Fix this now" or "Fix soon":}
**1.** Add permissions.deny for sensitive paths
**{N}.** {title}
→ Settings enforcement is stronger than CLAUDE.md instructions.
→ {description}
→ Effort: Low (5 min)
→ {recommendation}
→ Effort: {from gap-closure-templates.md}
**2.** Configure at least one hook for safety automation
→ Hooks guarantee the action happens. CLAUDE.md instructions are advisory.
→ Effort: Medium (15 min)
### Worth Considering
### Worth Considering
These improve workflow efficiency for projects like yours.
{For each finding where userActionLanguage is "Fix when convenient":}
**3.** Split CLAUDE.md into focused modules with @imports
**{N}.** {title}
→ Files over 200 lines degrade Claude's adherence to instructions.
→ {description}
→ Effort: Low (10 min)
→ {recommendation}
**4.** Add path-scoped rules for different file types
→ Unscoped rules load every session regardless of relevance.
→ Effort: Low (10 min)
### Explore When Ready
### Explore When Ready
Nice-to-have. Skip if your current setup works well.
{For each finding where userActionLanguage is "Optional cleanup" or "FYI":}
**5.** Custom keybindings (Shift+Enter for newline)
**{N}.** {title}
→ Effort: Low (2 min)
→ {recommendation}
**6.** Status line configuration
→ Effort: Low (2 min)
```
```
Each recommendation MUST have:
Each recommendation MUST have:
- A number
- A number
- A one-line description
- The humanizer-provided `title`
- A "Why" with evidence
- The humanizer-provided `description` (where shown)
- A target path (default: current working directory)
- A target path (default: current working directory)
- `--dry-run`: Show fix plan without applying
- `--dry-run`: Show fix plan without applying
- `--raw`: Pass-through to scanners; produces v5.0.0 verbatim envelope (bypasses the humanizer) for byte-stable diff tooling
## Implementation
## Implementation
@ -28,44 +29,50 @@ Tell the user:
Scanning for auto-fixable issues...
Scanning for auto-fixable issues...
```
```
Run scanners silently:
Parse flags and run scanners silently. Default mode emits humanized JSON — each finding carries `userImpactCategory`, `userActionLanguage`, and `relevanceContext` alongside the v5.0.0 fields:
Exit code 3 → tell user: "Scanner error. Try `/config-audit posture` to check your configuration."
Exit code 3 → tell user: "Scanner error. Try `/config-audit posture` to check your configuration."
### Step 2: Plan fixes
### Step 2: Plan fixes
Run fix planner silently:
Run fix planner silently. The fix-cli emits humanized prose to stderr in default mode and v5.0.0-shape JSON to stdout when `--json` is set; we use `--json` here for structured data and let the humanizer-aware rendering layer (this command's prose output below) supply the plain-language wording from the scan envelope above:
Read the JSON output. Categorize fixes into auto-fixable and manual.
Read the JSON output using the Read tool. Cross-reference each fix-plan entry against the humanized scan envelope (`/tmp/config-audit-fix-scan-$$.json`) by finding ID to recover the humanized `title`/`description`/`recommendation` plus `userImpactCategory`/`userActionLanguage` for grouping.
### Step 3: Present fix plan
### Step 3: Present fix plan
Show what will be fixed and what needs manual attention:
Show what will be fixed and what needs manual attention. Group by `userActionLanguage` so the urgency phrasing stays consistent with the rest of the toolchain:
```markdown
```markdown
### Fix Plan
### Fix Plan
**Auto-fixable ({N} issues):**
**Auto-fixable ({N} issues), grouped by impact:**
{For each userActionLanguage bucket in priority order — "Fix this now" → "Fix soon" → "Fix when convenient" → "Optional cleanup" → "FYI":}
**Manual ({M} issues — require human judgment), grouped by impact:**
{Same userActionLanguage grouping. Render humanized title and recommendation verbatim — the humanizer already produced plain-language strings, do not paraphrase.}
| # | ID | Issue | Recommendation |
| # | ID | Issue | Recommendation |
|---|-----|-------|----------------|
|---|-----|-------|----------------|
| 1 | CA-CML-003 | CLAUDE.md exceeds 200 lines | Split content into @imports or .claude/rules/ |
description: Show all available config-audit commands
description: Show all available config-audit commands
allowed-tools: Read
allowed-tools: Read, Bash
model: sonnet
model: sonnet
---
---
@ -11,6 +11,19 @@ model: sonnet
Just run `/config-audit` — it auto-detects your project scope and runs a full audit. No setup needed.
Just run `/config-audit` — it auto-detects your project scope and runs a full audit. No setup needed.
The default output is written in plain language: each finding is grouped by impact ("Configuration mistake," "Conflict," "Wasted tokens," "Missed opportunity," "Dead config") and led with an urgency phrase ("Fix this now," "Fix soon," "Fix when convenient," "Optional cleanup," "FYI").
If you prefer the v5.0.0 verbatim output (technical IDs, raw severity, no plain-language wording), pass `--raw` to any command — it's threaded through every CLI in the toolchain. Use the Read tool on the saved JSON to consume it programmatically.
```bash
# Examples — every command accepts --raw for byte-stable v5.0.0 output
RAW_FLAG=""
if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
# /config-audit posture --raw
# /config-audit tokens --raw
# /config-audit fix --raw
```
## All Commands
## All Commands
### Core
### Core
@ -18,17 +31,19 @@ Just run `/config-audit` — it auto-detects your project scope and runs a full
| Command | Description |
| Command | Description |
|---------|-------------|
|---------|-------------|
| `/config-audit` | Full audit with auto-scope detection |
| `/config-audit` | Full audit with auto-scope detection |
| `/config-audit posture` | Quick scorecard with A-F grades per area |
| `/config-audit posture` | Quick scorecard with A-F grades per area (10 areas) |
@ -14,13 +14,22 @@ Execute the action plan with full backup, verification, and rollback support.
- Must have completed Phase 4 (plan)
- Must have completed Phase 4 (plan)
- Action plan at `~/.claude/config-audit/sessions/{session-id}/action-plan.md`
- Action plan at `~/.claude/config-audit/sessions/{session-id}/action-plan.md`
## Arguments
- `$ARGUMENTS` may contain `--raw` to forward to the implementer-agent's instructions; in `--raw` mode the agent renders v5.0.0 verbatim severity prefiks instead of humanized `userActionLanguage` urgency phrasing.
## Implementation
## Implementation
### Step 1: Load and verify
### Step 1: Parse flags, load and verify
```bash
RAW_FLAG=""
if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
```
Find the most recent session with a plan. If none: "No action plan found. Run `/config-audit plan` first."
Find the most recent session with a plan. If none: "No action plan found. Run `/config-audit plan` first."
Read the action plan and count actions. Tell the user:
Use the Read tool on the action plan and count actions. Tell the user:
@ -17,10 +17,21 @@ AskUserQuestion requires synchronous terminal interaction and does not work when
## Prerequisites
## Prerequisites
- Must have completed Phase 2 (analysis)
- Must have completed Phase 2 (analysis)
- Read analysis from `~/.claude/config-audit/sessions/{session-id}/analysis-report.md`
- Use the Read tool on the analysis at `~/.claude/config-audit/sessions/{session-id}/analysis-report.md`
## Arguments
- `$ARGUMENTS` may contain `--raw` — pass-through accepted for CLI surface consistency. Interview is interactive prose only (no scanner output, no findings prose), so `--raw` is a no-op here.
## Implementation Steps
## Implementation Steps
0. **Parse flags**:
```bash
RAW_FLAG=""
if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
2. **Conduct interview inline**: Use AskUserQuestion tool directly (NOT via Task). Adapt questions based on analysis findings.
2. **Conduct interview inline**: Use AskUserQuestion tool directly (NOT via Task). Adapt questions based on analysis findings.
3. **Save interview results**: Write to `~/.claude/config-audit/sessions/{session-id}/interview.md`
3. **Save interview results**: Write to `~/.claude/config-audit/sessions/{session-id}/interview.md`
@ -29,10 +40,10 @@ AskUserQuestion requires synchronous terminal interaction and does not work when
## Interview Questions
## Interview Questions
Ask these using AskUserQuestion (skip questions that don't apply based on analysis):
Ask these using AskUserQuestion (skip questions that don't apply based on analysis). Where the analysis report references finding IDs, use the humanized title from the report rather than re-deriving prose:
1. **Config Style** — Centralized vs Distributed vs Hybrid organization
1. **Config Style** — Centralized vs Distributed vs Hybrid organization
2. **Unused Hooks** — Wire up, review individually, delete, or leave (only if found)
2. **Unused automation that runs at specific events** — Wire up, review individually, delete, or leave (only if the analysis report flagged one)
3. **Duplicate Permissions** — Remove from local, consolidate, or keep (only if found)
3. **Duplicate Permissions** — Remove from local, consolidate, or keep (only if found)
4. **Modular Rules** — Use .claude/rules/ pattern? Yes/No
4. **Modular Rules** — Use .claude/rules/ pattern? Yes/No
5. **Path-Scoped Rules** — Which patterns (tests, src, config, docs) — only if Q4=Yes
5. **Path-Scoped Rules** — Which patterns (tests, src, config, docs) — only if Q4=Yes
description: Show ranked token-source manifest — every CLAUDE.md, plugin, skill, MCP server, and hook ordered DESC by estimated tokens
argument-hint: "[path] [--json]"
allowed-tools: Read, Bash
model: sonnet
---
# Config-Audit: Manifest
Produce a ranked, single-table view of every token source loaded for a given repo path. Where `whats-active` shows separate tables per category, `manifest` collapses everything into one ordered list — making it easy to see what's costing the most regardless of category.
## UX Rules (MANDATORY — from `.claude/rules/ux-rules.md`)
1. **Never show raw JSON or stderr output.** Always use `--output-file` + `2>/dev/null`.
2. **Narrate before acting.** Tell the user what you're about to do.
3. **Read, don't dump.** Read the JSON file and render a formatted table.
4. **End with context-sensitive next steps.**
## Implementation
### Step 1: Parse `$ARGUMENTS`
First non-flag argument is the path (default `.`). Recognized flags:
- `--json` — emit raw JSON instead of the rendered table.
- `--raw` — pass-through to the scanner; accepted for CLI surface consistency with the other config-audit commands. The manifest CLI is data-table only (no findings prose), so `--raw` is a no-op here, but the flag is still threaded through so users get uniform behaviour across `--raw`.
### Step 2: Run the CLI silently
Tell the user: **"Building token-source manifest for `<path>`..."**
```bash
TMPFILE="/tmp/ca-manifest-$$.json"
RAW_FLAG=""
if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
@ -14,11 +14,15 @@ Generate a prioritized action plan based on analysis results.
- Must have completed Phase 2 (analysis)
- Must have completed Phase 2 (analysis)
- Phase 3 (interview) is optional — plan works with or without it
- Phase 3 (interview) is optional — plan works with or without it
## Arguments
- `$ARGUMENTS` may contain `--raw` to forward to the planner-agent's instructions; in `--raw` mode the agent renders v5.0.0 verbatim severity prefiks instead of humanized `userActionLanguage` urgency phrasing.
## Implementation
## Implementation
### Step 1: Verify session state
### Step 1: Verify session state
Find the most recent session with analysis completed. If none found: "No analysis results found. Run `/config-audit` first to scan your configuration."
Find the most recent session with analysis completed using the Read tool on `~/.claude/config-audit/sessions/*/state.yaml`. If none found: "No analysis results found. Run `/config-audit` first to scan your configuration."
### Step 2: Tell the user what's happening
### Step 2: Tell the user what's happening
@ -29,7 +33,12 @@ Building a prioritized plan based on your analysis results...
Actions are ordered by impact, with risk assessment and dependency tracking.
Actions are ordered by impact, with risk assessment and dependency tracking.
```
```
### Step 3: Spawn planner agent
### Step 3: Parse flags and spawn planner agent
```bash
RAW_FLAG=""
if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
```
Tell the user: **"Generating your action plan (this takes about 30 seconds)..."**
Tell the user: **"Generating your action plan (this takes about 30 seconds)..."**
@ -14,6 +14,7 @@ Audit Claude Code plugin structure and quality — validates plugin.json, CLAUDE
- `$ARGUMENTS` may contain a path to a specific plugin directory
- `$ARGUMENTS` may contain a path to a specific plugin directory
- If omitted: scans all plugins in the marketplace root
- If omitted: scans all plugins in the marketplace root
- `--raw`: pass-through to the scanner; produces v5.0.0 verbatim envelope (bypasses the humanizer) for byte-stable diff tooling
## Implementation
## Implementation
@ -31,13 +32,15 @@ Auditing {N} plugin(s) for structure, frontmatter quality, and cross-plugin conf
### Step 2: Run scanner
### Step 2: Run scanner
Run silently for each plugin:
Run silently for each plugin. Default mode emits a humanized JSON envelope where each PLH finding carries `userImpactCategory`, `userActionLanguage`, and `relevanceContext` alongside the v5.0.0 fields. `--raw` is passed through verbatim when present.
Group findings within each plugin by `userImpactCategory` (e.g., "Configuration mistake", "Conflict") and lead each line with `userActionLanguage` ("Fix this now", "Fix soon", "Optional cleanup"). The humanizer already produced the plain-language `title`/`recommendation` strings — render them verbatim, do not paraphrase.
@ -13,15 +13,19 @@ Quick, deterministic configuration health scorecard. No agents needed — runs a
## What the user gets
## What the user gets
- Health grade (A-F) with plain-language explanation
- Health grade (A-F) with plain-language explanation
- Per-area breakdown for 7 quality areas with grades and actionable notes
- Per-area breakdown for 10 quality areas (incl. Token Efficiency, Plugin Hygiene) with grades and actionable notes
- Opportunity count — how many features could enhance their setup (not a grade)
- Opportunity count — how many features could enhance their setup (not a grade)
- Grade-appropriate next steps
- Grade-appropriate next steps
## Implementation
## Implementation
### Step 1: Determine target
### Step 1: Determine target and flags
Parse `$ARGUMENTS` for a path (default: current working directory). Resolve relative paths.
Split `$ARGUMENTS` into a path and flags. Path is the first non-flag argument (default: current working directory). Resolve relative paths. Recognized flags:
- `--raw` — pass-through to the scanner; produces v5.0.0 verbatim output (bypasses the humanizer). Power-user mode for byte-stable diffs and machine consumption.
- `--drift` — append a "Configuration Drift" section (see Step 5).
- `--plugin-health` — append a "Plugin Health" section (see Step 5).
Run silently — JSON goes to a file, the humanized scorecard prints to stderr (default mode). The humanized stderr scorecard already includes the grade headline and area-score lines in plain language, so render those directly rather than re-deriving prose tables.
If exit code is non-zero, tell the user: "Assessment couldn't complete. Check that the path exists and contains Claude Code configuration files."
If exit code is non-zero, tell the user: "Assessment couldn't complete. Check that the path exists and contains Claude Code configuration files."
If `--raw` was passed, treat the captured stderr as v5.0.0-shape verbatim text and present it as-is in a code block; skip the humanized rendering steps below.
### Step 3: Read and interpret results
### Step 3: Read and interpret results
Read the JSON output file using the Read tool. Extract:
Read the JSON output file using the Read tool. Extract:
- `overallGrade`, `opportunityCount`
- `overallGrade`, `opportunityCount`
- `areas[]` — each with `name`, `grade`, `score`, `findingCount`
- `areas[]` — each with `name`, `grade`, `score`, `findingCount`
- `scannerEnvelope.scanners[].findings[]` — when surfacing individual findings, prefer the humanizer-provided fields: `userImpactCategory` (e.g., "Configuration mistake", "Wasted tokens"), `userActionLanguage` (e.g., "Fix this now", "Fix soon", "Optional cleanup"), and `relevanceContext` ("affects-everyone", "affects-this-machine-only", "test-fixture-no-impact"). These let you group and prioritize without hardcoded severity-to-prose mappings.
Also Read the captured stderr file — its body is the humanized scorecard (grade headline, area-score block, opportunity hint). You can present it verbatim or interleave its lines with the JSON-driven table.
### Step 4: Present the scorecard
### Step 4: Present the scorecard
```markdown
```markdown
**Health: {overallGrade}** | {qualityAreaCount} areas scanned
**Health: {overallGrade}** | {qualityAreaCount} areas scanned
{grade-based context — pick ONE:}
{Use the headline line from the humanized stderr scorecard — it carries grade-context prose already (e.g., " Health: A (97/100) — Healthy setup, only minor polish needed"). Do not re-derive an A/B/C/D prose table here; the humanizer owns that vocabulary.}
- A: "Your configuration is correct and well-maintained."
- B: "Solid configuration with minor improvements available."
- C: "Working configuration with some issues worth addressing."
- D: "Configuration needs attention in several areas."
- F: "Significant issues found — addressing these will improve your experience."
### Area Scores
### Area Scores
@ -73,22 +79,13 @@ Read the JSON output file using the Read tool. Extract:
### What's next
### What's next
```
```
**Grade A or B:**
Group "what's next" suggestions by `userActionLanguage` from the humanized findings:
```
Your configuration health is strong. Re-run after major changes to catch regressions.
For feature recommendations: `/config-audit feature-gap`
```
**Grade C:**
- Findings tagged "Fix this now" / "Fix soon" → suggest `/config-audit fix` first, then `/config-audit plan`.
```
- Findings tagged "Fix when convenient" / "Optional cleanup" → suggest `/config-audit feature-gap` and routine maintenance.
Run `/config-audit fix` to auto-fix what's possible, then `/config-audit plan` for a prioritized improvement path.
- No high-urgency findings → suggest `/config-audit feature-gap` for opportunities and re-running posture after major config changes.
```
**Grade D or F:**
Avoid hardcoded grade-to-prose ladders here — the humanized scorecard headline already supplies grade context, and `userActionLanguage` supplies finding-level urgency.
```
Start with `/config-audit fix` — it handles the most impactful issues automatically with backup and rollback.
Then run `/config-audit plan` for a step-by-step path to a better configuration.
@ -13,12 +13,19 @@ Restore configuration files from a previous backup. Without arguments, lists ava
## Arguments
## Arguments
- `$ARGUMENTS` may contain a backup ID (format: `YYYYMMDD_HHMMSS`)
- `$ARGUMENTS` may contain a backup ID (format: `YYYYMMDD_HHMMSS`)
- `--raw`: pass-through flag accepted for CLI surface consistency. Rollback is file restoration only (no scanner output, no findings prose), so `--raw` is a no-op here, but the flag is still parsed so users get uniform behaviour across the toolchain.
## Behavior
## Behavior
### List mode (no argument)
### List mode (no argument)
List available backups from `~/.claude/config-audit/backups/`:
Parse flags and list available backups from `~/.claude/config-audit/backups/`:
```bash
RAW_FLAG=""
if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
ls -1 ~/.claude/config-audit/backups/
```
```
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
@ -33,11 +40,11 @@ List available backups from `~/.claude/config-audit/backups/`:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
```
```
Read each backup's `manifest.yaml` to extract file list and timestamps.
Use the Read tool on each backup's `manifest.yaml` (the list of changes captured at backup time) to extract the file list and timestamps.
### Restore mode (with backup ID)
### Restore mode (with backup ID)
1. Read manifest from `~/.claude/config-audit/backups/{backup-id}/manifest.yaml`
1. Read the list of changes from `~/.claude/config-audit/backups/{backup-id}/manifest.yaml` using the Read tool
2. Show files that will be restored — ask for confirmation:
2. Show files that will be restored — ask for confirmation:
```
```
AskUserQuestion:
AskUserQuestion:
@ -46,10 +53,10 @@ Read each backup's `manifest.yaml` to extract file list and timestamps.
- "Yes, restore"
- "Yes, restore"
- "Cancel"
- "Cancel"
```
```
3. For each file in manifest:
3. For each file in the list of changes:
a. Read backup file from `~/.claude/config-audit/backups/{backup-id}/files/{safeName}`
a. Read the backup file from `~/.claude/config-audit/backups/{backup-id}/files/{safeName}`
b. Write to original path
b. Write to the original path
c. Verify checksum matches manifest
c. Verify the checksum matches the recorded value in the list of changes
description: Show current session state and available actions
description: Show current session state and available actions
allowed-tools: Read, Glob
allowed-tools: Read, Glob, Bash
model: sonnet
model: sonnet
---
---
@ -13,18 +13,40 @@ Display current session state and guide next actions.
```
```
/config-audit status
/config-audit status
/config-audit status --raw # show the raw v5.0.0 phase identifiers (current_phase: "discover", etc.) instead of humanized labels
```
```
## Phase-label translation
The `state.yaml` field `current_phase` is the machine contract — never rename it. The user-facing label is humanized. Map the field value to a plain-language label when rendering (default mode):
description: Show ranked token hotspots and Opus 4.7 pattern findings — what's costing the most per turn and how to reduce it
argument-hint: "[path] [--global]"
allowed-tools: Read, Bash
model: sonnet
---
# Config-Audit: Token Hotspots
Show the configuration sources that contribute the most tokens per turn, ranked by estimated tokens, with Opus 4.7-specific recommendations for reducing prompt-cache misses, schema bloat, and deep import chains.
- **`tokens`** = action view (what to trim and why).
## UX Rules (MANDATORY — from `.claude/rules/ux-rules.md`)
1. **Never show raw JSON or stderr output.** Always use `--output-file` + `2>/dev/null`.
2. **Narrate before acting.** Tell the user what you're about to do.
3. **Read, don't dump.** Read the JSON file and render formatted tables.
4. **End with context-sensitive next steps.**
## Implementation
### Step 1: Parse `$ARGUMENTS`
Split `$ARGUMENTS` into a path and flags. Path is the first non-flag argument. Default to `.` (current working directory). Recognized flags:
- `--global` — also include the user-level `~/.claude/` cascade
- `--json` — emit raw JSON instead of rendered tables (power-user mode; bypasses the humanizer for byte-stable v5.0.0 output)
- `--raw` — pass-through to the scanner; produces v5.0.0 verbatim JSON (bypasses the humanizer). Use when piping into v5.0.0-baseline diff tooling.
- `--with-telemetry-recipe` — include `telemetry_recipe_path` in the JSON output, pointing to `knowledge/cache-telemetry-recipe.md`. Use this when you want to verify a structural fix actually improved cache hit rate (manual jq recipe, opt-in)
### Step 2: Run the CLI silently
Tell the user: **"Analysing token hotspots for `<path>`..."**
Default mode (no `--json`, no `--raw`) emits a humanized JSON envelope: each finding carries `userImpactCategory`, `userActionLanguage`, and `relevanceContext` in addition to the v5.0.0 fields. Pass `--raw` through verbatim if the user requested it.
```bash
TMPFILE="/tmp/config-audit-tokens-$$.json"
RAW_FLAG=""
if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
- `3` → tell user: "Couldn't analyse tokens. Check that the path exists and is a directory." Stop.
### Step 3: If `--json` was requested, cat the file and stop
```bash
cat "$TMPFILE"
```
Do NOT render tables in JSON mode.
### Step 4: Read JSON and render
Use the Read tool on `$TMPFILE`. Extract:
- `total_estimated_tokens` — top-line number
- `hotspots[]` — top 10 ranked sources
- `findings[]` — Opus 4.7 pattern findings (CA-TOK-001..003); each finding in default mode carries humanizer fields (`userImpactCategory`, `userActionLanguage`, `relevanceContext`) alongside the v5.0.0 fields
- `counts` — severity breakdown
Render as markdown. Group findings by `userImpactCategory` (e.g., "Wasted tokens" vs "Configuration mistake") rather than re-deriving severity prose; lead each line with `userActionLanguage` ("Fix this now", "Fix soon", "Optional cleanup", etc.) so the urgency phrasing stays consistent with the rest of the toolchain. The humanizer already replaced jargon-heavy `title`/`description`/`recommendation` strings with plain-language equivalents — render them verbatim.
```markdown
**Token hotspots for `<path>`** — ~{total_estimated_tokens} estimated tokens loaded per turn
### Top hotspots (ranked by estimated tokens)
| Rank | Source | Tokens | Recommendations |
|------|--------|--------|-----------------|
| {rank} | `{source}` | ~{estimated_tokens} | {recommendations joined as `· ` bullets} |
### Findings, grouped by impact
{Group findings[] by their userImpactCategory. Within each group, sort by userActionLanguage urgency (Fix this now → Fix soon → Fix when convenient → Optional cleanup → FYI), then render:}
- **{userActionLanguage}** — {title} ({id})
- {description}
- **Fix:** {recommendation}
- _{relevanceContext}_ when not "affects-everyone" (mention the scope so the user knows whether a fix touches shared config or just their machine)
- See `knowledge/opus-4.7-patterns.md` for the full pattern catalogue (CA-TOK-001 … 003)
- **Verify cache hit rate after a fix:** rerun with `--with-telemetry-recipe` to surface the path to `knowledge/cache-telemetry-recipe.md` — a copy-paste `jq` recipe that reads cache hit rate from your session transcripts. Opt-in. The TOK scanner is structural; this recipe is the runtime escape hatch.
```
## Scope and limits
- **Read-only.** Inspects config files; never writes.
- **Single repo.** Scans one path per invocation.
- **Structural only.** Hotspots are deterministic byte→token estimates from disk; runtime cache hit-rate is out of scope.
- **Heuristic estimates.** ~4 chars/token for markdown, ~3.5 for JSON. Real counts vary ±20%.
## Error handling
| Condition | Action |
|-----------|--------|
| Exit code 3 | Tell user path is invalid, suggest checking path exists |
| JSON parse fails | Tell user to re-run, mention as a bug to report |
| Empty hotspots | Suggest adding a CLAUDE.md or running `/config-audit feature-gap` first |
Show a complete, read-only inventory of everything Claude Code loads for a given repo — plugins, skills, MCP servers, hooks, CLAUDE.md cascade — with source attribution and rough token estimates. Helps identify candidates for disabling without guessing.
## UX Rules (MANDATORY — from `.claude/rules/ux-rules.md`)
1. **Never show raw JSON or stderr output.** Always use `--output-file` + `2>/dev/null`.
2. **Narrate before acting.** Tell the user what you're about to do.
3. **Read, don't dump.** Read the JSON file and render formatted tables.
4. **End with context-sensitive next steps.**
## Implementation
### Step 1: Parse `$ARGUMENTS`
Split `$ARGUMENTS` into a path and flags. Path is the first non-flag argument. Default to `.` (current working directory). Recognized flags:
- `--json` — emit raw JSON instead of rendered tables (power-user mode)
- `--raw` — pass-through to the scanner; accepted for CLI surface consistency. `whats-active` is an inventory-only output (no findings prose), so `--raw` is a no-op here, but the flag is still threaded through for uniform behaviour across the toolchain.
For each CLAUDE.md file, skill, and plugin, include a nested "Details" list with bytes, lines, and full path.
### Step 6: If `--suggest-disables`, show candidates
First show deterministic signals from `suggestDisables.candidates[]`:
```markdown
### Disable candidates (deterministic)
| Kind | Name | Reason | Confidence |
|------|------|--------|------------|
| {kind} | {name} | {reason} | {confidence} |
```
Then run LLM judgment — check `git log --oneline -20` and project manifests (package.json/Cargo.toml/etc.) to propose up to **3** additional candidates. For each candidate, you MUST:
1. Name the specific redundancy
2. Name the signal the user should check to confirm
Do NOT suggest items you can't name concrete redundancy for. If you can't find 3 strong candidates, return fewer or zero.
### Step 7: Cleanup and next steps
```bash
rm -f "$TMPFILE"
```
```markdown
### What's next
- **`/config-audit posture`** — check configuration health (A-F grades per area)
- **`/config-audit feature-gap`** — context-aware recommendations for features you aren't using
- **Disable a plugin:** edit `~/.claude/settings.json` → `enabledPlugins` (remove the entry)
- **Disable an MCP server:** edit `~/.claude.json` → `projects.<path>.disabledMcpjsonServers`
- **Re-run with flags:**`/config-audit whats-active --verbose` (details) or `--suggest-disables` (pruning help)
```
## Scope and limits
- **Read-only.** This command never writes to configuration files — no mkdir, no edits, no deletes.
- **Single repo.** Scans one repo path per invocation. Cross-repo rollups are out of scope.
- **Ballpark token counts.** Estimates are deterministic but not calibrated against Claude's tokenizer. Use them to compare categories, not to predict exact billing.
- **No runtime queries.** We inspect config files only — we do not connect to MCP servers or invoke hooks.
## Error handling
| Condition | Action |
|-----------|--------|
| Exit code 3 | Tell user path is invalid, suggest checking path exists |
| JSON parse fails (shouldn't happen — CLI writes valid JSON) | Tell user to re-run, mention this as a bug to report |
| No plugins, no CLAUDE.md, no hooks found | Still render with zeroes; suggest `/config-audit feature-gap` for setup help |
Default output of all 18 commands routes through `humanizeEnvelope` from `lib/humanizer.mjs`. Findings are decorated with three additive fields and may have title/description/recommendation replaced when a translation exists.
## Output modes
| Flag | Behavior |
|------|----------|
| (default, no flag) | Plain-language: humanizer applied, findings group by user-impact, titles lead with prose. Self-audit terminal render also humanized. |
| `--raw` | Byte-stable v5.0.0 verbatim — humanizer bypassed, technical IDs and severity-only labels. For tooling that scrapes stderr from v5.0.0. |
| `--json` | Unchanged from v5.0.0 — humanizer bypassed, byte-stable JSON envelope. Always preferred for programmatic consumption over `--raw`. |
`--raw` is threaded through every CLI: `posture.mjs`, `scan-orchestrator.mjs`, `token-hotspots-cli.mjs`, `manifest.mjs`, `whats-active.mjs`, `fix-cli.mjs`, `drift-cli.mjs`, `self-audit.mjs`.
## Vocabularies
User-impact category (added to each finding as `userImpactCategory`, derived from scanner prefix):
- Posture's stderr scorecard is rendered prose-side and is not part of the JSON envelope; `humanized.areas[].titleHumanized` referenced by command templates lives only in the prose render.
- Posture's `--output-file` writes raw v5.0.0-shape JSON because `posture.mjs` does not call `humanizeEnvelope`. If session-files should later be humanized, posture needs its own humanize pass — out of v5.1.0 scope.
- The default-output snapshot at `tests/snapshots/default-output/posture.json` is frozen — change requires `UPDATE_SNAPSHOT=1` plus intent confirmation.
| `cache-prefix-scanner.mjs` | CPS | Volatile content in lines 31–150 of CLAUDE.md cascade (beyond Pattern A's top-30 window) |
| `disabled-in-schema-scanner.mjs` | DIS | Tools listed in BOTH `permissions.deny` AND `permissions.allow` — deny wins, allow entries are dead config |
| `collision-scanner.mjs` | COL | Cross-plugin skill name collisions (low); user-vs-plugin overlaps (medium); `details.namespaces` payload |
| `humanizer.mjs` | Plain-language output translator (v5.1.0): `humanizeFinding`, `humanizeFindings`, `humanizeEnvelope`, `computeRelevanceContext`. Pure functions; never mutate inputs. Adds `userImpactCategory`, `userActionLanguage`, `relevanceContext` fields and replaces title/description/recommendation when a translation exists. Bypassed by `--raw` and `--json` paths. |
| `humanizer-data.mjs` | TRANSLATIONS table for 13 scanner prefixes (CML/SET/HKV/RUL/MCP/IMP/CNF/COL/TOK/CPS/DIS/GAP/PLH). Three-step lookup: exact title → regex pattern → `_default` → fall through to original |
- **v5.0.0** — Full suite grønn, README oppdatert, CHANGELOG, versjonssync, self-audit grade A.
- **v5.1.0+ (post-release)** — N7 (cache-hit-digest) når data-tilgang er løst.
---
## 1. Hvorfor v5.0.0
v4.0.0 markedsfører seg som "Opus 4.7-aware token optimization" (TOK-scanner, `/config-audit tokens`, Token Efficiency som 8. kvalitetsområde). Kritisk review viser at markedsføringen ikke holder:
- TOK-scanneren importerer `readActiveConfig` og bruker den eksplisitt ikke (`void readActiveConfig` i `scanners/token-hotspots.mjs:31`) — scanneren ser aldri på plugins, skills, MCP-servere eller CLAUDE.md-kaskade som aggregert token-kost.
- 4 TOK-mønstre dekker 29% av 14 identifiserte Opus 4.7-kostdrivere. De største sinkene (MCP tool-schema-eksplosjon, skill-description-bloat, CLAUDE.md-kaskade-sum) har null dekning.
- `estimateTokens` (`scanners/lib/active-config-reader.mjs:29-39`) flater MCP-servere og hooks til 15 tokens hver. En bruker med 5 MCP-servere får rapportert 75 tokens der virkeligheten er 10-20k.
- Area-score ignorerer severity helt (`scanners/lib/scoring.mjs:184`): 1 kritisk og 1 info gir identisk areascore.
- Pattern D (`detectSonnetEra`) motsier pluginens egen v3.0-policy om at minimalt korrekt oppsett = Grade A.
Resten av pluginen (8 strukturelle scannere, backup/rollback, suppression, plugin-health) fungerer og skal ikke rives ned. v5.0.0 er en token-economy-runde, ikke en totalombygging.
---
## 2. Mål for v5.0.0
**Primært:** Gjøre pluginens token-optimalisering reality-based. Etter v5.0.0 skal en bruker som kjører `/config-audit tokens` få konkret, kalibrert innsikt i hva som faktisk koster tokens i deres oppsett — MCP, skills, CLAUDE.md-kaskade, hooks inkludert.
**Sekundært:**
- Severity reflekterer estimert tokens/tur, ikke "hvor trivielt mønsteret er å detektere".
- Area-score tar hensyn til severity.
- README/CLAUDE.md-tall samsvarer med faktisk kode.
- Knowledge-basen reflekterer Opus 4.7-prioriteringer (cache-reuse og schema-disiplin), ikke Sonnet-æra-"tokens er billige".
**Ikke-mål:**
- Runtime-telemetri som kjernefunksjon (bare som opt-in recipe; krever transcript-parsing).
- Full tiktoken-bundling (opt-in `--accurate-tokens` via API er akseptabelt; default skal være zero-deps-heuristikk).
- Kryssrepo-benchmarking eller cloud-telemetri.
- Endringer i secret/credential-scanning-scope (fortsatt delegert til llm-security).
---
## 3. Scope
### Must-fix (7 kritiske)
| ID | Fil/linje | Hva |
|----|-----------|-----|
| F1 | `scanners/token-hotspots.mjs:31` | TOK må faktisk bruke `readActiveConfig` — ikke bare importere den |
| F2 | `scanners/lib/active-config-reader.mjs:29-39` | `estimateTokens` må type-differensiere MCP/hooks, ikke flat 15 tokens |
| F3 | `scanners/lib/scoring.mjs:184` | Area-score må vekte findings etter severity (gjenbruk `riskScore`-WEIGHTS) |
5. **MCP tool-count flagges.** Fixture med `.mcp.json` pluss `tools/list`-mock med 20 tools: TOK-scanner produserer `CA-TOK-005` finding.
6. **System-prompt-manifest fungerer.**`node scanners/manifest.mjs <path>` returnerer en rangert liste med kilde + tokens DESC, totalt innenfor ±20% av faktisk summert byte-estimat.
7. **Cache-prefix-analyse.** CLAUDE.md med volatile midt-seksjon genererer finding, ikke bare hvis volatilitet er i topp-30.
8. **Kollisjons-scanner.** Fixture med to plugins som begge eksponerer skill `review`: collision-finding produseres.
9. **Knowledge-basen oppdatert.** Grep etter "Keep under 200 lines" (Sonnet-æra-formulering) i `knowledge/configuration-best-practices.md` returnerer 0 — erstattet av cache-stabilitets-rettet guidance.
10. **Suite-helse.**`node --test 'tests/**/*.test.mjs'` ≥ 600 tester grønne (fra 543 i v4.0.0). Ny scanner-funksjonalitet har fixture-dekning.
---
## 5. Risikoer og avhengigheter
- **Tokenizer-kalibrering** — ingen zero-deps-tokenizer gir 100% nøyaktighet. Godta ±20% default; markér opt-in `--accurate-tokens` som eksperimentell.
- **MCP `tools/list`-tilgang** — krever kjørende MCP-server. Fallback: parse serverens manifest hvis det finnes, ellers bruk cache/estimat.
- **Schema-drift på `.claude.json`-format** — Anthropic kan endre formatet. `readClaudeJsonProjectSlice` har allerede longest-prefix-matching; nye felter må detekteres robust.
- **Breaking changes** — v5.0.0 er major bump. TOK-finding-IDer består (`CA-TOK-001..004`), nye legges til fra `CA-TOK-005`. Suppression-filer fra v4.x skal fortsatt fungere.
- **Self-audit-failure etter bump** — README-sjekken (F6) kan feile ved første push. Godta midlertidig rød self-audit under v5-arbeid; krav om grønn før release-tag.
| 3 | Audit `baseline-all-a` fixture | ✓ no changes needed — fixture is genuinely info-only, posture-grade-stability still all-A | (no commit) |
| 4 | `'mcp'` kind in `estimateTokens` (F2 fn) | ✓ green (+4 tests, base 500, +200/tool) | `48d560a` feat(config-audit): add 'mcp' kind to estimateTokens (v5 F2) |
| 5 | MCP callers use `'mcp'` kind (F2 caller) | ✓ green (+1 test, hooks keep `'item'`) | `ce7c42f` fix(config-audit): MCP token callers use 'mcp' kind (v5 F2) |
| 6 | TOK consumes `readActiveConfig` (F1) | ✓ green (+3 tests, new fixture `tok-active-config/`, MCP servers expand into hotspots, `result.activeConfig` summary exposed, try/catch fallback) | `34669d5` feat(config-audit): TOK consumes readActiveConfig (v5 F1) |
| 7 | Remove `take` + padding (F4) | ✓ green (+2 tests for uniqueness + max-bound, `HOTSPOTS_MIN` constant deleted) | `0d8a9af` fix(config-audit): remove TOK dead take + hotspot padding (v5 F4) |
| 8 | Remove Pattern D `detectSonnetEra` (F5) | ✓ green (+ updated sonnet-era test to assert zero findings) | `2810ee6` feat(config-audit): remove TOK Pattern D detectSonnetEra (v5 F5) |
- Step 6 test had to compare against `opus-47/sonnet-era` (smaller baseline) instead of `healthy-project`; both pull in user's ambient `~/.claude.json`/plugins via `readActiveConfig`, so `healthy-project` ended up only ~30 tokens different. `sonnet-era` has no `.mcp.json`, so the +1000 tokens from the new fixture's 2 servers shows clearly.
- Step 8 had a surprise: Pattern D didn't actually fire on `opus-47/sonnet-era` even before removal, because `discovery.files` for that fixture have `scope: 'plugin'` (the file-discovery mistakes the test layout for a plugin). The "emits no findings above info severity" assertion was passing vacuously. New assertion is stricter (`findings.length === 0`) and now genuinely tests the removal.
- PathGuard hook blocked `Write` to `tests/fixtures/tok-active-config/.claude-plugin/plugin.json` (false positive on test fixtures); used `Bash printf` to create the file. Hook should likely allow `tests/fixtures/**` paths in a future hardening pass.
- `void readActiveConfig` placeholder in `scanners/token-hotspots.mjs` removed in Step 6.
- Total tests: 543 → 563 (+20).
**No blockers carried into Session 2.**
---
---
## Session 2 — alpha.2 (2026-05-01)
**Outcome:** All 8 steps shipped. 569 → 586 tests, all green. Direct-to-main on Forgejo (autorisert).
**Per-step result:**
| # | Step | Result | Commit |
|---|------|--------|--------|
| 10 | F7 — recalibrate TOK severities + calibration_note | ✓ green (+6 tests, table-driven by title — TOK IDs are sequential per scan, not semantic per pattern) | `58d6b5b` feat(config-audit): recalibrate TOK severities for tokens/turn (v5 F7) |
| 16 | F6 — `self-audit --check-readme` flag | ✓ green (+4 tests, helper `checkReadmeBadges` + `runSelfAudit({checkReadme:true})`, fixture `readme-desynced`; real plugin self-check intentionally red — scanners 10 vs 9, tests 31 vs 543, deferred to Step 28) | `3c79f95` feat(config-audit): self-audit --check-readme flag (v5 F6) |
| 17 | CHANGELOG 5.0.0-alpha.2 entry | ✓ added with F7/M1/M2/M4-M6/F6 summary, breakdown of new fixtures, and notes on alpha-phase passed===false acceptance | `55cedbe` docs(config-audit): CHANGELOG 5.0.0-alpha.2 entry |
**Notable observations / deviations:**
- **Step 10 plan vs reality:** Plan's table used `findingId: 'CA-TOK-NNN'` mapping IDs to patterns. Actual TOK finding IDs are sequential per scan (output.mjs:31), not semantic per pattern — when only Pattern B fires (redundant-tools fixture), it gets CA-TOK-001 not CA-TOK-002. Test was rewritten to identify findings by title regex instead.
- **Step 13 scope:** Plan said "walk activeConfig.skills". Implementation walks only `discovery.files` of type `skill-md`. Reason: walking activeConfig.skills pulls in user's `~/.claude/skills/` (11 user skills + 54 plugin skills, of which 22 had > 500-char descriptions in this user's ambient state) — none of which are actionable in a project-scoped audit. Discovery-only matches what `/config-audit <path>` is asking about.
- **Step 14 fixture committed via gitignore exception:**`node_modules/` is repo-wide ignored; added `!tests/fixtures/**/node_modules/**` so the `mcp-heavy/package.json` fixture stays under version control.
- **Step 14 hook command path:** Initial fixture used `node ./hooks/scripts/loud.mjs` but `extractScriptPath` resolves relative paths from `dirname(file.absPath)` which is already `hooks/`, so the path needed to be `./scripts/loud.mjs` (no leading `hooks/`).
- **Step 16 plan deviation on tests count:** Plan's heuristic "count `.test.mjs` files in `tests/`" yields 31 for the real plugin, but the README badge says "543+" (test cases, not files). Both are legitimate measurements — alpha phase explicitly does not require `passed === true`. Step 28 will reconcile.
- **`[skip-docs]` tag on every feat commit:** pre-commit-docs-gate hook requires README/CLAUDE.md updates on `feat:` commits to Forgejo; v5 plan explicitly fences off doc updates until Session 5. Each commit message ends with `[skip-docs]` and a reason; logged to `~/.claude/audit/docs-gate-skips.log`.
- Total tests: 569 → 586 (+17 from new + already-counted F7 in 569 baseline).
**No blockers carried into Session 3.**
---
---
## Session 3 — beta.1 (2026-05-01)
**Outcome:** All 7 steps shipped. 586 → 625 tests, all green. Direct-to-main on Forgejo (autorisert).
**Per-step result:**
| # | Step | Result | Commit |
|---|------|--------|--------|
| 18 | N1 — `CA-TOK-005` MCP tool-schema budget | ✓ green (+7 tests; tiered severity 14/25/60/120/unknown via fixtures with inline `tools` arrays in `.mcp.json`; scoped to project-local `.mcp.json` to avoid ambient ~/.claude.json plugin-MCP leakage) | `b2407a0` feat(config-audit): CA-TOK-005 MCP tool-schema budget (v5 N1) |
| 19 | N2 — System-Prompt Manifest scanner + CLI | ✓ green (+11 tests; both real-config and `buildRichManifestRepo` fixture paths; CLAUDE.md per-file tokens distributed proportional to bytes) | `0420b8c` feat(config-audit): /config-audit manifest command (v5 N2) |
| 20 | N3 — Cache-Prefix Stability scanner (CPS) | ✓ green (+7 tests; CACHED_PREFIX_LINES=150; volatile patterns extend Pattern A with `!` shell-exec and `${VAR}`; skips lines 1-30 to avoid Pattern A overlap; required `scoreByArea` dedup-by-area to keep 9-area contract for shared "Token Efficiency") | `65087e6` feat(config-audit): cache-prefix stability scanner CPS (v5 N3) |
| 21 | N4 — Disabled-In-Schema scanner (DIS) | ✓ green (+6 tests; per-file deny+allow overlap detection by bare tool name; healthy-project as negative case) | `cc349d6` feat(config-audit): disabled-in-schema scanner DIS (v5 N4) |
| 22a | Namespace research spike | ✓ written to `docs/v5-namespace-research.md` (gitignored); confidence: medium; verdicts: plugin-vs-plugin = low collision possible, user-vs-plugin = medium, built-in = uncertain (deferred to v5.0.1) | (no commit; .gitignore folded into 22b) |
| 22b | N6 — Cross-plugin collision scanner (COL) | ✓ green (+8 tests; user-vs-plugin medium, plugin-vs-plugin low, with `details.namespaces` array; new "Plugin Hygiene" area; `output.mjs:finding()` helper now passes through `details`; posture test bumped 9→10 areas) | `cd25c1e` feat(config-audit): cross-plugin collision scanner COL (v5 N6) |
| 23 | beta.1 wrap CHANGELOG | ✓ added with Known breaking changes section on `CA-TOK-*` glob now matching CA-TOK-005, plus explicit note on plugin-vs-built-in deferred to v5.0.1 | `5a1e7cb` docs(config-audit): CHANGELOG 5.0.0-beta.1 + N1 breaking note |
**Notable observations / deviations:**
- **Step 18 ambient leakage rerun:** initial implementation iterated all `activeConfig.mcpServers` and tripped on user's plugin-bundled MCP servers (e.g. `sadhguru-wisdom` showed up in the `sonnet-era` fixture's findings). Fix: scope to `m.source === '.mcp.json'` (project-local). Plugin/user MCP servers are surfaced by Step 19's manifest scanner instead. Tests filter by fixture-specific server name (`budget-srv-N`).
- **Step 18 detection-order pinning:** plan said "5th detection block AFTER A/B/C". Patterns F (skill desc) + E (cascade > 10k) were already present from alpha.2. Inserted N1 between Pattern F and Pattern E. Tests assert title + severity (not exact ID) since IDs are sequential per scan.
- **Step 19 CLAUDE.md per-file tokens:**`claudeMd.estimatedTokens` is computed for the whole cascade. Decided to distribute across files proportional to `bytes` rather than recompute per file — single source of truth for the cascade total.
- **Step 20 dedup-by-area refactor:** CPS shares the "Token Efficiency" area with TOK, but `scoreByArea` was emitting one row per scanner, not per area. Refactored to group results by area name and merge counts. The 9-area contract held until Step 22b added "Plugin Hygiene".
- **Step 21 fixture write succeeded:** PathGuard hook was a Session 2 watch-out for fixture `settings.json` writes. Used `cat <<EOF` via Bash this time — passed through. (Either the hook was relaxed since alpha.2, or the path-guard rule applies to specific edits not new fixtures.)
- **Step 22a confidence: medium.** The plugin-prefix in `name:` frontmatter is freeform (e.g. `llm-security` plugin uses `security:` prefix, not `llm-security:`), so collision IS possible if two authors choose the same prefix word. Built-in collision (e.g. plugin shadows `/help`) is not testable from research alone — left as info-only in CHANGELOG.
- **Step 22b `details` field:** had to extend `output.mjs:finding()` helper to pass through `details`. Existing scanners don't break (the field is optional, only present when set). First scanner to use it.
- **Step 22b posture test:** the `assert.equal(result.areas.length, 9)` assertion broke because COL added a 10th area. Bumped to 10 with a note in the test message (v5 adds Plugin Hygiene from COL). This is a deliberate v5 design change.
- **Step 22b suppression-glob test surfaced an API bug:** my first test passed `[{ id: 'CA-TOK-*', ... }]` to `applySuppressions`. The actual key is `pattern`, not `id`. Updated. No code change — just test fixed.
- Total tests: 586 → 625 (+39). Per-step: +7, +11, +7, +6, +8 (no test for 22a research, 0 for Step 23).
- **Step 24 — M8 knowledge rensing.** Replaced "Keep CLAUDE.md under 200 lines" with cache-stability guidance (first 30 lines stable, volatile content below the cache threshold). Added footnote explaining the 200-line rule was a Sonnet-era adherence heuristic. Verified: `grep -q "Keep under 200 lines"` returns no match. Commit: `e1e23ed``docs(config-audit): knowledge rensing — Opus 4.7 cache-stability guidance (v5 M8)`.
- **Step 25 — M7 cache-telemetry recipe.**
- New `knowledge/cache-telemetry-recipe.md` — copy-paste `jq` recipe that sums `cache_read_input_tokens` and `cache_creation_input_tokens` per turn from `~/.claude/projects/<slug>/*.jsonl`. Hit-rate interpretation table, per-turn breakdown for spotting regression turns, design-rationale note explaining why this is a recipe and not a scanner.
- `--with-telemetry-recipe` flag on `token-hotspots-cli.mjs`. When present, emits `telemetry_recipe_path` field in JSON output. Without the flag, output unchanged (committed as default deliverable, opt-in at invocation).
- `commands/tokens.md` updated: flag documented in Step 1 args, surfaced in next-steps as the cache-verification path after a structural fix.
- Tests (×3): negative test (flag absent → field absent), positive test (flag present → string ending in `cache-telemetry-recipe.md`), existing 2 tests still pass. 627 → 628 tests.
- Commit: `df6e012``docs(config-audit): cache-telemetry recipe + --with-telemetry-recipe flag (v5 M7)`.
- **Step 26 — N5 `--accurate-tokens` API calibration.**
- New `scanners/lib/tokenizer-api.mjs`: `callCountTokensApi(text, apiKey, options)` wraps Anthropic's `count_tokens` endpoint. Required headers (`x-api-key`, `anthropic-version: 2023-06-01`, `content-type`). 5-second AbortController timeout. Exponential backoff on HTTP 429 (max 3 retries: 1s, 2s, 4s — base configurable for tests). Non-429 HTTP errors throw `count_tokens API failed (key sk-ant-X...): HTTP <status>` with the body deliberately omitted to avoid echo-leak. Network/abort errors masked similarly. `maskKey()` exported as a utility.
- `--accurate-tokens` flag on `token-hotspots-cli.mjs`. When `ANTHROPIC_API_KEY` is present, calls the API for the top 3 hotspots and populates `output.calibration = { actual_tokens, source: 'count_tokens_api', sampled_hotspots: 3 }`. When absent, `calibration = { skipped: 'no-api-key' }` plus stderr warning. On API error, `calibration = { skipped: 'api-error', error: <masked-message> }`.
- **Mocking pattern correction:** v5-plan specified `mock.method(tokenizerApi, 'callCountTokensApi', ...)` but ESM read-only export bindings reject property redefinition (`TypeError: Cannot redefine property: callCountTokensApi`). Switched to mocking `globalThis.fetch` instead — equivalent coverage at the actual external-dependency boundary. Documented in CHANGELOG Notes and the test-file comment.
- Tests (×8): 2× CLI subprocess (no-key skip + flag absence), 6× tokenizer-api unit (key-masking on network error, body-leak protection on 401, AbortController signal threaded, 429 retry with mocked fetch, headers asserted, happy-path fetch mock).
- Test count: 628 → 635 (+7 net; the +1 from the "absent-flag" test was added in Step 25 above so the Step 26 delta sees 7 new tests).
- Commit: `b741430``feat(config-audit): --accurate-tokens API calibration (v5 N5) [skip-docs]`.
- **Step 27 — rc.1 wrap.** Added `## [5.0.0-rc.1]` entry to `CHANGELOG.md` with Summary / Added / Changed / Tests / Notes. Documented the SC-6b release-gate carve-out (manual verification before tagging) and the `mock.method` → `fetch` mocking pivot. Commit: `1ce26fe``docs(config-audit): CHANGELOG 5.0.0-rc.1 entry`.
### Result
- 4 steps shipped, all green. Pushed to Forgejo `main` (autorisert).
- Test count: 625 → 635 (+10).
- New files: `knowledge/cache-telemetry-recipe.md`, `scanners/lib/tokenizer-api.mjs`, `tests/scanners/accurate-tokens.test.mjs`.
- Untouched (scope fence): `README.md`, `CLAUDE.md`, `.claude-plugin/plugin.json` — all wait for Session 5.
### Observations carried into Session 5
- **SC-6b release gate is open.** Before tagging `v5.0.0`, KTG must run `--accurate-tokens` against a known fixture with a real `ANTHROPIC_API_KEY`, manually compare `calibration.actual_tokens` against the byte-estimated value for that fixture, and confirm error ≤ ±5%. If error exceeds ±5%, the heuristic in `estimateTokens` must be re-tuned before tagging.
- **`mock.method` for ESM modules is a known footgun** — record this in REMEMBER for future scanners that try to stub library exports. Use `globalThis.fetch` mocking, dependency-injection seams, or `vi.mock`-style loaders if needed; do NOT rely on `mock.method` against ESM module namespaces.
- **`--check-readme` will still fail in beta state.** Self-audit's badge mismatch report (scanners 12 vs 9, tests now 31 vs 543) is by-design until Step 28's straggler sweep aligns README/CLAUDE.md with filesystem truth. Posture-test still expects 10 areas (unchanged in this session).
- **`fetch` global confirmed working** on Node 25.8.2 (KTG's machine). No fallback to `node:https` needed.
**No blockers carried into Session 5.**
---
## Session 5 — release (2026-05-01)
**Outcome:** All 3 steps shipped. v5.0.0 tagged and pushed (`config-audit/v5.0.0` on Forgejo). 635 tests still green. SC-6b release-gate **PASS** at −0.85% delta.
### Per-step result
| # | Step | Result | Commit |
|---|------|--------|--------|
| 28 | README + CLAUDE.md straggler-sweep | ✓ green; `--check-readme` PASSES (counts: scanners 12, commands 18, tests 635, knowledge 8, agents 6, hooks 4); self-audit also updated to (a) exclude `plugin-health-scanner.mjs` from `countScannerShape` so the orchestrated-scanner count matches the README badge taxonomy, and (b) `countTestCases` runs `node --test` to count test cases (635) instead of test files (36) — required for badge accuracy | `5bf500e``docs(config-audit): straggler sweep for v5.0.0 — sync all badge counts` |
| 29 | Version bump 4.0.0 → 5.0.0 + consolidated CHANGELOG | ✓ `plugin.json` bumped, README version badge bumped, Version History row added, marketplace root README updated (Config-Audit row v4.0.0 → v5.0.0 + counts), `## [5.0.0]` consolidated entry written from alpha.1/alpha.2/beta.1/rc.1 | `dcf8087``chore(config-audit): bump version to 5.0.0` |
| 30 | Final self-audit + SC-6b gate + tag | ✓ verdict PASS (config A 97/100, plugin A 100/100, readmeCheck PASS); SC-6b gate PASS at 0.85% delta; tag `config-audit/v5.0.0` created and pushed | `6cfca82``fix(config-audit): expose hotspot.path for --accurate-tokens calibration + SC-6b PASS` (incl. tag) |
### SC-6b release-gate outcome
- **PASS — verified at release time with live `ANTHROPIC_API_KEY`.**
- No tuning of `estimateTokens` heuristic required.
### Notable observations / deviations
- **Step 30 surfaced a latent N5 bug.** The rc.1 implementation of `--accurate-tokens` looked up `hotspot.path` but the scanner only emitted `source` — every iteration hit the `if (!hotspot?.path) continue` guard and `actual_tokens` stayed at 0. Detected when running the gate. Minimal fix: file-backed hotspots now expose `path: h.absPath` in the JSON output; MCP-server hotspots intentionally leave `path` unset. Tests updated coverage already in place; no test changes required (the bug was a missing field, not a logic error). After the fix, the calibration produced the expected 589 actual_tokens for CLAUDE.md.
- **Self-audit `--check-readme` now counts test cases by spawning `node --test`.** Slow (~16s on the full plugin) but produces the canonical test count (635) that matches the README badge. `countTestFiles` retained as fallback when the subprocess fails (timeout, parse failure).
- **`plugin-health-scanner.mjs` excluded from `countScannerShape`.** It exports `scan` but is documented under "Standalone Scanner" in README/CLAUDE.md and runs separately from `scan-orchestrator.mjs`. Aligning self-audit's counter with the human/badge taxonomy.
- **API key retrieved from macOS keychain** via `security find-generic-password -a ktg -s anthropic-api-key -w` per global CLAUDE.md convention. Key was masked to `sk-ant-a...` in all error paths (verified: tokenizer-api.mjs maskKey).
- **`sampled_hotspots: 3`** in the calibration JSON is slightly misleading — the slice length is 3 but only 1 had a readable path (other 2 are MCP virtuals). Substantive result is correct: 1 file-backed sample, 0.85% delta. A follow-up could change this to `samples_calibrated: actualCount` for clarity (v5.0.1 candidate).
- **`pre-commit-docs-gate` hook** did not trigger on Session 5 commits — all were `docs:`, `chore:`, or `fix:` types (gate only blocks `feat:`).
Generated by Wave 0 / Step 0 pre-flight on 2026-05-01.
This document is the authoritative change list for **Step 4** (replace title-string assertions with ID-based or shape-based assertions). Step 5 cannot wire the humanizer until every "WILL BREAK" entry below is converted.
## Classification key
- **(a) shape-only** — checks existence, type, or test-fixture input; not affected by humanization.
- **(b) literal-string WILL BREAK** — exact equality or substring match against scanner-produced title prose. Humanization rewrites these strings; the assertion must be re-anchored to `finding.id`, `finding.scanner`, or `finding.evidence`.
- **(c) ID-based** — already anchored on `finding.id` or scanner prefix. No change needed.
## Audit summary
| Test file | Matches | Will break (b) | Safe (a/c) |
| 46 | `assert.strictEqual(f.title, 'Test')` | (a) shape-only | None — `'Test'` is the test's own input to `finding()` constructor, not a scanner-produced title. |
### `tests/scanners/feature-gap-scanner.test.mjs`
| Line | Code | Class | Action |
|------|------|-------|--------|
| 45 | `f.title === 'No CLAUDE.md file'` | (b) WILL BREAK | Replace with `f.id === '<GAP-ID-for-no-CLAUDE.md>'`. Anchor on ID. |
| 49 | `f.title === 'No MCP servers configured'` | (b) WILL BREAK | Replace with ID anchor. |
| 53 | `f.title === 'No hooks configured'` | (b) WILL BREAK | Replace with ID anchor. |
| 96 | `f.title === 'No hooks configured'` | (b) WILL BREAK | Replace with ID anchor. |
| 100 | `f.title === 'No MCP servers configured'` | (b) WILL BREAK | Replace with ID anchor. |
| 150 | `f.title === 'No CLAUDE.md file'` | (b) WILL BREAK | Replace with ID anchor. |
> **Implementation note for Step 4:** look up the actual GAP finding IDs via `grep -n "title:" scanners/feature-gap-scanner.mjs` and substitute. For shape only: `assert.ok(f.id.startsWith('CA-GAP-'))` is acceptable when the test only cares that a GAP finding fired.
### `tests/scanners/hook-validator.test.mjs`
| Line | Code | Class | Action |
|------|------|-------|--------|
| 30 | `serious.map(f => f.title).join(', ')` | (a) shape-only | None — title used only for error-message formatting in failed assert; not the assertion itself. |
| 49 | `f.title === 'Unknown hook event'` | (b) WILL BREAK | Replace with ID anchor. |
| 54 | `f.title.includes('Matcher must be a string')` | (b) WILL BREAK | Replace with ID anchor or `.evidence.includes(...)`. |
| 59 | `f.title === 'Invalid hook handler type'` | (b) WILL BREAK | Replace with ID anchor. |
| 64 | `f.title.includes('timeout')` | (b) WILL BREAK | Replace with ID anchor. |
| 69 | `f.title === 'Unknown hook event'` | (b) WILL BREAK | Replace with ID anchor. |
| 80 | `/verbose hook output/i.test(x.title \|\| '')` | (b) WILL BREAK | Replace with ID anchor. |
| 81 | `result.findings.map(x => x.title).join(' \| ')` | (a) shape-only | Used only in error-message formatting. None. |
| 91 | `/verbose hook output/i.test(x.title \|\| '')` | (b) WILL BREAK | Replace with ID anchor. |
| 92 | `f?.title` | (a) shape-only | Used only in error-message formatting. None. |
### `tests/lib/diff-engine.test.mjs`
| Line | Code | Class | Action |
|------|------|-------|--------|
| 66 | `diff.newFindings[0].title === 'New issue'` | (a) shape-only | None — `'New issue'` is the test's synthetic finding input, not scanner-produced. |
| 52 | `f.title.includes('Missing required field')` | (b) WILL BREAK | Replace with ID anchor or `f.evidence.includes(...)`. |
| 59 | `f.title.includes('missing') && f.title.includes('section')` | (b) WILL BREAK | Replace with ID anchor on the missing-section finding. |
| 68 | `f.title.includes('Missing required field')` | (b) WILL BREAK | Replace with ID anchor. |
| 75 | `f.title === 'Missing CLAUDE.md'` | (b) WILL BREAK | Replace with ID anchor. |
| 82 | `f.title === 'Command missing frontmatter'` | (b) WILL BREAK | Replace with ID anchor. |
| 90 | `f.title.startsWith('Agent missing frontmatter field:')` | (b) WILL BREAK | Replace with ID anchor + `f.evidence.includes(...)` for the field name (humanizer preserves evidence). |
| 93 | `missingAgent.map(f => f.title).join(', ')` | (a) shape-only | Used only in error-message formatting. None. |
| 102 | `result.findings[0].title === 'No plugins found'` | (b) WILL BREAK | Replace with ID anchor. |
| 125 | `assert.ok(f.title)` | (a) shape-only | None — pure existence check. |
### `tests/scanners/settings-validator.test.mjs`
| Line | Code | Class | Action |
|------|------|-------|--------|
| 49 | `f.title === 'Unknown settings key'` | (b) WILL BREAK | Replace with ID anchor (likely `CA-SET-001` or similar — verify). |
| 54 | `f.title === 'Deprecated settings key'` | (b) WILL BREAK | Replace with ID anchor. |
| 59 | `f.title === 'Type mismatch in settings'` | (b) WILL BREAK | Replace with ID anchor. |
| 64 | `f.title === 'Invalid effortLevel value'` | (b) WILL BREAK | Replace with ID anchor. |
| 69 | `f.title.includes('array instead of object')` | (b) WILL BREAK | Replace with ID anchor. |
| 74 | `f.title.includes('array instead of object')` | (b) WILL BREAK | Replace with ID anchor. |
| 86 | `f.title === 'Unknown settings key' && /additionalDirectories/.test(f.evidence)` | (b) WILL BREAK | Keep evidence regex; replace title check with ID anchor. |
| 96 | `/additionalDirectories/i.test(x.title \|\| '')` | (b) WILL BREAK | Replace with ID anchor + evidence regex (additionalDirectories likely appears in evidence already). |
| 98 | `f?.title` | (a) shape-only — but inside breaking assertion | Will become moot after line 96 is fixed. |
| 106 | `/additionalDirectories/i.test(x.title \|\| '')` | (b) WILL BREAK | Replace with ID anchor + evidence regex. |
2. Replace the title check with `f.id === '<exact-id>'`. If the test cares about a sub-variant (e.g., a specific deprecated key), pair the ID anchor with an `f.evidence.includes(...)` substring check — humanizer preserves `evidence` exactly.
3. For broad categorical checks ("any GAP finding fired"), use `f.id.startsWith('CA-GAP-')`.
4. For tests that capture `f.title` only inside `assert` failure-message templates (class (a)): leave them. Humanization changes the displayed string but the assertion still anchors on `f.id`.
5. Re-run `node --test 'tests/**/*.test.mjs'` after changes; expect zero regressions before proceeding to Step 5.
## Total scope for Step 4
- **6 test files** require code changes (`output.test.mjs` and `diff-engine.test.mjs` are clean).
- **34 distinct assertions** to convert.
- Estimated effort: 1–2 hours including ID lookup and verification.
| 0.50–0.85 | Cache works but something near the prefix is shifting. Inspect first 30 lines of CLAUDE.md and any `@import`-ed file. |
| 0.20–0.50 | Cache is being broken most turns. Likely a volatile CLAUDE.md top-of-file (timestamp, session id, rolling activity log) or a `defaultMode` flip. Run `/config-audit tokens` to locate. |
| < 0.20 | Cache is essentially disabled. Either the prefix is rewritten every turn, or the session is so short caching never warmed up. |
### 4. Per-turn breakdown (for spotting the regression turn)
1. **Keep under 200 lines.** Claude's adherence drops on longer files. If the file exceeds 200 lines, extract sections with `@import`.
1. **Optimise for prompt-cache stability.** Place stable content in the first 30 lines (cache-friendly prefix); volatile content (timestamps, dynamic counts, rolling activity logs) goes below that threshold or moves to an `@import`-ed file outside the cache prefix. On Opus 4.7 the dominant cost lever is cache reuse, not file length.[^200lines]
2. **Use `@import` for specs/docs.**`@path/to/spec.md` inlines the file at session start. Max 5 hops. Keeps the main file scannable.
2. **Use `@import` for specs/docs.**`@path/to/spec.md` inlines the file at session start. Max 5 hops, but keep chains ≤ 2 hops — every `@import` boundary fragments the prompt-cache prefix. Keeps the main file scannable.
3. **Use HTML comments for maintainer notes.**`<!-- Updated 2026-01-01: reason -->` is stripped before context injection — zero token cost.
3. **Use HTML comments for maintainer notes.**`<!-- Updated 2026-01-01: reason -->` is stripped before context injection — zero token cost.
4. **Put personal dev notes in `CLAUDE.local.md`**, not `CLAUDE.md`. Add `CLAUDE.local.md` to `.gitignore`. Team members' sandbox URLs should never appear in git.
4. **Put personal dev notes in `CLAUDE.local.md`**, not `CLAUDE.md`. Add `CLAUDE.local.md` to `.gitignore`. Team members' sandbox URLs should never appear in git.
5. **Write `~/.claude/CLAUDE.md` for preferences that apply everywhere.** Communication style, preferred tools, output format — not project-specific config.
5. **Write `~/.claude/CLAUDE.md` for preferences that apply everywhere.** Communication style, preferred tools, output format — not project-specific config.
@ -91,3 +91,7 @@
3. **Use `additionalDirectories` for cross-repo work.** If Claude regularly reads `../shared-lib/`, add it: `{"additionalDirectories": ["../shared-lib/"]}`. Otherwise Claude can't access it without prompts.
3. **Use `additionalDirectories` for cross-repo work.** If Claude regularly reads `../shared-lib/`, add it: `{"additionalDirectories": ["../shared-lib/"]}`. Otherwise Claude can't access it without prompts.
4. **Configure `autoMode.environment` before using auto mode.** Without it, Claude's background safety classifier triggers false positives on your org's internal tool names and domains.
4. **Configure `autoMode.environment` before using auto mode.** Without it, Claude's background safety classifier triggers false positives on your org's internal tool names and domains.
5. **Add `Agent()` deny rules for sensitive agents.**`{"deny": ["Agent(general-purpose)"]}` prevents the most powerful agent from running without explicit permission.
5. **Add `Agent()` deny rules for sensitive agents.**`{"deny": ["Agent(general-purpose)"]}` prevents the most powerful agent from running without explicit permission.
---
[^200lines]: The "keep CLAUDE.md under 200 lines" threshold was a Sonnet-era adherence heuristic — Sonnet's attention quality dropped on longer files, so trimming raw line count was the optimisation lever. Opus 4.7 uses prompt-cache structure as the dominant cost driver: the first 30 lines must stay byte-stable across turns to keep the cache hit, and `@import` boundaries fragment the cached prefix. A 400-line CLAUDE.md with stable structure outperforms a 150-line file whose top contains a daily-rolling activity log. See `knowledge/opus-4.7-patterns.md` for detection IDs (CA-TOK-001..003).
> Timeline of major features, most recent first. Covers features with configuration impact.
> Timeline of major features, most recent first. Covers features with configuration impact.
> Source: Official Claude Code documentation, verified 2026-04-03.
> Source: Official Claude Code documentation, verified 2026-04-03; 2026-04 entries verified via research/03-claude-code-changes-config-surfaces.md (2026-04-19).
---
---
@ -9,6 +9,10 @@
| Approx. Date | Feature | Config Impact |
| Approx. Date | Feature | Config Impact |
|-------------|---------|---------------|
|-------------|---------|---------------|
| 2026-04 (v2.1.111) | **Opus 4.7 + token-efficiency surfaces** | New env vars `ENABLE_PROMPT_CACHING_1H`, `FORCE_PROMPT_CACHING_5M`, `CLAUDE_CODE_DISABLE_1M_CONTEXT`. New settings keys around `tui` / `autoScrollEnabled`. Granular commit attribution via `attribution.commit` / `attribution.pr` (replaces `includeCoAuthoredBy`). |
| Q1 2026 | **Agent Teams (experimental)** | Enable via `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` or env in settings.json. Configure display mode via `~/.claude.json``teammateMode`. Hooks: `TeammateIdle`, `TaskCreated`, `TaskCompleted`. |
| Q1 2026 | **Agent Teams (experimental)** | Enable via `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` or env in settings.json. Configure display mode via `~/.claude.json``teammateMode`. Hooks: `TeammateIdle`, `TaskCreated`, `TaskCompleted`. |
| Q1 2026 | **Elicitation events** | `Elicitation` and `ElicitationResult` hook events added. MCP servers can request user input; hooks control and log these requests. |
| Q1 2026 | **Elicitation events** | `Elicitation` and `ElicitationResult` hook events added. MCP servers can request user input; hooks control and log these requests. |
| Q1 2026 | **`SubagentStart` / `SubagentStop` hooks** | Added hook events for subagent lifecycle. `SubagentStop` is blocking — exit code 2 acts as a quality gate. |
| Q1 2026 | **`SubagentStart` / `SubagentStop` hooks** | Added hook events for subagent lifecycle. `SubagentStop` is blocking — exit code 2 acts as a quality gate. |
> All 26 hook events as of April 2026. Source: code.claude.com/docs/en/hooks.md
> All 26 hook events as of April 2026. Source: code.claude.com/docs/en/hooks.md
> Verified 2026-04-19 against research/03-claude-code-changes-config-surfaces.md — no new hook events introduced in v2.1.83–v2.1.111. Sandbox + managed-only flags (2026-04) operate at the settings layer, not as new hook events.
---
---
Some files were not shown because too many files have changed in this diff
Show more