From 4bd7cd50564aae46d833559dcf9dd86a462337ae Mon Sep 17 00:00:00 2001 From: Kjell Tore Guttormsen Date: Fri, 1 May 2026 06:10:44 +0200 Subject: [PATCH] docs(config-audit): v5.0.0 brief + implementation plan MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Planning artifacts for v5.0.0 (token-economy round): - v5-brief.md: scope brief with 22 items (F1-F7 + M1-M8 + N1-N7), revised with Avklaringer-section after critical review (N7 dropped, M3+N6 merged, N5 promoted to v5.0.0, SC-6/SC-10 reformulated) - v5-plan.md: 31-step implementation plan in 5 sessions (alpha.1 → alpha.2 → beta.1 → rc.1 → release). B+ score (84/100) after plan-critic + scope-guardian review addressed all blockers/majors/gaps. - v5-implementation-log.md: per-session status record (skeleton) Sessions track via state files (REMEMBER.md, TODO.md gitignored; implementation-log.md committed; NEXT-SESSION-PROMPT.local.md gitignored). No code changes in this commit — planning only. --- plugins/config-audit/docs/v5-brief.md | 186 +++ .../docs/v5-implementation-log.md | 71 ++ plugins/config-audit/docs/v5-plan.md | 1007 +++++++++++++++++ 3 files changed, 1264 insertions(+) create mode 100644 plugins/config-audit/docs/v5-brief.md create mode 100644 plugins/config-audit/docs/v5-implementation-log.md create mode 100644 plugins/config-audit/docs/v5-plan.md diff --git a/plugins/config-audit/docs/v5-brief.md b/plugins/config-audit/docs/v5-brief.md new file mode 100644 index 0000000..ccfa9cd --- /dev/null +++ b/plugins/config-audit/docs/v5-brief.md @@ -0,0 +1,186 @@ +# config-audit v5.0.0 — Brief + +**Status:** Final input til implementation planning (avklart 2026-05-01) +**Opprettet:** 2026-04-19 +**Utgangspunkt:** Kritisk review av v4.0.0 (Opus 4.7-perspektiv) +**Eier:** Kjell Tore Guttormsen + +--- + +## Avklaringer fra konsultasjon 2026-05-01 + +Disse avklaringene OVERSTYRER tilsvarende felter i seksjonene under. Brief-reviewer +fant 9 inkonsistenser/uklarheter; brukerens beslutninger er kodifisert her. + +### Scope-justeringer + +- **N7 droppes fra v5.0.0.** Flyttes til "post-v5.0.0 stretch" (krever transcript-parsing + som motsier non-goals; data-tilgang må løses separat). SC-12 utgår. +- **M3 og N6 slås sammen til N6.** M3 fjernes fra should-fix-listen. N6 flyttes + fra `rc.1` til `beta.1`. Nytt finding-prefix: `CA-COL-001`. +- **N5 flyttes inn i v5.0.0** (fra v5.1.0) — beholdes som opt-in via `--accurate-tokens`. + Hvis `ANTHROPIC_API_KEY` mangler: warn + graceful fallback til zero-deps-heuristikk. + Bruker Anthropic `POST /v1/messages/count_tokens`-endepunktet. + +### Korrigerte fil/linje-referanser + +- **F7:** Severity-assignments er på 4 linjer (270, 299, 321, 338) i `token-hotspots.mjs`, + ikke linje 298. Alle fire patterns må rekalibreres mot tokens/tur. +- **F3:** Krever `import { riskScore } from './severity.mjs'` i `scoring.mjs` + (WEIGHTS bor i severity.mjs, ikke scoring.mjs). +- **F2:** Hovedbug er caller-side: `whats-active.mjs` og lignende sender `kind='item'` + for MCP-servere. Fix krever både ny `'mcp'`-kind i `estimateTokens` OG endrede caller-kall. + +### Reviderte success criteria + +- **SC-4:** Avhenger av `--check-readme`-flagg som F6 bygger. Sjekkbar først etter `alpha.2`. +- **SC-6 splittes i to:** + - **SC-6a:** `node scanners/manifest.mjs ` returnerer rangert kilde-tokens-liste + med korrekt struktur (uavhengig av tokenizer-presisjon). + - **SC-6b:** Med `--accurate-tokens`: byte-estimat innen ±5% av Anthropic count_tokens-API. +- **SC-10 erstattes:** I stedet for "≥600 tester totalt", krev: alle 543 v4.0.0-tester + fortsatt grønne + ≥1 fixture-backet test per ny scanner-funksjon (N1-N4, N6) og per + strukturell endring (F1, F2, F3, M1-M6). +- **SC-11 (ny):** `node scanners/token-hotspots-cli.mjs --accurate-tokens` exit 0 + + output har `calibration.actual_tokens`-felt når API-key finnes; `calibration.skipped: "no-api-key"` + når ikke. + +### Mindre justeringer + +- **M1 (MCP tool-count):** Når `tools/list` ikke kan kjøres, fall back til: + npm-pakke → les `package.json` `tools`-felt; cached `tools/list`-respons; ellers flag + "tool count unknown" som finding (ikke skip). +- **N1 backward-compat:** Eksisterende `CA-TOK-*`-globs i `.config-audit-ignore` vil + suppressere det nye `CA-TOK-005`. Flagg eksplisitt i CHANGELOG som "kjent breaking + change for glob-suppressions". + +### Revidert release-plan (autoritativ) + +- **v5.0.0-alpha.1** — F1-F5 (TOK-rensing + estimateTokens-fix + scoring-severity-fix). +- **v5.0.0-alpha.2** — M1, M2, M4-M6 (M3 fjernet) + F6, F7. +- **v5.0.0-beta.1** — N1, N2, N3, N4, N6 (collision-scanner flyttet hit fra rc.1). +- **v5.0.0-rc.1** — M7, M8 + N5 (tokenizer-kalibrering). +- **v5.0.0** — Full suite grønn, README oppdatert, CHANGELOG, versjonssync, self-audit grade A. +- **v5.1.0+ (post-release)** — N7 (cache-hit-digest) når data-tilgang er løst. + +--- + +## 1. Hvorfor v5.0.0 + +v4.0.0 markedsfører seg som "Opus 4.7-aware token optimization" (TOK-scanner, `/config-audit tokens`, Token Efficiency som 8. kvalitetsområde). Kritisk review viser at markedsføringen ikke holder: + +- TOK-scanneren importerer `readActiveConfig` og bruker den eksplisitt ikke (`void readActiveConfig` i `scanners/token-hotspots.mjs:31`) — scanneren ser aldri på plugins, skills, MCP-servere eller CLAUDE.md-kaskade som aggregert token-kost. +- 4 TOK-mønstre dekker 29% av 14 identifiserte Opus 4.7-kostdrivere. De største sinkene (MCP tool-schema-eksplosjon, skill-description-bloat, CLAUDE.md-kaskade-sum) har null dekning. +- `estimateTokens` (`scanners/lib/active-config-reader.mjs:29-39`) flater MCP-servere og hooks til 15 tokens hver. En bruker med 5 MCP-servere får rapportert 75 tokens der virkeligheten er 10-20k. +- Area-score ignorerer severity helt (`scanners/lib/scoring.mjs:184`): 1 kritisk og 1 info gir identisk areascore. +- Pattern D (`detectSonnetEra`) motsier pluginens egen v3.0-policy om at minimalt korrekt oppsett = Grade A. + +Resten av pluginen (8 strukturelle scannere, backup/rollback, suppression, plugin-health) fungerer og skal ikke rives ned. v5.0.0 er en token-economy-runde, ikke en totalombygging. + +--- + +## 2. Mål for v5.0.0 + +**Primært:** Gjøre pluginens token-optimalisering reality-based. Etter v5.0.0 skal en bruker som kjører `/config-audit tokens` få konkret, kalibrert innsikt i hva som faktisk koster tokens i deres oppsett — MCP, skills, CLAUDE.md-kaskade, hooks inkludert. + +**Sekundært:** +- Severity reflekterer estimert tokens/tur, ikke "hvor trivielt mønsteret er å detektere". +- Area-score tar hensyn til severity. +- README/CLAUDE.md-tall samsvarer med faktisk kode. +- Knowledge-basen reflekterer Opus 4.7-prioriteringer (cache-reuse og schema-disiplin), ikke Sonnet-æra-"tokens er billige". + +**Ikke-mål:** +- Runtime-telemetri som kjernefunksjon (bare som opt-in recipe; krever transcript-parsing). +- Full tiktoken-bundling (opt-in `--accurate-tokens` via API er akseptabelt; default skal være zero-deps-heuristikk). +- Kryssrepo-benchmarking eller cloud-telemetri. +- Endringer i secret/credential-scanning-scope (fortsatt delegert til llm-security). + +--- + +## 3. Scope + +### Must-fix (7 kritiske) + +| ID | Fil/linje | Hva | +|----|-----------|-----| +| F1 | `scanners/token-hotspots.mjs:31` | TOK må faktisk bruke `readActiveConfig` — ikke bare importere den | +| F2 | `scanners/lib/active-config-reader.mjs:29-39` | `estimateTokens` må type-differensiere MCP/hooks, ikke flat 15 tokens | +| F3 | `scanners/lib/scoring.mjs:184` | Area-score må vekte findings etter severity (gjenbruk `riskScore`-WEIGHTS) | +| F4 | `scanners/token-hotspots.mjs:202-229` | Fjern død `take`-logikk + fabrikerte hotspot-padding-entries | +| F5 | `scanners/token-hotspots.mjs:166-178` | Fjern pattern D (`detectSonnetEra`) eller flytt bak `--suggest-features` | +| F6 | `README.md:15,86,111,280,459-474` + `CLAUDE.md` | Legg til self-audit som verifiserer README-tall mot kode | +| F7 | `scanners/token-hotspots.mjs:298` | Severity må følge tokens/tur, ikke detektor-kompleksitet | + +### Should-fix (8 mangler) + +| ID | Hva | +|----|-----| +| M1 | MCP tool-count per server (parse manifest/`tools/list`, flagg > 15 tools) | +| M2 | Skill-description-lengde (frontmatter, ikke body) — flagg > 500 tegn | +| M3 | Plugin-skill/command-kollisjoner på tvers av aktive plugins | +| M4 | CLAUDE.md-kaskadens totalsum eksponert til TOK — flagg > 10k tokens | +| M5 | Hook-stdout/`additionalContext`-størrelse — flagg hooks som skriver > 50 linjer | +| M6 | `additionalDirectories` inn i `KNOWN_KEYS` + flagg > 2 entries | +| M7 | Cache-telemetri-recipe i knowledge/ + `/config-audit tokens --with-telemetry-recipe` | +| M8 | Knowledge-base-rensing: flytt Sonnet-æra-råd (adherence-basert 200-linjer-grense, kosmetiske tier-3-gaps) mot Opus 4.7-prioriteringer | + +### Nye features (prioritert) + +| # | Feature | Begrunnelse | +|---|---------|-------------| +| N1 | **MCP Tool-Schema Budget Scanner** — ny finding `CA-TOK-005` | Største token-sink; 10-20k/tur-potensial | +| N2 | **System-Prompt Manifest** — `/config-audit manifest`-kommando | Gjør alle andre TOK-findings forståelige | +| N3 | **Cache-Prefix Stability Analyzer** | Klassifiser segmenter som stable/volatile, ikke bare topp-30-linjer | +| N4 | **Disabled-Tools-Still-In-Schema Detector** | Vanlig mønster: denied tools lastes i schema likevel | +| N5 | **Live Tokenizer Calibration** (`--accurate-tokens`, opt-in) | Senker ±20%-usikkerheten til ±5% for brukere som godtar API-kall | +| N6 | **Cross-Plugin Skill/Command Collision Scanner** | Korrekthet ved heavy plugin use (relevant for KTG med 8 plugins) | +| N7 | **Cache-Hit-Rate Session Digest** — `/config-audit cache-digest` | Eneste sannhetskilde for om token-optimalisering faktisk virker | + +--- + +## 4. Success criteria (testbare) + +Etter v5.0.0 skal følgende kunne verifiseres: + +1. **TOK bruker `readActiveConfig`.** `grep -n "readActiveConfig(" scanners/token-hotspots.mjs` må vise minst ett faktisk kall, ikke bare `void`. +2. **`estimateTokens` differensierer.** Unit test: MCP-server med 10 tools returnerer > 2000 estimerte tokens, ikke 15. +3. **Area-score reagerer på severity.** Unit test: 1 critical gir lavere score enn 5 lows, holder alt annet likt. +4. **README-tall matcher kode.** `node scanners/self-audit.mjs --check-readme` exit-code 0 — sjekker testfil-count, scanner-count, command-count, agent-count, hook-count, knowledge-count mot README-badges. +5. **MCP tool-count flagges.** Fixture med `.mcp.json` pluss `tools/list`-mock med 20 tools: TOK-scanner produserer `CA-TOK-005` finding. +6. **System-prompt-manifest fungerer.** `node scanners/manifest.mjs ` returnerer en rangert liste med kilde + tokens DESC, totalt innenfor ±20% av faktisk summert byte-estimat. +7. **Cache-prefix-analyse.** CLAUDE.md med volatile midt-seksjon genererer finding, ikke bare hvis volatilitet er i topp-30. +8. **Kollisjons-scanner.** Fixture med to plugins som begge eksponerer skill `review`: collision-finding produseres. +9. **Knowledge-basen oppdatert.** Grep etter "Keep under 200 lines" (Sonnet-æra-formulering) i `knowledge/configuration-best-practices.md` returnerer 0 — erstattet av cache-stabilitets-rettet guidance. +10. **Suite-helse.** `node --test 'tests/**/*.test.mjs'` ≥ 600 tester grønne (fra 543 i v4.0.0). Ny scanner-funksjonalitet har fixture-dekning. + +--- + +## 5. Risikoer og avhengigheter + +- **Tokenizer-kalibrering** — ingen zero-deps-tokenizer gir 100% nøyaktighet. Godta ±20% default; markér opt-in `--accurate-tokens` som eksperimentell. +- **MCP `tools/list`-tilgang** — krever kjørende MCP-server. Fallback: parse serverens manifest hvis det finnes, ellers bruk cache/estimat. +- **Schema-drift på `.claude.json`-format** — Anthropic kan endre formatet. `readClaudeJsonProjectSlice` har allerede longest-prefix-matching; nye felter må detekteres robust. +- **Breaking changes** — v5.0.0 er major bump. TOK-finding-IDer består (`CA-TOK-001..004`), nye legges til fra `CA-TOK-005`. Suppression-filer fra v4.x skal fortsatt fungere. +- **Self-audit-failure etter bump** — README-sjekken (F6) kan feile ved første push. Godta midlertidig rød self-audit under v5-arbeid; krav om grønn før release-tag. + +--- + +## 6. Release-plan (high-level) + +- **v5.0.0-alpha.1** — F1-F5 (TOK-scanner-rensing + estimateTokens-fix + scoring-severity-fix). +- **v5.0.0-alpha.2** — M1-M6 (manglende strukturelle sjekker) + F6-F7 (README-sync + severity-rekalibrering). +- **v5.0.0-beta.1** — N1-N4 (MCP budget, manifest, cache-prefix, disabled-in-schema). +- **v5.0.0-rc.1** — M7-M8 (knowledge-basens opus-4.7-rensing) + N6 (collision-scanner). +- **v5.0.0** — Full suite grønn, README oppdatert, CHANGELOG, versjonssync, selv-audit grade A. +- **v5.1.0** (post-release) — N5 (tokenizer) + N7 (cache-hit-digest) som opt-in features. + +--- + +## 7. Referanser + +- **Kritisk review (full):** inline i sesjonen 2026-04-19 (KTG-konsultasjon, Opus 4.7-perspektiv). +- **TOK-scanner:** `scanners/token-hotspots.mjs` +- **Token-heuristikk:** `scanners/lib/active-config-reader.mjs` + `knowledge/opus-4.7-patterns.md` +- **Area-scoring:** `scanners/lib/scoring.mjs` +- **Aktiv v4.0.0:** `README.md`, `CLAUDE.md` +- **Opus 4.7-dekningskartlegging:** reviewets "Mangler"-seksjon (14 punkter, 10 udekkede). diff --git a/plugins/config-audit/docs/v5-implementation-log.md b/plugins/config-audit/docs/v5-implementation-log.md new file mode 100644 index 0000000..d94beb3 --- /dev/null +++ b/plugins/config-audit/docs/v5-implementation-log.md @@ -0,0 +1,71 @@ +# config-audit v5.0.0 — Implementation Log + +Per-session record of what was done, what was deferred, and what failed. +Written at the end of each session. State for the next session lives in +`NEXT-SESSION-PROMPT.local.md` (gitignored). + +--- + +## Planning session (2026-05-01) + +**Outcome:** Plan ready for execution. + +**Completed:** +- Read `v5-brief.md` (drafted 2026-04-19) +- Brief reviewer ran — 5 findings requiring user input +- User decisions captured: + - N7 (cache-hit-digest) dropped from v5.0.0 — moved to post-release + - N5 (live tokenizer) moved into v5.0.0 with warn-and-fallback + - M3 merged into N6 (single collision scanner) + - M1 manifest-fallback approach approved (cache → package.json → "tool count unknown" finding) + - SC-6 split to 6a/6b + - SC-10 replaced with per-feature coverage requirement + - N1 backward-compat for `CA-TOK-*` glob suppression flagged in CHANGELOG +- Brief revised with "Avklaringer fra konsultasjon 2026-05-01" section (authoritative) +- Exploration: 7 parallel agents (architecture, task-finder, dependency-tracer, risk-assessor, test-strategist, git-historian, convention-scanner) +- Plan written: `docs/v5-plan.md` — 31 steps in 5 sessions +- Adversarial review: plan-critic verdict REPLAN (Grade C, 5 blockers + 8 majors); scope-guardian MIXED (4 gaps) +- Plan revised to address all 5 blockers + 8 majors + 4 scope-gaps; new score B+ (84/100) + +**Open assumptions** (carry into execution): +1. Anthropic `count_tokens` endpoint accepts plain-text payload, returns `{input_tokens: number}` (Step 26) +2. MCP servers expose tool count via `tools/list` or `package.json` `tools` field (Steps 14, 18) +3. `readActiveConfig` performant enough for TOK at scale (Step 6) +4. Cross-plugin namespace model — to be verified by Step 22a research spike before Step 22b +5. `baseline-all-a` fixture is genuinely info-only after F3 — Step 3 audit verifies + +**Next session:** Session 1 — alpha.1 (F1-F5 + reference cleanup). See `NEXT-SESSION-PROMPT.local.md`. + +--- + +## Session 1 — alpha.1 (TBD) + +*Start when ready. Replace this stub with actual log at session end.* + +**Steps planned:** 1-9 (incl. 8b) + +**Branch strategy:** direct-to-main (Forgejo, pre-authorized). + +--- + +## Session 2 — alpha.2 (TBD) + +*Steps 10-17.* + +--- + +## Session 3 — beta.1 (TBD) + +*Steps 18, 19, 20, 21, 22a, 22b, 23.* + +--- + +## Session 4 — rc.1 (TBD) + +*Steps 24-27.* + +--- + +## Session 5 — release (TBD) + +*Steps 28-30, including SC-6b release gate.* diff --git a/plugins/config-audit/docs/v5-plan.md b/plugins/config-audit/docs/v5-plan.md new file mode 100644 index 0000000..8176f95 --- /dev/null +++ b/plugins/config-audit/docs/v5-plan.md @@ -0,0 +1,1007 @@ +# config-audit v5.0.0 — Implementation Plan + +> **Plan quality:** B+ (84/100) — adversarial review complete, revisions applied +> +> Generated by ultraplan-local v3.0.0 on 2026-05-01 — `plan_version: 1.7` +> Source brief: `docs/v5-brief.md` (revised 2026-05-01) +> Revised after plan-critic + scope-guardian review on 2026-05-01 (see Revisions section) + +## Context + +config-audit v4.0.0 markets itself as "Opus 4.7-aware token optimization" but the +critical review (briefed 2026-04-19, revised 2026-05-01) shows the marketing does not hold: + +- TOK scanner imports `readActiveConfig` and explicitly voids it (`void readActiveConfig` + at `scanners/token-hotspots.mjs:31`) — never sees plugins, skills, MCP, or cascade. +- 4 TOK patterns cover ~29% of identified Opus 4.7 cost drivers; the largest sinks + (MCP tool-schema bloat, skill-description bloat, CLAUDE.md cascade total) have zero coverage. +- `estimateTokens` flattens MCP servers and hooks to 15 tokens each via three caller sites + passing `kind='item'` (`active-config-reader.mjs:556, 593, 618`). Reality is 2k–20k per MCP. +- `scoreByArea` treats severities equally: 1 critical and 1 info produce identical area score. +- Pattern D (`detectSonnetEra`) contradicts the plugin's own v3.0 policy + (minimal correct = Grade A). + +v5.0.0 is a **token-economy round**, not a rewrite. The 8 structural scanners, +backup/rollback, suppression, plugin-health all stay. The TOK scanner is reworked +in place; new scanners (MCP budget, manifest, cache-prefix, disabled-in-schema, +collision) are added; severity-aware scoring lands; tokenizer calibration via +Anthropic `count_tokens` ships as opt-in. + +## Architecture Diagram + +```mermaid +graph TD + subgraph "v5.0.0 changes" + TOK[token-hotspots.mjs
F1/F4/F5/F7] + ACR[active-config-reader.mjs
F2 + 'mcp' kind] + SCR[scoring.mjs
F3 severity-weighted] + SVR[severity.mjs
WEIGHTS export] + SAU[self-audit.mjs
F6 --check-readme] + ORC[scan-orchestrator.mjs
register new scanners] + CLI[token-hotspots-cli.mjs
N5 --accurate-tokens] + + MCB[NEW: mcp-budget-scanner.mjs
N1 CA-TOK-005] + MAN[NEW: manifest.mjs
N2 CLI + scanner] + CPS[NEW: cache-prefix-scanner.mjs
N3] + DIS[NEW: disabled-in-schema-scanner.mjs
N4] + COL[NEW: collision-scanner.mjs
N6 CA-COL-001] + + SET[settings-validator.mjs
M6 additionalDirectories] + KB[knowledge/
M7 cache-recipe + M8 rensing] + + ORC --> TOK & MCB & CPS & DIS & COL + TOK --> ACR + SCR --> SVR + ACR -. F2 fix .-> ACR + CLI --> ACR + SAU --> README + end +``` + +## Codebase Analysis + +- **Tech stack:** Node.js >= 18, ES modules (.mjs), `node:test`, zero external deps +- **Test framework:** `node:test` + `node:assert/strict` — 543 tests across 31 files in v4.0.0 +- **Key patterns:** + - Scanner orchestrator + shared `discovery` object (`scan-orchestrator.mjs:73`) + - Finding factory `finding({scanner, severity, ...})` produces `CA-{SCANNER}-{NNN}` IDs + (`output.mjs:31`); counter is process-global, reset per scan + - CLI direct-run guard pattern via `import.meta.url` + - Manual argv parsing — no external libs + - Test fixtures under `tests/fixtures//` +- **Relevant files (verified):** + - `scanners/token-hotspots.mjs` (lines 31, 166-178, 202-229, 270, 299, 321, 338) + - `scanners/lib/active-config-reader.mjs` (lines 29-39, 556, 593, 618) + - `scanners/lib/scoring.mjs` (lines 6, 169-200, 184) + - `scanners/lib/severity.mjs` (lines 14, 21-27) + - `scanners/scan-orchestrator.mjs` (lines 18-58) + - `scanners/self-audit.mjs` (lines 154-177) + - `scanners/settings-validator.mjs` (lines 16-35) + - `scanners/lib/suppression.mjs` (lines 117-128) + - `knowledge/configuration-best-practices.md` (line 9) + - `knowledge/opus-4.7-patterns.md` (1-57) + - `tests/lib/active-config-reader.test.mjs`, `tests/lib/scoring.test.mjs`, + `tests/scanners/token-hotspots.test.mjs`, `tests/scanners/posture-grade-stability.test.mjs` +- **Reusable code:** + - `tokenKind()` at `token-hotspots.mjs:54-63` — extend to map MCP types + - `enumeratePlugins()` at `active-config-reader.mjs:262-305` — for N6 collision + - `countPluginItems()` + `findSkillMdFiles()` at `active-config-reader.mjs:332-399` — for N6 + - `parseFrontmatter()` from `lib/yaml-parser.mjs` — for M2 skill description + - `discoverConfigFiles()` from `lib/file-discovery.mjs` — for new CLIs + - `buildRichRepo()` test helper at `tests/lib/active-config-reader.test.mjs` — extend for MCP fixtures + - `runScanner()` helper pattern in `tests/scanners/token-hotspots.test.mjs` — model for new scanner tests +- **External tech (researched):** Anthropic `POST /v1/messages/count_tokens` endpoint for N5 (rate-limited 1000 req/min) +- **Recent git activity:** + - `token-hotspots.mjs`, `active-config-reader.mjs`, `severity.mjs` are all single-commit cold files (born in 4f1cc7e or a090ed3, never revised) + - `scoring.mjs` has 2 commits (born + TOK wiring) + - Single owner (KTG); no concurrent branches; all work merges to main + - **Straggler-sweep risk:** 4 historical events where badge counts/area counts drifted across multiple files in a single feature batch — must plan dedicated doc-consistency pass + +## Research Sources + +| Technology | Source | Key Findings | Confidence | +|-----------|--------|--------------|------------| +| Anthropic count_tokens API | Anthropic public docs | `POST /v1/messages/count_tokens` returns `{input_tokens: number}`; 1000 req/min rate limit; requires `ANTHROPIC_API_KEY` | high | +| MCP tool count detection | MCP spec | `tools/list` requires running server; package.json `tools` field is convention-only, not standard | medium | + +## Implementation Plan + +Steps grouped by release stage. Each step has manifest, on-failure, checkpoint. +Steps within a stage may be reordered if test gates allow; cross-stage ordering is fixed. + +--- + +### STAGE alpha.1 — TOK rensing + scoring/estimateTokens fix (F1-F5) + +#### Step 1: Export `WEIGHTS` and `riskScore` from severity.mjs (F3 prep) + +- **Files:** `scanners/lib/severity.mjs` +- **Changes:** Promote `WEIGHTS` const to named export. Verify `riskScore` already exported. +- **Reuses:** Existing `WEIGHTS = { critical: 25, high: 10, medium: 4, low: 1, info: 0 }` at line 14. +- **Test first:** `tests/lib/severity.test.mjs` — assert `WEIGHTS.critical === 25` via named import. +- **Verify:** `node --test tests/lib/severity.test.mjs` → PASS +- **On failure:** revert (single-file change) +- **Checkpoint:** `git commit -m "feat(config-audit): export WEIGHTS from severity.mjs (v5 F3 prep)"` +- **Manifest:** + ```yaml + expected_paths: [scanners/lib/severity.mjs] + must_contain: + - { path: scanners/lib/severity.mjs, pattern: "export const WEIGHTS" } + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 2: Severity-weighted `scoreByArea` (F3) + +- **Files:** `scanners/lib/scoring.mjs`, `tests/lib/scoring.test.mjs` +- **Changes:** + 1. Add `import { WEIGHTS, riskScore } from './severity.mjs'` + 2. Rewrite `scoreByArea` non-GAP path (lines 182-186) to penalize via severity-weighted sum: + `penalty = sum(count[s] * WEIGHTS[s]) / maxBudget; passRate = max(0, 100 - penalty)` + 3. Add `scoringVersion: 'v5'` to returned struct (for cross-version drift detection) +- **Reuses:** `WEIGHTS` from Step 1; existing GAP-tier logic untouched. +- **Test first:** Add `describe('scoreByArea — severity weighting')` with new factory `makeScannerResultWithSeverities`. Assert: 1 critical → score < 5 lows → score; clean → 100/A. +- **Verify:** `node --test tests/lib/scoring.test.mjs tests/scanners/posture-grade-stability.test.mjs` → PASS +- **On failure:** revert + re-evaluate maxBudget formula. Likely tweak: `maxBudget = max(10, findingCount * 4)`. +- **Checkpoint:** `git commit -m "feat(config-audit): severity-weighted scoreByArea (v5 F3)"` +- **Manifest:** + ```yaml + expected_paths: [scanners/lib/scoring.mjs, tests/lib/scoring.test.mjs] + must_contain: + - { path: scanners/lib/scoring.mjs, pattern: "import.*WEIGHTS.*riskScore" } + - { path: scanners/lib/scoring.mjs, pattern: "scoringVersion" } + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 3: Audit `baseline-all-a` fixture for F3 compatibility + +- **Files:** `tests/fixtures/baseline-all-a/` (read-only audit) +- **Changes:** Run scoring against fixture; if any non-info findings drop below 90 score after F3, document and either (a) update fixture to truly minimal correct config, or (b) update test expectations to match v5 semantics with explanatory comment. +- **Reuses:** Existing fixture. +- **Test first:** `tests/scanners/posture-grade-stability.test.mjs` already asserts grade A on this fixture; if it fails after Step 2, fix fixture. +- **Verify:** `node --test tests/scanners/posture-grade-stability.test.mjs` → PASS +- **On failure:** retry — tweak fixture to be truly clean (remove any medium+ findings). +- **Checkpoint:** `git commit -m "test(config-audit): align baseline-all-a fixture with v5 scoring"` +- **Manifest:** + ```yaml + expected_paths: [tests/scanners/posture-grade-stability.test.mjs] + commit_message_pattern: "^(test|fix|chore)\\(config-audit\\):" + ``` + +#### Step 4: Add `'mcp'` kind to `estimateTokens` (F2 — function side) + +- **Files:** `scanners/lib/active-config-reader.mjs`, `tests/lib/active-config-reader.test.mjs` +- **Changes:** Extend `estimateTokens(bytes, kind)` (lines 29-39): + - new branch `kind === 'mcp'`: if `bytes > 0` use `ceil(bytes / 3.5)` (json-rate); else `500` (base overhead floor) + - Optional second arg `toolCount` via overload: `estimateTokens(bytes, 'mcp', {toolCount}) → max(base, toolCount * 200)` +- **Reuses:** Existing `'json'` and `'item'` branches as patterns. +- **Test first:** Add cases: `'mcp'` with 0 bytes → ≥500; `'mcp'` with `{toolCount: 10}` → ≥2000; ratio `mcp / item` ≥ 30 for 10-tool server. +- **Verify:** `node --test tests/lib/active-config-reader.test.mjs` → PASS +- **On failure:** revert. Adjust formula if test thresholds unrealistic — but keep the order-of-magnitude differentiation. +- **Checkpoint:** `git commit -m "feat(config-audit): add 'mcp' kind to estimateTokens (v5 F2)"` +- **Manifest:** + ```yaml + expected_paths: [scanners/lib/active-config-reader.mjs, tests/lib/active-config-reader.test.mjs] + must_contain: + - { path: scanners/lib/active-config-reader.mjs, pattern: "kind === 'mcp'" } + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 5: Migrate MCP/hook callers to use `'mcp'` kind (F2 — caller side) + +- **Files:** `scanners/lib/active-config-reader.mjs` +- **Changes:** Three call sites: + - Line 556 (`collectHookEntries`): keep `'item'` for hooks (hooks don't have schemas) but pass actual byte size when available. + - Line 593 (`collectMcpFromFile`): `kind='mcp'`, pass `{ toolCount: server.tools?.length ?? 0 }` (will be 0 until N1 wires tool detection — that's fine; base 500 still beats flat 15). + - Line 618 (`readActiveMcpServers` from .claude.json): same as 593. +- **Reuses:** New `'mcp'` kind from Step 4. +- **Test first:** Extend `tests/lib/active-config-reader.test.mjs` `buildRichRepo` to include MCP servers; assert returned `mcpServers[].estimatedTokens >= 500` (not 15). +- **Verify:** `node --test tests/lib/active-config-reader.test.mjs` → PASS +- **On failure:** revert. Re-check call sites if test still shows 15. +- **Checkpoint:** `git commit -m "fix(config-audit): MCP token callers use 'mcp' kind (v5 F2)"` +- **Manifest:** + ```yaml + expected_paths: [scanners/lib/active-config-reader.mjs, tests/lib/active-config-reader.test.mjs] + forbidden_paths: [] + commit_message_pattern: "^fix\\(config-audit\\):" + ``` + +#### Step 6: Wire `readActiveConfig` into TOK scanner (F1) + +- **Files:** `scanners/token-hotspots.mjs`, `tests/scanners/token-hotspots.test.mjs`, `tests/fixtures/tok-active-config/` *(new)* +- **Changes:** + - Remove `void readActiveConfig;` at line 31. + - Inside `scan(targetPath, discovery)`: call `await readActiveConfig(targetPath, {})` once; if it throws (non-git target), catch and continue with `discovery`-only behavior. Merge its `mcpServers`, `plugins`, `skills`, `claudeMd.estimatedTokens` into hotspot ranking input. + - Add new finding source category `'mcp-server'`, `'plugin'`, `'skill'` for hotspots. + - **Unify token estimation paths:** the `tokenKind()` mapper at line 54-63 is used for `discovery.files`. After Step 5, MCP files in discovery still map to `'json'` while MCP servers from `readActiveConfig` use `'mcp'`. Within TOK, prefer `readActiveConfig` data for MCP/skills/plugins; fall back to `discovery` only for files not covered by `readActiveConfig` (e.g., loose `claude.json`). Document in a 1-line comment. +- **Reuses:** `readActiveConfig` shape from `active-config-reader.mjs:738-827`. +- **Test first:** New fixture `tok-active-config/` with `.mcp.json` (2 servers), `CLAUDE.md`, and `.claude-plugin/plugin.json` + `commands/sample.md` (plugin-skeleton). New describe block: assert (a) `hotspots.some(h => h.source.includes('mcp'))`; (b) total estimated tokens > minimal-project total; (c) `claudeMd.estimatedTokens > 0` is observable when readActiveConfig was called. +- **Verify:** `node --test tests/scanners/token-hotspots.test.mjs` → PASS +- **On failure:** revert. Common cause: `readActiveConfig` requires git root; the try/catch above handles this. Verify discovery-only fallback path works. +- **Checkpoint:** `git commit -m "feat(config-audit): TOK consumes readActiveConfig (v5 F1)"` +- **Manifest:** + ```yaml + expected_paths: + - scanners/token-hotspots.mjs + - tests/scanners/token-hotspots.test.mjs + - tests/fixtures/tok-active-config/.mcp.json + - tests/fixtures/tok-active-config/CLAUDE.md + - tests/fixtures/tok-active-config/.claude-plugin/plugin.json + - tests/fixtures/tok-active-config/commands/sample.md + must_contain: + - { path: scanners/token-hotspots.mjs, pattern: "readActiveConfig\\(targetPath" } + forbidden_paths: [] + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 7: Remove `take` dead-code and hotspot padding (F4) + +- **Files:** `scanners/token-hotspots.mjs`, `tests/scanners/token-hotspots.test.mjs` +- **Changes:** Delete `take` computation (lines 202-205) and padding while-loop (lines 219-229). Replace with: `return ranked.slice(0, HOTSPOTS_MAX)` and accept that fewer than `HOTSPOTS_MIN` may be returned for small projects. +- **Reuses:** `HOTSPOTS_MAX` constant. +- **Test first:** Add assertion: every `hotspot.source` is unique; `hotspots.length <= discovery.files.length`. +- **Verify:** `node --test tests/scanners/token-hotspots.test.mjs` → PASS +- **On failure:** revert. If hotspots-contract test breaks because some test expects min count, update test to allow fewer. +- **Checkpoint:** `git commit -m "fix(config-audit): remove TOK dead take + hotspot padding (v5 F4)"` +- **Manifest:** + ```yaml + expected_paths: [scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs] + must_contain: + - { path: scanners/token-hotspots.mjs, pattern: "ranked\\.slice\\(0, HOTSPOTS_MAX\\)" } + commit_message_pattern: "^fix\\(config-audit\\):" + ``` + +#### Step 8: Remove Pattern D `detectSonnetEra` (F5) + +- **Files:** `scanners/token-hotspots.mjs`, `tests/scanners/token-hotspots.test.mjs` +- **Changes:** Delete `detectSonnetEra()` function (lines 166-178) and its finding emission (lines 335-350). Pattern D and `CA-TOK-004` no longer exist. +- **Reuses:** — +- **Test first:** Update `opus-47/sonnet-era` describe block: assert `result.findings.every(f => f.id !== 'CA-TOK-004')` AND that the existing fixture now produces zero TOK findings. +- **Verify:** `node --test tests/scanners/token-hotspots.test.mjs` → PASS AND `! grep -q "detectSonnetEra" scanners/token-hotspots.mjs` +- **On failure:** revert. CA-TOK-004 may still exist if any other path emits it; grep confirms none. +- **Checkpoint:** `git commit -m "feat(config-audit): remove TOK Pattern D detectSonnetEra (v5 F5)"` +- **Manifest:** + ```yaml + expected_paths: [scanners/token-hotspots.mjs] + forbidden_paths: [] + must_not_contain: + - { path: scanners/token-hotspots.mjs, pattern: "detectSonnetEra" } + - { path: scanners/token-hotspots.mjs, pattern: "CA-TOK-004" } + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 8b: Sweep CA-TOK-004 references from docs after F5 + +- **Files:** `commands/tokens.md`, `knowledge/opus-4.7-patterns.md` +- **Changes:** + - `commands/tokens.md`: replace any `CA-TOK-001..004` reference with `CA-TOK-001..003` (or list explicitly). Verify no `CA-TOK-004` remains. + - `knowledge/opus-4.7-patterns.md`: remove the Pattern D row from the catalogue table and any text referencing "Pattern D" or `CA-TOK-004`. Update the pattern count in the document header if mentioned. +- **Reuses:** — +- **Test first:** None (docs). +- **Verify:** `! grep -q "CA-TOK-004" commands/tokens.md knowledge/opus-4.7-patterns.md` +- **On failure:** revert. +- **Checkpoint:** `git commit -m "docs(config-audit): remove CA-TOK-004 references after F5 (v5)"` +- **Manifest:** + ```yaml + expected_paths: [commands/tokens.md, knowledge/opus-4.7-patterns.md] + must_not_contain: + - { path: commands/tokens.md, pattern: "CA-TOK-004" } + - { path: knowledge/opus-4.7-patterns.md, pattern: "CA-TOK-004" } + commit_message_pattern: "^docs\\(config-audit\\):" + ``` + +#### Step 9: alpha.1 wrap — release notes draft + +- **Files:** `CHANGELOG.md` +- **Changes:** Add `## [5.0.0-alpha.1]` entry summarizing F1-F5. Note BREAKING for F3 (severity weighting) and F2 (MCP estimate jump). +- **Reuses:** v4.0.0 entry format. +- **Test first:** None (docs). +- **Verify:** `grep -c "5.0.0-alpha.1" CHANGELOG.md` → 1 +- **On failure:** revert. +- **Checkpoint:** `git commit -m "docs(config-audit): CHANGELOG 5.0.0-alpha.1 entry"` +- **Manifest:** + ```yaml + expected_paths: [CHANGELOG.md] + must_contain: + - { path: CHANGELOG.md, pattern: "5\\.0\\.0-alpha\\.1" } + commit_message_pattern: "^docs\\(config-audit\\):" + ``` + +--- + +### STAGE alpha.2 — Structural gaps + README self-audit (M1, M2, M4-M6, F6, F7) + +#### Step 10: F7 — Severity recalibration for TOK patterns + +- **Files:** `scanners/token-hotspots.mjs`, `tests/scanners/token-hotspots.test.mjs` +- **Changes:** Recalibrate severity for all 3 remaining patterns based on tokens/turn (Pattern D removed in F5). Each decision is explicit and testable: + - **Pattern A (volatile top, line 270):** `medium` → `high`. Reason: volatile content in cached prefix triggers full re-read of cascade every turn (10k+ tokens/turn cost). Highest single-pattern impact. + - **Pattern B (redundant perms, line 299):** `low` → `medium`. Reason: duplicate tool entries inflate the tool-schema payload sent every turn (~50-200 tokens/turn per duplicate, scales with turns). + - **Pattern C (deep imports, line 321):** `medium` → `low`. Reason: depth alone is structural and only matters at first-load; cache benefits remain. Lower per-turn cost than originally rated. **This is an explicit recalibration, not "unchanged".** + - Add `calibration_note` field to each finding's evidence: `"severity reflects estimated tokens/turn based on structural heuristic; not measured against runtime telemetry"`. +- **Reuses:** `SEVERITY` constants. +- **Test first:** Table-driven test: + ```js + const SEVERITY_TABLE = [ + { fixture: 'opus-47/cache-breaking', findingId: 'CA-TOK-001', expected: 'high' }, + { fixture: 'opus-47/redundant-tools', findingId: 'CA-TOK-002', expected: 'medium' }, + { fixture: 'opus-47/deep-imports', findingId: 'CA-TOK-003', expected: 'low' }, + ]; + ``` + Iterate with `for...of` generating `it(...)` blocks. Each asserts `finding.severity === expected`. +- **Verify:** `node --test tests/scanners/token-hotspots.test.mjs` → PASS +- **On failure:** revert. Re-evaluate severities if integration tests break (e.g., posture-grade-stability expects different aggregate). +- **Checkpoint:** `git commit -m "feat(config-audit): recalibrate TOK severities for tokens/turn (v5 F7)"` +- **Manifest:** + ```yaml + expected_paths: [scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs] + must_contain: + - { path: scanners/token-hotspots.mjs, pattern: "calibration_note" } + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 11: M6 — `additionalDirectories` in KNOWN_KEYS + threshold + +- **Files:** `scanners/settings-validator.mjs`, `tests/scanners/settings-validator.test.mjs`, `tests/fixtures/additional-dirs-many/` *(new)* +- **Changes:** + - Add `'additionalDirectories'` to `KNOWN_KEYS` (line 16-35). + - New check: if `additionalDirectories.length > 2`, emit `CA-SET-NNN` finding (severity `low`). +- **Reuses:** Existing settings-validator pattern. +- **Test first:** Fixture with 3 entries → 1 finding; fixture with 2 entries → 0 findings; settings without the key → no "unknown key" warning. +- **Verify:** `node --test tests/scanners/settings-validator.test.mjs` → PASS +- **On failure:** revert. +- **Checkpoint:** `git commit -m "feat(config-audit): flag additionalDirectories > 2 (v5 M6)"` +- **Manifest:** + ```yaml + expected_paths: [scanners/settings-validator.mjs, tests/fixtures/additional-dirs-many/settings.json] + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 12: M4 — CLAUDE.md cascade total finding in TOK + +- **Files:** `scanners/token-hotspots.mjs`, `tests/fixtures/large-cascade/` *(new)* +- **Changes:** New detection in TOK: if `activeConfig.claudeMd.estimatedTokens > 10_000`, emit finding (severity `medium`). +- **Reuses:** `readActiveConfig` integration from Step 6; `claudeMd.estimatedTokens` field. +- **Test first:** Fixture with CLAUDE.md @-importing 40k+ bytes → finding present; minimal CLAUDE.md → no finding. +- **Verify:** `node --test tests/scanners/token-hotspots.test.mjs` → PASS +- **On failure:** revert. +- **Checkpoint:** `git commit -m "feat(config-audit): TOK flags CLAUDE.md cascade > 10k tokens (v5 M4)"` +- **Manifest:** + ```yaml + expected_paths: [scanners/token-hotspots.mjs, tests/fixtures/large-cascade/CLAUDE.md] + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 13: M2 — Skill description length finding + +- **Files:** `scanners/token-hotspots.mjs`, `tests/fixtures/skill-bloated/` *(new)* +- **Changes:** New detection in TOK: walk `activeConfig.skills`, parse each `SKILL.md` frontmatter; flag any with `description` > 500 characters as `low` finding. +- **Reuses:** `parseFrontmatter` from `lib/yaml-parser.mjs`; `activeConfig.skills` from Step 6. +- **Test first:** Fixture with 600-char description → finding; 100-char → no finding. +- **Verify:** `node --test tests/scanners/token-hotspots.test.mjs` → PASS +- **On failure:** revert. +- **Checkpoint:** `git commit -m "feat(config-audit): TOK flags skill description > 500 chars (v5 M2)"` +- **Manifest:** + ```yaml + expected_paths: [scanners/token-hotspots.mjs, tests/fixtures/skill-bloated/skills/bloated/SKILL.md] + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 14: M1 — MCP tool-count detection (with manifest fallback) + +- **Files:** `scanners/lib/active-config-reader.mjs`, `tests/fixtures/mcp-tool-heavy/` *(new)* +- **Changes:** Extend `readActiveMcpServers` to attempt tool-count detection in this order: + 1. Cached `tools/list` response at `~/.claude/config-audit/mcp-cache/.json` (if exists) + 2. `package.json` `tools` array on the npm package (if server is npm-resolved) + 3. Fallback: emit `toolCount: null` and a `toolCountUnknown: true` flag on the server entry + Update `estimateTokens` call (Step 5) to use `toolCount` when known. +- **Reuses:** Existing MCP enumeration. +- **Test first:** Fixture with mocked `package.json` tools array of 20 → `toolCount === 20`; fixture without → `toolCount === null`. +- **Verify:** `node --test tests/lib/active-config-reader.test.mjs` → PASS +- **On failure:** revert. Tool-count infrastructure can ship as `null` everywhere if detection logic fails — N1 still produces baseline finding. +- **Checkpoint:** `git commit -m "feat(config-audit): MCP tool-count detection with manifest fallback (v5 M1)"` +- **Manifest:** + ```yaml + expected_paths: [scanners/lib/active-config-reader.mjs, tests/fixtures/mcp-tool-heavy/] + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 15: M5 — Hook output-size finding + +- **Files:** `scanners/hook-validator.mjs`, `tests/fixtures/hooks-verbose/` *(new)* +- **Changes:** Read each hook script referenced in `hooks.json`; count `console.log` / `process.stdout.write` lines; if > 50, emit `CA-HKV-NNN` finding (severity `low`). Static heuristic — no execution. +- **Reuses:** Existing hook-validator file-walking. +- **Test first:** Fixture with hook script containing 60 `console.log` lines → finding; sparse hook → no finding. +- **Verify:** `node --test tests/scanners/hook-validator.test.mjs` → PASS +- **On failure:** revert. +- **Checkpoint:** `git commit -m "feat(config-audit): HKV flags verbose hook output (v5 M5)"` +- **Manifest:** + ```yaml + expected_paths: [scanners/hook-validator.mjs, tests/fixtures/hooks-verbose/] + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 16: F6 — `self-audit --check-readme` flag + +- **Files:** `scanners/self-audit.mjs`, `tests/scanners/self-audit.test.mjs`, `tests/fixtures/readme-desynced/` *(new)* +- **Changes:** Add `--check-readme` CLI flag. The flag uses **filesystem counts as the source of truth**, not the README. Counts: + - scanners: count `.mjs` files matching scanner-shape (have `export async function scan` AND are in `scanners/` not `scanners/lib/` and not `*-cli.mjs`/`*-engine.mjs`/`whats-active.mjs`/`self-audit.mjs`/`scan-orchestrator.mjs`) + - commands: count `.md` files in `commands/` + - agents: count `.md` files in `agents/` + - hooks: parse `hooks/hooks.json`, count distinct event-script pairs + - tests: count `.test.mjs` files in `tests/` + - knowledge: count `.md` files in `knowledge/` + Parse README badge values via line-anchored substring patterns (NOT regex on URL — use exact " 9 " / "9+" detection). Compare counts; emit `low` finding per mismatch with `expected: ` and `found_in_readme: `. +- **Reuses:** Existing `runSelfAudit` shape; `glob`-style file enumeration via `node:fs/promises`. +- **Test first:** + - Fixture `readme-desynced/`: a mini-plugin layout with `commands/foo.md`, `commands/bar.md` (filesystem count = 2) plus a fake `README.md` with badge "1+ commands" → finding present. + - Self-test (no fixture): run `runSelfAudit({checkReadme: true})` against the real plugin; assert `result.readmeCheck` exists, `result.readmeCheck.passed` is `boolean`. Do NOT assert `passed === true` during alpha/beta phases (allowed to be red until Step 28). +- **Verify:** `node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck | type'` → `"object"` +- **On failure:** revert. Most likely cause: scanner-shape detection over-counts; refine to require both `export async function scan` AND `const SCANNER = ` declarations. +- **Checkpoint:** `git commit -m "feat(config-audit): self-audit --check-readme flag (v5 F6)"` +- **Manifest:** + ```yaml + expected_paths: + - scanners/self-audit.mjs + - tests/scanners/self-audit.test.mjs + - tests/fixtures/readme-desynced/README.md + - tests/fixtures/readme-desynced/commands/foo.md + - tests/fixtures/readme-desynced/commands/bar.md + must_contain: + - { path: scanners/self-audit.mjs, pattern: "check-readme" } + - { path: scanners/self-audit.mjs, pattern: "readmeCheck" } + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 17: alpha.2 wrap — CHANGELOG entry + +- **Files:** `CHANGELOG.md` +- **Changes:** Add `## [5.0.0-alpha.2]` summarizing M1, M2, M4-M6, F6, F7. +- **Verify:** `grep -c "5.0.0-alpha.2" CHANGELOG.md` → 1 +- **On failure:** revert. +- **Checkpoint:** `git commit -m "docs(config-audit): CHANGELOG 5.0.0-alpha.2 entry"` +- **Manifest:** + ```yaml + expected_paths: [CHANGELOG.md] + commit_message_pattern: "^docs\\(config-audit\\):" + ``` + +--- + +### STAGE beta.1 — New scanners (N1-N4, N6) + +#### Step 18: N1 — MCP Tool-Schema Budget finding (CA-TOK-005) + +- **Files:** `scanners/token-hotspots.mjs`, `tests/fixtures/mcp-budget/` *(new)* +- **Changes:** New detection function `detectMcpToolBudget(activeConfig)`. Iterate `activeConfig.mcpServers`. Tiered severity per server: + - `toolCount === null` (unknown — fallback chain in M1 returned null): emit finding with severity `low` and message `"tool count unknown — could not parse manifest or cached tools/list"` (per Avklaringer M1: flag, don't skip). + - `toolCount` 0-19: no finding + - 20-49: `low` + - 50-99: `medium` + - 100+: `high` + Finding ID: `CA-TOK-005` per server flagged. Recommendation: use `tools/filter` config; reference cache-telemetry recipe from M7. + **Detection-order pinning:** ensure `detectMcpToolBudget` runs as the 5th detection block in `scan()` AFTER patterns A, B, C (which always run first regardless of fixture). This makes ID assignment deterministic when all patterns fire. When some patterns don't fire, the ID may shift — tests assert presence and tier-specific severity, not exact ID number. +- **Reuses:** `activeConfig.mcpServers` with `toolCount` from Step 14. +- **Test first:** Fixtures: 14 tools (no finding), 25 tools (`low`), 60 tools (`medium`), 120 tools (`high`), null toolCount (`low` with message containing "unknown"). Tests assert `severity` and `finding.title` substring, NOT exact `id` number. +- **Verify:** `node --test tests/scanners/token-hotspots.test.mjs` → PASS +- **On failure:** revert. +- **Checkpoint:** `git commit -m "feat(config-audit): CA-TOK-005 MCP tool-schema budget (v5 N1)"` +- **Manifest:** + ```yaml + expected_paths: + - scanners/token-hotspots.mjs + - tests/fixtures/mcp-budget/14-tools/ + - tests/fixtures/mcp-budget/25-tools/ + - tests/fixtures/mcp-budget/60-tools/ + - tests/fixtures/mcp-budget/120-tools/ + - tests/fixtures/mcp-budget/unknown-tools/ + must_contain: + - { path: scanners/token-hotspots.mjs, pattern: "detectMcpToolBudget" } + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 19: N2 — System-Prompt Manifest scanner + CLI + +- **Files:** `scanners/manifest.mjs` *(new)*, `commands/manifest.md` *(new)*, `tests/scanners/manifest.test.mjs` *(new)* +- **Changes:** New CLI: `node scanners/manifest.mjs [--json] [--output-file]`. Output: ranked list of token sources from `readActiveConfig` (CLAUDE.md cascade entries, plugins, skills, MCP servers, hooks) sorted DESC by `estimated_tokens`. New slash command `/config-audit manifest` invokes the CLI and renders a markdown table. +- **Reuses:** `readActiveConfig`, CLI direct-run pattern, command frontmatter from `commands/whats-active.md`. +- **Test first:** Two test paths: + - **Real-config path** (primary): subprocess against the plugin's own root (`.`) — `output.sources` length > 0; `output.sources[0].estimated_tokens >= output.sources[1].estimated_tokens`; `output.total >= sum(sources.estimated_tokens) - 1` (rounding tolerance). + - **Fixture path** (with `buildRichRepo` helper from `tests/lib/active-config-reader.test.mjs`): build a tmpdir repo with patched HOME containing 2 plugins + 3 skills + .mcp.json. Run the CLI subprocess against tmpdir with the patched HOME passed via env. Assert `sources.length >= 5` (CLAUDE.md cascade + plugins + skills + MCP). +- **Verify:** `node --test tests/scanners/manifest.test.mjs` → PASS +- **On failure:** revert. If `readActiveConfig` returns empty for the real-plugin target: check that `detectGitRoot` resolves to the marketplace root. +- **Checkpoint:** `git commit -m "feat(config-audit): /config-audit manifest command (v5 N2)"` +- **Manifest:** + ```yaml + expected_paths: [scanners/manifest.mjs, commands/manifest.md, tests/scanners/manifest.test.mjs] + must_contain: + - { path: scanners/manifest.mjs, pattern: "readActiveConfig" } + - { path: commands/manifest.md, pattern: "name: manifest" } + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 20: N3 — Cache-Prefix Stability Analyzer + +- **Files:** `scanners/cache-prefix-scanner.mjs` *(new)*, `scanners/scan-orchestrator.mjs`, `scanners/lib/scoring.mjs` (SCANNER_AREA_MAP), `tests/scanners/cache-prefix.test.mjs` *(new)*, `tests/fixtures/volatile-mid-section/` *(new)* +- **Changes:** New scanner with prefix `CPS`. Walks CLAUDE.md cascade; classifies each segment as stable/volatile (using existing volatile patterns from `token-hotspots.mjs:38-43` extended with shell-exec `!` prefix and `${VAR}` patterns). Flags volatility anywhere in cached prefix (not just top 30 lines). Severity `medium`. +- **Reuses:** `VOLATILE_PATTERNS`, `walkClaudeMdCascade`. +- **Test first:** Fixture with `!git log` at line 60 → finding; fixture with volatile content only at line 200+ → no finding. +- **Verify:** `node --test tests/scanners/cache-prefix.test.mjs` → PASS +- **On failure:** revert. +- **Checkpoint:** `git commit -m "feat(config-audit): cache-prefix stability scanner CPS (v5 N3)"` +- **Manifest:** + ```yaml + expected_paths: + - scanners/cache-prefix-scanner.mjs + - scanners/scan-orchestrator.mjs + - scanners/lib/scoring.mjs + - tests/scanners/cache-prefix.test.mjs + must_contain: + - { path: scanners/scan-orchestrator.mjs, pattern: "scanCachePrefix|CPS" } + - { path: scanners/lib/scoring.mjs, pattern: "CPS:" } + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 21: N4 — Disabled-Tools-Still-In-Schema Detector + +- **Files:** `scanners/disabled-in-schema-scanner.mjs` *(new)*, `scanners/scan-orchestrator.mjs`, `scanners/lib/scoring.mjs`, `tests/scanners/disabled-in-schema.test.mjs` *(new)*, `tests/fixtures/denied-tools-in-schema/` *(new)* +- **Changes:** New scanner with prefix `DIS`. Reads cascaded `settings.json`; finds tools that appear in both `permissions.deny` and `permissions.allow`. Severity `low`. +- **Reuses:** Settings-cascade reading. +- **Test first:** Fixture with `Bash` in both arrays → finding; clean settings → no finding. +- **Verify:** `node --test tests/scanners/disabled-in-schema.test.mjs` → PASS +- **On failure:** revert. +- **Checkpoint:** `git commit -m "feat(config-audit): disabled-in-schema scanner DIS (v5 N4)"` +- **Manifest:** + ```yaml + expected_paths: + - scanners/disabled-in-schema-scanner.mjs + - scanners/scan-orchestrator.mjs + - tests/scanners/disabled-in-schema.test.mjs + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 22a: N6 — verify Claude Code skill-namespacing model (research spike) + +- **Files:** `docs/v5-namespace-research.md` *(new, gitignored)* +- **Changes:** Quick verification spike before building N6. Verify against current Claude Code behavior: + 1. When user types `/review` and both a built-in command and a plugin skill named `review` exist — which fires? Is invocation namespaced via `/plugin:review`? + 2. When two plugins both expose a skill named `review` — do their invocation paths differ? + 3. User-level skills (under `~/.claude/skills/`) — same name as plugin skill — does it collide? + Methods: read Claude Code documentation; check existing plugin patterns in marketplace; if uncertain after 10 minutes of research, document the assumption explicitly and proceed with the most defensive interpretation (treat any same-name conflict as a finding). +- **Reuses:** — +- **Test first:** None (research). +- **Verify:** `[ -f docs/v5-namespace-research.md ]` containing at least: "Built-in vs plugin: ", "Plugin vs plugin: ", "User-level vs plugin: ", "Confidence: " +- **On failure:** escalate — if research is inconclusive, ask user before proceeding to Step 22b. +- **Checkpoint:** No commit (file is local-only). +- **Manifest:** + ```yaml + expected_paths: [docs/v5-namespace-research.md] + commit_message_pattern: ".*" + ``` + +#### Step 22b: N6 — Cross-Plugin Skill/Command Collision Scanner (CA-COL-001) + +- **Files:** `scanners/collision-scanner.mjs` *(new)*, `scanners/scan-orchestrator.mjs`, `scanners/lib/scoring.mjs`, `tests/scanners/collision.test.mjs` *(new)*, `tests/fixtures/collision-plugins/` *(new)* +- **Changes:** New scanner with prefix `COL` (Finding ID `CA-COL-001`). Enumerate plugins via `enumeratePlugins`. Build maps of skill names and command names by source. Detection logic determined by Step 22a research: + - **Plugin-vs-plugin same skill name:** finding (severity `low`) — invocation order ambiguity even if `/plugin:skill` is supported. + - **User-level skill vs plugin skill same name:** finding (severity `medium`) — bare invocation may resolve unpredictably. + - **Plugin skill vs Claude Code built-in:** finding only if Step 22a confirms collision is real; otherwise no finding (info-level note in CHANGELOG). + - All findings include `details.namespaces` array describing each conflicting source. +- **Reuses:** `enumeratePlugins`, `countPluginItems`, `findSkillMdFiles`. +- **Test first:** Multi-plugin fixture `collision-plugins/`: + - Layout: `plugins/plugin-a/skills/review/SKILL.md` + `plugins/plugin-b/skills/review/SKILL.md` → finding present (severity `low`). + - Negative: `plugins/plugin-a/skills/review/` + `plugins/plugin-b/skills/summarize/` → no finding. + - Positive (user-vs-plugin): user skill at fake-HOME `skills/review/SKILL.md` + plugin skill `plugin-a/skills/review/SKILL.md` → finding (severity `medium`). + - Suppression-glob check: existing `CA-TOK-*` glob does NOT suppress `CA-COL-001`. +- **Verify:** `node --test tests/scanners/collision.test.mjs` → PASS +- **On failure:** revert. False positives indicate namespace model deviation from Step 22a research — revisit research file. +- **Checkpoint:** `git commit -m "feat(config-audit): cross-plugin collision scanner COL (v5 N6)"` +- **Manifest:** + ```yaml + expected_paths: + - scanners/collision-scanner.mjs + - scanners/scan-orchestrator.mjs + - tests/scanners/collision.test.mjs + - tests/fixtures/collision-plugins/plugins/plugin-a/skills/review/SKILL.md + - tests/fixtures/collision-plugins/plugins/plugin-b/skills/review/SKILL.md + must_contain: + - { path: scanners/collision-scanner.mjs, pattern: "SCANNER = 'COL'" } + - { path: scanners/scan-orchestrator.mjs, pattern: "scanCollision|COL" } + - { path: scanners/lib/scoring.mjs, pattern: "COL:" } + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 23: beta.1 wrap — CHANGELOG + N1 backward-compat note + +- **Files:** `CHANGELOG.md` +- **Changes:** Add `## [5.0.0-beta.1]` entry. Include explicit subsection: `### Known breaking changes` — `CA-TOK-*` glob suppressions in existing `.config-audit-ignore` files now also match `CA-TOK-005` (MCP budget). Document workaround: list `CA-TOK-001 CA-TOK-002 CA-TOK-003` explicitly. +- **Verify:** `grep -c "CA-TOK-005" CHANGELOG.md` → ≥ 1 +- **On failure:** revert. +- **Checkpoint:** `git commit -m "docs(config-audit): CHANGELOG 5.0.0-beta.1 + N1 breaking note"` +- **Manifest:** + ```yaml + expected_paths: [CHANGELOG.md] + must_contain: + - { path: CHANGELOG.md, pattern: "5\\.0\\.0-beta\\.1" } + - { path: CHANGELOG.md, pattern: "CA-TOK-005" } + commit_message_pattern: "^docs\\(config-audit\\):" + ``` + +--- + +### STAGE rc.1 — Knowledge rensing + tokenizer calibration (M7, M8, N5) + +#### Step 24: M8 — Knowledge-base rensing (Sonnet-era → Opus 4.7) + +- **Files:** `knowledge/configuration-best-practices.md`, `knowledge/anti-patterns.md` (if relevant) +- **Changes:** Replace "Keep under 200 lines" framing (line 9) with cache-stability guidance: "Place stable content in the first 30 lines (cache-friendly); volatile content (timestamps, dynamic counts) goes below the cache threshold." Add footnote: "200-line threshold was a Sonnet-era adherence heuristic; Opus 4.7 uses prompt-cache structure." +- **Reuses:** Existing knowledge file format. +- **Test first:** None (docs). +- **Verify:** `! grep -q "Keep under 200 lines" knowledge/configuration-best-practices.md` +- **On failure:** revert. +- **Checkpoint:** `git commit -m "docs(config-audit): knowledge rensing — Opus 4.7 cache-stability guidance (v5 M8)"` +- **Manifest:** + ```yaml + expected_paths: [knowledge/configuration-best-practices.md] + forbidden_paths: [] + commit_message_pattern: "^docs\\(config-audit\\):" + ``` + +#### Step 25: M7 — Cache-telemetry recipe in knowledge/ + flag + +- **Files:** `knowledge/cache-telemetry-recipe.md` *(new)*, `commands/tokens.md`, `scanners/token-hotspots-cli.mjs`, `tests/scanners/token-hotspots-cli.test.mjs` +- **Changes:** + 1. New knowledge file documenting how a user can manually verify cache hit rate from session transcripts (parsing `cache_read_input_tokens` from transcript JSON; recipe is opt-in, NOT bundled scanner logic — keeps non-goal of "no transcript-parsing as core feature"). + 2. Add `--with-telemetry-recipe` flag to `token-hotspots-cli.mjs`: when present, includes `telemetry_recipe_path` field in JSON output pointing to the knowledge file. Without the flag, output unchanged. Committed as deliverable, not optional. + 3. Update `commands/tokens.md` next-steps to mention `--with-telemetry-recipe` and link the recipe. +- **Reuses:** Knowledge-file format from `opus-4.7-patterns.md`; CLI argv-parsing pattern from `posture.mjs`. +- **Test first:** Subprocess test: `node token-hotspots-cli.mjs --with-telemetry-recipe --json | jq '.telemetry_recipe_path'` → non-empty string ending in `cache-telemetry-recipe.md`. +- **Verify:** `[ -f knowledge/cache-telemetry-recipe.md ]` AND `node --test tests/scanners/token-hotspots-cli.test.mjs` → PASS +- **On failure:** revert. +- **Checkpoint:** `git commit -m "docs(config-audit): cache-telemetry recipe + --with-telemetry-recipe flag (v5 M7)"` +- **Manifest:** + ```yaml + expected_paths: + - knowledge/cache-telemetry-recipe.md + - commands/tokens.md + - scanners/token-hotspots-cli.mjs + - tests/scanners/token-hotspots-cli.test.mjs + must_contain: + - { path: scanners/token-hotspots-cli.mjs, pattern: "with-telemetry-recipe" } + commit_message_pattern: "^docs\\(config-audit\\):" + ``` + +#### Step 26: N5 — `--accurate-tokens` API calibration + +- **Files:** `scanners/token-hotspots-cli.mjs`, `scanners/lib/tokenizer-api.mjs` *(new)*, `tests/scanners/accurate-tokens.test.mjs` *(new)* +- **Prerequisites:** Node.js >= 18.13 (for `mock.method` from `node:test`). Verify with `node --version`. If older, escalate. +- **Changes:** New helper module `tokenizer-api.mjs` exporting `async callCountTokensApi(text, apiKey)`. Wraps `fetch('https://api.anthropic.com/v1/messages/count_tokens', ...)` with: + - 5-second AbortController timeout + - Exponential backoff on 429 (max 3 retries: 1s, 2s, 4s) + - API key MASKED to `${key.slice(0,8)}...` in ANY error message and ANY thrown error + - On non-429 HTTP error: throw `Error("count_tokens API failed: " + status)` — no body included (body may contain the key in echo'd form) + - Required headers: `x-api-key`, `anthropic-version: 2023-06-01`, `content-type: application/json` + + Wire `--accurate-tokens` into `token-hotspots-cli.mjs`: + - If `process.env.ANTHROPIC_API_KEY` present: call `callCountTokensApi` for the top 3 hotspots' content; populate `output.calibration = { actual_tokens: , source: 'count_tokens_api', sampled_hotspots: 3 }`. + - If absent: `output.calibration = { skipped: 'no-api-key' }` and warn to stderr "ANTHROPIC_API_KEY not set — skipping API calibration". +- **Reuses:** Existing CLI pattern, env-var reading. +- **Test first:** + - **No-API-key case:** subprocess with `env: { ...process.env, ANTHROPIC_API_KEY: '' }`. Assert exit 0, output `calibration.skipped === 'no-api-key'`. + - **With-key case:** `import { mock } from 'node:test'`. Use `mock.method(tokenizerApi, 'callCountTokensApi', () => ({ input_tokens: 4200 }))`. Run CLI in-process (not subprocess — mock can't cross process boundary). Assert `output.calibration.actual_tokens === 4200`. + - **Error masking:** stub `callCountTokensApi` to throw `Error("simulated 401 with key sk-ant-FAKEKEY-1234")`. Assert that the JSON output and stderr contain `sk-ant-F...` and NOT `FAKEKEY-1234` (mask works). +- **Verify:** `node --test tests/scanners/accurate-tokens.test.mjs` → PASS +- **On failure:** revert. Most likely causes: + - `mock.method` not available — check Node version >= 18.13. + - `fetch` unavailable — fall back to `node:https`. +- **Checkpoint:** `git commit -m "feat(config-audit): --accurate-tokens API calibration (v5 N5)"` +- **SC-6b note:** The brief's SC-6b ("byte-estimat innen ±5% av Anthropic count_tokens-API") cannot be verified by automated tests using a stub — the stub returns a constant. SC-6b is a **release gate**: before tagging v5.0.0 in Step 30, KTG must run `--accurate-tokens` against a known fixture with a real `ANTHROPIC_API_KEY`, manually compare `calibration.actual_tokens` to byte-estimated tokens for that fixture, and confirm error ≤ ±5%. If error > ±5%, fix the heuristic before tagging. +- **Manifest:** + ```yaml + expected_paths: + - scanners/token-hotspots-cli.mjs + - scanners/lib/tokenizer-api.mjs + - tests/scanners/accurate-tokens.test.mjs + must_contain: + - { path: scanners/lib/tokenizer-api.mjs, pattern: "count_tokens" } + - { path: scanners/lib/tokenizer-api.mjs, pattern: "AbortController|signal" } + - { path: scanners/lib/tokenizer-api.mjs, pattern: "slice\\(0, ?8\\)" } + commit_message_pattern: "^feat\\(config-audit\\):" + ``` + +#### Step 27: rc.1 wrap — CHANGELOG entry + +- **Files:** `CHANGELOG.md` +- **Changes:** Add `## [5.0.0-rc.1]` summarizing M7, M8, N5. +- **Verify:** `grep -c "5.0.0-rc.1" CHANGELOG.md` → 1 +- **On failure:** revert. +- **Checkpoint:** `git commit -m "docs(config-audit): CHANGELOG 5.0.0-rc.1 entry"` +- **Manifest:** + ```yaml + expected_paths: [CHANGELOG.md] + commit_message_pattern: "^docs\\(config-audit\\):" + ``` + +--- + +### STAGE release — v5.0.0 final + +#### Step 28: README and CLAUDE.md sync (straggler-sweep) + +- **Files:** `README.md`, `CLAUDE.md`, `commands/help.md`, `commands/posture.md`, `commands/config-audit.md`, `agents/feature-gap-agent.md` +- **Changes:** Update all badges and counts: + - Scanners: 9 → 12 (TOK extended + CPS + DIS + COL + manifest if counted) + - Commands: 17 → 18 (+ manifest) + - Tests: 543 → final count after all steps (run `node --test 'tests/**/*.test.mjs' 2>&1 | grep "tests"`) + - Hooks: unchanged (4) + - Agents: unchanged (6) + - Knowledge: 7 → 8 (+ cache-telemetry-recipe) + - Quality areas: unchanged (8) +- **Reuses:** Self-audit `--check-readme` from Step 16 to verify completeness. +- **Test first:** `node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck.passed'` → `true` +- **Verify:** Same command above. +- **On failure:** retry — find the missing badge with `node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck.mismatches'`. +- **Checkpoint:** `git commit -m "docs(config-audit): straggler sweep for v5.0.0 — sync all badge counts"` +- **Manifest:** + ```yaml + expected_paths: [README.md, CLAUDE.md] + commit_message_pattern: "^docs\\(config-audit\\):" + ``` + +#### Step 29: Version bump + final CHANGELOG + +- **Files:** `.claude-plugin/plugin.json`, `CHANGELOG.md`, `README.md` (version badge) +- **Changes:** Bump `plugin.json` version: `4.0.0 → 5.0.0`. Add `## [5.0.0]` entry to CHANGELOG with `### Summary` (consolidated from alpha/beta/rc entries) and `### Breaking changes` (F2 token magnitude jump, F3 severity weighting, N1 suppression backward-compat). +- **Reuses:** v4.0.0 entry format. +- **Test first:** `[ "$(jq -r .version .claude-plugin/plugin.json)" = "5.0.0" ]` +- **Verify:** `grep "5.0.0" .claude-plugin/plugin.json && grep "## \[5.0.0\]" CHANGELOG.md` +- **On failure:** revert. +- **Checkpoint:** `git commit -m "chore(config-audit): bump version to 5.0.0"` +- **Manifest:** + ```yaml + expected_paths: [.claude-plugin/plugin.json, CHANGELOG.md, README.md] + must_contain: + - { path: .claude-plugin/plugin.json, pattern: "\"version\": \"5.0.0\"" } + commit_message_pattern: "^chore\\(config-audit\\):" + ``` + +#### Step 30: Final self-audit + SC-6b release gate + green tag + +- **Files:** — +- **Changes:** + 1. Run full test suite. All 543 v4 tests + new tests must pass. + 2. Run `node scanners/self-audit.mjs --check-readme`. Grade must be A; `readmeCheck.passed === true`. + 3. **SC-6b release gate (manual):** If `ANTHROPIC_API_KEY` is set, run `node scanners/token-hotspots-cli.mjs --accurate-tokens --json`; compare `calibration.actual_tokens` against the heuristic byte-estimate for the same fixture; ensure delta ≤ ±5%. Document the comparison in the v5.0.0 CHANGELOG entry. If the user opts out of the SC-6b gate (no API key available), document this in CHANGELOG as "SC-6b verification deferred — ±5% tokenizer accuracy unverified." + 4. Tag and push. +- **Reuses:** Self-audit from Step 16; CLI from Step 26. +- **Test first:** `node --test 'tests/**/*.test.mjs' 2>&1 | tail -5` — all PASS +- **Verify:** + - `node --test 'tests/**/*.test.mjs'` → all PASS + - `node scanners/self-audit.mjs --check-readme --json | jq -r '.overallGrade + " " + (.readmeCheck.passed | tostring)'` → `"A true"` + - SC-6b gate documented (pass or deferred) in CHANGELOG + - `git tag config-audit/v5.0.0` +- **On failure:** escalate — if test/grade fails, diagnose and add follow-up steps in this plan; do not tag. +- **Checkpoint:** Tag is the equivalent of a commit. After tag: `git push origin main && git push origin config-audit/v5.0.0` +- **Manifest:** + ```yaml + expected_paths: [] + commit_message_pattern: ".*" + ``` + +--- + +### Manifest — objective completion predicate + +Every step has a Manifest block with `expected_paths`, `must_contain` patterns, and a regex +`commit_message_pattern`. Steps that touch only docs may have empty `must_contain`. + +### Failure recovery rules + +- **revert** — `git checkout -- `, restore working tree, do not proceed. +- **retry** — try the alternative described in `On failure`, revert if still failing. +- **escalate** — stop entirely; human review required (used only at Step 30). + +## Alternatives Considered + +| Approach | Pros | Cons | Why rejected | +|----------|------|------|--------------| +| Keep N1 (`CA-TOK-005`) inside `token-hotspots.mjs` (chosen) | Lowest friction; preserves TOK ID namespace; consistent with patterns A-D | Counter is positional; `CA-TOK-005` ID assigned by order of detection, not semantic | Acceptable trade-off; tests assert on finding *presence* and severity, not exact ID number. The brief specifies CA-TOK-005, which can be enforced by detection order. | +| Standalone `mcp-budget-scanner.mjs` with prefix `MCB` | Clean separation; new ID namespace; testable in isolation | Diverges from brief's `CA-TOK-005` spec; requires new SCANNER_AREA_MAP entry | Brief explicitly names CA-TOK-005; standalone scanner would force a brief revision. | +| Defer F3 severity-weighting to v5.1.0 | Reduces alpha.1 risk of breaking baselines | Means alpha.1 ships only 4 of 7 must-fix items; brief's primary goal "reality-based token-optimization" depends on F3 | Brief lists F3 as must-fix and ties it directly to v5.0.0 success criteria. | +| Bundle N5 (live tokenizer) into v5.1.0 | Removes API-key risk surface from v5.0.0 | User explicitly confirmed N5 in v5.0.0 (Avklaringer 2026-05-01); features list specifies opt-in via flag, mitigating risk | User confirmed scope explicitly. | +| Use external lib like `tiktoken` for N5 | Higher accuracy | Violates zero-deps convention (CLAUDE.md "null avhengigheter") | Convention is hard rule. | + +## Test Strategy + +- **Framework:** `node:test` + `node:assert/strict` +- **Existing patterns:** + - Scanner tests: `runScanner(fixtureName)` helper that resets counter + runs full discovery+scan + - Lib tests: factory functions (`makeScannerResult`) for in-memory input data + - Lib integration: `buildRichRepo()` tmpdir with patched HOME + - CLI tests: `execFile`/`exec` subprocess + parse stdout JSON +- **New tests in this plan:** approximately 60 new test cases across 13 test files +- **Coverage gating:** Per revised SC-10 — every F-fix and M-fix has ≥1 test; every new scanner (N1-N4, N6) has ≥1 fixture-backed test; F3 has severity-mix table; N5 has both API-key-present and -absent cases. + +### Tests to write + +| Type | File | Verifies | Model test | +|------|------|----------|------------| +| Unit | `tests/lib/severity.test.mjs` | WEIGHTS exported | existing severity tests | +| Unit | `tests/lib/scoring.test.mjs` | severity-weighted area score | makeScannerResult pattern | +| Unit | `tests/lib/active-config-reader.test.mjs` | 'mcp' kind differentiation | existing estimateTokens cases | +| Integration | `tests/lib/active-config-reader.test.mjs` | MCP servers report >500 tokens | buildRichRepo extension | +| Scanner | `tests/scanners/token-hotspots.test.mjs` | F1, F4, F5, F7, M2, M4, N1 | runScanner pattern | +| Scanner | `tests/scanners/settings-validator.test.mjs` | M6 additionalDirectories | existing validator tests | +| Scanner | `tests/scanners/hook-validator.test.mjs` | M5 verbose hook output | existing hook tests | +| Scanner | `tests/scanners/cache-prefix.test.mjs` *(new)* | N3 mid-section volatility | runScanner pattern | +| Scanner | `tests/scanners/disabled-in-schema.test.mjs` *(new)* | N4 deny+allow conflict | runScanner pattern | +| Scanner | `tests/scanners/collision.test.mjs` *(new)* | N6 cross-plugin collision | multi-plugin fixture | +| CLI | `tests/scanners/manifest.test.mjs` *(new)* | N2 manifest CLI | execFile pattern | +| CLI | `tests/scanners/accurate-tokens.test.mjs` *(new)* | N5 API + no-API paths | mock.method first use | +| Self-audit | `tests/scanners/self-audit.test.mjs` | F6 --check-readme shape | existing runSelfAudit test | + +## Risks and Mitigations + +| Priority | Risk | Location | Impact | Mitigation | +|----------|------|----------|--------|------------| +| Critical | F3 silently degrades grades for users with v4 baselines | scoring.mjs:184 (rewritten in Step 2) | Drift comparisons produce wrong deltas | Add `scoringVersion: 'v5'` to envelope meta (Step 2). diff-engine warns on cross-version compare in v5.0.1 patch (out of scope here) | +| Critical | F2 jump from 15 → 5000+ tokens per MCP collapses Token Efficiency grades | Step 5 | User's Grade A becomes Grade C overnight | CHANGELOG explicit BREAKING note (Step 9, 23, 29). Document in `commands/posture.md` next-steps | +| Critical | N5 API-key leak via error message or JSON output | Step 26 | Key persisted in session files / logs | `tokenizer-api.mjs` masks key to first 8 chars; never includes key in JSON; explicit test for masking | +| High | F3 baseline-all-a fixture may fail | Step 3 | Test suite blocks at alpha.1 | Step 3 dedicated to fixture audit; `posture-grade-stability.test.mjs` updated if needed | +| High | N1 tool-count threshold flagging real-world MCP servers (GitHub MCP has 28+ tools) | Step 18 | False-positive findings train users to suppress | Tiered severity: <20=none, 20-49=low, 50-99=medium, 100+=high (Step 18) | +| High | N6 namespace confusion (plugin-skill vs user-skill vs built-in) | Step 22 | Every plugin with skill named `review` flagged | Scanner only compares same-namespace items; built-ins excluded; documented in scanner comment | +| High | N5 rate-limit (1000/min) exhausted in CI loop | Step 26 | Mid-scan crash; user's main quota impacted | 3 retries with exponential backoff; 5-sec timeout; `--accurate-tokens-max-files` future flag (out of scope) | +| Medium | Cascade-volatility false positives on inline date references | Step 20 | Noise findings | Keep line-anchored regex; negative fixture for inline dates | +| Medium | F6 self-audit fragile to README formatting changes | Step 16 | Hard-blocks every release | Use exact line-anchored substring (not URL regex); badge mismatch is `low` severity (advisory, not fail) | +| Medium | findingCounter is process-global; new scanners interfere if they call `finding()` outside orchestrator | All N* steps | Wrong IDs in tests | All new scanners follow single-`scan()` entry; no nested calls | +| Medium | Suppression backward-compat: `CA-TOK-*` glob suppresses CA-TOK-005 | Step 18+23 | Users miss highest-value finding | Documented in CHANGELOG (Step 23). One-time runtime warning is out of scope (v5.0.1 candidate) | +| Low | Network failure on N5 hangs 30s | Step 26 | Bad UX | 5-sec AbortController timeout, immediate stderr message | +| Low | Knowledge-base rensing breaks Sonnet-version users | Step 24 | Outdated guidance | Reframe with footnote, not delete | + +## Assumptions + +| # | Assumption | Why unverifiable | Impact if wrong | +|---|-----------|-----------------|-----------------| +| 1 | Anthropic `count_tokens` endpoint accepts plain text payload and returns `{input_tokens: number}` | Brief premise; not tested in this codebase | N5 produces wrong calibration values; falls back gracefully | +| 2 | MCP servers expose tool count via `tools/list` or package.json `tools` field | MCP spec is evolving; servers vary | M1/N1 detection silently returns null; CA-TOK-005 finding may not fire on real servers; baseline behavior is "no finding" not "wrong finding" | +| 3 | `readActiveConfig` is performant enough to call from TOK on large repos | Untested at scale | TOK scanner becomes slow; fix: cache `activeConfig` in scan-orchestrator and pass to scanners (out of scope) | +| 4 | `posture-grade-stability.test.mjs` baseline-all-a fixture is genuinely info-only after v4 work | Assumed from naming + git history | Step 3 catches and fixes | +| 5 | Cross-plugin collision detection model (plugin-namespaced skills don't collide) is correct | Documented in N6 description but not in Anthropic specs | False positives/negatives on plugin-namespacing; verified via test fixture | + +## Verification + +End-to-end checks after Step 30 completes (these mirror the brief's revised SCs): + +- [ ] **SC-10 (revised):** `node --test 'tests/**/*.test.mjs'` → all green AND original 543 tests still pass AND ≥ 1 fixture-backed test exists per new scanner function (F1, F2, F3, M1, M2, M4, M5, M6, N1-N4, N6) — verified by file presence: + - `tests/lib/active-config-reader.test.mjs` — F2 'mcp' kind cases + - `tests/lib/scoring.test.mjs` — F3 severity-mix cases + - `tests/scanners/token-hotspots.test.mjs` — F1, F4, F5, F7, M2, M4, N1 cases + - `tests/scanners/settings-validator.test.mjs` — M6 cases + - `tests/scanners/hook-validator.test.mjs` — M5 cases + - `tests/scanners/manifest.test.mjs` — N2 cases (new file) + - `tests/scanners/cache-prefix.test.mjs` — N3 cases (new file) + - `tests/scanners/disabled-in-schema.test.mjs` — N4 cases (new file) + - `tests/scanners/collision.test.mjs` — N6 cases (new file) + - `tests/scanners/accurate-tokens.test.mjs` — N5 cases (new file) + - `tests/scanners/self-audit.test.mjs` — F6 cases +- [ ] **SC-1 (F1):** `! grep -q "void readActiveConfig" scanners/token-hotspots.mjs` AND `grep -q "readActiveConfig(targetPath" scanners/token-hotspots.mjs` +- [ ] **SC-2 (F2):** `grep -q "kind === 'mcp'" scanners/lib/active-config-reader.mjs` +- [ ] **SC-3 (F3):** `grep -q "import.*WEIGHTS.*riskScore\|import.*riskScore.*WEIGHTS" scanners/lib/scoring.mjs` +- [ ] **SC-4 (F6):** `node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck.passed'` → `true` +- [ ] **SC-5 (N1):** `node --test tests/scanners/token-hotspots.test.mjs --grep "mcp-budget"` → PASS +- [ ] **SC-6a (N2):** `node scanners/manifest.mjs . --json | jq '.sources | length'` → > 0 AND output sorted DESC by `estimated_tokens` +- [ ] **SC-6b (N5):** Manual gate — release-time verification of ±5% accuracy with real API key (Step 30); pass OR documented deferral in CHANGELOG +- [ ] **SC-7 (N3):** `node --test tests/scanners/cache-prefix.test.mjs` → PASS +- [ ] **SC-8 (N6):** `node --test tests/scanners/collision.test.mjs` → PASS +- [ ] **SC-9 (M8):** `! grep -q "Keep under 200 lines" knowledge/configuration-best-practices.md` +- [ ] **SC-11 (N5):** Both API-key-present and -absent paths covered in `tests/scanners/accurate-tokens.test.mjs` +- [ ] **F5 cleanup:** `! grep -q "detectSonnetEra\|CA-TOK-004" scanners/token-hotspots.mjs commands/tokens.md knowledge/opus-4.7-patterns.md` +- [ ] **Release:** `[ "$(jq -r .version .claude-plugin/plugin.json)" = "5.0.0" ]` +- [ ] **Git:** `git log --oneline -50 | grep -c "v5"` ≥ 5 (one per stage) + +## Estimated Scope + +- **Files to modify:** 18 (incl. `commands/tokens.md` and `knowledge/opus-4.7-patterns.md` per Step 8b) +- **Files to create:** ~22 (5 new scanners + 1 lib + 1 command + 13 fixture dirs + 5 new test files + 1 research doc) +- **Steps:** 31 (was 30; added Step 8b for CA-TOK-004 reference cleanup, Step 22a for namespace research spike) +- **Complexity:** high (cross-cutting changes across scoring, tokenization, scanner registry, knowledge base) + +## Execution Strategy + +The plan has 30 steps grouped into 5 sessions matching release stages. +**Sessions are sequential** — alpha.1 must land before alpha.2, etc. Within a session, +some steps are parallel-safe but for clarity all run sequentially. + +### Session 1 — alpha.1 (TOK rensing + scoring fix) +- **Steps:** 1-9 (includes Step 8b for CA-TOK-004 reference cleanup) +- **Wave:** 1 +- **Depends on:** none +- **Scope fence:** + - Touch: `scanners/lib/severity.mjs`, `scanners/lib/scoring.mjs`, `scanners/lib/active-config-reader.mjs`, `scanners/token-hotspots.mjs`, `tests/lib/{severity,scoring,active-config-reader}.test.mjs`, `tests/scanners/token-hotspots.test.mjs`, `tests/scanners/posture-grade-stability.test.mjs`, `tests/fixtures/tok-active-config/`, `commands/tokens.md` (Step 8b), `knowledge/opus-4.7-patterns.md` (Step 8b), `CHANGELOG.md` + - Never touch: any scanner other than TOK; any new scanner files (those land later) + +### Session 2 — alpha.2 (structural gaps + README self-audit) +- **Steps:** 10-17 +- **Wave:** 2 +- **Depends on:** Session 1 +- **Scope fence:** + - Touch: `scanners/{token-hotspots,settings-validator,hook-validator,self-audit}.mjs`, `scanners/lib/active-config-reader.mjs`, `tests/scanners/{settings-validator,hook-validator,self-audit,token-hotspots}.test.mjs`, `tests/fixtures/{additional-dirs-many,large-cascade,skill-bloated,mcp-tool-heavy,hooks-verbose,readme-desynced}/`, `CHANGELOG.md` + - Never touch: scanner-orchestrator (no new scanners yet); knowledge/ (later) + +### Session 3 — beta.1 (new scanners) +- **Steps:** 18, 19, 20, 21, 22a (research spike), 22b (collision scanner), 23 +- **Wave:** 3 +- **Depends on:** Session 2 +- **Scope fence:** + - Touch: `scanners/token-hotspots.mjs` (N1), `scanners/{manifest,cache-prefix-scanner,disabled-in-schema-scanner,collision-scanner}.mjs` (new), `scanners/scan-orchestrator.mjs`, `scanners/lib/scoring.mjs` (SCANNER_AREA_MAP only), `commands/manifest.md` (new), 5 new test files, 4 new fixture dirs, `docs/v5-namespace-research.md` (gitignored), `CHANGELOG.md` + - Never touch: any other scanner code + +### Session 4 — rc.1 (knowledge + tokenizer) +- **Steps:** 24-27 +- **Wave:** 4 +- **Depends on:** Session 3 +- **Scope fence:** + - Touch: `knowledge/{configuration-best-practices,cache-telemetry-recipe}.md`, `commands/tokens.md`, `scanners/token-hotspots-cli.mjs`, `scanners/lib/tokenizer-api.mjs` (new), `tests/scanners/accurate-tokens.test.mjs` (new), `CHANGELOG.md` + - Never touch: scanner code beyond CLI + +### Session 5 — release (v5.0.0 final) +- **Steps:** 28-30 +- **Wave:** 5 +- **Depends on:** Session 4 +- **Scope fence:** + - Touch: `README.md`, `CLAUDE.md`, `commands/{help,posture,config-audit}.md`, `agents/feature-gap-agent.md`, `.claude-plugin/plugin.json`, `CHANGELOG.md` + - Never touch: any code; this is documentation + tag + +### Execution Order + +- **Wave 1:** Session 1 (alpha.1) +- **Wave 2:** Session 2 (alpha.2) — after Wave 1 +- **Wave 3:** Session 3 (beta.1) — after Wave 2 +- **Wave 4:** Session 4 (rc.1) — after Wave 3 +- **Wave 5:** Session 5 (release) — after Wave 4 + +### Grouping rules applied + +- Steps sharing files → same session (e.g., all TOK changes in Session 1+2) +- New-scanner steps → Session 3 (post structural) +- Knowledge/CLI changes → Session 4 (post all scanners) +- Doc-sync + version-bump → Session 5 (last, depends on all counts being final) + +## Plan Quality Score + +| Dimension | Weight | Score | Notes | +|-----------|--------|-------|-------| +| Structural integrity | 0.15 | 88 | Sessions ordered by dependencies; Step 22a research spike resolves namespace ambiguity before 22b | +| Step quality | 0.20 | 85 | All TBDs resolved; F7 explicit decision on Pattern C; Step 16 fs-counted not README-counted | +| Coverage completeness | 0.20 | 92 | All 22 brief items mapped; F5 documentation cleanup added (8b); SC-6b release gate documented | +| Specification quality | 0.15 | 86 | File paths verified; manifest must_not_contain replaces vacuous regex; Node version pinned for N5 | +| Risk & pre-mortem | 0.15 | 88 | 13 risks; namespace research spike resolves N6 mitigation circularity | +| Headless readiness | 0.10 | 84 | All steps have On Failure + Checkpoint; manifest blocks updated to use must_not_contain where appropriate | +| Manifest quality | 0.05 | 78 | must_contain + must_not_contain; fixture file paths fully enumerated for Step 6/14/18 | +| **Weighted total** | **1.00** | **86.6** | **Grade: B+** | + +**Adversarial review:** +- **Plan critic:** initial verdict REPLAN (5 blockers, 8 majors, 7 minors, score 67.7); all blockers + majors addressed in revisions +- **Scope guardian:** initial verdict MIXED (4 scope-gaps); all 4 gaps addressed in revisions + +## Revisions + +| # | Finding | Severity | Resolution | +|---|---------|----------|------------| +| 1 | Plan header "TBD" | blocker | Updated to "B+ (84/100)" after re-scoring | +| 2 | Step 25 "TBD if needed" flag | blocker | Committed `--with-telemetry-recipe` flag as deliverable; added test | +| 3 | Step 8 manifest `^(?!.*detectSonnetEra)` is logically vacuous | blocker | Replaced with `must_not_contain` field; added explicit grep verify | +| 4 | Step 6 fixture incomplete in expected_paths | blocker | Enumerated 4 fixture files: `.mcp.json`, `CLAUDE.md`, `.claude-plugin/plugin.json`, `commands/sample.md` | +| 5 | CA-TOK-004 references in `commands/tokens.md` and `knowledge/opus-4.7-patterns.md` after F5 | blocker | Added Step 8b: dedicated cleanup step with grep verify | +| 6 | Step 12 missing test for `claudeMd.estimatedTokens` field shape | major | Added assertion to Step 6 test (item c) | +| 7 | Step 18 missing toolCount=null handling | major | Added explicit `null` branch with `low` severity + "tool count unknown" message | +| 8 | Step 3 ordering vs Step 10 grade-stability re-invalidation | major | Step 10's table-driven test now checks per-finding severity; Step 3 audits remain at fixture-level grade | +| 9 | N6 namespace assumption is circular mitigation | major | Added Step 22a research spike with explicit verdict file before 22b implementation | +| 10 | Step 16 negative-case test depends on Step 28 docs sweep | major | Step 16 now uses filesystem counts as truth (not README); fs-counted detection breaks the cycle | +| 11 | Step 19 `marketplace-large` fixture issue with manifest CLI | major | Added two test paths: real-config (plugin root) + fixture-based with `buildRichRepo` helper | +| 12 | Step 26 mock.method Node version requirement | major | Added prerequisite check: Node >= 18.13; documented in step + escalation path | +| 13 | estimateTokens kind inconsistency between discovery and readActiveConfig paths | major | Step 6 unifies: prefer readActiveConfig data for MCP/skills/plugins; discovery only for files not covered | +| 14 | F7 Pattern C left "unchanged" without rationale | scope-gap | Step 10 now explicitly recalibrates Pattern C: `medium` → `low` with reason; table-driven test asserts | +| 15 | M7 `--with-telemetry-recipe` flag was conditional | scope-gap | Same as Revision 2 — committed as deliverable | +| 16 | SC-6b ±5% accuracy unprovable in automation | scope-gap | Step 30 added manual release gate with documented deferral path | +| 17 | SC-10 verification used old "≥600 tests" threshold | scope-gap | Verification section rewritten to per-feature coverage requirement | +| 18-24 | Various minors (docs file naming, manifest enumeration, CHANGELOG specifics) | minor | Addressed in their respective steps |