Kjell Tore Guttormsen 4bd7cd5056 docs(config-audit): v5.0.0 brief + implementation plan

Planning artifacts for v5.0.0 (token-economy round):

- v5-brief.md: scope brief with 22 items (F1-F7 + M1-M8 + N1-N7), revised
  with Avklaringer-section after critical review (N7 dropped, M3+N6 merged,
  N5 promoted to v5.0.0, SC-6/SC-10 reformulated)
- v5-plan.md: 31-step implementation plan in 5 sessions
  (alpha.1 → alpha.2 → beta.1 → rc.1 → release). B+ score (84/100) after
  plan-critic + scope-guardian review addressed all blockers/majors/gaps.
- v5-implementation-log.md: per-session status record (skeleton)

Sessions track via state files (REMEMBER.md, TODO.md gitignored;
implementation-log.md committed; NEXT-SESSION-PROMPT.local.md gitignored).

No code changes in this commit — planning only.

2026-05-01 06:10:44 +02:00

67 KiB

Raw Permalink Blame History

config-audit v5.0.0 — Implementation Plan

Plan quality: B+ (84/100) — adversarial review complete, revisions applied

Generated by ultraplan-local v3.0.0 on 2026-05-01 — plan_version: 1.7 Source brief: docs/v5-brief.md (revised 2026-05-01) Revised after plan-critic + scope-guardian review on 2026-05-01 (see Revisions section)

Context

config-audit v4.0.0 markets itself as "Opus 4.7-aware token optimization" but the critical review (briefed 2026-04-19, revised 2026-05-01) shows the marketing does not hold:

TOK scanner imports readActiveConfig and explicitly voids it (void readActiveConfig at scanners/token-hotspots.mjs:31) — never sees plugins, skills, MCP, or cascade.
4 TOK patterns cover ~29% of identified Opus 4.7 cost drivers; the largest sinks (MCP tool-schema bloat, skill-description bloat, CLAUDE.md cascade total) have zero coverage.
estimateTokens flattens MCP servers and hooks to 15 tokens each via three caller sites passing kind='item' (active-config-reader.mjs:556, 593, 618). Reality is 2k–20k per MCP.
scoreByArea treats severities equally: 1 critical and 1 info produce identical area score.
Pattern D (detectSonnetEra) contradicts the plugin's own v3.0 policy (minimal correct = Grade A).

v5.0.0 is a token-economy round, not a rewrite. The 8 structural scanners, backup/rollback, suppression, plugin-health all stay. The TOK scanner is reworked in place; new scanners (MCP budget, manifest, cache-prefix, disabled-in-schema, collision) are added; severity-aware scoring lands; tokenizer calibration via Anthropic count_tokens ships as opt-in.

Architecture Diagram

graph TD
    subgraph "v5.0.0 changes"
        TOK[token-hotspots.mjs<br/>F1/F4/F5/F7]
        ACR[active-config-reader.mjs<br/>F2 + 'mcp' kind]
        SCR[scoring.mjs<br/>F3 severity-weighted]
        SVR[severity.mjs<br/>WEIGHTS export]
        SAU[self-audit.mjs<br/>F6 --check-readme]
        ORC[scan-orchestrator.mjs<br/>register new scanners]
        CLI[token-hotspots-cli.mjs<br/>N5 --accurate-tokens]

        MCB[NEW: mcp-budget-scanner.mjs<br/>N1 CA-TOK-005]
        MAN[NEW: manifest.mjs<br/>N2 CLI + scanner]
        CPS[NEW: cache-prefix-scanner.mjs<br/>N3]
        DIS[NEW: disabled-in-schema-scanner.mjs<br/>N4]
        COL[NEW: collision-scanner.mjs<br/>N6 CA-COL-001]

        SET[settings-validator.mjs<br/>M6 additionalDirectories]
        KB[knowledge/<br/>M7 cache-recipe + M8 rensing]

        ORC --> TOK & MCB & CPS & DIS & COL
        TOK --> ACR
        SCR --> SVR
        ACR -. F2 fix .-> ACR
        CLI --> ACR
        SAU --> README
    end

Codebase Analysis

Tech stack: Node.js >= 18, ES modules (.mjs), node:test, zero external deps
Test framework: node:test + node:assert/strict — 543 tests across 31 files in v4.0.0
Key patterns:
- Scanner orchestrator + shared discovery object (scan-orchestrator.mjs:73)
- Finding factory finding({scanner, severity, ...}) produces CA-{SCANNER}-{NNN} IDs (output.mjs:31); counter is process-global, reset per scan
- CLI direct-run guard pattern via import.meta.url
- Manual argv parsing — no external libs
- Test fixtures under tests/fixtures/<scenario-name>/
Relevant files (verified):
- scanners/token-hotspots.mjs (lines 31, 166-178, 202-229, 270, 299, 321, 338)
- scanners/lib/active-config-reader.mjs (lines 29-39, 556, 593, 618)
- scanners/lib/scoring.mjs (lines 6, 169-200, 184)
- scanners/lib/severity.mjs (lines 14, 21-27)
- scanners/scan-orchestrator.mjs (lines 18-58)
- scanners/self-audit.mjs (lines 154-177)
- scanners/settings-validator.mjs (lines 16-35)
- scanners/lib/suppression.mjs (lines 117-128)
- knowledge/configuration-best-practices.md (line 9)
- knowledge/opus-4.7-patterns.md (1-57)
- tests/lib/active-config-reader.test.mjs, tests/lib/scoring.test.mjs, tests/scanners/token-hotspots.test.mjs, tests/scanners/posture-grade-stability.test.mjs
Reusable code:
- tokenKind() at token-hotspots.mjs:54-63 — extend to map MCP types
- enumeratePlugins() at active-config-reader.mjs:262-305 — for N6 collision
- countPluginItems() + findSkillMdFiles() at active-config-reader.mjs:332-399 — for N6
- parseFrontmatter() from lib/yaml-parser.mjs — for M2 skill description
- discoverConfigFiles() from lib/file-discovery.mjs — for new CLIs
- buildRichRepo() test helper at tests/lib/active-config-reader.test.mjs — extend for MCP fixtures
- runScanner() helper pattern in tests/scanners/token-hotspots.test.mjs — model for new scanner tests
External tech (researched): Anthropic POST /v1/messages/count_tokens endpoint for N5 (rate-limited 1000 req/min)
Recent git activity:
- token-hotspots.mjs, active-config-reader.mjs, severity.mjs are all single-commit cold files (born in 4f1cc7e or a090ed3, never revised)
- scoring.mjs has 2 commits (born + TOK wiring)
- Single owner (KTG); no concurrent branches; all work merges to main
- Straggler-sweep risk: 4 historical events where badge counts/area counts drifted across multiple files in a single feature batch — must plan dedicated doc-consistency pass

Research Sources

Technology	Source	Key Findings	Confidence
Anthropic count_tokens API	Anthropic public docs	`POST /v1/messages/count_tokens` returns `{input_tokens: number}`; 1000 req/min rate limit; requires `ANTHROPIC_API_KEY`	high
MCP tool count detection	MCP spec	`tools/list` requires running server; package.json `tools` field is convention-only, not standard	medium

Implementation Plan

Steps grouped by release stage. Each step has manifest, on-failure, checkpoint. Steps within a stage may be reordered if test gates allow; cross-stage ordering is fixed.

STAGE alpha.1 — TOK rensing + scoring/estimateTokens fix (F1-F5)

Step 1: Export `WEIGHTS` and `riskScore` from severity.mjs (F3 prep)

Files: scanners/lib/severity.mjs
Changes: Promote WEIGHTS const to named export. Verify riskScore already exported.
Reuses: Existing WEIGHTS = { critical: 25, high: 10, medium: 4, low: 1, info: 0 } at line 14.
Test first: tests/lib/severity.test.mjs — assert WEIGHTS.critical === 25 via named import.
Verify: node --test tests/lib/severity.test.mjs → PASS
On failure: revert (single-file change)
Checkpoint: git commit -m "feat(config-audit): export WEIGHTS from severity.mjs (v5 F3 prep)"

Manifest:

expected_paths: [scanners/lib/severity.mjs]
must_contain:
  - { path: scanners/lib/severity.mjs, pattern: "export const WEIGHTS" }
commit_message_pattern: "^feat\\(config-audit\\):"

Step 2: Severity-weighted `scoreByArea` (F3)

Files: scanners/lib/scoring.mjs, tests/lib/scoring.test.mjs
Changes:
1. Add import { WEIGHTS, riskScore } from './severity.mjs'
2. Rewrite scoreByArea non-GAP path (lines 182-186) to penalize via severity-weighted sum: penalty = sum(count[s] * WEIGHTS[s]) / maxBudget; passRate = max(0, 100 - penalty)
3. Add scoringVersion: 'v5' to returned struct (for cross-version drift detection)
Reuses: WEIGHTS from Step 1; existing GAP-tier logic untouched.
Test first: Add describe('scoreByArea — severity weighting') with new factory makeScannerResultWithSeverities. Assert: 1 critical → score < 5 lows → score; clean → 100/A.
Verify: node --test tests/lib/scoring.test.mjs tests/scanners/posture-grade-stability.test.mjs → PASS
On failure: revert + re-evaluate maxBudget formula. Likely tweak: maxBudget = max(10, findingCount * 4).
Checkpoint: git commit -m "feat(config-audit): severity-weighted scoreByArea (v5 F3)"

Manifest:

expected_paths: [scanners/lib/scoring.mjs, tests/lib/scoring.test.mjs]
must_contain:
  - { path: scanners/lib/scoring.mjs, pattern: "import.*WEIGHTS.*riskScore" }
  - { path: scanners/lib/scoring.mjs, pattern: "scoringVersion" }
commit_message_pattern: "^feat\\(config-audit\\):"

Step 3: Audit `baseline-all-a` fixture for F3 compatibility

Files: tests/fixtures/baseline-all-a/ (read-only audit)
Changes: Run scoring against fixture; if any non-info findings drop below 90 score after F3, document and either (a) update fixture to truly minimal correct config, or (b) update test expectations to match v5 semantics with explanatory comment.
Reuses: Existing fixture.
Test first: tests/scanners/posture-grade-stability.test.mjs already asserts grade A on this fixture; if it fails after Step 2, fix fixture.
Verify: node --test tests/scanners/posture-grade-stability.test.mjs → PASS
On failure: retry — tweak fixture to be truly clean (remove any medium+ findings).
Checkpoint: git commit -m "test(config-audit): align baseline-all-a fixture with v5 scoring"

Manifest:

expected_paths: [tests/scanners/posture-grade-stability.test.mjs]
commit_message_pattern: "^(test|fix|chore)\\(config-audit\\):"

Step 4: Add `'mcp'` kind to `estimateTokens` (F2 — function side)

Files: scanners/lib/active-config-reader.mjs, tests/lib/active-config-reader.test.mjs
Changes: Extend estimateTokens(bytes, kind) (lines 29-39):
- new branch kind === 'mcp': if bytes > 0 use ceil(bytes / 3.5) (json-rate); else 500 (base overhead floor)
- Optional second arg toolCount via overload: estimateTokens(bytes, 'mcp', {toolCount}) → max(base, toolCount * 200)
Reuses: Existing 'json' and 'item' branches as patterns.
Test first: Add cases: 'mcp' with 0 bytes → ≥500; 'mcp' with {toolCount: 10} → ≥2000; ratio mcp / item ≥ 30 for 10-tool server.
Verify: node --test tests/lib/active-config-reader.test.mjs → PASS
On failure: revert. Adjust formula if test thresholds unrealistic — but keep the order-of-magnitude differentiation.
Checkpoint: git commit -m "feat(config-audit): add 'mcp' kind to estimateTokens (v5 F2)"

Manifest:

expected_paths: [scanners/lib/active-config-reader.mjs, tests/lib/active-config-reader.test.mjs]
must_contain:
  - { path: scanners/lib/active-config-reader.mjs, pattern: "kind === 'mcp'" }
commit_message_pattern: "^feat\\(config-audit\\):"

Step 5: Migrate MCP/hook callers to use `'mcp'` kind (F2 — caller side)

Files: scanners/lib/active-config-reader.mjs
Changes: Three call sites:
- Line 556 (collectHookEntries): keep 'item' for hooks (hooks don't have schemas) but pass actual byte size when available.
- Line 593 (collectMcpFromFile): kind='mcp', pass { toolCount: server.tools?.length ?? 0 } (will be 0 until N1 wires tool detection — that's fine; base 500 still beats flat 15).
- Line 618 (readActiveMcpServers from .claude.json): same as 593.
Reuses: New 'mcp' kind from Step 4.
Test first: Extend tests/lib/active-config-reader.test.mjs buildRichRepo to include MCP servers; assert returned mcpServers[].estimatedTokens >= 500 (not 15).
Verify: node --test tests/lib/active-config-reader.test.mjs → PASS
On failure: revert. Re-check call sites if test still shows 15.
Checkpoint: git commit -m "fix(config-audit): MCP token callers use 'mcp' kind (v5 F2)"

Manifest:

expected_paths: [scanners/lib/active-config-reader.mjs, tests/lib/active-config-reader.test.mjs]
forbidden_paths: []
commit_message_pattern: "^fix\\(config-audit\\):"

Step 6: Wire `readActiveConfig` into TOK scanner (F1)

Files: scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs, tests/fixtures/tok-active-config/ (new)
Changes:
- Remove void readActiveConfig; at line 31.
- Inside scan(targetPath, discovery): call await readActiveConfig(targetPath, {}) once; if it throws (non-git target), catch and continue with discovery-only behavior. Merge its mcpServers, plugins, skills, claudeMd.estimatedTokens into hotspot ranking input.
- Add new finding source category 'mcp-server', 'plugin', 'skill' for hotspots.
- Unify token estimation paths: the tokenKind() mapper at line 54-63 is used for discovery.files. After Step 5, MCP files in discovery still map to 'json' while MCP servers from readActiveConfig use 'mcp'. Within TOK, prefer readActiveConfig data for MCP/skills/plugins; fall back to discovery only for files not covered by readActiveConfig (e.g., loose claude.json). Document in a 1-line comment.
Reuses: readActiveConfig shape from active-config-reader.mjs:738-827.
Test first: New fixture tok-active-config/ with .mcp.json (2 servers), CLAUDE.md, and .claude-plugin/plugin.json + commands/sample.md (plugin-skeleton). New describe block: assert (a) hotspots.some(h => h.source.includes('mcp')); (b) total estimated tokens > minimal-project total; (c) claudeMd.estimatedTokens > 0 is observable when readActiveConfig was called.
Verify: node --test tests/scanners/token-hotspots.test.mjs → PASS
On failure: revert. Common cause: readActiveConfig requires git root; the try/catch above handles this. Verify discovery-only fallback path works.
Checkpoint: git commit -m "feat(config-audit): TOK consumes readActiveConfig (v5 F1)"

Manifest:

expected_paths:
  - scanners/token-hotspots.mjs
  - tests/scanners/token-hotspots.test.mjs
  - tests/fixtures/tok-active-config/.mcp.json
  - tests/fixtures/tok-active-config/CLAUDE.md
  - tests/fixtures/tok-active-config/.claude-plugin/plugin.json
  - tests/fixtures/tok-active-config/commands/sample.md
must_contain:
  - { path: scanners/token-hotspots.mjs, pattern: "readActiveConfig\\(targetPath" }
forbidden_paths: []
commit_message_pattern: "^feat\\(config-audit\\):"

Step 7: Remove `take` dead-code and hotspot padding (F4)

Files: scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs
Changes: Delete take computation (lines 202-205) and padding while-loop (lines 219-229). Replace with: return ranked.slice(0, HOTSPOTS_MAX) and accept that fewer than HOTSPOTS_MIN may be returned for small projects.
Reuses: HOTSPOTS_MAX constant.
Test first: Add assertion: every hotspot.source is unique; hotspots.length <= discovery.files.length.
Verify: node --test tests/scanners/token-hotspots.test.mjs → PASS
On failure: revert. If hotspots-contract test breaks because some test expects min count, update test to allow fewer.
Checkpoint: git commit -m "fix(config-audit): remove TOK dead take + hotspot padding (v5 F4)"

Manifest:

expected_paths: [scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs]
must_contain:
  - { path: scanners/token-hotspots.mjs, pattern: "ranked\\.slice\\(0, HOTSPOTS_MAX\\)" }
commit_message_pattern: "^fix\\(config-audit\\):"

Step 8: Remove Pattern D `detectSonnetEra` (F5)

Files: scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs
Changes: Delete detectSonnetEra() function (lines 166-178) and its finding emission (lines 335-350). Pattern D and CA-TOK-004 no longer exist.
Reuses: —
Test first: Update opus-47/sonnet-era describe block: assert result.findings.every(f => f.id !== 'CA-TOK-004') AND that the existing fixture now produces zero TOK findings.
Verify: node --test tests/scanners/token-hotspots.test.mjs → PASS AND ! grep -q "detectSonnetEra" scanners/token-hotspots.mjs
On failure: revert. CA-TOK-004 may still exist if any other path emits it; grep confirms none.
Checkpoint: git commit -m "feat(config-audit): remove TOK Pattern D detectSonnetEra (v5 F5)"

Manifest:

expected_paths: [scanners/token-hotspots.mjs]
forbidden_paths: []
must_not_contain:
  - { path: scanners/token-hotspots.mjs, pattern: "detectSonnetEra" }
  - { path: scanners/token-hotspots.mjs, pattern: "CA-TOK-004" }
commit_message_pattern: "^feat\\(config-audit\\):"

Step 8b: Sweep CA-TOK-004 references from docs after F5

Files: commands/tokens.md, knowledge/opus-4.7-patterns.md
Changes:
- commands/tokens.md: replace any CA-TOK-001..004 reference with CA-TOK-001..003 (or list explicitly). Verify no CA-TOK-004 remains.
- knowledge/opus-4.7-patterns.md: remove the Pattern D row from the catalogue table and any text referencing "Pattern D" or CA-TOK-004. Update the pattern count in the document header if mentioned.
Reuses: —
Test first: None (docs).
Verify: ! grep -q "CA-TOK-004" commands/tokens.md knowledge/opus-4.7-patterns.md
On failure: revert.
Checkpoint: git commit -m "docs(config-audit): remove CA-TOK-004 references after F5 (v5)"

Manifest:

expected_paths: [commands/tokens.md, knowledge/opus-4.7-patterns.md]
must_not_contain:
  - { path: commands/tokens.md, pattern: "CA-TOK-004" }
  - { path: knowledge/opus-4.7-patterns.md, pattern: "CA-TOK-004" }
commit_message_pattern: "^docs\\(config-audit\\):"

Step 9: alpha.1 wrap — release notes draft

Files: CHANGELOG.md
Changes: Add ## [5.0.0-alpha.1] entry summarizing F1-F5. Note BREAKING for F3 (severity weighting) and F2 (MCP estimate jump).
Reuses: v4.0.0 entry format.
Test first: None (docs).
Verify: grep -c "5.0.0-alpha.1" CHANGELOG.md → 1
On failure: revert.
Checkpoint: git commit -m "docs(config-audit): CHANGELOG 5.0.0-alpha.1 entry"

Manifest:

expected_paths: [CHANGELOG.md]
must_contain:
  - { path: CHANGELOG.md, pattern: "5\\.0\\.0-alpha\\.1" }
commit_message_pattern: "^docs\\(config-audit\\):"

STAGE alpha.2 — Structural gaps + README self-audit (M1, M2, M4-M6, F6, F7)

Step 10: F7 — Severity recalibration for TOK patterns

Files: scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs
Changes: Recalibrate severity for all 3 remaining patterns based on tokens/turn (Pattern D removed in F5). Each decision is explicit and testable:
- Pattern A (volatile top, line 270): medium → high. Reason: volatile content in cached prefix triggers full re-read of cascade every turn (10k+ tokens/turn cost). Highest single-pattern impact.
- Pattern B (redundant perms, line 299): low → medium. Reason: duplicate tool entries inflate the tool-schema payload sent every turn (~50-200 tokens/turn per duplicate, scales with turns).
- Pattern C (deep imports, line 321): medium → low. Reason: depth alone is structural and only matters at first-load; cache benefits remain. Lower per-turn cost than originally rated. This is an explicit recalibration, not "unchanged".
- Add calibration_note field to each finding's evidence: "severity reflects estimated tokens/turn based on structural heuristic; not measured against runtime telemetry".
Reuses: SEVERITY constants.

Test first: Table-driven test:

const SEVERITY_TABLE = [
  { fixture: 'opus-47/cache-breaking', findingId: 'CA-TOK-001', expected: 'high' },
  { fixture: 'opus-47/redundant-tools', findingId: 'CA-TOK-002', expected: 'medium' },
  { fixture: 'opus-47/deep-imports', findingId: 'CA-TOK-003', expected: 'low' },
];

Iterate with for...of generating it(...) blocks. Each asserts finding.severity === expected.

Verify: node --test tests/scanners/token-hotspots.test.mjs → PASS
On failure: revert. Re-evaluate severities if integration tests break (e.g., posture-grade-stability expects different aggregate).
Checkpoint: git commit -m "feat(config-audit): recalibrate TOK severities for tokens/turn (v5 F7)"

Manifest:

expected_paths: [scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs]
must_contain:
  - { path: scanners/token-hotspots.mjs, pattern: "calibration_note" }
commit_message_pattern: "^feat\\(config-audit\\):"

Step 11: M6 — `additionalDirectories` in KNOWN_KEYS + threshold

Files: scanners/settings-validator.mjs, tests/scanners/settings-validator.test.mjs, tests/fixtures/additional-dirs-many/ (new)
Changes:
- Add 'additionalDirectories' to KNOWN_KEYS (line 16-35).
- New check: if additionalDirectories.length > 2, emit CA-SET-NNN finding (severity low).
Reuses: Existing settings-validator pattern.
Test first: Fixture with 3 entries → 1 finding; fixture with 2 entries → 0 findings; settings without the key → no "unknown key" warning.
Verify: node --test tests/scanners/settings-validator.test.mjs → PASS
On failure: revert.
Checkpoint: git commit -m "feat(config-audit): flag additionalDirectories > 2 (v5 M6)"

Manifest:

expected_paths: [scanners/settings-validator.mjs, tests/fixtures/additional-dirs-many/settings.json]
commit_message_pattern: "^feat\\(config-audit\\):"

Step 12: M4 — CLAUDE.md cascade total finding in TOK

Files: scanners/token-hotspots.mjs, tests/fixtures/large-cascade/ (new)
Changes: New detection in TOK: if activeConfig.claudeMd.estimatedTokens > 10_000, emit finding (severity medium).
Reuses: readActiveConfig integration from Step 6; claudeMd.estimatedTokens field.
Test first: Fixture with CLAUDE.md @-importing 40k+ bytes → finding present; minimal CLAUDE.md → no finding.
Verify: node --test tests/scanners/token-hotspots.test.mjs → PASS
On failure: revert.
Checkpoint: git commit -m "feat(config-audit): TOK flags CLAUDE.md cascade > 10k tokens (v5 M4)"

Manifest:

expected_paths: [scanners/token-hotspots.mjs, tests/fixtures/large-cascade/CLAUDE.md]
commit_message_pattern: "^feat\\(config-audit\\):"

Step 13: M2 — Skill description length finding

Files: scanners/token-hotspots.mjs, tests/fixtures/skill-bloated/ (new)
Changes: New detection in TOK: walk activeConfig.skills, parse each SKILL.md frontmatter; flag any with description > 500 characters as low finding.
Reuses: parseFrontmatter from lib/yaml-parser.mjs; activeConfig.skills from Step 6.
Test first: Fixture with 600-char description → finding; 100-char → no finding.
Verify: node --test tests/scanners/token-hotspots.test.mjs → PASS
On failure: revert.
Checkpoint: git commit -m "feat(config-audit): TOK flags skill description > 500 chars (v5 M2)"

Manifest:

expected_paths: [scanners/token-hotspots.mjs, tests/fixtures/skill-bloated/skills/bloated/SKILL.md]
commit_message_pattern: "^feat\\(config-audit\\):"

Step 14: M1 — MCP tool-count detection (with manifest fallback)

Files: scanners/lib/active-config-reader.mjs, tests/fixtures/mcp-tool-heavy/ (new)
Changes: Extend readActiveMcpServers to attempt tool-count detection in this order:
1. Cached tools/list response at ~/.claude/config-audit/mcp-cache/<server>.json (if exists)
2. package.json tools array on the npm package (if server is npm-resolved)
3. Fallback: emit toolCount: null and a toolCountUnknown: true flag on the server entry Update estimateTokens call (Step 5) to use toolCount when known.
Reuses: Existing MCP enumeration.
Test first: Fixture with mocked package.json tools array of 20 → toolCount === 20; fixture without → toolCount === null.
Verify: node --test tests/lib/active-config-reader.test.mjs → PASS
On failure: revert. Tool-count infrastructure can ship as null everywhere if detection logic fails — N1 still produces baseline finding.
Checkpoint: git commit -m "feat(config-audit): MCP tool-count detection with manifest fallback (v5 M1)"

Manifest:

expected_paths: [scanners/lib/active-config-reader.mjs, tests/fixtures/mcp-tool-heavy/]
commit_message_pattern: "^feat\\(config-audit\\):"

Step 15: M5 — Hook output-size finding

Files: scanners/hook-validator.mjs, tests/fixtures/hooks-verbose/ (new)
Changes: Read each hook script referenced in hooks.json; count console.log / process.stdout.write lines; if > 50, emit CA-HKV-NNN finding (severity low). Static heuristic — no execution.
Reuses: Existing hook-validator file-walking.
Test first: Fixture with hook script containing 60 console.log lines → finding; sparse hook → no finding.
Verify: node --test tests/scanners/hook-validator.test.mjs → PASS
On failure: revert.
Checkpoint: git commit -m "feat(config-audit): HKV flags verbose hook output (v5 M5)"

Manifest:

expected_paths: [scanners/hook-validator.mjs, tests/fixtures/hooks-verbose/]
commit_message_pattern: "^feat\\(config-audit\\):"

Step 16: F6 — `self-audit --check-readme` flag

Files: scanners/self-audit.mjs, tests/scanners/self-audit.test.mjs, tests/fixtures/readme-desynced/ (new)
Changes: Add --check-readme CLI flag. The flag uses filesystem counts as the source of truth, not the README. Counts:
- scanners: count .mjs files matching scanner-shape (have export async function scan AND are in scanners/ not scanners/lib/ and not *-cli.mjs/*-engine.mjs/whats-active.mjs/self-audit.mjs/scan-orchestrator.mjs)
- commands: count .md files in commands/
- agents: count .md files in agents/
- hooks: parse hooks/hooks.json, count distinct event-script pairs
- tests: count .test.mjs files in tests/
- knowledge: count .md files in knowledge/ Parse README badge values via line-anchored substring patterns (NOT regex on URL — use exact " 9 " / "9+" detection). Compare counts; emit low finding per mismatch with expected: <fs_count> and found_in_readme: <badge_value>.
Reuses: Existing runSelfAudit shape; glob-style file enumeration via node:fs/promises.
Test first:
- Fixture readme-desynced/: a mini-plugin layout with commands/foo.md, commands/bar.md (filesystem count = 2) plus a fake README.md with badge "1+ commands" → finding present.
- Self-test (no fixture): run runSelfAudit({checkReadme: true}) against the real plugin; assert result.readmeCheck exists, result.readmeCheck.passed is boolean. Do NOT assert passed === true during alpha/beta phases (allowed to be red until Step 28).
Verify: node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck | type' → "object"
On failure: revert. Most likely cause: scanner-shape detection over-counts; refine to require both export async function scan AND const SCANNER = declarations.
Checkpoint: git commit -m "feat(config-audit): self-audit --check-readme flag (v5 F6)"

Manifest:

expected_paths:
  - scanners/self-audit.mjs
  - tests/scanners/self-audit.test.mjs
  - tests/fixtures/readme-desynced/README.md
  - tests/fixtures/readme-desynced/commands/foo.md
  - tests/fixtures/readme-desynced/commands/bar.md
must_contain:
  - { path: scanners/self-audit.mjs, pattern: "check-readme" }
  - { path: scanners/self-audit.mjs, pattern: "readmeCheck" }
commit_message_pattern: "^feat\\(config-audit\\):"

Step 17: alpha.2 wrap — CHANGELOG entry

Files: CHANGELOG.md
Changes: Add ## [5.0.0-alpha.2] summarizing M1, M2, M4-M6, F6, F7.
Verify: grep -c "5.0.0-alpha.2" CHANGELOG.md → 1
On failure: revert.
Checkpoint: git commit -m "docs(config-audit): CHANGELOG 5.0.0-alpha.2 entry"

Manifest:

expected_paths: [CHANGELOG.md]
commit_message_pattern: "^docs\\(config-audit\\):"

STAGE beta.1 — New scanners (N1-N4, N6)

Step 18: N1 — MCP Tool-Schema Budget finding (CA-TOK-005)

Files: scanners/token-hotspots.mjs, tests/fixtures/mcp-budget/ (new)
Changes: New detection function detectMcpToolBudget(activeConfig). Iterate activeConfig.mcpServers. Tiered severity per server:
- toolCount === null (unknown — fallback chain in M1 returned null): emit finding with severity low and message "tool count unknown — could not parse manifest or cached tools/list" (per Avklaringer M1: flag, don't skip).
- toolCount 0-19: no finding
- 20-49: low
- 50-99: medium
- 100+: high Finding ID: CA-TOK-005 per server flagged. Recommendation: use tools/filter config; reference cache-telemetry recipe from M7. Detection-order pinning: ensure detectMcpToolBudget runs as the 5th detection block in scan() AFTER patterns A, B, C (which always run first regardless of fixture). This makes ID assignment deterministic when all patterns fire. When some patterns don't fire, the ID may shift — tests assert presence and tier-specific severity, not exact ID number.
Reuses: activeConfig.mcpServers with toolCount from Step 14.
Test first: Fixtures: 14 tools (no finding), 25 tools (low), 60 tools (medium), 120 tools (high), null toolCount (low with message containing "unknown"). Tests assert severity and finding.title substring, NOT exact id number.
Verify: node --test tests/scanners/token-hotspots.test.mjs → PASS
On failure: revert.
Checkpoint: git commit -m "feat(config-audit): CA-TOK-005 MCP tool-schema budget (v5 N1)"

Manifest:

expected_paths:
  - scanners/token-hotspots.mjs
  - tests/fixtures/mcp-budget/14-tools/
  - tests/fixtures/mcp-budget/25-tools/
  - tests/fixtures/mcp-budget/60-tools/
  - tests/fixtures/mcp-budget/120-tools/
  - tests/fixtures/mcp-budget/unknown-tools/
must_contain:
  - { path: scanners/token-hotspots.mjs, pattern: "detectMcpToolBudget" }
commit_message_pattern: "^feat\\(config-audit\\):"

Step 19: N2 — System-Prompt Manifest scanner + CLI

Files: scanners/manifest.mjs (new), commands/manifest.md (new), tests/scanners/manifest.test.mjs (new)
Changes: New CLI: node scanners/manifest.mjs <path> [--json] [--output-file]. Output: ranked list of token sources from readActiveConfig (CLAUDE.md cascade entries, plugins, skills, MCP servers, hooks) sorted DESC by estimated_tokens. New slash command /config-audit manifest invokes the CLI and renders a markdown table.
Reuses: readActiveConfig, CLI direct-run pattern, command frontmatter from commands/whats-active.md.
Test first: Two test paths:
- Real-config path (primary): subprocess against the plugin's own root (.) — output.sources length > 0; output.sources[0].estimated_tokens >= output.sources[1].estimated_tokens; output.total >= sum(sources.estimated_tokens) - 1 (rounding tolerance).
- Fixture path (with buildRichRepo helper from tests/lib/active-config-reader.test.mjs): build a tmpdir repo with patched HOME containing 2 plugins + 3 skills + .mcp.json. Run the CLI subprocess against tmpdir with the patched HOME passed via env. Assert sources.length >= 5 (CLAUDE.md cascade + plugins + skills + MCP).
Verify: node --test tests/scanners/manifest.test.mjs → PASS
On failure: revert. If readActiveConfig returns empty for the real-plugin target: check that detectGitRoot resolves to the marketplace root.
Checkpoint: git commit -m "feat(config-audit): /config-audit manifest command (v5 N2)"

Manifest:

expected_paths: [scanners/manifest.mjs, commands/manifest.md, tests/scanners/manifest.test.mjs]
must_contain:
  - { path: scanners/manifest.mjs, pattern: "readActiveConfig" }
  - { path: commands/manifest.md, pattern: "name: manifest" }
commit_message_pattern: "^feat\\(config-audit\\):"

Step 20: N3 — Cache-Prefix Stability Analyzer

Files: scanners/cache-prefix-scanner.mjs (new), scanners/scan-orchestrator.mjs, scanners/lib/scoring.mjs (SCANNER_AREA_MAP), tests/scanners/cache-prefix.test.mjs (new), tests/fixtures/volatile-mid-section/ (new)
Changes: New scanner with prefix CPS. Walks CLAUDE.md cascade; classifies each segment as stable/volatile (using existing volatile patterns from token-hotspots.mjs:38-43 extended with shell-exec ! prefix and ${VAR} patterns). Flags volatility anywhere in cached prefix (not just top 30 lines). Severity medium.
Reuses: VOLATILE_PATTERNS, walkClaudeMdCascade.
Test first: Fixture with !git log at line 60 → finding; fixture with volatile content only at line 200+ → no finding.
Verify: node --test tests/scanners/cache-prefix.test.mjs → PASS
On failure: revert.
Checkpoint: git commit -m "feat(config-audit): cache-prefix stability scanner CPS (v5 N3)"

Manifest:

expected_paths:
  - scanners/cache-prefix-scanner.mjs
  - scanners/scan-orchestrator.mjs
  - scanners/lib/scoring.mjs
  - tests/scanners/cache-prefix.test.mjs
must_contain:
  - { path: scanners/scan-orchestrator.mjs, pattern: "scanCachePrefix|CPS" }
  - { path: scanners/lib/scoring.mjs, pattern: "CPS:" }
commit_message_pattern: "^feat\\(config-audit\\):"

Step 21: N4 — Disabled-Tools-Still-In-Schema Detector

Files: scanners/disabled-in-schema-scanner.mjs (new), scanners/scan-orchestrator.mjs, scanners/lib/scoring.mjs, tests/scanners/disabled-in-schema.test.mjs (new), tests/fixtures/denied-tools-in-schema/ (new)
Changes: New scanner with prefix DIS. Reads cascaded settings.json; finds tools that appear in both permissions.deny and permissions.allow. Severity low.
Reuses: Settings-cascade reading.
Test first: Fixture with Bash in both arrays → finding; clean settings → no finding.
Verify: node --test tests/scanners/disabled-in-schema.test.mjs → PASS
On failure: revert.
Checkpoint: git commit -m "feat(config-audit): disabled-in-schema scanner DIS (v5 N4)"

Manifest:

expected_paths:
  - scanners/disabled-in-schema-scanner.mjs
  - scanners/scan-orchestrator.mjs
  - tests/scanners/disabled-in-schema.test.mjs
commit_message_pattern: "^feat\\(config-audit\\):"

Step 22a: N6 — verify Claude Code skill-namespacing model (research spike)

Files: docs/v5-namespace-research.md (new, gitignored)
Changes: Quick verification spike before building N6. Verify against current Claude Code behavior:
1. When user types /review and both a built-in command and a plugin skill named review exist — which fires? Is invocation namespaced via /plugin:review?
2. When two plugins both expose a skill named review — do their invocation paths differ?
3. User-level skills (under ~/.claude/skills/) — same name as plugin skill — does it collide? Methods: read Claude Code documentation; check existing plugin patterns in marketplace; if uncertain after 10 minutes of research, document the assumption explicitly and proceed with the most defensive interpretation (treat any same-name conflict as a finding).
Reuses: —
Test first: None (research).
Verify: [ -f docs/v5-namespace-research.md ] containing at least: "Built-in vs plugin: ", "Plugin vs plugin: ", "User-level vs plugin: ", "Confidence: <high/medium/low>"
On failure: escalate — if research is inconclusive, ask user before proceeding to Step 22b.
Checkpoint: No commit (file is local-only).

Manifest:

expected_paths: [docs/v5-namespace-research.md]
commit_message_pattern: ".*"

Step 22b: N6 — Cross-Plugin Skill/Command Collision Scanner (CA-COL-001)

Files: scanners/collision-scanner.mjs (new), scanners/scan-orchestrator.mjs, scanners/lib/scoring.mjs, tests/scanners/collision.test.mjs (new), tests/fixtures/collision-plugins/ (new)
Changes: New scanner with prefix COL (Finding ID CA-COL-001). Enumerate plugins via enumeratePlugins. Build maps of skill names and command names by source. Detection logic determined by Step 22a research:
- Plugin-vs-plugin same skill name: finding (severity low) — invocation order ambiguity even if /plugin:skill is supported.
- User-level skill vs plugin skill same name: finding (severity medium) — bare invocation may resolve unpredictably.
- Plugin skill vs Claude Code built-in: finding only if Step 22a confirms collision is real; otherwise no finding (info-level note in CHANGELOG).
- All findings include details.namespaces array describing each conflicting source.
Reuses: enumeratePlugins, countPluginItems, findSkillMdFiles.
Test first: Multi-plugin fixture collision-plugins/:
- Layout: plugins/plugin-a/skills/review/SKILL.md + plugins/plugin-b/skills/review/SKILL.md → finding present (severity low).
- Negative: plugins/plugin-a/skills/review/ + plugins/plugin-b/skills/summarize/ → no finding.
- Positive (user-vs-plugin): user skill at fake-HOME skills/review/SKILL.md + plugin skill plugin-a/skills/review/SKILL.md → finding (severity medium).
- Suppression-glob check: existing CA-TOK-* glob does NOT suppress CA-COL-001.
Verify: node --test tests/scanners/collision.test.mjs → PASS
On failure: revert. False positives indicate namespace model deviation from Step 22a research — revisit research file.
Checkpoint: git commit -m "feat(config-audit): cross-plugin collision scanner COL (v5 N6)"

Manifest:

expected_paths:
  - scanners/collision-scanner.mjs
  - scanners/scan-orchestrator.mjs
  - tests/scanners/collision.test.mjs
  - tests/fixtures/collision-plugins/plugins/plugin-a/skills/review/SKILL.md
  - tests/fixtures/collision-plugins/plugins/plugin-b/skills/review/SKILL.md
must_contain:
  - { path: scanners/collision-scanner.mjs, pattern: "SCANNER = 'COL'" }
  - { path: scanners/scan-orchestrator.mjs, pattern: "scanCollision|COL" }
  - { path: scanners/lib/scoring.mjs, pattern: "COL:" }
commit_message_pattern: "^feat\\(config-audit\\):"

Step 23: beta.1 wrap — CHANGELOG + N1 backward-compat note

Files: CHANGELOG.md
Changes: Add ## [5.0.0-beta.1] entry. Include explicit subsection: ### Known breaking changes — CA-TOK-* glob suppressions in existing .config-audit-ignore files now also match CA-TOK-005 (MCP budget). Document workaround: list CA-TOK-001 CA-TOK-002 CA-TOK-003 explicitly.
Verify: grep -c "CA-TOK-005" CHANGELOG.md → ≥ 1
On failure: revert.
Checkpoint: git commit -m "docs(config-audit): CHANGELOG 5.0.0-beta.1 + N1 breaking note"

Manifest:

expected_paths: [CHANGELOG.md]
must_contain:
  - { path: CHANGELOG.md, pattern: "5\\.0\\.0-beta\\.1" }
  - { path: CHANGELOG.md, pattern: "CA-TOK-005" }
commit_message_pattern: "^docs\\(config-audit\\):"

STAGE rc.1 — Knowledge rensing + tokenizer calibration (M7, M8, N5)

Step 24: M8 — Knowledge-base rensing (Sonnet-era → Opus 4.7)

Files: knowledge/configuration-best-practices.md, knowledge/anti-patterns.md (if relevant)
Changes: Replace "Keep under 200 lines" framing (line 9) with cache-stability guidance: "Place stable content in the first 30 lines (cache-friendly); volatile content (timestamps, dynamic counts) goes below the cache threshold." Add footnote: "200-line threshold was a Sonnet-era adherence heuristic; Opus 4.7 uses prompt-cache structure."
Reuses: Existing knowledge file format.
Test first: None (docs).
Verify: ! grep -q "Keep under 200 lines" knowledge/configuration-best-practices.md
On failure: revert.
Checkpoint: git commit -m "docs(config-audit): knowledge rensing — Opus 4.7 cache-stability guidance (v5 M8)"

Manifest:

expected_paths: [knowledge/configuration-best-practices.md]
forbidden_paths: []
commit_message_pattern: "^docs\\(config-audit\\):"

Step 25: M7 — Cache-telemetry recipe in knowledge/ + flag

Files: knowledge/cache-telemetry-recipe.md (new), commands/tokens.md, scanners/token-hotspots-cli.mjs, tests/scanners/token-hotspots-cli.test.mjs
Changes:
1. New knowledge file documenting how a user can manually verify cache hit rate from session transcripts (parsing cache_read_input_tokens from transcript JSON; recipe is opt-in, NOT bundled scanner logic — keeps non-goal of "no transcript-parsing as core feature").
2. Add --with-telemetry-recipe flag to token-hotspots-cli.mjs: when present, includes telemetry_recipe_path field in JSON output pointing to the knowledge file. Without the flag, output unchanged. Committed as deliverable, not optional.
3. Update commands/tokens.md next-steps to mention --with-telemetry-recipe and link the recipe.
Reuses: Knowledge-file format from opus-4.7-patterns.md; CLI argv-parsing pattern from posture.mjs.
Test first: Subprocess test: node token-hotspots-cli.mjs <fixture> --with-telemetry-recipe --json | jq '.telemetry_recipe_path' → non-empty string ending in cache-telemetry-recipe.md.
Verify: [ -f knowledge/cache-telemetry-recipe.md ] AND node --test tests/scanners/token-hotspots-cli.test.mjs → PASS
On failure: revert.
Checkpoint: git commit -m "docs(config-audit): cache-telemetry recipe + --with-telemetry-recipe flag (v5 M7)"

Manifest:

expected_paths:
  - knowledge/cache-telemetry-recipe.md
  - commands/tokens.md
  - scanners/token-hotspots-cli.mjs
  - tests/scanners/token-hotspots-cli.test.mjs
must_contain:
  - { path: scanners/token-hotspots-cli.mjs, pattern: "with-telemetry-recipe" }
commit_message_pattern: "^docs\\(config-audit\\):"

Step 26: N5 — `--accurate-tokens` API calibration

Files: scanners/token-hotspots-cli.mjs, scanners/lib/tokenizer-api.mjs (new), tests/scanners/accurate-tokens.test.mjs (new)
Prerequisites: Node.js >= 18.13 (for mock.method from node:test). Verify with node --version. If older, escalate.
Changes: New helper module tokenizer-api.mjs exporting async callCountTokensApi(text, apiKey). Wraps fetch('https://api.anthropic.com/v1/messages/count_tokens', ...) with:
- 5-second AbortController timeout
- Exponential backoff on 429 (max 3 retries: 1s, 2s, 4s)
- API key MASKED to ${key.slice(0,8)}... in ANY error message and ANY thrown error
- On non-429 HTTP error: throw Error("count_tokens API failed: " + status) — no body included (body may contain the key in echo'd form)
- Required headers: x-api-key, anthropic-version: 2023-06-01, content-type: application/json
Wire --accurate-tokens into token-hotspots-cli.mjs:
- If process.env.ANTHROPIC_API_KEY present: call callCountTokensApi for the top 3 hotspots' content; populate output.calibration = { actual_tokens: <number>, source: 'count_tokens_api', sampled_hotspots: 3 }.
- If absent: output.calibration = { skipped: 'no-api-key' } and warn to stderr "ANTHROPIC_API_KEY not set — skipping API calibration".
Reuses: Existing CLI pattern, env-var reading.
Test first:
- No-API-key case: subprocess with env: { ...process.env, ANTHROPIC_API_KEY: '' }. Assert exit 0, output calibration.skipped === 'no-api-key'.
- With-key case: import { mock } from 'node:test'. Use mock.method(tokenizerApi, 'callCountTokensApi', () => ({ input_tokens: 4200 })). Run CLI in-process (not subprocess — mock can't cross process boundary). Assert output.calibration.actual_tokens === 4200.
- Error masking: stub callCountTokensApi to throw Error("simulated 401 with key sk-ant-FAKEKEY-1234"). Assert that the JSON output and stderr contain sk-ant-F... and NOT FAKEKEY-1234 (mask works).
Verify: node --test tests/scanners/accurate-tokens.test.mjs → PASS
On failure: revert. Most likely causes:
- mock.method not available — check Node version >= 18.13.
- fetch unavailable — fall back to node:https.
Checkpoint: git commit -m "feat(config-audit): --accurate-tokens API calibration (v5 N5)"
SC-6b note: The brief's SC-6b ("byte-estimat innen ±5% av Anthropic count_tokens-API") cannot be verified by automated tests using a stub — the stub returns a constant. SC-6b is a release gate: before tagging v5.0.0 in Step 30, KTG must run --accurate-tokens against a known fixture with a real ANTHROPIC_API_KEY, manually compare calibration.actual_tokens to byte-estimated tokens for that fixture, and confirm error ≤ ±5%. If error > ±5%, fix the heuristic before tagging.

Manifest:

expected_paths:
  - scanners/token-hotspots-cli.mjs
  - scanners/lib/tokenizer-api.mjs
  - tests/scanners/accurate-tokens.test.mjs
must_contain:
  - { path: scanners/lib/tokenizer-api.mjs, pattern: "count_tokens" }
  - { path: scanners/lib/tokenizer-api.mjs, pattern: "AbortController|signal" }
  - { path: scanners/lib/tokenizer-api.mjs, pattern: "slice\\(0, ?8\\)" }
commit_message_pattern: "^feat\\(config-audit\\):"

Step 27: rc.1 wrap — CHANGELOG entry

Files: CHANGELOG.md
Changes: Add ## [5.0.0-rc.1] summarizing M7, M8, N5.
Verify: grep -c "5.0.0-rc.1" CHANGELOG.md → 1
On failure: revert.
Checkpoint: git commit -m "docs(config-audit): CHANGELOG 5.0.0-rc.1 entry"

Manifest:

expected_paths: [CHANGELOG.md]
commit_message_pattern: "^docs\\(config-audit\\):"

STAGE release — v5.0.0 final

Step 28: README and CLAUDE.md sync (straggler-sweep)

Files: README.md, CLAUDE.md, commands/help.md, commands/posture.md, commands/config-audit.md, agents/feature-gap-agent.md
Changes: Update all badges and counts:
- Scanners: 9 → 12 (TOK extended + CPS + DIS + COL + manifest if counted)
- Commands: 17 → 18 (+ manifest)
- Tests: 543 → final count after all steps (run node --test 'tests/**/*.test.mjs' 2>&1 | grep "tests")
- Hooks: unchanged (4)
- Agents: unchanged (6)
- Knowledge: 7 → 8 (+ cache-telemetry-recipe)
- Quality areas: unchanged (8)
Reuses: Self-audit --check-readme from Step 16 to verify completeness.
Test first: node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck.passed' → true
Verify: Same command above.
On failure: retry — find the missing badge with node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck.mismatches'.
Checkpoint: git commit -m "docs(config-audit): straggler sweep for v5.0.0 — sync all badge counts"

Manifest:

expected_paths: [README.md, CLAUDE.md]
commit_message_pattern: "^docs\\(config-audit\\):"

Step 29: Version bump + final CHANGELOG

Files: .claude-plugin/plugin.json, CHANGELOG.md, README.md (version badge)
Changes: Bump plugin.json version: 4.0.0 → 5.0.0. Add ## [5.0.0] entry to CHANGELOG with ### Summary (consolidated from alpha/beta/rc entries) and ### Breaking changes (F2 token magnitude jump, F3 severity weighting, N1 suppression backward-compat).
Reuses: v4.0.0 entry format.
Test first: [ "$(jq -r .version .claude-plugin/plugin.json)" = "5.0.0" ]
Verify: grep "5.0.0" .claude-plugin/plugin.json && grep "## \[5.0.0\]" CHANGELOG.md
On failure: revert.
Checkpoint: git commit -m "chore(config-audit): bump version to 5.0.0"

Manifest:

expected_paths: [.claude-plugin/plugin.json, CHANGELOG.md, README.md]
must_contain:
  - { path: .claude-plugin/plugin.json, pattern: "\"version\": \"5.0.0\"" }
commit_message_pattern: "^chore\\(config-audit\\):"

Step 30: Final self-audit + SC-6b release gate + green tag

Files: —
Changes:
1. Run full test suite. All 543 v4 tests + new tests must pass.
2. Run node scanners/self-audit.mjs --check-readme. Grade must be A; readmeCheck.passed === true.
3. SC-6b release gate (manual): If ANTHROPIC_API_KEY is set, run node scanners/token-hotspots-cli.mjs <known-fixture> --accurate-tokens --json; compare calibration.actual_tokens against the heuristic byte-estimate for the same fixture; ensure delta ≤ ±5%. Document the comparison in the v5.0.0 CHANGELOG entry. If the user opts out of the SC-6b gate (no API key available), document this in CHANGELOG as "SC-6b verification deferred — ±5% tokenizer accuracy unverified."
4. Tag and push.
Reuses: Self-audit from Step 16; CLI from Step 26.
Test first: node --test 'tests/**/*.test.mjs' 2>&1 | tail -5 — all PASS
Verify:
- node --test 'tests/**/*.test.mjs' → all PASS
- node scanners/self-audit.mjs --check-readme --json | jq -r '.overallGrade + " " + (.readmeCheck.passed | tostring)' → "A true"
- SC-6b gate documented (pass or deferred) in CHANGELOG
- git tag config-audit/v5.0.0
On failure: escalate — if test/grade fails, diagnose and add follow-up steps in this plan; do not tag.
Checkpoint: Tag is the equivalent of a commit. After tag: git push origin main && git push origin config-audit/v5.0.0

Manifest:

expected_paths: []
commit_message_pattern: ".*"

Manifest — objective completion predicate

Every step has a Manifest block with expected_paths, must_contain patterns, and a regex commit_message_pattern. Steps that touch only docs may have empty must_contain.

Failure recovery rules

revert — git checkout -- <files>, restore working tree, do not proceed.
retry — try the alternative described in On failure, revert if still failing.
escalate — stop entirely; human review required (used only at Step 30).

Alternatives Considered

Approach	Pros	Cons	Why rejected
Keep N1 (`CA-TOK-005`) inside `token-hotspots.mjs` (chosen)	Lowest friction; preserves TOK ID namespace; consistent with patterns A-D	Counter is positional; `CA-TOK-005` ID assigned by order of detection, not semantic	Acceptable trade-off; tests assert on finding presence and severity, not exact ID number. The brief specifies CA-TOK-005, which can be enforced by detection order.
Standalone `mcp-budget-scanner.mjs` with prefix `MCB`	Clean separation; new ID namespace; testable in isolation	Diverges from brief's `CA-TOK-005` spec; requires new SCANNER_AREA_MAP entry	Brief explicitly names CA-TOK-005; standalone scanner would force a brief revision.
Defer F3 severity-weighting to v5.1.0	Reduces alpha.1 risk of breaking baselines	Means alpha.1 ships only 4 of 7 must-fix items; brief's primary goal "reality-based token-optimization" depends on F3	Brief lists F3 as must-fix and ties it directly to v5.0.0 success criteria.
Bundle N5 (live tokenizer) into v5.1.0	Removes API-key risk surface from v5.0.0	User explicitly confirmed N5 in v5.0.0 (Avklaringer 2026-05-01); features list specifies opt-in via flag, mitigating risk	User confirmed scope explicitly.
Use external lib like `tiktoken` for N5	Higher accuracy	Violates zero-deps convention (CLAUDE.md "null avhengigheter")	Convention is hard rule.

Test Strategy

Framework: node:test + node:assert/strict
Existing patterns:
- Scanner tests: runScanner(fixtureName) helper that resets counter + runs full discovery+scan
- Lib tests: factory functions (makeScannerResult) for in-memory input data
- Lib integration: buildRichRepo() tmpdir with patched HOME
- CLI tests: execFile/exec subprocess + parse stdout JSON
New tests in this plan: approximately 60 new test cases across 13 test files
Coverage gating: Per revised SC-10 — every F-fix and M-fix has ≥1 test; every new scanner (N1-N4, N6) has ≥1 fixture-backed test; F3 has severity-mix table; N5 has both API-key-present and -absent cases.

Tests to write

Type	File	Verifies	Model test
Unit	`tests/lib/severity.test.mjs`	WEIGHTS exported	existing severity tests
Unit	`tests/lib/scoring.test.mjs`	severity-weighted area score	makeScannerResult pattern
Unit	`tests/lib/active-config-reader.test.mjs`	'mcp' kind differentiation	existing estimateTokens cases
Integration	`tests/lib/active-config-reader.test.mjs`	MCP servers report >500 tokens	buildRichRepo extension
Scanner	`tests/scanners/token-hotspots.test.mjs`	F1, F4, F5, F7, M2, M4, N1	runScanner pattern
Scanner	`tests/scanners/settings-validator.test.mjs`	M6 additionalDirectories	existing validator tests
Scanner	`tests/scanners/hook-validator.test.mjs`	M5 verbose hook output	existing hook tests
Scanner	`tests/scanners/cache-prefix.test.mjs` (new)	N3 mid-section volatility	runScanner pattern
Scanner	`tests/scanners/disabled-in-schema.test.mjs` (new)	N4 deny+allow conflict	runScanner pattern
Scanner	`tests/scanners/collision.test.mjs` (new)	N6 cross-plugin collision	multi-plugin fixture
CLI	`tests/scanners/manifest.test.mjs` (new)	N2 manifest CLI	execFile pattern
CLI	`tests/scanners/accurate-tokens.test.mjs` (new)	N5 API + no-API paths	mock.method first use
Self-audit	`tests/scanners/self-audit.test.mjs`	F6 --check-readme shape	existing runSelfAudit test

Risks and Mitigations

Priority	Risk	Location	Impact	Mitigation
Critical	F3 silently degrades grades for users with v4 baselines	scoring.mjs:184 (rewritten in Step 2)	Drift comparisons produce wrong deltas	Add `scoringVersion: 'v5'` to envelope meta (Step 2). diff-engine warns on cross-version compare in v5.0.1 patch (out of scope here)
Critical	F2 jump from 15 → 5000+ tokens per MCP collapses Token Efficiency grades	Step 5	User's Grade A becomes Grade C overnight	CHANGELOG explicit BREAKING note (Step 9, 23, 29). Document in `commands/posture.md` next-steps
Critical	N5 API-key leak via error message or JSON output	Step 26	Key persisted in session files / logs	`tokenizer-api.mjs` masks key to first 8 chars; never includes key in JSON; explicit test for masking
High	F3 baseline-all-a fixture may fail	Step 3	Test suite blocks at alpha.1	Step 3 dedicated to fixture audit; `posture-grade-stability.test.mjs` updated if needed
High	N1 tool-count threshold flagging real-world MCP servers (GitHub MCP has 28+ tools)	Step 18	False-positive findings train users to suppress	Tiered severity: <20=none, 20-49=low, 50-99=medium, 100+=high (Step 18)
High	N6 namespace confusion (plugin-skill vs user-skill vs built-in)	Step 22	Every plugin with skill named `review` flagged	Scanner only compares same-namespace items; built-ins excluded; documented in scanner comment
High	N5 rate-limit (1000/min) exhausted in CI loop	Step 26	Mid-scan crash; user's main quota impacted	3 retries with exponential backoff; 5-sec timeout; `--accurate-tokens-max-files` future flag (out of scope)
Medium	Cascade-volatility false positives on inline date references	Step 20	Noise findings	Keep line-anchored regex; negative fixture for inline dates
Medium	F6 self-audit fragile to README formatting changes	Step 16	Hard-blocks every release	Use exact line-anchored substring (not URL regex); badge mismatch is `low` severity (advisory, not fail)
Medium	findingCounter is process-global; new scanners interfere if they call `finding()` outside orchestrator	All N* steps	Wrong IDs in tests	All new scanners follow single-`scan()` entry; no nested calls
Medium	Suppression backward-compat: `CA-TOK-*` glob suppresses CA-TOK-005	Step 18+23	Users miss highest-value finding	Documented in CHANGELOG (Step 23). One-time runtime warning is out of scope (v5.0.1 candidate)
Low	Network failure on N5 hangs 30s	Step 26	Bad UX	5-sec AbortController timeout, immediate stderr message
Low	Knowledge-base rensing breaks Sonnet-version users	Step 24	Outdated guidance	Reframe with footnote, not delete

Assumptions

#	Assumption	Why unverifiable	Impact if wrong
1	Anthropic `count_tokens` endpoint accepts plain text payload and returns `{input_tokens: number}`	Brief premise; not tested in this codebase	N5 produces wrong calibration values; falls back gracefully
2	MCP servers expose tool count via `tools/list` or package.json `tools` field	MCP spec is evolving; servers vary	M1/N1 detection silently returns null; CA-TOK-005 finding may not fire on real servers; baseline behavior is "no finding" not "wrong finding"
3	`readActiveConfig` is performant enough to call from TOK on large repos	Untested at scale	TOK scanner becomes slow; fix: cache `activeConfig` in scan-orchestrator and pass to scanners (out of scope)
4	`posture-grade-stability.test.mjs` baseline-all-a fixture is genuinely info-only after v4 work	Assumed from naming + git history	Step 3 catches and fixes
5	Cross-plugin collision detection model (plugin-namespaced skills don't collide) is correct	Documented in N6 description but not in Anthropic specs	False positives/negatives on plugin-namespacing; verified via test fixture

Verification

End-to-end checks after Step 30 completes (these mirror the brief's revised SCs):

SC-10 (revised): node --test 'tests/**/*.test.mjs' → all green AND original 543 tests still pass AND ≥ 1 fixture-backed test exists per new scanner function (F1, F2, F3, M1, M2, M4, M5, M6, N1-N4, N6) — verified by file presence:
- tests/lib/active-config-reader.test.mjs — F2 'mcp' kind cases
- tests/lib/scoring.test.mjs — F3 severity-mix cases
- tests/scanners/token-hotspots.test.mjs — F1, F4, F5, F7, M2, M4, N1 cases
- tests/scanners/settings-validator.test.mjs — M6 cases
- tests/scanners/hook-validator.test.mjs — M5 cases
- tests/scanners/manifest.test.mjs — N2 cases (new file)
- tests/scanners/cache-prefix.test.mjs — N3 cases (new file)
- tests/scanners/disabled-in-schema.test.mjs — N4 cases (new file)
- tests/scanners/collision.test.mjs — N6 cases (new file)
- tests/scanners/accurate-tokens.test.mjs — N5 cases (new file)
- tests/scanners/self-audit.test.mjs — F6 cases
SC-1 (F1): ! grep -q "void readActiveConfig" scanners/token-hotspots.mjs AND grep -q "readActiveConfig(targetPath" scanners/token-hotspots.mjs
SC-2 (F2): grep -q "kind === 'mcp'" scanners/lib/active-config-reader.mjs
SC-3 (F3): grep -q "import.*WEIGHTS.*riskScore\|import.*riskScore.*WEIGHTS" scanners/lib/scoring.mjs
SC-4 (F6): node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck.passed' → true
SC-5 (N1): node --test tests/scanners/token-hotspots.test.mjs --grep "mcp-budget" → PASS
SC-6a (N2): node scanners/manifest.mjs . --json | jq '.sources | length' → > 0 AND output sorted DESC by estimated_tokens
SC-6b (N5): Manual gate — release-time verification of ±5% accuracy with real API key (Step 30); pass OR documented deferral in CHANGELOG
SC-7 (N3): node --test tests/scanners/cache-prefix.test.mjs → PASS
SC-8 (N6): node --test tests/scanners/collision.test.mjs → PASS
SC-9 (M8): ! grep -q "Keep under 200 lines" knowledge/configuration-best-practices.md
SC-11 (N5): Both API-key-present and -absent paths covered in tests/scanners/accurate-tokens.test.mjs
F5 cleanup: ! grep -q "detectSonnetEra\|CA-TOK-004" scanners/token-hotspots.mjs commands/tokens.md knowledge/opus-4.7-patterns.md
Release: [ "$(jq -r .version .claude-plugin/plugin.json)" = "5.0.0" ]
Git: git log --oneline -50 | grep -c "v5" ≥ 5 (one per stage)

Estimated Scope

Files to modify: 18 (incl. commands/tokens.md and knowledge/opus-4.7-patterns.md per Step 8b)
Files to create: ~22 (5 new scanners + 1 lib + 1 command + 13 fixture dirs + 5 new test files + 1 research doc)
Steps: 31 (was 30; added Step 8b for CA-TOK-004 reference cleanup, Step 22a for namespace research spike)
Complexity: high (cross-cutting changes across scoring, tokenization, scanner registry, knowledge base)

Execution Strategy

The plan has 30 steps grouped into 5 sessions matching release stages. Sessions are sequential — alpha.1 must land before alpha.2, etc. Within a session, some steps are parallel-safe but for clarity all run sequentially.

Session 1 — alpha.1 (TOK rensing + scoring fix)

Steps: 1-9 (includes Step 8b for CA-TOK-004 reference cleanup)
Wave: 1
Depends on: none
Scope fence:
- Touch: scanners/lib/severity.mjs, scanners/lib/scoring.mjs, scanners/lib/active-config-reader.mjs, scanners/token-hotspots.mjs, tests/lib/{severity,scoring,active-config-reader}.test.mjs, tests/scanners/token-hotspots.test.mjs, tests/scanners/posture-grade-stability.test.mjs, tests/fixtures/tok-active-config/, commands/tokens.md (Step 8b), knowledge/opus-4.7-patterns.md (Step 8b), CHANGELOG.md
- Never touch: any scanner other than TOK; any new scanner files (those land later)

Session 2 — alpha.2 (structural gaps + README self-audit)

Steps: 10-17
Wave: 2
Depends on: Session 1
Scope fence:
- Touch: scanners/{token-hotspots,settings-validator,hook-validator,self-audit}.mjs, scanners/lib/active-config-reader.mjs, tests/scanners/{settings-validator,hook-validator,self-audit,token-hotspots}.test.mjs, tests/fixtures/{additional-dirs-many,large-cascade,skill-bloated,mcp-tool-heavy,hooks-verbose,readme-desynced}/, CHANGELOG.md
- Never touch: scanner-orchestrator (no new scanners yet); knowledge/ (later)

Session 3 — beta.1 (new scanners)

Steps: 18, 19, 20, 21, 22a (research spike), 22b (collision scanner), 23
Wave: 3
Depends on: Session 2
Scope fence:
- Touch: scanners/token-hotspots.mjs (N1), scanners/{manifest,cache-prefix-scanner,disabled-in-schema-scanner,collision-scanner}.mjs (new), scanners/scan-orchestrator.mjs, scanners/lib/scoring.mjs (SCANNER_AREA_MAP only), commands/manifest.md (new), 5 new test files, 4 new fixture dirs, docs/v5-namespace-research.md (gitignored), CHANGELOG.md
- Never touch: any other scanner code

Session 4 — rc.1 (knowledge + tokenizer)

Steps: 24-27
Wave: 4
Depends on: Session 3
Scope fence:
- Touch: knowledge/{configuration-best-practices,cache-telemetry-recipe}.md, commands/tokens.md, scanners/token-hotspots-cli.mjs, scanners/lib/tokenizer-api.mjs (new), tests/scanners/accurate-tokens.test.mjs (new), CHANGELOG.md
- Never touch: scanner code beyond CLI

Session 5 — release (v5.0.0 final)

Steps: 28-30
Wave: 5
Depends on: Session 4
Scope fence:
- Touch: README.md, CLAUDE.md, commands/{help,posture,config-audit}.md, agents/feature-gap-agent.md, .claude-plugin/plugin.json, CHANGELOG.md
- Never touch: any code; this is documentation + tag

Execution Order

Wave 1: Session 1 (alpha.1)
Wave 2: Session 2 (alpha.2) — after Wave 1
Wave 3: Session 3 (beta.1) — after Wave 2
Wave 4: Session 4 (rc.1) — after Wave 3
Wave 5: Session 5 (release) — after Wave 4

Grouping rules applied

Steps sharing files → same session (e.g., all TOK changes in Session 1+2)
New-scanner steps → Session 3 (post structural)
Knowledge/CLI changes → Session 4 (post all scanners)
Doc-sync + version-bump → Session 5 (last, depends on all counts being final)

Plan Quality Score

Dimension	Weight	Score	Notes
Structural integrity	0.15	88	Sessions ordered by dependencies; Step 22a research spike resolves namespace ambiguity before 22b
Step quality	0.20	85	All TBDs resolved; F7 explicit decision on Pattern C; Step 16 fs-counted not README-counted
Coverage completeness	0.20	92	All 22 brief items mapped; F5 documentation cleanup added (8b); SC-6b release gate documented
Specification quality	0.15	86	File paths verified; manifest must_not_contain replaces vacuous regex; Node version pinned for N5
Risk & pre-mortem	0.15	88	13 risks; namespace research spike resolves N6 mitigation circularity
Headless readiness	0.10	84	All steps have On Failure + Checkpoint; manifest blocks updated to use must_not_contain where appropriate
Manifest quality	0.05	78	must_contain + must_not_contain; fixture file paths fully enumerated for Step 6/14/18
Weighted total	1.00	86.6	Grade: B+

Adversarial review:

Plan critic: initial verdict REPLAN (5 blockers, 8 majors, 7 minors, score 67.7); all blockers + majors addressed in revisions
Scope guardian: initial verdict MIXED (4 scope-gaps); all 4 gaps addressed in revisions

Revisions

#	Finding	Severity	Resolution
1	Plan header "TBD"	blocker	Updated to "B+ (84/100)" after re-scoring
2	Step 25 "TBD if needed" flag	blocker	Committed `--with-telemetry-recipe` flag as deliverable; added test
3	Step 8 manifest `^(?!.*detectSonnetEra)` is logically vacuous	blocker	Replaced with `must_not_contain` field; added explicit grep verify
4	Step 6 fixture incomplete in expected_paths	blocker	Enumerated 4 fixture files: `.mcp.json`, `CLAUDE.md`, `.claude-plugin/plugin.json`, `commands/sample.md`
5	CA-TOK-004 references in `commands/tokens.md` and `knowledge/opus-4.7-patterns.md` after F5	blocker	Added Step 8b: dedicated cleanup step with grep verify
6	Step 12 missing test for `claudeMd.estimatedTokens` field shape	major	Added assertion to Step 6 test (item c)
7	Step 18 missing toolCount=null handling	major	Added explicit `null` branch with `low` severity + "tool count unknown" message
8	Step 3 ordering vs Step 10 grade-stability re-invalidation	major	Step 10's table-driven test now checks per-finding severity; Step 3 audits remain at fixture-level grade
9	N6 namespace assumption is circular mitigation	major	Added Step 22a research spike with explicit verdict file before 22b implementation
10	Step 16 negative-case test depends on Step 28 docs sweep	major	Step 16 now uses filesystem counts as truth (not README); fs-counted detection breaks the cycle
11	Step 19 `marketplace-large` fixture issue with manifest CLI	major	Added two test paths: real-config (plugin root) + fixture-based with `buildRichRepo` helper
12	Step 26 mock.method Node version requirement	major	Added prerequisite check: Node >= 18.13; documented in step + escalation path
13	estimateTokens kind inconsistency between discovery and readActiveConfig paths	major	Step 6 unifies: prefer readActiveConfig data for MCP/skills/plugins; discovery only for files not covered
14	F7 Pattern C left "unchanged" without rationale	scope-gap	Step 10 now explicitly recalibrates Pattern C: `medium` → `low` with reason; table-driven test asserts
15	M7 `--with-telemetry-recipe` flag was conditional	scope-gap	Same as Revision 2 — committed as deliverable
16	SC-6b ±5% accuracy unprovable in automation	scope-gap	Step 30 added manual release gate with documented deferral path
17	SC-10 verification used old "≥600 tests" threshold	scope-gap	Verification section rewritten to per-feature coverage requirement
18-24	Various minors (docs file naming, manifest enumeration, CHANGELOG specifics)	minor	Addressed in their respective steps

67 KiB Raw Permalink Blame History Unescape Escape

config-audit v5.0.0 — Implementation Plan

Context

Architecture Diagram

Codebase Analysis

Research Sources

Implementation Plan

STAGE alpha.1 — TOK rensing + scoring/estimateTokens fix (F1-F5)

Step 1: Export WEIGHTS and riskScore from severity.mjs (F3 prep)

Step 2: Severity-weighted scoreByArea (F3)

Step 3: Audit baseline-all-a fixture for F3 compatibility

Step 4: Add 'mcp' kind to estimateTokens (F2 — function side)

Step 5: Migrate MCP/hook callers to use 'mcp' kind (F2 — caller side)

Step 6: Wire readActiveConfig into TOK scanner (F1)

Step 7: Remove take dead-code and hotspot padding (F4)

Step 8: Remove Pattern D detectSonnetEra (F5)

Step 8b: Sweep CA-TOK-004 references from docs after F5

Step 9: alpha.1 wrap — release notes draft

STAGE alpha.2 — Structural gaps + README self-audit (M1, M2, M4-M6, F6, F7)

Step 10: F7 — Severity recalibration for TOK patterns

Step 11: M6 — additionalDirectories in KNOWN_KEYS + threshold

Step 12: M4 — CLAUDE.md cascade total finding in TOK

Step 13: M2 — Skill description length finding

Step 14: M1 — MCP tool-count detection (with manifest fallback)

Step 15: M5 — Hook output-size finding

Step 16: F6 — self-audit --check-readme flag

Step 17: alpha.2 wrap — CHANGELOG entry

STAGE beta.1 — New scanners (N1-N4, N6)

Step 18: N1 — MCP Tool-Schema Budget finding (CA-TOK-005)

Step 19: N2 — System-Prompt Manifest scanner + CLI

Step 20: N3 — Cache-Prefix Stability Analyzer

Step 21: N4 — Disabled-Tools-Still-In-Schema Detector

Step 22a: N6 — verify Claude Code skill-namespacing model (research spike)

Step 22b: N6 — Cross-Plugin Skill/Command Collision Scanner (CA-COL-001)

Step 23: beta.1 wrap — CHANGELOG + N1 backward-compat note

STAGE rc.1 — Knowledge rensing + tokenizer calibration (M7, M8, N5)

Step 24: M8 — Knowledge-base rensing (Sonnet-era → Opus 4.7)

Step 25: M7 — Cache-telemetry recipe in knowledge/ + flag

Step 26: N5 — --accurate-tokens API calibration

Step 27: rc.1 wrap — CHANGELOG entry

STAGE release — v5.0.0 final

Step 28: README and CLAUDE.md sync (straggler-sweep)

Step 29: Version bump + final CHANGELOG

Step 30: Final self-audit + SC-6b release gate + green tag

Manifest — objective completion predicate

Failure recovery rules

Alternatives Considered

Test Strategy

Tests to write

Risks and Mitigations

Assumptions

Verification

Estimated Scope

Execution Strategy

Session 1 — alpha.1 (TOK rensing + scoring fix)

Session 2 — alpha.2 (structural gaps + README self-audit)

Session 3 — beta.1 (new scanners)

Session 4 — rc.1 (knowledge + tokenizer)

Session 5 — release (v5.0.0 final)

Execution Order

Grouping rules applied

Plan Quality Score

Revisions

67 KiB

Raw Permalink Blame History

Step 1: Export `WEIGHTS` and `riskScore` from severity.mjs (F3 prep)

Step 2: Severity-weighted `scoreByArea` (F3)

Step 3: Audit `baseline-all-a` fixture for F3 compatibility

Step 4: Add `'mcp'` kind to `estimateTokens` (F2 — function side)

Step 5: Migrate MCP/hook callers to use `'mcp'` kind (F2 — caller side)

Step 6: Wire `readActiveConfig` into TOK scanner (F1)

Step 7: Remove `take` dead-code and hotspot padding (F4)

Step 8: Remove Pattern D `detectSonnetEra` (F5)

Step 11: M6 — `additionalDirectories` in KNOWN_KEYS + threshold

Step 16: F6 — `self-audit --check-readme` flag

Step 26: N5 — `--accurate-tokens` API calibration