ktg-plugin-marketplace/plugins/config-audit/docs/v5-plan.md
Kjell Tore Guttormsen 4bd7cd5056 docs(config-audit): v5.0.0 brief + implementation plan
Planning artifacts for v5.0.0 (token-economy round):

- v5-brief.md: scope brief with 22 items (F1-F7 + M1-M8 + N1-N7), revised
  with Avklaringer-section after critical review (N7 dropped, M3+N6 merged,
  N5 promoted to v5.0.0, SC-6/SC-10 reformulated)
- v5-plan.md: 31-step implementation plan in 5 sessions
  (alpha.1 → alpha.2 → beta.1 → rc.1 → release). B+ score (84/100) after
  plan-critic + scope-guardian review addressed all blockers/majors/gaps.
- v5-implementation-log.md: per-session status record (skeleton)

Sessions track via state files (REMEMBER.md, TODO.md gitignored;
implementation-log.md committed; NEXT-SESSION-PROMPT.local.md gitignored).

No code changes in this commit — planning only.
2026-05-01 06:10:44 +02:00

67 KiB
Raw Permalink Blame History

config-audit v5.0.0 — Implementation Plan

Plan quality: B+ (84/100) — adversarial review complete, revisions applied

Generated by ultraplan-local v3.0.0 on 2026-05-01 — plan_version: 1.7 Source brief: docs/v5-brief.md (revised 2026-05-01) Revised after plan-critic + scope-guardian review on 2026-05-01 (see Revisions section)

Context

config-audit v4.0.0 markets itself as "Opus 4.7-aware token optimization" but the critical review (briefed 2026-04-19, revised 2026-05-01) shows the marketing does not hold:

  • TOK scanner imports readActiveConfig and explicitly voids it (void readActiveConfig at scanners/token-hotspots.mjs:31) — never sees plugins, skills, MCP, or cascade.
  • 4 TOK patterns cover ~29% of identified Opus 4.7 cost drivers; the largest sinks (MCP tool-schema bloat, skill-description bloat, CLAUDE.md cascade total) have zero coverage.
  • estimateTokens flattens MCP servers and hooks to 15 tokens each via three caller sites passing kind='item' (active-config-reader.mjs:556, 593, 618). Reality is 2k20k per MCP.
  • scoreByArea treats severities equally: 1 critical and 1 info produce identical area score.
  • Pattern D (detectSonnetEra) contradicts the plugin's own v3.0 policy (minimal correct = Grade A).

v5.0.0 is a token-economy round, not a rewrite. The 8 structural scanners, backup/rollback, suppression, plugin-health all stay. The TOK scanner is reworked in place; new scanners (MCP budget, manifest, cache-prefix, disabled-in-schema, collision) are added; severity-aware scoring lands; tokenizer calibration via Anthropic count_tokens ships as opt-in.

Architecture Diagram

graph TD
    subgraph "v5.0.0 changes"
        TOK[token-hotspots.mjs<br/>F1/F4/F5/F7]
        ACR[active-config-reader.mjs<br/>F2 + 'mcp' kind]
        SCR[scoring.mjs<br/>F3 severity-weighted]
        SVR[severity.mjs<br/>WEIGHTS export]
        SAU[self-audit.mjs<br/>F6 --check-readme]
        ORC[scan-orchestrator.mjs<br/>register new scanners]
        CLI[token-hotspots-cli.mjs<br/>N5 --accurate-tokens]

        MCB[NEW: mcp-budget-scanner.mjs<br/>N1 CA-TOK-005]
        MAN[NEW: manifest.mjs<br/>N2 CLI + scanner]
        CPS[NEW: cache-prefix-scanner.mjs<br/>N3]
        DIS[NEW: disabled-in-schema-scanner.mjs<br/>N4]
        COL[NEW: collision-scanner.mjs<br/>N6 CA-COL-001]

        SET[settings-validator.mjs<br/>M6 additionalDirectories]
        KB[knowledge/<br/>M7 cache-recipe + M8 rensing]

        ORC --> TOK & MCB & CPS & DIS & COL
        TOK --> ACR
        SCR --> SVR
        ACR -. F2 fix .-> ACR
        CLI --> ACR
        SAU --> README
    end

Codebase Analysis

  • Tech stack: Node.js >= 18, ES modules (.mjs), node:test, zero external deps
  • Test framework: node:test + node:assert/strict — 543 tests across 31 files in v4.0.0
  • Key patterns:
    • Scanner orchestrator + shared discovery object (scan-orchestrator.mjs:73)
    • Finding factory finding({scanner, severity, ...}) produces CA-{SCANNER}-{NNN} IDs (output.mjs:31); counter is process-global, reset per scan
    • CLI direct-run guard pattern via import.meta.url
    • Manual argv parsing — no external libs
    • Test fixtures under tests/fixtures/<scenario-name>/
  • Relevant files (verified):
    • scanners/token-hotspots.mjs (lines 31, 166-178, 202-229, 270, 299, 321, 338)
    • scanners/lib/active-config-reader.mjs (lines 29-39, 556, 593, 618)
    • scanners/lib/scoring.mjs (lines 6, 169-200, 184)
    • scanners/lib/severity.mjs (lines 14, 21-27)
    • scanners/scan-orchestrator.mjs (lines 18-58)
    • scanners/self-audit.mjs (lines 154-177)
    • scanners/settings-validator.mjs (lines 16-35)
    • scanners/lib/suppression.mjs (lines 117-128)
    • knowledge/configuration-best-practices.md (line 9)
    • knowledge/opus-4.7-patterns.md (1-57)
    • tests/lib/active-config-reader.test.mjs, tests/lib/scoring.test.mjs, tests/scanners/token-hotspots.test.mjs, tests/scanners/posture-grade-stability.test.mjs
  • Reusable code:
    • tokenKind() at token-hotspots.mjs:54-63 — extend to map MCP types
    • enumeratePlugins() at active-config-reader.mjs:262-305 — for N6 collision
    • countPluginItems() + findSkillMdFiles() at active-config-reader.mjs:332-399 — for N6
    • parseFrontmatter() from lib/yaml-parser.mjs — for M2 skill description
    • discoverConfigFiles() from lib/file-discovery.mjs — for new CLIs
    • buildRichRepo() test helper at tests/lib/active-config-reader.test.mjs — extend for MCP fixtures
    • runScanner() helper pattern in tests/scanners/token-hotspots.test.mjs — model for new scanner tests
  • External tech (researched): Anthropic POST /v1/messages/count_tokens endpoint for N5 (rate-limited 1000 req/min)
  • Recent git activity:
    • token-hotspots.mjs, active-config-reader.mjs, severity.mjs are all single-commit cold files (born in 4f1cc7e or a090ed3, never revised)
    • scoring.mjs has 2 commits (born + TOK wiring)
    • Single owner (KTG); no concurrent branches; all work merges to main
    • Straggler-sweep risk: 4 historical events where badge counts/area counts drifted across multiple files in a single feature batch — must plan dedicated doc-consistency pass

Research Sources

Technology Source Key Findings Confidence
Anthropic count_tokens API Anthropic public docs POST /v1/messages/count_tokens returns {input_tokens: number}; 1000 req/min rate limit; requires ANTHROPIC_API_KEY high
MCP tool count detection MCP spec tools/list requires running server; package.json tools field is convention-only, not standard medium

Implementation Plan

Steps grouped by release stage. Each step has manifest, on-failure, checkpoint. Steps within a stage may be reordered if test gates allow; cross-stage ordering is fixed.


STAGE alpha.1 — TOK rensing + scoring/estimateTokens fix (F1-F5)

Step 1: Export WEIGHTS and riskScore from severity.mjs (F3 prep)

  • Files: scanners/lib/severity.mjs
  • Changes: Promote WEIGHTS const to named export. Verify riskScore already exported.
  • Reuses: Existing WEIGHTS = { critical: 25, high: 10, medium: 4, low: 1, info: 0 } at line 14.
  • Test first: tests/lib/severity.test.mjs — assert WEIGHTS.critical === 25 via named import.
  • Verify: node --test tests/lib/severity.test.mjs → PASS
  • On failure: revert (single-file change)
  • Checkpoint: git commit -m "feat(config-audit): export WEIGHTS from severity.mjs (v5 F3 prep)"
  • Manifest:
    expected_paths: [scanners/lib/severity.mjs]
    must_contain:
      - { path: scanners/lib/severity.mjs, pattern: "export const WEIGHTS" }
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 2: Severity-weighted scoreByArea (F3)

  • Files: scanners/lib/scoring.mjs, tests/lib/scoring.test.mjs
  • Changes:
    1. Add import { WEIGHTS, riskScore } from './severity.mjs'
    2. Rewrite scoreByArea non-GAP path (lines 182-186) to penalize via severity-weighted sum: penalty = sum(count[s] * WEIGHTS[s]) / maxBudget; passRate = max(0, 100 - penalty)
    3. Add scoringVersion: 'v5' to returned struct (for cross-version drift detection)
  • Reuses: WEIGHTS from Step 1; existing GAP-tier logic untouched.
  • Test first: Add describe('scoreByArea — severity weighting') with new factory makeScannerResultWithSeverities. Assert: 1 critical → score < 5 lows → score; clean → 100/A.
  • Verify: node --test tests/lib/scoring.test.mjs tests/scanners/posture-grade-stability.test.mjs → PASS
  • On failure: revert + re-evaluate maxBudget formula. Likely tweak: maxBudget = max(10, findingCount * 4).
  • Checkpoint: git commit -m "feat(config-audit): severity-weighted scoreByArea (v5 F3)"
  • Manifest:
    expected_paths: [scanners/lib/scoring.mjs, tests/lib/scoring.test.mjs]
    must_contain:
      - { path: scanners/lib/scoring.mjs, pattern: "import.*WEIGHTS.*riskScore" }
      - { path: scanners/lib/scoring.mjs, pattern: "scoringVersion" }
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 3: Audit baseline-all-a fixture for F3 compatibility

  • Files: tests/fixtures/baseline-all-a/ (read-only audit)
  • Changes: Run scoring against fixture; if any non-info findings drop below 90 score after F3, document and either (a) update fixture to truly minimal correct config, or (b) update test expectations to match v5 semantics with explanatory comment.
  • Reuses: Existing fixture.
  • Test first: tests/scanners/posture-grade-stability.test.mjs already asserts grade A on this fixture; if it fails after Step 2, fix fixture.
  • Verify: node --test tests/scanners/posture-grade-stability.test.mjs → PASS
  • On failure: retry — tweak fixture to be truly clean (remove any medium+ findings).
  • Checkpoint: git commit -m "test(config-audit): align baseline-all-a fixture with v5 scoring"
  • Manifest:
    expected_paths: [tests/scanners/posture-grade-stability.test.mjs]
    commit_message_pattern: "^(test|fix|chore)\\(config-audit\\):"
    

Step 4: Add 'mcp' kind to estimateTokens (F2 — function side)

  • Files: scanners/lib/active-config-reader.mjs, tests/lib/active-config-reader.test.mjs
  • Changes: Extend estimateTokens(bytes, kind) (lines 29-39):
    • new branch kind === 'mcp': if bytes > 0 use ceil(bytes / 3.5) (json-rate); else 500 (base overhead floor)
    • Optional second arg toolCount via overload: estimateTokens(bytes, 'mcp', {toolCount}) → max(base, toolCount * 200)
  • Reuses: Existing 'json' and 'item' branches as patterns.
  • Test first: Add cases: 'mcp' with 0 bytes → ≥500; 'mcp' with {toolCount: 10} → ≥2000; ratio mcp / item ≥ 30 for 10-tool server.
  • Verify: node --test tests/lib/active-config-reader.test.mjs → PASS
  • On failure: revert. Adjust formula if test thresholds unrealistic — but keep the order-of-magnitude differentiation.
  • Checkpoint: git commit -m "feat(config-audit): add 'mcp' kind to estimateTokens (v5 F2)"
  • Manifest:
    expected_paths: [scanners/lib/active-config-reader.mjs, tests/lib/active-config-reader.test.mjs]
    must_contain:
      - { path: scanners/lib/active-config-reader.mjs, pattern: "kind === 'mcp'" }
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 5: Migrate MCP/hook callers to use 'mcp' kind (F2 — caller side)

  • Files: scanners/lib/active-config-reader.mjs
  • Changes: Three call sites:
    • Line 556 (collectHookEntries): keep 'item' for hooks (hooks don't have schemas) but pass actual byte size when available.
    • Line 593 (collectMcpFromFile): kind='mcp', pass { toolCount: server.tools?.length ?? 0 } (will be 0 until N1 wires tool detection — that's fine; base 500 still beats flat 15).
    • Line 618 (readActiveMcpServers from .claude.json): same as 593.
  • Reuses: New 'mcp' kind from Step 4.
  • Test first: Extend tests/lib/active-config-reader.test.mjs buildRichRepo to include MCP servers; assert returned mcpServers[].estimatedTokens >= 500 (not 15).
  • Verify: node --test tests/lib/active-config-reader.test.mjs → PASS
  • On failure: revert. Re-check call sites if test still shows 15.
  • Checkpoint: git commit -m "fix(config-audit): MCP token callers use 'mcp' kind (v5 F2)"
  • Manifest:
    expected_paths: [scanners/lib/active-config-reader.mjs, tests/lib/active-config-reader.test.mjs]
    forbidden_paths: []
    commit_message_pattern: "^fix\\(config-audit\\):"
    

Step 6: Wire readActiveConfig into TOK scanner (F1)

  • Files: scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs, tests/fixtures/tok-active-config/ (new)
  • Changes:
    • Remove void readActiveConfig; at line 31.
    • Inside scan(targetPath, discovery): call await readActiveConfig(targetPath, {}) once; if it throws (non-git target), catch and continue with discovery-only behavior. Merge its mcpServers, plugins, skills, claudeMd.estimatedTokens into hotspot ranking input.
    • Add new finding source category 'mcp-server', 'plugin', 'skill' for hotspots.
    • Unify token estimation paths: the tokenKind() mapper at line 54-63 is used for discovery.files. After Step 5, MCP files in discovery still map to 'json' while MCP servers from readActiveConfig use 'mcp'. Within TOK, prefer readActiveConfig data for MCP/skills/plugins; fall back to discovery only for files not covered by readActiveConfig (e.g., loose claude.json). Document in a 1-line comment.
  • Reuses: readActiveConfig shape from active-config-reader.mjs:738-827.
  • Test first: New fixture tok-active-config/ with .mcp.json (2 servers), CLAUDE.md, and .claude-plugin/plugin.json + commands/sample.md (plugin-skeleton). New describe block: assert (a) hotspots.some(h => h.source.includes('mcp')); (b) total estimated tokens > minimal-project total; (c) claudeMd.estimatedTokens > 0 is observable when readActiveConfig was called.
  • Verify: node --test tests/scanners/token-hotspots.test.mjs → PASS
  • On failure: revert. Common cause: readActiveConfig requires git root; the try/catch above handles this. Verify discovery-only fallback path works.
  • Checkpoint: git commit -m "feat(config-audit): TOK consumes readActiveConfig (v5 F1)"
  • Manifest:
    expected_paths:
      - scanners/token-hotspots.mjs
      - tests/scanners/token-hotspots.test.mjs
      - tests/fixtures/tok-active-config/.mcp.json
      - tests/fixtures/tok-active-config/CLAUDE.md
      - tests/fixtures/tok-active-config/.claude-plugin/plugin.json
      - tests/fixtures/tok-active-config/commands/sample.md
    must_contain:
      - { path: scanners/token-hotspots.mjs, pattern: "readActiveConfig\\(targetPath" }
    forbidden_paths: []
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 7: Remove take dead-code and hotspot padding (F4)

  • Files: scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs
  • Changes: Delete take computation (lines 202-205) and padding while-loop (lines 219-229). Replace with: return ranked.slice(0, HOTSPOTS_MAX) and accept that fewer than HOTSPOTS_MIN may be returned for small projects.
  • Reuses: HOTSPOTS_MAX constant.
  • Test first: Add assertion: every hotspot.source is unique; hotspots.length <= discovery.files.length.
  • Verify: node --test tests/scanners/token-hotspots.test.mjs → PASS
  • On failure: revert. If hotspots-contract test breaks because some test expects min count, update test to allow fewer.
  • Checkpoint: git commit -m "fix(config-audit): remove TOK dead take + hotspot padding (v5 F4)"
  • Manifest:
    expected_paths: [scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs]
    must_contain:
      - { path: scanners/token-hotspots.mjs, pattern: "ranked\\.slice\\(0, HOTSPOTS_MAX\\)" }
    commit_message_pattern: "^fix\\(config-audit\\):"
    

Step 8: Remove Pattern D detectSonnetEra (F5)

  • Files: scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs
  • Changes: Delete detectSonnetEra() function (lines 166-178) and its finding emission (lines 335-350). Pattern D and CA-TOK-004 no longer exist.
  • Reuses:
  • Test first: Update opus-47/sonnet-era describe block: assert result.findings.every(f => f.id !== 'CA-TOK-004') AND that the existing fixture now produces zero TOK findings.
  • Verify: node --test tests/scanners/token-hotspots.test.mjs → PASS AND ! grep -q "detectSonnetEra" scanners/token-hotspots.mjs
  • On failure: revert. CA-TOK-004 may still exist if any other path emits it; grep confirms none.
  • Checkpoint: git commit -m "feat(config-audit): remove TOK Pattern D detectSonnetEra (v5 F5)"
  • Manifest:
    expected_paths: [scanners/token-hotspots.mjs]
    forbidden_paths: []
    must_not_contain:
      - { path: scanners/token-hotspots.mjs, pattern: "detectSonnetEra" }
      - { path: scanners/token-hotspots.mjs, pattern: "CA-TOK-004" }
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 8b: Sweep CA-TOK-004 references from docs after F5

  • Files: commands/tokens.md, knowledge/opus-4.7-patterns.md
  • Changes:
    • commands/tokens.md: replace any CA-TOK-001..004 reference with CA-TOK-001..003 (or list explicitly). Verify no CA-TOK-004 remains.
    • knowledge/opus-4.7-patterns.md: remove the Pattern D row from the catalogue table and any text referencing "Pattern D" or CA-TOK-004. Update the pattern count in the document header if mentioned.
  • Reuses:
  • Test first: None (docs).
  • Verify: ! grep -q "CA-TOK-004" commands/tokens.md knowledge/opus-4.7-patterns.md
  • On failure: revert.
  • Checkpoint: git commit -m "docs(config-audit): remove CA-TOK-004 references after F5 (v5)"
  • Manifest:
    expected_paths: [commands/tokens.md, knowledge/opus-4.7-patterns.md]
    must_not_contain:
      - { path: commands/tokens.md, pattern: "CA-TOK-004" }
      - { path: knowledge/opus-4.7-patterns.md, pattern: "CA-TOK-004" }
    commit_message_pattern: "^docs\\(config-audit\\):"
    

Step 9: alpha.1 wrap — release notes draft

  • Files: CHANGELOG.md
  • Changes: Add ## [5.0.0-alpha.1] entry summarizing F1-F5. Note BREAKING for F3 (severity weighting) and F2 (MCP estimate jump).
  • Reuses: v4.0.0 entry format.
  • Test first: None (docs).
  • Verify: grep -c "5.0.0-alpha.1" CHANGELOG.md → 1
  • On failure: revert.
  • Checkpoint: git commit -m "docs(config-audit): CHANGELOG 5.0.0-alpha.1 entry"
  • Manifest:
    expected_paths: [CHANGELOG.md]
    must_contain:
      - { path: CHANGELOG.md, pattern: "5\\.0\\.0-alpha\\.1" }
    commit_message_pattern: "^docs\\(config-audit\\):"
    

STAGE alpha.2 — Structural gaps + README self-audit (M1, M2, M4-M6, F6, F7)

Step 10: F7 — Severity recalibration for TOK patterns

  • Files: scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs
  • Changes: Recalibrate severity for all 3 remaining patterns based on tokens/turn (Pattern D removed in F5). Each decision is explicit and testable:
    • Pattern A (volatile top, line 270): mediumhigh. Reason: volatile content in cached prefix triggers full re-read of cascade every turn (10k+ tokens/turn cost). Highest single-pattern impact.
    • Pattern B (redundant perms, line 299): lowmedium. Reason: duplicate tool entries inflate the tool-schema payload sent every turn (~50-200 tokens/turn per duplicate, scales with turns).
    • Pattern C (deep imports, line 321): mediumlow. Reason: depth alone is structural and only matters at first-load; cache benefits remain. Lower per-turn cost than originally rated. This is an explicit recalibration, not "unchanged".
    • Add calibration_note field to each finding's evidence: "severity reflects estimated tokens/turn based on structural heuristic; not measured against runtime telemetry".
  • Reuses: SEVERITY constants.
  • Test first: Table-driven test:
    const SEVERITY_TABLE = [
      { fixture: 'opus-47/cache-breaking', findingId: 'CA-TOK-001', expected: 'high' },
      { fixture: 'opus-47/redundant-tools', findingId: 'CA-TOK-002', expected: 'medium' },
      { fixture: 'opus-47/deep-imports', findingId: 'CA-TOK-003', expected: 'low' },
    ];
    
    Iterate with for...of generating it(...) blocks. Each asserts finding.severity === expected.
  • Verify: node --test tests/scanners/token-hotspots.test.mjs → PASS
  • On failure: revert. Re-evaluate severities if integration tests break (e.g., posture-grade-stability expects different aggregate).
  • Checkpoint: git commit -m "feat(config-audit): recalibrate TOK severities for tokens/turn (v5 F7)"
  • Manifest:
    expected_paths: [scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs]
    must_contain:
      - { path: scanners/token-hotspots.mjs, pattern: "calibration_note" }
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 11: M6 — additionalDirectories in KNOWN_KEYS + threshold

  • Files: scanners/settings-validator.mjs, tests/scanners/settings-validator.test.mjs, tests/fixtures/additional-dirs-many/ (new)
  • Changes:
    • Add 'additionalDirectories' to KNOWN_KEYS (line 16-35).
    • New check: if additionalDirectories.length > 2, emit CA-SET-NNN finding (severity low).
  • Reuses: Existing settings-validator pattern.
  • Test first: Fixture with 3 entries → 1 finding; fixture with 2 entries → 0 findings; settings without the key → no "unknown key" warning.
  • Verify: node --test tests/scanners/settings-validator.test.mjs → PASS
  • On failure: revert.
  • Checkpoint: git commit -m "feat(config-audit): flag additionalDirectories > 2 (v5 M6)"
  • Manifest:
    expected_paths: [scanners/settings-validator.mjs, tests/fixtures/additional-dirs-many/settings.json]
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 12: M4 — CLAUDE.md cascade total finding in TOK

  • Files: scanners/token-hotspots.mjs, tests/fixtures/large-cascade/ (new)
  • Changes: New detection in TOK: if activeConfig.claudeMd.estimatedTokens > 10_000, emit finding (severity medium).
  • Reuses: readActiveConfig integration from Step 6; claudeMd.estimatedTokens field.
  • Test first: Fixture with CLAUDE.md @-importing 40k+ bytes → finding present; minimal CLAUDE.md → no finding.
  • Verify: node --test tests/scanners/token-hotspots.test.mjs → PASS
  • On failure: revert.
  • Checkpoint: git commit -m "feat(config-audit): TOK flags CLAUDE.md cascade > 10k tokens (v5 M4)"
  • Manifest:
    expected_paths: [scanners/token-hotspots.mjs, tests/fixtures/large-cascade/CLAUDE.md]
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 13: M2 — Skill description length finding

  • Files: scanners/token-hotspots.mjs, tests/fixtures/skill-bloated/ (new)
  • Changes: New detection in TOK: walk activeConfig.skills, parse each SKILL.md frontmatter; flag any with description > 500 characters as low finding.
  • Reuses: parseFrontmatter from lib/yaml-parser.mjs; activeConfig.skills from Step 6.
  • Test first: Fixture with 600-char description → finding; 100-char → no finding.
  • Verify: node --test tests/scanners/token-hotspots.test.mjs → PASS
  • On failure: revert.
  • Checkpoint: git commit -m "feat(config-audit): TOK flags skill description > 500 chars (v5 M2)"
  • Manifest:
    expected_paths: [scanners/token-hotspots.mjs, tests/fixtures/skill-bloated/skills/bloated/SKILL.md]
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 14: M1 — MCP tool-count detection (with manifest fallback)

  • Files: scanners/lib/active-config-reader.mjs, tests/fixtures/mcp-tool-heavy/ (new)
  • Changes: Extend readActiveMcpServers to attempt tool-count detection in this order:
    1. Cached tools/list response at ~/.claude/config-audit/mcp-cache/<server>.json (if exists)
    2. package.json tools array on the npm package (if server is npm-resolved)
    3. Fallback: emit toolCount: null and a toolCountUnknown: true flag on the server entry Update estimateTokens call (Step 5) to use toolCount when known.
  • Reuses: Existing MCP enumeration.
  • Test first: Fixture with mocked package.json tools array of 20 → toolCount === 20; fixture without → toolCount === null.
  • Verify: node --test tests/lib/active-config-reader.test.mjs → PASS
  • On failure: revert. Tool-count infrastructure can ship as null everywhere if detection logic fails — N1 still produces baseline finding.
  • Checkpoint: git commit -m "feat(config-audit): MCP tool-count detection with manifest fallback (v5 M1)"
  • Manifest:
    expected_paths: [scanners/lib/active-config-reader.mjs, tests/fixtures/mcp-tool-heavy/]
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 15: M5 — Hook output-size finding

  • Files: scanners/hook-validator.mjs, tests/fixtures/hooks-verbose/ (new)
  • Changes: Read each hook script referenced in hooks.json; count console.log / process.stdout.write lines; if > 50, emit CA-HKV-NNN finding (severity low). Static heuristic — no execution.
  • Reuses: Existing hook-validator file-walking.
  • Test first: Fixture with hook script containing 60 console.log lines → finding; sparse hook → no finding.
  • Verify: node --test tests/scanners/hook-validator.test.mjs → PASS
  • On failure: revert.
  • Checkpoint: git commit -m "feat(config-audit): HKV flags verbose hook output (v5 M5)"
  • Manifest:
    expected_paths: [scanners/hook-validator.mjs, tests/fixtures/hooks-verbose/]
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 16: F6 — self-audit --check-readme flag

  • Files: scanners/self-audit.mjs, tests/scanners/self-audit.test.mjs, tests/fixtures/readme-desynced/ (new)
  • Changes: Add --check-readme CLI flag. The flag uses filesystem counts as the source of truth, not the README. Counts:
    • scanners: count .mjs files matching scanner-shape (have export async function scan AND are in scanners/ not scanners/lib/ and not *-cli.mjs/*-engine.mjs/whats-active.mjs/self-audit.mjs/scan-orchestrator.mjs)
    • commands: count .md files in commands/
    • agents: count .md files in agents/
    • hooks: parse hooks/hooks.json, count distinct event-script pairs
    • tests: count .test.mjs files in tests/
    • knowledge: count .md files in knowledge/ Parse README badge values via line-anchored substring patterns (NOT regex on URL — use exact " 9 " / "9+" detection). Compare counts; emit low finding per mismatch with expected: <fs_count> and found_in_readme: <badge_value>.
  • Reuses: Existing runSelfAudit shape; glob-style file enumeration via node:fs/promises.
  • Test first:
    • Fixture readme-desynced/: a mini-plugin layout with commands/foo.md, commands/bar.md (filesystem count = 2) plus a fake README.md with badge "1+ commands" → finding present.
    • Self-test (no fixture): run runSelfAudit({checkReadme: true}) against the real plugin; assert result.readmeCheck exists, result.readmeCheck.passed is boolean. Do NOT assert passed === true during alpha/beta phases (allowed to be red until Step 28).
  • Verify: node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck | type'"object"
  • On failure: revert. Most likely cause: scanner-shape detection over-counts; refine to require both export async function scan AND const SCANNER = declarations.
  • Checkpoint: git commit -m "feat(config-audit): self-audit --check-readme flag (v5 F6)"
  • Manifest:
    expected_paths:
      - scanners/self-audit.mjs
      - tests/scanners/self-audit.test.mjs
      - tests/fixtures/readme-desynced/README.md
      - tests/fixtures/readme-desynced/commands/foo.md
      - tests/fixtures/readme-desynced/commands/bar.md
    must_contain:
      - { path: scanners/self-audit.mjs, pattern: "check-readme" }
      - { path: scanners/self-audit.mjs, pattern: "readmeCheck" }
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 17: alpha.2 wrap — CHANGELOG entry

  • Files: CHANGELOG.md
  • Changes: Add ## [5.0.0-alpha.2] summarizing M1, M2, M4-M6, F6, F7.
  • Verify: grep -c "5.0.0-alpha.2" CHANGELOG.md → 1
  • On failure: revert.
  • Checkpoint: git commit -m "docs(config-audit): CHANGELOG 5.0.0-alpha.2 entry"
  • Manifest:
    expected_paths: [CHANGELOG.md]
    commit_message_pattern: "^docs\\(config-audit\\):"
    

STAGE beta.1 — New scanners (N1-N4, N6)

Step 18: N1 — MCP Tool-Schema Budget finding (CA-TOK-005)

  • Files: scanners/token-hotspots.mjs, tests/fixtures/mcp-budget/ (new)
  • Changes: New detection function detectMcpToolBudget(activeConfig). Iterate activeConfig.mcpServers. Tiered severity per server:
    • toolCount === null (unknown — fallback chain in M1 returned null): emit finding with severity low and message "tool count unknown — could not parse manifest or cached tools/list" (per Avklaringer M1: flag, don't skip).
    • toolCount 0-19: no finding
    • 20-49: low
    • 50-99: medium
    • 100+: high Finding ID: CA-TOK-005 per server flagged. Recommendation: use tools/filter config; reference cache-telemetry recipe from M7. Detection-order pinning: ensure detectMcpToolBudget runs as the 5th detection block in scan() AFTER patterns A, B, C (which always run first regardless of fixture). This makes ID assignment deterministic when all patterns fire. When some patterns don't fire, the ID may shift — tests assert presence and tier-specific severity, not exact ID number.
  • Reuses: activeConfig.mcpServers with toolCount from Step 14.
  • Test first: Fixtures: 14 tools (no finding), 25 tools (low), 60 tools (medium), 120 tools (high), null toolCount (low with message containing "unknown"). Tests assert severity and finding.title substring, NOT exact id number.
  • Verify: node --test tests/scanners/token-hotspots.test.mjs → PASS
  • On failure: revert.
  • Checkpoint: git commit -m "feat(config-audit): CA-TOK-005 MCP tool-schema budget (v5 N1)"
  • Manifest:
    expected_paths:
      - scanners/token-hotspots.mjs
      - tests/fixtures/mcp-budget/14-tools/
      - tests/fixtures/mcp-budget/25-tools/
      - tests/fixtures/mcp-budget/60-tools/
      - tests/fixtures/mcp-budget/120-tools/
      - tests/fixtures/mcp-budget/unknown-tools/
    must_contain:
      - { path: scanners/token-hotspots.mjs, pattern: "detectMcpToolBudget" }
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 19: N2 — System-Prompt Manifest scanner + CLI

  • Files: scanners/manifest.mjs (new), commands/manifest.md (new), tests/scanners/manifest.test.mjs (new)
  • Changes: New CLI: node scanners/manifest.mjs <path> [--json] [--output-file]. Output: ranked list of token sources from readActiveConfig (CLAUDE.md cascade entries, plugins, skills, MCP servers, hooks) sorted DESC by estimated_tokens. New slash command /config-audit manifest invokes the CLI and renders a markdown table.
  • Reuses: readActiveConfig, CLI direct-run pattern, command frontmatter from commands/whats-active.md.
  • Test first: Two test paths:
    • Real-config path (primary): subprocess against the plugin's own root (.) — output.sources length > 0; output.sources[0].estimated_tokens >= output.sources[1].estimated_tokens; output.total >= sum(sources.estimated_tokens) - 1 (rounding tolerance).
    • Fixture path (with buildRichRepo helper from tests/lib/active-config-reader.test.mjs): build a tmpdir repo with patched HOME containing 2 plugins + 3 skills + .mcp.json. Run the CLI subprocess against tmpdir with the patched HOME passed via env. Assert sources.length >= 5 (CLAUDE.md cascade + plugins + skills + MCP).
  • Verify: node --test tests/scanners/manifest.test.mjs → PASS
  • On failure: revert. If readActiveConfig returns empty for the real-plugin target: check that detectGitRoot resolves to the marketplace root.
  • Checkpoint: git commit -m "feat(config-audit): /config-audit manifest command (v5 N2)"
  • Manifest:
    expected_paths: [scanners/manifest.mjs, commands/manifest.md, tests/scanners/manifest.test.mjs]
    must_contain:
      - { path: scanners/manifest.mjs, pattern: "readActiveConfig" }
      - { path: commands/manifest.md, pattern: "name: manifest" }
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 20: N3 — Cache-Prefix Stability Analyzer

  • Files: scanners/cache-prefix-scanner.mjs (new), scanners/scan-orchestrator.mjs, scanners/lib/scoring.mjs (SCANNER_AREA_MAP), tests/scanners/cache-prefix.test.mjs (new), tests/fixtures/volatile-mid-section/ (new)
  • Changes: New scanner with prefix CPS. Walks CLAUDE.md cascade; classifies each segment as stable/volatile (using existing volatile patterns from token-hotspots.mjs:38-43 extended with shell-exec ! prefix and ${VAR} patterns). Flags volatility anywhere in cached prefix (not just top 30 lines). Severity medium.
  • Reuses: VOLATILE_PATTERNS, walkClaudeMdCascade.
  • Test first: Fixture with !git log at line 60 → finding; fixture with volatile content only at line 200+ → no finding.
  • Verify: node --test tests/scanners/cache-prefix.test.mjs → PASS
  • On failure: revert.
  • Checkpoint: git commit -m "feat(config-audit): cache-prefix stability scanner CPS (v5 N3)"
  • Manifest:
    expected_paths:
      - scanners/cache-prefix-scanner.mjs
      - scanners/scan-orchestrator.mjs
      - scanners/lib/scoring.mjs
      - tests/scanners/cache-prefix.test.mjs
    must_contain:
      - { path: scanners/scan-orchestrator.mjs, pattern: "scanCachePrefix|CPS" }
      - { path: scanners/lib/scoring.mjs, pattern: "CPS:" }
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 21: N4 — Disabled-Tools-Still-In-Schema Detector

  • Files: scanners/disabled-in-schema-scanner.mjs (new), scanners/scan-orchestrator.mjs, scanners/lib/scoring.mjs, tests/scanners/disabled-in-schema.test.mjs (new), tests/fixtures/denied-tools-in-schema/ (new)
  • Changes: New scanner with prefix DIS. Reads cascaded settings.json; finds tools that appear in both permissions.deny and permissions.allow. Severity low.
  • Reuses: Settings-cascade reading.
  • Test first: Fixture with Bash in both arrays → finding; clean settings → no finding.
  • Verify: node --test tests/scanners/disabled-in-schema.test.mjs → PASS
  • On failure: revert.
  • Checkpoint: git commit -m "feat(config-audit): disabled-in-schema scanner DIS (v5 N4)"
  • Manifest:
    expected_paths:
      - scanners/disabled-in-schema-scanner.mjs
      - scanners/scan-orchestrator.mjs
      - tests/scanners/disabled-in-schema.test.mjs
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 22a: N6 — verify Claude Code skill-namespacing model (research spike)

  • Files: docs/v5-namespace-research.md (new, gitignored)
  • Changes: Quick verification spike before building N6. Verify against current Claude Code behavior:
    1. When user types /review and both a built-in command and a plugin skill named review exist — which fires? Is invocation namespaced via /plugin:review?
    2. When two plugins both expose a skill named review — do their invocation paths differ?
    3. User-level skills (under ~/.claude/skills/) — same name as plugin skill — does it collide? Methods: read Claude Code documentation; check existing plugin patterns in marketplace; if uncertain after 10 minutes of research, document the assumption explicitly and proceed with the most defensive interpretation (treat any same-name conflict as a finding).
  • Reuses:
  • Test first: None (research).
  • Verify: [ -f docs/v5-namespace-research.md ] containing at least: "Built-in vs plugin: ", "Plugin vs plugin: ", "User-level vs plugin: ", "Confidence: <high/medium/low>"
  • On failure: escalate — if research is inconclusive, ask user before proceeding to Step 22b.
  • Checkpoint: No commit (file is local-only).
  • Manifest:
    expected_paths: [docs/v5-namespace-research.md]
    commit_message_pattern: ".*"
    

Step 22b: N6 — Cross-Plugin Skill/Command Collision Scanner (CA-COL-001)

  • Files: scanners/collision-scanner.mjs (new), scanners/scan-orchestrator.mjs, scanners/lib/scoring.mjs, tests/scanners/collision.test.mjs (new), tests/fixtures/collision-plugins/ (new)
  • Changes: New scanner with prefix COL (Finding ID CA-COL-001). Enumerate plugins via enumeratePlugins. Build maps of skill names and command names by source. Detection logic determined by Step 22a research:
    • Plugin-vs-plugin same skill name: finding (severity low) — invocation order ambiguity even if /plugin:skill is supported.
    • User-level skill vs plugin skill same name: finding (severity medium) — bare invocation may resolve unpredictably.
    • Plugin skill vs Claude Code built-in: finding only if Step 22a confirms collision is real; otherwise no finding (info-level note in CHANGELOG).
    • All findings include details.namespaces array describing each conflicting source.
  • Reuses: enumeratePlugins, countPluginItems, findSkillMdFiles.
  • Test first: Multi-plugin fixture collision-plugins/:
    • Layout: plugins/plugin-a/skills/review/SKILL.md + plugins/plugin-b/skills/review/SKILL.md → finding present (severity low).
    • Negative: plugins/plugin-a/skills/review/ + plugins/plugin-b/skills/summarize/ → no finding.
    • Positive (user-vs-plugin): user skill at fake-HOME skills/review/SKILL.md + plugin skill plugin-a/skills/review/SKILL.md → finding (severity medium).
    • Suppression-glob check: existing CA-TOK-* glob does NOT suppress CA-COL-001.
  • Verify: node --test tests/scanners/collision.test.mjs → PASS
  • On failure: revert. False positives indicate namespace model deviation from Step 22a research — revisit research file.
  • Checkpoint: git commit -m "feat(config-audit): cross-plugin collision scanner COL (v5 N6)"
  • Manifest:
    expected_paths:
      - scanners/collision-scanner.mjs
      - scanners/scan-orchestrator.mjs
      - tests/scanners/collision.test.mjs
      - tests/fixtures/collision-plugins/plugins/plugin-a/skills/review/SKILL.md
      - tests/fixtures/collision-plugins/plugins/plugin-b/skills/review/SKILL.md
    must_contain:
      - { path: scanners/collision-scanner.mjs, pattern: "SCANNER = 'COL'" }
      - { path: scanners/scan-orchestrator.mjs, pattern: "scanCollision|COL" }
      - { path: scanners/lib/scoring.mjs, pattern: "COL:" }
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 23: beta.1 wrap — CHANGELOG + N1 backward-compat note

  • Files: CHANGELOG.md
  • Changes: Add ## [5.0.0-beta.1] entry. Include explicit subsection: ### Known breaking changesCA-TOK-* glob suppressions in existing .config-audit-ignore files now also match CA-TOK-005 (MCP budget). Document workaround: list CA-TOK-001 CA-TOK-002 CA-TOK-003 explicitly.
  • Verify: grep -c "CA-TOK-005" CHANGELOG.md → ≥ 1
  • On failure: revert.
  • Checkpoint: git commit -m "docs(config-audit): CHANGELOG 5.0.0-beta.1 + N1 breaking note"
  • Manifest:
    expected_paths: [CHANGELOG.md]
    must_contain:
      - { path: CHANGELOG.md, pattern: "5\\.0\\.0-beta\\.1" }
      - { path: CHANGELOG.md, pattern: "CA-TOK-005" }
    commit_message_pattern: "^docs\\(config-audit\\):"
    

STAGE rc.1 — Knowledge rensing + tokenizer calibration (M7, M8, N5)

Step 24: M8 — Knowledge-base rensing (Sonnet-era → Opus 4.7)

  • Files: knowledge/configuration-best-practices.md, knowledge/anti-patterns.md (if relevant)
  • Changes: Replace "Keep under 200 lines" framing (line 9) with cache-stability guidance: "Place stable content in the first 30 lines (cache-friendly); volatile content (timestamps, dynamic counts) goes below the cache threshold." Add footnote: "200-line threshold was a Sonnet-era adherence heuristic; Opus 4.7 uses prompt-cache structure."
  • Reuses: Existing knowledge file format.
  • Test first: None (docs).
  • Verify: ! grep -q "Keep under 200 lines" knowledge/configuration-best-practices.md
  • On failure: revert.
  • Checkpoint: git commit -m "docs(config-audit): knowledge rensing — Opus 4.7 cache-stability guidance (v5 M8)"
  • Manifest:
    expected_paths: [knowledge/configuration-best-practices.md]
    forbidden_paths: []
    commit_message_pattern: "^docs\\(config-audit\\):"
    

Step 25: M7 — Cache-telemetry recipe in knowledge/ + flag

  • Files: knowledge/cache-telemetry-recipe.md (new), commands/tokens.md, scanners/token-hotspots-cli.mjs, tests/scanners/token-hotspots-cli.test.mjs
  • Changes:
    1. New knowledge file documenting how a user can manually verify cache hit rate from session transcripts (parsing cache_read_input_tokens from transcript JSON; recipe is opt-in, NOT bundled scanner logic — keeps non-goal of "no transcript-parsing as core feature").
    2. Add --with-telemetry-recipe flag to token-hotspots-cli.mjs: when present, includes telemetry_recipe_path field in JSON output pointing to the knowledge file. Without the flag, output unchanged. Committed as deliverable, not optional.
    3. Update commands/tokens.md next-steps to mention --with-telemetry-recipe and link the recipe.
  • Reuses: Knowledge-file format from opus-4.7-patterns.md; CLI argv-parsing pattern from posture.mjs.
  • Test first: Subprocess test: node token-hotspots-cli.mjs <fixture> --with-telemetry-recipe --json | jq '.telemetry_recipe_path' → non-empty string ending in cache-telemetry-recipe.md.
  • Verify: [ -f knowledge/cache-telemetry-recipe.md ] AND node --test tests/scanners/token-hotspots-cli.test.mjs → PASS
  • On failure: revert.
  • Checkpoint: git commit -m "docs(config-audit): cache-telemetry recipe + --with-telemetry-recipe flag (v5 M7)"
  • Manifest:
    expected_paths:
      - knowledge/cache-telemetry-recipe.md
      - commands/tokens.md
      - scanners/token-hotspots-cli.mjs
      - tests/scanners/token-hotspots-cli.test.mjs
    must_contain:
      - { path: scanners/token-hotspots-cli.mjs, pattern: "with-telemetry-recipe" }
    commit_message_pattern: "^docs\\(config-audit\\):"
    

Step 26: N5 — --accurate-tokens API calibration

  • Files: scanners/token-hotspots-cli.mjs, scanners/lib/tokenizer-api.mjs (new), tests/scanners/accurate-tokens.test.mjs (new)

  • Prerequisites: Node.js >= 18.13 (for mock.method from node:test). Verify with node --version. If older, escalate.

  • Changes: New helper module tokenizer-api.mjs exporting async callCountTokensApi(text, apiKey). Wraps fetch('https://api.anthropic.com/v1/messages/count_tokens', ...) with:

    • 5-second AbortController timeout
    • Exponential backoff on 429 (max 3 retries: 1s, 2s, 4s)
    • API key MASKED to ${key.slice(0,8)}... in ANY error message and ANY thrown error
    • On non-429 HTTP error: throw Error("count_tokens API failed: " + status) — no body included (body may contain the key in echo'd form)
    • Required headers: x-api-key, anthropic-version: 2023-06-01, content-type: application/json

    Wire --accurate-tokens into token-hotspots-cli.mjs:

    • If process.env.ANTHROPIC_API_KEY present: call callCountTokensApi for the top 3 hotspots' content; populate output.calibration = { actual_tokens: <number>, source: 'count_tokens_api', sampled_hotspots: 3 }.
    • If absent: output.calibration = { skipped: 'no-api-key' } and warn to stderr "ANTHROPIC_API_KEY not set — skipping API calibration".
  • Reuses: Existing CLI pattern, env-var reading.

  • Test first:

    • No-API-key case: subprocess with env: { ...process.env, ANTHROPIC_API_KEY: '' }. Assert exit 0, output calibration.skipped === 'no-api-key'.
    • With-key case: import { mock } from 'node:test'. Use mock.method(tokenizerApi, 'callCountTokensApi', () => ({ input_tokens: 4200 })). Run CLI in-process (not subprocess — mock can't cross process boundary). Assert output.calibration.actual_tokens === 4200.
    • Error masking: stub callCountTokensApi to throw Error("simulated 401 with key sk-ant-FAKEKEY-1234"). Assert that the JSON output and stderr contain sk-ant-F... and NOT FAKEKEY-1234 (mask works).
  • Verify: node --test tests/scanners/accurate-tokens.test.mjs → PASS

  • On failure: revert. Most likely causes:

    • mock.method not available — check Node version >= 18.13.
    • fetch unavailable — fall back to node:https.
  • Checkpoint: git commit -m "feat(config-audit): --accurate-tokens API calibration (v5 N5)"

  • SC-6b note: The brief's SC-6b ("byte-estimat innen ±5% av Anthropic count_tokens-API") cannot be verified by automated tests using a stub — the stub returns a constant. SC-6b is a release gate: before tagging v5.0.0 in Step 30, KTG must run --accurate-tokens against a known fixture with a real ANTHROPIC_API_KEY, manually compare calibration.actual_tokens to byte-estimated tokens for that fixture, and confirm error ≤ ±5%. If error > ±5%, fix the heuristic before tagging.

  • Manifest:

    expected_paths:
      - scanners/token-hotspots-cli.mjs
      - scanners/lib/tokenizer-api.mjs
      - tests/scanners/accurate-tokens.test.mjs
    must_contain:
      - { path: scanners/lib/tokenizer-api.mjs, pattern: "count_tokens" }
      - { path: scanners/lib/tokenizer-api.mjs, pattern: "AbortController|signal" }
      - { path: scanners/lib/tokenizer-api.mjs, pattern: "slice\\(0, ?8\\)" }
    commit_message_pattern: "^feat\\(config-audit\\):"
    

Step 27: rc.1 wrap — CHANGELOG entry

  • Files: CHANGELOG.md
  • Changes: Add ## [5.0.0-rc.1] summarizing M7, M8, N5.
  • Verify: grep -c "5.0.0-rc.1" CHANGELOG.md → 1
  • On failure: revert.
  • Checkpoint: git commit -m "docs(config-audit): CHANGELOG 5.0.0-rc.1 entry"
  • Manifest:
    expected_paths: [CHANGELOG.md]
    commit_message_pattern: "^docs\\(config-audit\\):"
    

STAGE release — v5.0.0 final

Step 28: README and CLAUDE.md sync (straggler-sweep)

  • Files: README.md, CLAUDE.md, commands/help.md, commands/posture.md, commands/config-audit.md, agents/feature-gap-agent.md
  • Changes: Update all badges and counts:
    • Scanners: 9 → 12 (TOK extended + CPS + DIS + COL + manifest if counted)
    • Commands: 17 → 18 (+ manifest)
    • Tests: 543 → final count after all steps (run node --test 'tests/**/*.test.mjs' 2>&1 | grep "tests")
    • Hooks: unchanged (4)
    • Agents: unchanged (6)
    • Knowledge: 7 → 8 (+ cache-telemetry-recipe)
    • Quality areas: unchanged (8)
  • Reuses: Self-audit --check-readme from Step 16 to verify completeness.
  • Test first: node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck.passed'true
  • Verify: Same command above.
  • On failure: retry — find the missing badge with node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck.mismatches'.
  • Checkpoint: git commit -m "docs(config-audit): straggler sweep for v5.0.0 — sync all badge counts"
  • Manifest:
    expected_paths: [README.md, CLAUDE.md]
    commit_message_pattern: "^docs\\(config-audit\\):"
    

Step 29: Version bump + final CHANGELOG

  • Files: .claude-plugin/plugin.json, CHANGELOG.md, README.md (version badge)
  • Changes: Bump plugin.json version: 4.0.0 → 5.0.0. Add ## [5.0.0] entry to CHANGELOG with ### Summary (consolidated from alpha/beta/rc entries) and ### Breaking changes (F2 token magnitude jump, F3 severity weighting, N1 suppression backward-compat).
  • Reuses: v4.0.0 entry format.
  • Test first: [ "$(jq -r .version .claude-plugin/plugin.json)" = "5.0.0" ]
  • Verify: grep "5.0.0" .claude-plugin/plugin.json && grep "## \[5.0.0\]" CHANGELOG.md
  • On failure: revert.
  • Checkpoint: git commit -m "chore(config-audit): bump version to 5.0.0"
  • Manifest:
    expected_paths: [.claude-plugin/plugin.json, CHANGELOG.md, README.md]
    must_contain:
      - { path: .claude-plugin/plugin.json, pattern: "\"version\": \"5.0.0\"" }
    commit_message_pattern: "^chore\\(config-audit\\):"
    

Step 30: Final self-audit + SC-6b release gate + green tag

  • Files:
  • Changes:
    1. Run full test suite. All 543 v4 tests + new tests must pass.
    2. Run node scanners/self-audit.mjs --check-readme. Grade must be A; readmeCheck.passed === true.
    3. SC-6b release gate (manual): If ANTHROPIC_API_KEY is set, run node scanners/token-hotspots-cli.mjs <known-fixture> --accurate-tokens --json; compare calibration.actual_tokens against the heuristic byte-estimate for the same fixture; ensure delta ≤ ±5%. Document the comparison in the v5.0.0 CHANGELOG entry. If the user opts out of the SC-6b gate (no API key available), document this in CHANGELOG as "SC-6b verification deferred — ±5% tokenizer accuracy unverified."
    4. Tag and push.
  • Reuses: Self-audit from Step 16; CLI from Step 26.
  • Test first: node --test 'tests/**/*.test.mjs' 2>&1 | tail -5 — all PASS
  • Verify:
    • node --test 'tests/**/*.test.mjs' → all PASS
    • node scanners/self-audit.mjs --check-readme --json | jq -r '.overallGrade + " " + (.readmeCheck.passed | tostring)'"A true"
    • SC-6b gate documented (pass or deferred) in CHANGELOG
    • git tag config-audit/v5.0.0
  • On failure: escalate — if test/grade fails, diagnose and add follow-up steps in this plan; do not tag.
  • Checkpoint: Tag is the equivalent of a commit. After tag: git push origin main && git push origin config-audit/v5.0.0
  • Manifest:
    expected_paths: []
    commit_message_pattern: ".*"
    

Manifest — objective completion predicate

Every step has a Manifest block with expected_paths, must_contain patterns, and a regex commit_message_pattern. Steps that touch only docs may have empty must_contain.

Failure recovery rules

  • revertgit checkout -- <files>, restore working tree, do not proceed.
  • retry — try the alternative described in On failure, revert if still failing.
  • escalate — stop entirely; human review required (used only at Step 30).

Alternatives Considered

Approach Pros Cons Why rejected
Keep N1 (CA-TOK-005) inside token-hotspots.mjs (chosen) Lowest friction; preserves TOK ID namespace; consistent with patterns A-D Counter is positional; CA-TOK-005 ID assigned by order of detection, not semantic Acceptable trade-off; tests assert on finding presence and severity, not exact ID number. The brief specifies CA-TOK-005, which can be enforced by detection order.
Standalone mcp-budget-scanner.mjs with prefix MCB Clean separation; new ID namespace; testable in isolation Diverges from brief's CA-TOK-005 spec; requires new SCANNER_AREA_MAP entry Brief explicitly names CA-TOK-005; standalone scanner would force a brief revision.
Defer F3 severity-weighting to v5.1.0 Reduces alpha.1 risk of breaking baselines Means alpha.1 ships only 4 of 7 must-fix items; brief's primary goal "reality-based token-optimization" depends on F3 Brief lists F3 as must-fix and ties it directly to v5.0.0 success criteria.
Bundle N5 (live tokenizer) into v5.1.0 Removes API-key risk surface from v5.0.0 User explicitly confirmed N5 in v5.0.0 (Avklaringer 2026-05-01); features list specifies opt-in via flag, mitigating risk User confirmed scope explicitly.
Use external lib like tiktoken for N5 Higher accuracy Violates zero-deps convention (CLAUDE.md "null avhengigheter") Convention is hard rule.

Test Strategy

  • Framework: node:test + node:assert/strict
  • Existing patterns:
    • Scanner tests: runScanner(fixtureName) helper that resets counter + runs full discovery+scan
    • Lib tests: factory functions (makeScannerResult) for in-memory input data
    • Lib integration: buildRichRepo() tmpdir with patched HOME
    • CLI tests: execFile/exec subprocess + parse stdout JSON
  • New tests in this plan: approximately 60 new test cases across 13 test files
  • Coverage gating: Per revised SC-10 — every F-fix and M-fix has ≥1 test; every new scanner (N1-N4, N6) has ≥1 fixture-backed test; F3 has severity-mix table; N5 has both API-key-present and -absent cases.

Tests to write

Type File Verifies Model test
Unit tests/lib/severity.test.mjs WEIGHTS exported existing severity tests
Unit tests/lib/scoring.test.mjs severity-weighted area score makeScannerResult pattern
Unit tests/lib/active-config-reader.test.mjs 'mcp' kind differentiation existing estimateTokens cases
Integration tests/lib/active-config-reader.test.mjs MCP servers report >500 tokens buildRichRepo extension
Scanner tests/scanners/token-hotspots.test.mjs F1, F4, F5, F7, M2, M4, N1 runScanner pattern
Scanner tests/scanners/settings-validator.test.mjs M6 additionalDirectories existing validator tests
Scanner tests/scanners/hook-validator.test.mjs M5 verbose hook output existing hook tests
Scanner tests/scanners/cache-prefix.test.mjs (new) N3 mid-section volatility runScanner pattern
Scanner tests/scanners/disabled-in-schema.test.mjs (new) N4 deny+allow conflict runScanner pattern
Scanner tests/scanners/collision.test.mjs (new) N6 cross-plugin collision multi-plugin fixture
CLI tests/scanners/manifest.test.mjs (new) N2 manifest CLI execFile pattern
CLI tests/scanners/accurate-tokens.test.mjs (new) N5 API + no-API paths mock.method first use
Self-audit tests/scanners/self-audit.test.mjs F6 --check-readme shape existing runSelfAudit test

Risks and Mitigations

Priority Risk Location Impact Mitigation
Critical F3 silently degrades grades for users with v4 baselines scoring.mjs:184 (rewritten in Step 2) Drift comparisons produce wrong deltas Add scoringVersion: 'v5' to envelope meta (Step 2). diff-engine warns on cross-version compare in v5.0.1 patch (out of scope here)
Critical F2 jump from 15 → 5000+ tokens per MCP collapses Token Efficiency grades Step 5 User's Grade A becomes Grade C overnight CHANGELOG explicit BREAKING note (Step 9, 23, 29). Document in commands/posture.md next-steps
Critical N5 API-key leak via error message or JSON output Step 26 Key persisted in session files / logs tokenizer-api.mjs masks key to first 8 chars; never includes key in JSON; explicit test for masking
High F3 baseline-all-a fixture may fail Step 3 Test suite blocks at alpha.1 Step 3 dedicated to fixture audit; posture-grade-stability.test.mjs updated if needed
High N1 tool-count threshold flagging real-world MCP servers (GitHub MCP has 28+ tools) Step 18 False-positive findings train users to suppress Tiered severity: <20=none, 20-49=low, 50-99=medium, 100+=high (Step 18)
High N6 namespace confusion (plugin-skill vs user-skill vs built-in) Step 22 Every plugin with skill named review flagged Scanner only compares same-namespace items; built-ins excluded; documented in scanner comment
High N5 rate-limit (1000/min) exhausted in CI loop Step 26 Mid-scan crash; user's main quota impacted 3 retries with exponential backoff; 5-sec timeout; --accurate-tokens-max-files future flag (out of scope)
Medium Cascade-volatility false positives on inline date references Step 20 Noise findings Keep line-anchored regex; negative fixture for inline dates
Medium F6 self-audit fragile to README formatting changes Step 16 Hard-blocks every release Use exact line-anchored substring (not URL regex); badge mismatch is low severity (advisory, not fail)
Medium findingCounter is process-global; new scanners interfere if they call finding() outside orchestrator All N* steps Wrong IDs in tests All new scanners follow single-scan() entry; no nested calls
Medium Suppression backward-compat: CA-TOK-* glob suppresses CA-TOK-005 Step 18+23 Users miss highest-value finding Documented in CHANGELOG (Step 23). One-time runtime warning is out of scope (v5.0.1 candidate)
Low Network failure on N5 hangs 30s Step 26 Bad UX 5-sec AbortController timeout, immediate stderr message
Low Knowledge-base rensing breaks Sonnet-version users Step 24 Outdated guidance Reframe with footnote, not delete

Assumptions

# Assumption Why unverifiable Impact if wrong
1 Anthropic count_tokens endpoint accepts plain text payload and returns {input_tokens: number} Brief premise; not tested in this codebase N5 produces wrong calibration values; falls back gracefully
2 MCP servers expose tool count via tools/list or package.json tools field MCP spec is evolving; servers vary M1/N1 detection silently returns null; CA-TOK-005 finding may not fire on real servers; baseline behavior is "no finding" not "wrong finding"
3 readActiveConfig is performant enough to call from TOK on large repos Untested at scale TOK scanner becomes slow; fix: cache activeConfig in scan-orchestrator and pass to scanners (out of scope)
4 posture-grade-stability.test.mjs baseline-all-a fixture is genuinely info-only after v4 work Assumed from naming + git history Step 3 catches and fixes
5 Cross-plugin collision detection model (plugin-namespaced skills don't collide) is correct Documented in N6 description but not in Anthropic specs False positives/negatives on plugin-namespacing; verified via test fixture

Verification

End-to-end checks after Step 30 completes (these mirror the brief's revised SCs):

  • SC-10 (revised): node --test 'tests/**/*.test.mjs' → all green AND original 543 tests still pass AND ≥ 1 fixture-backed test exists per new scanner function (F1, F2, F3, M1, M2, M4, M5, M6, N1-N4, N6) — verified by file presence:
    • tests/lib/active-config-reader.test.mjs — F2 'mcp' kind cases
    • tests/lib/scoring.test.mjs — F3 severity-mix cases
    • tests/scanners/token-hotspots.test.mjs — F1, F4, F5, F7, M2, M4, N1 cases
    • tests/scanners/settings-validator.test.mjs — M6 cases
    • tests/scanners/hook-validator.test.mjs — M5 cases
    • tests/scanners/manifest.test.mjs — N2 cases (new file)
    • tests/scanners/cache-prefix.test.mjs — N3 cases (new file)
    • tests/scanners/disabled-in-schema.test.mjs — N4 cases (new file)
    • tests/scanners/collision.test.mjs — N6 cases (new file)
    • tests/scanners/accurate-tokens.test.mjs — N5 cases (new file)
    • tests/scanners/self-audit.test.mjs — F6 cases
  • SC-1 (F1): ! grep -q "void readActiveConfig" scanners/token-hotspots.mjs AND grep -q "readActiveConfig(targetPath" scanners/token-hotspots.mjs
  • SC-2 (F2): grep -q "kind === 'mcp'" scanners/lib/active-config-reader.mjs
  • SC-3 (F3): grep -q "import.*WEIGHTS.*riskScore\|import.*riskScore.*WEIGHTS" scanners/lib/scoring.mjs
  • SC-4 (F6): node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck.passed'true
  • SC-5 (N1): node --test tests/scanners/token-hotspots.test.mjs --grep "mcp-budget" → PASS
  • SC-6a (N2): node scanners/manifest.mjs . --json | jq '.sources | length' → > 0 AND output sorted DESC by estimated_tokens
  • SC-6b (N5): Manual gate — release-time verification of ±5% accuracy with real API key (Step 30); pass OR documented deferral in CHANGELOG
  • SC-7 (N3): node --test tests/scanners/cache-prefix.test.mjs → PASS
  • SC-8 (N6): node --test tests/scanners/collision.test.mjs → PASS
  • SC-9 (M8): ! grep -q "Keep under 200 lines" knowledge/configuration-best-practices.md
  • SC-11 (N5): Both API-key-present and -absent paths covered in tests/scanners/accurate-tokens.test.mjs
  • F5 cleanup: ! grep -q "detectSonnetEra\|CA-TOK-004" scanners/token-hotspots.mjs commands/tokens.md knowledge/opus-4.7-patterns.md
  • Release: [ "$(jq -r .version .claude-plugin/plugin.json)" = "5.0.0" ]
  • Git: git log --oneline -50 | grep -c "v5" ≥ 5 (one per stage)

Estimated Scope

  • Files to modify: 18 (incl. commands/tokens.md and knowledge/opus-4.7-patterns.md per Step 8b)
  • Files to create: ~22 (5 new scanners + 1 lib + 1 command + 13 fixture dirs + 5 new test files + 1 research doc)
  • Steps: 31 (was 30; added Step 8b for CA-TOK-004 reference cleanup, Step 22a for namespace research spike)
  • Complexity: high (cross-cutting changes across scoring, tokenization, scanner registry, knowledge base)

Execution Strategy

The plan has 30 steps grouped into 5 sessions matching release stages. Sessions are sequential — alpha.1 must land before alpha.2, etc. Within a session, some steps are parallel-safe but for clarity all run sequentially.

Session 1 — alpha.1 (TOK rensing + scoring fix)

  • Steps: 1-9 (includes Step 8b for CA-TOK-004 reference cleanup)
  • Wave: 1
  • Depends on: none
  • Scope fence:
    • Touch: scanners/lib/severity.mjs, scanners/lib/scoring.mjs, scanners/lib/active-config-reader.mjs, scanners/token-hotspots.mjs, tests/lib/{severity,scoring,active-config-reader}.test.mjs, tests/scanners/token-hotspots.test.mjs, tests/scanners/posture-grade-stability.test.mjs, tests/fixtures/tok-active-config/, commands/tokens.md (Step 8b), knowledge/opus-4.7-patterns.md (Step 8b), CHANGELOG.md
    • Never touch: any scanner other than TOK; any new scanner files (those land later)

Session 2 — alpha.2 (structural gaps + README self-audit)

  • Steps: 10-17
  • Wave: 2
  • Depends on: Session 1
  • Scope fence:
    • Touch: scanners/{token-hotspots,settings-validator,hook-validator,self-audit}.mjs, scanners/lib/active-config-reader.mjs, tests/scanners/{settings-validator,hook-validator,self-audit,token-hotspots}.test.mjs, tests/fixtures/{additional-dirs-many,large-cascade,skill-bloated,mcp-tool-heavy,hooks-verbose,readme-desynced}/, CHANGELOG.md
    • Never touch: scanner-orchestrator (no new scanners yet); knowledge/ (later)

Session 3 — beta.1 (new scanners)

  • Steps: 18, 19, 20, 21, 22a (research spike), 22b (collision scanner), 23
  • Wave: 3
  • Depends on: Session 2
  • Scope fence:
    • Touch: scanners/token-hotspots.mjs (N1), scanners/{manifest,cache-prefix-scanner,disabled-in-schema-scanner,collision-scanner}.mjs (new), scanners/scan-orchestrator.mjs, scanners/lib/scoring.mjs (SCANNER_AREA_MAP only), commands/manifest.md (new), 5 new test files, 4 new fixture dirs, docs/v5-namespace-research.md (gitignored), CHANGELOG.md
    • Never touch: any other scanner code

Session 4 — rc.1 (knowledge + tokenizer)

  • Steps: 24-27
  • Wave: 4
  • Depends on: Session 3
  • Scope fence:
    • Touch: knowledge/{configuration-best-practices,cache-telemetry-recipe}.md, commands/tokens.md, scanners/token-hotspots-cli.mjs, scanners/lib/tokenizer-api.mjs (new), tests/scanners/accurate-tokens.test.mjs (new), CHANGELOG.md
    • Never touch: scanner code beyond CLI

Session 5 — release (v5.0.0 final)

  • Steps: 28-30
  • Wave: 5
  • Depends on: Session 4
  • Scope fence:
    • Touch: README.md, CLAUDE.md, commands/{help,posture,config-audit}.md, agents/feature-gap-agent.md, .claude-plugin/plugin.json, CHANGELOG.md
    • Never touch: any code; this is documentation + tag

Execution Order

  • Wave 1: Session 1 (alpha.1)
  • Wave 2: Session 2 (alpha.2) — after Wave 1
  • Wave 3: Session 3 (beta.1) — after Wave 2
  • Wave 4: Session 4 (rc.1) — after Wave 3
  • Wave 5: Session 5 (release) — after Wave 4

Grouping rules applied

  • Steps sharing files → same session (e.g., all TOK changes in Session 1+2)
  • New-scanner steps → Session 3 (post structural)
  • Knowledge/CLI changes → Session 4 (post all scanners)
  • Doc-sync + version-bump → Session 5 (last, depends on all counts being final)

Plan Quality Score

Dimension Weight Score Notes
Structural integrity 0.15 88 Sessions ordered by dependencies; Step 22a research spike resolves namespace ambiguity before 22b
Step quality 0.20 85 All TBDs resolved; F7 explicit decision on Pattern C; Step 16 fs-counted not README-counted
Coverage completeness 0.20 92 All 22 brief items mapped; F5 documentation cleanup added (8b); SC-6b release gate documented
Specification quality 0.15 86 File paths verified; manifest must_not_contain replaces vacuous regex; Node version pinned for N5
Risk & pre-mortem 0.15 88 13 risks; namespace research spike resolves N6 mitigation circularity
Headless readiness 0.10 84 All steps have On Failure + Checkpoint; manifest blocks updated to use must_not_contain where appropriate
Manifest quality 0.05 78 must_contain + must_not_contain; fixture file paths fully enumerated for Step 6/14/18
Weighted total 1.00 86.6 Grade: B+

Adversarial review:

  • Plan critic: initial verdict REPLAN (5 blockers, 8 majors, 7 minors, score 67.7); all blockers + majors addressed in revisions
  • Scope guardian: initial verdict MIXED (4 scope-gaps); all 4 gaps addressed in revisions

Revisions

# Finding Severity Resolution
1 Plan header "TBD" blocker Updated to "B+ (84/100)" after re-scoring
2 Step 25 "TBD if needed" flag blocker Committed --with-telemetry-recipe flag as deliverable; added test
3 Step 8 manifest ^(?!.*detectSonnetEra) is logically vacuous blocker Replaced with must_not_contain field; added explicit grep verify
4 Step 6 fixture incomplete in expected_paths blocker Enumerated 4 fixture files: .mcp.json, CLAUDE.md, .claude-plugin/plugin.json, commands/sample.md
5 CA-TOK-004 references in commands/tokens.md and knowledge/opus-4.7-patterns.md after F5 blocker Added Step 8b: dedicated cleanup step with grep verify
6 Step 12 missing test for claudeMd.estimatedTokens field shape major Added assertion to Step 6 test (item c)
7 Step 18 missing toolCount=null handling major Added explicit null branch with low severity + "tool count unknown" message
8 Step 3 ordering vs Step 10 grade-stability re-invalidation major Step 10's table-driven test now checks per-finding severity; Step 3 audits remain at fixture-level grade
9 N6 namespace assumption is circular mitigation major Added Step 22a research spike with explicit verdict file before 22b implementation
10 Step 16 negative-case test depends on Step 28 docs sweep major Step 16 now uses filesystem counts as truth (not README); fs-counted detection breaks the cycle
11 Step 19 marketplace-large fixture issue with manifest CLI major Added two test paths: real-config (plugin root) + fixture-based with buildRichRepo helper
12 Step 26 mock.method Node version requirement major Added prerequisite check: Node >= 18.13; documented in step + escalation path
13 estimateTokens kind inconsistency between discovery and readActiveConfig paths major Step 6 unifies: prefer readActiveConfig data for MCP/skills/plugins; discovery only for files not covered
14 F7 Pattern C left "unchanged" without rationale scope-gap Step 10 now explicitly recalibrates Pattern C: mediumlow with reason; table-driven test asserts
15 M7 --with-telemetry-recipe flag was conditional scope-gap Same as Revision 2 — committed as deliverable
16 SC-6b ±5% accuracy unprovable in automation scope-gap Step 30 added manual release gate with documented deferral path
17 SC-10 verification used old "≥600 tests" threshold scope-gap Verification section rewritten to per-feature coverage requirement
18-24 Various minors (docs file naming, manifest enumeration, CHANGELOG specifics) minor Addressed in their respective steps