Planning artifacts for v5.0.0 (token-economy round): - v5-brief.md: scope brief with 22 items (F1-F7 + M1-M8 + N1-N7), revised with Avklaringer-section after critical review (N7 dropped, M3+N6 merged, N5 promoted to v5.0.0, SC-6/SC-10 reformulated) - v5-plan.md: 31-step implementation plan in 5 sessions (alpha.1 → alpha.2 → beta.1 → rc.1 → release). B+ score (84/100) after plan-critic + scope-guardian review addressed all blockers/majors/gaps. - v5-implementation-log.md: per-session status record (skeleton) Sessions track via state files (REMEMBER.md, TODO.md gitignored; implementation-log.md committed; NEXT-SESSION-PROMPT.local.md gitignored). No code changes in this commit — planning only.
67 KiB
config-audit v5.0.0 — Implementation Plan
Plan quality: B+ (84/100) — adversarial review complete, revisions applied
Generated by ultraplan-local v3.0.0 on 2026-05-01 —
plan_version: 1.7Source brief:docs/v5-brief.md(revised 2026-05-01) Revised after plan-critic + scope-guardian review on 2026-05-01 (see Revisions section)
Context
config-audit v4.0.0 markets itself as "Opus 4.7-aware token optimization" but the critical review (briefed 2026-04-19, revised 2026-05-01) shows the marketing does not hold:
- TOK scanner imports
readActiveConfigand explicitly voids it (void readActiveConfigatscanners/token-hotspots.mjs:31) — never sees plugins, skills, MCP, or cascade. - 4 TOK patterns cover ~29% of identified Opus 4.7 cost drivers; the largest sinks (MCP tool-schema bloat, skill-description bloat, CLAUDE.md cascade total) have zero coverage.
estimateTokensflattens MCP servers and hooks to 15 tokens each via three caller sites passingkind='item'(active-config-reader.mjs:556, 593, 618). Reality is 2k–20k per MCP.scoreByAreatreats severities equally: 1 critical and 1 info produce identical area score.- Pattern D (
detectSonnetEra) contradicts the plugin's own v3.0 policy (minimal correct = Grade A).
v5.0.0 is a token-economy round, not a rewrite. The 8 structural scanners,
backup/rollback, suppression, plugin-health all stay. The TOK scanner is reworked
in place; new scanners (MCP budget, manifest, cache-prefix, disabled-in-schema,
collision) are added; severity-aware scoring lands; tokenizer calibration via
Anthropic count_tokens ships as opt-in.
Architecture Diagram
graph TD
subgraph "v5.0.0 changes"
TOK[token-hotspots.mjs<br/>F1/F4/F5/F7]
ACR[active-config-reader.mjs<br/>F2 + 'mcp' kind]
SCR[scoring.mjs<br/>F3 severity-weighted]
SVR[severity.mjs<br/>WEIGHTS export]
SAU[self-audit.mjs<br/>F6 --check-readme]
ORC[scan-orchestrator.mjs<br/>register new scanners]
CLI[token-hotspots-cli.mjs<br/>N5 --accurate-tokens]
MCB[NEW: mcp-budget-scanner.mjs<br/>N1 CA-TOK-005]
MAN[NEW: manifest.mjs<br/>N2 CLI + scanner]
CPS[NEW: cache-prefix-scanner.mjs<br/>N3]
DIS[NEW: disabled-in-schema-scanner.mjs<br/>N4]
COL[NEW: collision-scanner.mjs<br/>N6 CA-COL-001]
SET[settings-validator.mjs<br/>M6 additionalDirectories]
KB[knowledge/<br/>M7 cache-recipe + M8 rensing]
ORC --> TOK & MCB & CPS & DIS & COL
TOK --> ACR
SCR --> SVR
ACR -. F2 fix .-> ACR
CLI --> ACR
SAU --> README
end
Codebase Analysis
- Tech stack: Node.js >= 18, ES modules (.mjs),
node:test, zero external deps - Test framework:
node:test+node:assert/strict— 543 tests across 31 files in v4.0.0 - Key patterns:
- Scanner orchestrator + shared
discoveryobject (scan-orchestrator.mjs:73) - Finding factory
finding({scanner, severity, ...})producesCA-{SCANNER}-{NNN}IDs (output.mjs:31); counter is process-global, reset per scan - CLI direct-run guard pattern via
import.meta.url - Manual argv parsing — no external libs
- Test fixtures under
tests/fixtures/<scenario-name>/
- Scanner orchestrator + shared
- Relevant files (verified):
scanners/token-hotspots.mjs(lines 31, 166-178, 202-229, 270, 299, 321, 338)scanners/lib/active-config-reader.mjs(lines 29-39, 556, 593, 618)scanners/lib/scoring.mjs(lines 6, 169-200, 184)scanners/lib/severity.mjs(lines 14, 21-27)scanners/scan-orchestrator.mjs(lines 18-58)scanners/self-audit.mjs(lines 154-177)scanners/settings-validator.mjs(lines 16-35)scanners/lib/suppression.mjs(lines 117-128)knowledge/configuration-best-practices.md(line 9)knowledge/opus-4.7-patterns.md(1-57)tests/lib/active-config-reader.test.mjs,tests/lib/scoring.test.mjs,tests/scanners/token-hotspots.test.mjs,tests/scanners/posture-grade-stability.test.mjs
- Reusable code:
tokenKind()attoken-hotspots.mjs:54-63— extend to map MCP typesenumeratePlugins()atactive-config-reader.mjs:262-305— for N6 collisioncountPluginItems()+findSkillMdFiles()atactive-config-reader.mjs:332-399— for N6parseFrontmatter()fromlib/yaml-parser.mjs— for M2 skill descriptiondiscoverConfigFiles()fromlib/file-discovery.mjs— for new CLIsbuildRichRepo()test helper attests/lib/active-config-reader.test.mjs— extend for MCP fixturesrunScanner()helper pattern intests/scanners/token-hotspots.test.mjs— model for new scanner tests
- External tech (researched): Anthropic
POST /v1/messages/count_tokensendpoint for N5 (rate-limited 1000 req/min) - Recent git activity:
token-hotspots.mjs,active-config-reader.mjs,severity.mjsare all single-commit cold files (born in4f1cc7eora090ed3, never revised)scoring.mjshas 2 commits (born + TOK wiring)- Single owner (KTG); no concurrent branches; all work merges to main
- Straggler-sweep risk: 4 historical events where badge counts/area counts drifted across multiple files in a single feature batch — must plan dedicated doc-consistency pass
Research Sources
| Technology | Source | Key Findings | Confidence |
|---|---|---|---|
| Anthropic count_tokens API | Anthropic public docs | POST /v1/messages/count_tokens returns {input_tokens: number}; 1000 req/min rate limit; requires ANTHROPIC_API_KEY |
high |
| MCP tool count detection | MCP spec | tools/list requires running server; package.json tools field is convention-only, not standard |
medium |
Implementation Plan
Steps grouped by release stage. Each step has manifest, on-failure, checkpoint. Steps within a stage may be reordered if test gates allow; cross-stage ordering is fixed.
STAGE alpha.1 — TOK rensing + scoring/estimateTokens fix (F1-F5)
Step 1: Export WEIGHTS and riskScore from severity.mjs (F3 prep)
- Files:
scanners/lib/severity.mjs - Changes: Promote
WEIGHTSconst to named export. VerifyriskScorealready exported. - Reuses: Existing
WEIGHTS = { critical: 25, high: 10, medium: 4, low: 1, info: 0 }at line 14. - Test first:
tests/lib/severity.test.mjs— assertWEIGHTS.critical === 25via named import. - Verify:
node --test tests/lib/severity.test.mjs→ PASS - On failure: revert (single-file change)
- Checkpoint:
git commit -m "feat(config-audit): export WEIGHTS from severity.mjs (v5 F3 prep)" - Manifest:
expected_paths: [scanners/lib/severity.mjs] must_contain: - { path: scanners/lib/severity.mjs, pattern: "export const WEIGHTS" } commit_message_pattern: "^feat\\(config-audit\\):"
Step 2: Severity-weighted scoreByArea (F3)
- Files:
scanners/lib/scoring.mjs,tests/lib/scoring.test.mjs - Changes:
- Add
import { WEIGHTS, riskScore } from './severity.mjs' - Rewrite
scoreByAreanon-GAP path (lines 182-186) to penalize via severity-weighted sum:penalty = sum(count[s] * WEIGHTS[s]) / maxBudget; passRate = max(0, 100 - penalty) - Add
scoringVersion: 'v5'to returned struct (for cross-version drift detection)
- Add
- Reuses:
WEIGHTSfrom Step 1; existing GAP-tier logic untouched. - Test first: Add
describe('scoreByArea — severity weighting')with new factorymakeScannerResultWithSeverities. Assert: 1 critical → score < 5 lows → score; clean → 100/A. - Verify:
node --test tests/lib/scoring.test.mjs tests/scanners/posture-grade-stability.test.mjs→ PASS - On failure: revert + re-evaluate maxBudget formula. Likely tweak:
maxBudget = max(10, findingCount * 4). - Checkpoint:
git commit -m "feat(config-audit): severity-weighted scoreByArea (v5 F3)" - Manifest:
expected_paths: [scanners/lib/scoring.mjs, tests/lib/scoring.test.mjs] must_contain: - { path: scanners/lib/scoring.mjs, pattern: "import.*WEIGHTS.*riskScore" } - { path: scanners/lib/scoring.mjs, pattern: "scoringVersion" } commit_message_pattern: "^feat\\(config-audit\\):"
Step 3: Audit baseline-all-a fixture for F3 compatibility
- Files:
tests/fixtures/baseline-all-a/(read-only audit) - Changes: Run scoring against fixture; if any non-info findings drop below 90 score after F3, document and either (a) update fixture to truly minimal correct config, or (b) update test expectations to match v5 semantics with explanatory comment.
- Reuses: Existing fixture.
- Test first:
tests/scanners/posture-grade-stability.test.mjsalready asserts grade A on this fixture; if it fails after Step 2, fix fixture. - Verify:
node --test tests/scanners/posture-grade-stability.test.mjs→ PASS - On failure: retry — tweak fixture to be truly clean (remove any medium+ findings).
- Checkpoint:
git commit -m "test(config-audit): align baseline-all-a fixture with v5 scoring" - Manifest:
expected_paths: [tests/scanners/posture-grade-stability.test.mjs] commit_message_pattern: "^(test|fix|chore)\\(config-audit\\):"
Step 4: Add 'mcp' kind to estimateTokens (F2 — function side)
- Files:
scanners/lib/active-config-reader.mjs,tests/lib/active-config-reader.test.mjs - Changes: Extend
estimateTokens(bytes, kind)(lines 29-39):- new branch
kind === 'mcp': ifbytes > 0useceil(bytes / 3.5)(json-rate); else500(base overhead floor) - Optional second arg
toolCountvia overload:estimateTokens(bytes, 'mcp', {toolCount}) → max(base, toolCount * 200)
- new branch
- Reuses: Existing
'json'and'item'branches as patterns. - Test first: Add cases:
'mcp'with 0 bytes → ≥500;'mcp'with{toolCount: 10}→ ≥2000; ratiomcp / item≥ 30 for 10-tool server. - Verify:
node --test tests/lib/active-config-reader.test.mjs→ PASS - On failure: revert. Adjust formula if test thresholds unrealistic — but keep the order-of-magnitude differentiation.
- Checkpoint:
git commit -m "feat(config-audit): add 'mcp' kind to estimateTokens (v5 F2)" - Manifest:
expected_paths: [scanners/lib/active-config-reader.mjs, tests/lib/active-config-reader.test.mjs] must_contain: - { path: scanners/lib/active-config-reader.mjs, pattern: "kind === 'mcp'" } commit_message_pattern: "^feat\\(config-audit\\):"
Step 5: Migrate MCP/hook callers to use 'mcp' kind (F2 — caller side)
- Files:
scanners/lib/active-config-reader.mjs - Changes: Three call sites:
- Line 556 (
collectHookEntries): keep'item'for hooks (hooks don't have schemas) but pass actual byte size when available. - Line 593 (
collectMcpFromFile):kind='mcp', pass{ toolCount: server.tools?.length ?? 0 }(will be 0 until N1 wires tool detection — that's fine; base 500 still beats flat 15). - Line 618 (
readActiveMcpServersfrom .claude.json): same as 593.
- Line 556 (
- Reuses: New
'mcp'kind from Step 4. - Test first: Extend
tests/lib/active-config-reader.test.mjsbuildRichRepoto include MCP servers; assert returnedmcpServers[].estimatedTokens >= 500(not 15). - Verify:
node --test tests/lib/active-config-reader.test.mjs→ PASS - On failure: revert. Re-check call sites if test still shows 15.
- Checkpoint:
git commit -m "fix(config-audit): MCP token callers use 'mcp' kind (v5 F2)" - Manifest:
expected_paths: [scanners/lib/active-config-reader.mjs, tests/lib/active-config-reader.test.mjs] forbidden_paths: [] commit_message_pattern: "^fix\\(config-audit\\):"
Step 6: Wire readActiveConfig into TOK scanner (F1)
- Files:
scanners/token-hotspots.mjs,tests/scanners/token-hotspots.test.mjs,tests/fixtures/tok-active-config/(new) - Changes:
- Remove
void readActiveConfig;at line 31. - Inside
scan(targetPath, discovery): callawait readActiveConfig(targetPath, {})once; if it throws (non-git target), catch and continue withdiscovery-only behavior. Merge itsmcpServers,plugins,skills,claudeMd.estimatedTokensinto hotspot ranking input. - Add new finding source category
'mcp-server','plugin','skill'for hotspots. - Unify token estimation paths: the
tokenKind()mapper at line 54-63 is used fordiscovery.files. After Step 5, MCP files in discovery still map to'json'while MCP servers fromreadActiveConfiguse'mcp'. Within TOK, preferreadActiveConfigdata for MCP/skills/plugins; fall back todiscoveryonly for files not covered byreadActiveConfig(e.g., looseclaude.json). Document in a 1-line comment.
- Remove
- Reuses:
readActiveConfigshape fromactive-config-reader.mjs:738-827. - Test first: New fixture
tok-active-config/with.mcp.json(2 servers),CLAUDE.md, and.claude-plugin/plugin.json+commands/sample.md(plugin-skeleton). New describe block: assert (a)hotspots.some(h => h.source.includes('mcp')); (b) total estimated tokens > minimal-project total; (c)claudeMd.estimatedTokens > 0is observable when readActiveConfig was called. - Verify:
node --test tests/scanners/token-hotspots.test.mjs→ PASS - On failure: revert. Common cause:
readActiveConfigrequires git root; the try/catch above handles this. Verify discovery-only fallback path works. - Checkpoint:
git commit -m "feat(config-audit): TOK consumes readActiveConfig (v5 F1)" - Manifest:
expected_paths: - scanners/token-hotspots.mjs - tests/scanners/token-hotspots.test.mjs - tests/fixtures/tok-active-config/.mcp.json - tests/fixtures/tok-active-config/CLAUDE.md - tests/fixtures/tok-active-config/.claude-plugin/plugin.json - tests/fixtures/tok-active-config/commands/sample.md must_contain: - { path: scanners/token-hotspots.mjs, pattern: "readActiveConfig\\(targetPath" } forbidden_paths: [] commit_message_pattern: "^feat\\(config-audit\\):"
Step 7: Remove take dead-code and hotspot padding (F4)
- Files:
scanners/token-hotspots.mjs,tests/scanners/token-hotspots.test.mjs - Changes: Delete
takecomputation (lines 202-205) and padding while-loop (lines 219-229). Replace with:return ranked.slice(0, HOTSPOTS_MAX)and accept that fewer thanHOTSPOTS_MINmay be returned for small projects. - Reuses:
HOTSPOTS_MAXconstant. - Test first: Add assertion: every
hotspot.sourceis unique;hotspots.length <= discovery.files.length. - Verify:
node --test tests/scanners/token-hotspots.test.mjs→ PASS - On failure: revert. If hotspots-contract test breaks because some test expects min count, update test to allow fewer.
- Checkpoint:
git commit -m "fix(config-audit): remove TOK dead take + hotspot padding (v5 F4)" - Manifest:
expected_paths: [scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs] must_contain: - { path: scanners/token-hotspots.mjs, pattern: "ranked\\.slice\\(0, HOTSPOTS_MAX\\)" } commit_message_pattern: "^fix\\(config-audit\\):"
Step 8: Remove Pattern D detectSonnetEra (F5)
- Files:
scanners/token-hotspots.mjs,tests/scanners/token-hotspots.test.mjs - Changes: Delete
detectSonnetEra()function (lines 166-178) and its finding emission (lines 335-350). Pattern D andCA-TOK-004no longer exist. - Reuses: —
- Test first: Update
opus-47/sonnet-eradescribe block: assertresult.findings.every(f => f.id !== 'CA-TOK-004')AND that the existing fixture now produces zero TOK findings. - Verify:
node --test tests/scanners/token-hotspots.test.mjs→ PASS AND! grep -q "detectSonnetEra" scanners/token-hotspots.mjs - On failure: revert. CA-TOK-004 may still exist if any other path emits it; grep confirms none.
- Checkpoint:
git commit -m "feat(config-audit): remove TOK Pattern D detectSonnetEra (v5 F5)" - Manifest:
expected_paths: [scanners/token-hotspots.mjs] forbidden_paths: [] must_not_contain: - { path: scanners/token-hotspots.mjs, pattern: "detectSonnetEra" } - { path: scanners/token-hotspots.mjs, pattern: "CA-TOK-004" } commit_message_pattern: "^feat\\(config-audit\\):"
Step 8b: Sweep CA-TOK-004 references from docs after F5
- Files:
commands/tokens.md,knowledge/opus-4.7-patterns.md - Changes:
commands/tokens.md: replace anyCA-TOK-001..004reference withCA-TOK-001..003(or list explicitly). Verify noCA-TOK-004remains.knowledge/opus-4.7-patterns.md: remove the Pattern D row from the catalogue table and any text referencing "Pattern D" orCA-TOK-004. Update the pattern count in the document header if mentioned.
- Reuses: —
- Test first: None (docs).
- Verify:
! grep -q "CA-TOK-004" commands/tokens.md knowledge/opus-4.7-patterns.md - On failure: revert.
- Checkpoint:
git commit -m "docs(config-audit): remove CA-TOK-004 references after F5 (v5)" - Manifest:
expected_paths: [commands/tokens.md, knowledge/opus-4.7-patterns.md] must_not_contain: - { path: commands/tokens.md, pattern: "CA-TOK-004" } - { path: knowledge/opus-4.7-patterns.md, pattern: "CA-TOK-004" } commit_message_pattern: "^docs\\(config-audit\\):"
Step 9: alpha.1 wrap — release notes draft
- Files:
CHANGELOG.md - Changes: Add
## [5.0.0-alpha.1]entry summarizing F1-F5. Note BREAKING for F3 (severity weighting) and F2 (MCP estimate jump). - Reuses: v4.0.0 entry format.
- Test first: None (docs).
- Verify:
grep -c "5.0.0-alpha.1" CHANGELOG.md→ 1 - On failure: revert.
- Checkpoint:
git commit -m "docs(config-audit): CHANGELOG 5.0.0-alpha.1 entry" - Manifest:
expected_paths: [CHANGELOG.md] must_contain: - { path: CHANGELOG.md, pattern: "5\\.0\\.0-alpha\\.1" } commit_message_pattern: "^docs\\(config-audit\\):"
STAGE alpha.2 — Structural gaps + README self-audit (M1, M2, M4-M6, F6, F7)
Step 10: F7 — Severity recalibration for TOK patterns
- Files:
scanners/token-hotspots.mjs,tests/scanners/token-hotspots.test.mjs - Changes: Recalibrate severity for all 3 remaining patterns based on tokens/turn (Pattern D removed in F5). Each decision is explicit and testable:
- Pattern A (volatile top, line 270):
medium→high. Reason: volatile content in cached prefix triggers full re-read of cascade every turn (10k+ tokens/turn cost). Highest single-pattern impact. - Pattern B (redundant perms, line 299):
low→medium. Reason: duplicate tool entries inflate the tool-schema payload sent every turn (~50-200 tokens/turn per duplicate, scales with turns). - Pattern C (deep imports, line 321):
medium→low. Reason: depth alone is structural and only matters at first-load; cache benefits remain. Lower per-turn cost than originally rated. This is an explicit recalibration, not "unchanged". - Add
calibration_notefield to each finding's evidence:"severity reflects estimated tokens/turn based on structural heuristic; not measured against runtime telemetry".
- Pattern A (volatile top, line 270):
- Reuses:
SEVERITYconstants. - Test first: Table-driven test:
Iterate withconst SEVERITY_TABLE = [ { fixture: 'opus-47/cache-breaking', findingId: 'CA-TOK-001', expected: 'high' }, { fixture: 'opus-47/redundant-tools', findingId: 'CA-TOK-002', expected: 'medium' }, { fixture: 'opus-47/deep-imports', findingId: 'CA-TOK-003', expected: 'low' }, ];for...ofgeneratingit(...)blocks. Each assertsfinding.severity === expected. - Verify:
node --test tests/scanners/token-hotspots.test.mjs→ PASS - On failure: revert. Re-evaluate severities if integration tests break (e.g., posture-grade-stability expects different aggregate).
- Checkpoint:
git commit -m "feat(config-audit): recalibrate TOK severities for tokens/turn (v5 F7)" - Manifest:
expected_paths: [scanners/token-hotspots.mjs, tests/scanners/token-hotspots.test.mjs] must_contain: - { path: scanners/token-hotspots.mjs, pattern: "calibration_note" } commit_message_pattern: "^feat\\(config-audit\\):"
Step 11: M6 — additionalDirectories in KNOWN_KEYS + threshold
- Files:
scanners/settings-validator.mjs,tests/scanners/settings-validator.test.mjs,tests/fixtures/additional-dirs-many/(new) - Changes:
- Add
'additionalDirectories'toKNOWN_KEYS(line 16-35). - New check: if
additionalDirectories.length > 2, emitCA-SET-NNNfinding (severitylow).
- Add
- Reuses: Existing settings-validator pattern.
- Test first: Fixture with 3 entries → 1 finding; fixture with 2 entries → 0 findings; settings without the key → no "unknown key" warning.
- Verify:
node --test tests/scanners/settings-validator.test.mjs→ PASS - On failure: revert.
- Checkpoint:
git commit -m "feat(config-audit): flag additionalDirectories > 2 (v5 M6)" - Manifest:
expected_paths: [scanners/settings-validator.mjs, tests/fixtures/additional-dirs-many/settings.json] commit_message_pattern: "^feat\\(config-audit\\):"
Step 12: M4 — CLAUDE.md cascade total finding in TOK
- Files:
scanners/token-hotspots.mjs,tests/fixtures/large-cascade/(new) - Changes: New detection in TOK: if
activeConfig.claudeMd.estimatedTokens > 10_000, emit finding (severitymedium). - Reuses:
readActiveConfigintegration from Step 6;claudeMd.estimatedTokensfield. - Test first: Fixture with CLAUDE.md @-importing 40k+ bytes → finding present; minimal CLAUDE.md → no finding.
- Verify:
node --test tests/scanners/token-hotspots.test.mjs→ PASS - On failure: revert.
- Checkpoint:
git commit -m "feat(config-audit): TOK flags CLAUDE.md cascade > 10k tokens (v5 M4)" - Manifest:
expected_paths: [scanners/token-hotspots.mjs, tests/fixtures/large-cascade/CLAUDE.md] commit_message_pattern: "^feat\\(config-audit\\):"
Step 13: M2 — Skill description length finding
- Files:
scanners/token-hotspots.mjs,tests/fixtures/skill-bloated/(new) - Changes: New detection in TOK: walk
activeConfig.skills, parse eachSKILL.mdfrontmatter; flag any withdescription> 500 characters aslowfinding. - Reuses:
parseFrontmatterfromlib/yaml-parser.mjs;activeConfig.skillsfrom Step 6. - Test first: Fixture with 600-char description → finding; 100-char → no finding.
- Verify:
node --test tests/scanners/token-hotspots.test.mjs→ PASS - On failure: revert.
- Checkpoint:
git commit -m "feat(config-audit): TOK flags skill description > 500 chars (v5 M2)" - Manifest:
expected_paths: [scanners/token-hotspots.mjs, tests/fixtures/skill-bloated/skills/bloated/SKILL.md] commit_message_pattern: "^feat\\(config-audit\\):"
Step 14: M1 — MCP tool-count detection (with manifest fallback)
- Files:
scanners/lib/active-config-reader.mjs,tests/fixtures/mcp-tool-heavy/(new) - Changes: Extend
readActiveMcpServersto attempt tool-count detection in this order:- Cached
tools/listresponse at~/.claude/config-audit/mcp-cache/<server>.json(if exists) package.jsontoolsarray on the npm package (if server is npm-resolved)- Fallback: emit
toolCount: nulland atoolCountUnknown: trueflag on the server entry UpdateestimateTokenscall (Step 5) to usetoolCountwhen known.
- Cached
- Reuses: Existing MCP enumeration.
- Test first: Fixture with mocked
package.jsontools array of 20 →toolCount === 20; fixture without →toolCount === null. - Verify:
node --test tests/lib/active-config-reader.test.mjs→ PASS - On failure: revert. Tool-count infrastructure can ship as
nulleverywhere if detection logic fails — N1 still produces baseline finding. - Checkpoint:
git commit -m "feat(config-audit): MCP tool-count detection with manifest fallback (v5 M1)" - Manifest:
expected_paths: [scanners/lib/active-config-reader.mjs, tests/fixtures/mcp-tool-heavy/] commit_message_pattern: "^feat\\(config-audit\\):"
Step 15: M5 — Hook output-size finding
- Files:
scanners/hook-validator.mjs,tests/fixtures/hooks-verbose/(new) - Changes: Read each hook script referenced in
hooks.json; countconsole.log/process.stdout.writelines; if > 50, emitCA-HKV-NNNfinding (severitylow). Static heuristic — no execution. - Reuses: Existing hook-validator file-walking.
- Test first: Fixture with hook script containing 60
console.loglines → finding; sparse hook → no finding. - Verify:
node --test tests/scanners/hook-validator.test.mjs→ PASS - On failure: revert.
- Checkpoint:
git commit -m "feat(config-audit): HKV flags verbose hook output (v5 M5)" - Manifest:
expected_paths: [scanners/hook-validator.mjs, tests/fixtures/hooks-verbose/] commit_message_pattern: "^feat\\(config-audit\\):"
Step 16: F6 — self-audit --check-readme flag
- Files:
scanners/self-audit.mjs,tests/scanners/self-audit.test.mjs,tests/fixtures/readme-desynced/(new) - Changes: Add
--check-readmeCLI flag. The flag uses filesystem counts as the source of truth, not the README. Counts:- scanners: count
.mjsfiles matching scanner-shape (haveexport async function scanAND are inscanners/notscanners/lib/and not*-cli.mjs/*-engine.mjs/whats-active.mjs/self-audit.mjs/scan-orchestrator.mjs) - commands: count
.mdfiles incommands/ - agents: count
.mdfiles inagents/ - hooks: parse
hooks/hooks.json, count distinct event-script pairs - tests: count
.test.mjsfiles intests/ - knowledge: count
.mdfiles inknowledge/Parse README badge values via line-anchored substring patterns (NOT regex on URL — use exact " 9 " / "9+" detection). Compare counts; emitlowfinding per mismatch withexpected: <fs_count>andfound_in_readme: <badge_value>.
- scanners: count
- Reuses: Existing
runSelfAuditshape;glob-style file enumeration vianode:fs/promises. - Test first:
- Fixture
readme-desynced/: a mini-plugin layout withcommands/foo.md,commands/bar.md(filesystem count = 2) plus a fakeREADME.mdwith badge "1+ commands" → finding present. - Self-test (no fixture): run
runSelfAudit({checkReadme: true})against the real plugin; assertresult.readmeCheckexists,result.readmeCheck.passedisboolean. Do NOT assertpassed === trueduring alpha/beta phases (allowed to be red until Step 28).
- Fixture
- Verify:
node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck | type'→"object" - On failure: revert. Most likely cause: scanner-shape detection over-counts; refine to require both
export async function scanANDconst SCANNER =declarations. - Checkpoint:
git commit -m "feat(config-audit): self-audit --check-readme flag (v5 F6)" - Manifest:
expected_paths: - scanners/self-audit.mjs - tests/scanners/self-audit.test.mjs - tests/fixtures/readme-desynced/README.md - tests/fixtures/readme-desynced/commands/foo.md - tests/fixtures/readme-desynced/commands/bar.md must_contain: - { path: scanners/self-audit.mjs, pattern: "check-readme" } - { path: scanners/self-audit.mjs, pattern: "readmeCheck" } commit_message_pattern: "^feat\\(config-audit\\):"
Step 17: alpha.2 wrap — CHANGELOG entry
- Files:
CHANGELOG.md - Changes: Add
## [5.0.0-alpha.2]summarizing M1, M2, M4-M6, F6, F7. - Verify:
grep -c "5.0.0-alpha.2" CHANGELOG.md→ 1 - On failure: revert.
- Checkpoint:
git commit -m "docs(config-audit): CHANGELOG 5.0.0-alpha.2 entry" - Manifest:
expected_paths: [CHANGELOG.md] commit_message_pattern: "^docs\\(config-audit\\):"
STAGE beta.1 — New scanners (N1-N4, N6)
Step 18: N1 — MCP Tool-Schema Budget finding (CA-TOK-005)
- Files:
scanners/token-hotspots.mjs,tests/fixtures/mcp-budget/(new) - Changes: New detection function
detectMcpToolBudget(activeConfig). IterateactiveConfig.mcpServers. Tiered severity per server:toolCount === null(unknown — fallback chain in M1 returned null): emit finding with severitylowand message"tool count unknown — could not parse manifest or cached tools/list"(per Avklaringer M1: flag, don't skip).toolCount0-19: no finding- 20-49:
low - 50-99:
medium - 100+:
highFinding ID:CA-TOK-005per server flagged. Recommendation: usetools/filterconfig; reference cache-telemetry recipe from M7. Detection-order pinning: ensuredetectMcpToolBudgetruns as the 5th detection block inscan()AFTER patterns A, B, C (which always run first regardless of fixture). This makes ID assignment deterministic when all patterns fire. When some patterns don't fire, the ID may shift — tests assert presence and tier-specific severity, not exact ID number.
- Reuses:
activeConfig.mcpServerswithtoolCountfrom Step 14. - Test first: Fixtures: 14 tools (no finding), 25 tools (
low), 60 tools (medium), 120 tools (high), null toolCount (lowwith message containing "unknown"). Tests assertseverityandfinding.titlesubstring, NOT exactidnumber. - Verify:
node --test tests/scanners/token-hotspots.test.mjs→ PASS - On failure: revert.
- Checkpoint:
git commit -m "feat(config-audit): CA-TOK-005 MCP tool-schema budget (v5 N1)" - Manifest:
expected_paths: - scanners/token-hotspots.mjs - tests/fixtures/mcp-budget/14-tools/ - tests/fixtures/mcp-budget/25-tools/ - tests/fixtures/mcp-budget/60-tools/ - tests/fixtures/mcp-budget/120-tools/ - tests/fixtures/mcp-budget/unknown-tools/ must_contain: - { path: scanners/token-hotspots.mjs, pattern: "detectMcpToolBudget" } commit_message_pattern: "^feat\\(config-audit\\):"
Step 19: N2 — System-Prompt Manifest scanner + CLI
- Files:
scanners/manifest.mjs(new),commands/manifest.md(new),tests/scanners/manifest.test.mjs(new) - Changes: New CLI:
node scanners/manifest.mjs <path> [--json] [--output-file]. Output: ranked list of token sources fromreadActiveConfig(CLAUDE.md cascade entries, plugins, skills, MCP servers, hooks) sorted DESC byestimated_tokens. New slash command/config-audit manifestinvokes the CLI and renders a markdown table. - Reuses:
readActiveConfig, CLI direct-run pattern, command frontmatter fromcommands/whats-active.md. - Test first: Two test paths:
- Real-config path (primary): subprocess against the plugin's own root (
.) —output.sourceslength > 0;output.sources[0].estimated_tokens >= output.sources[1].estimated_tokens;output.total >= sum(sources.estimated_tokens) - 1(rounding tolerance). - Fixture path (with
buildRichRepohelper fromtests/lib/active-config-reader.test.mjs): build a tmpdir repo with patched HOME containing 2 plugins + 3 skills + .mcp.json. Run the CLI subprocess against tmpdir with the patched HOME passed via env. Assertsources.length >= 5(CLAUDE.md cascade + plugins + skills + MCP).
- Real-config path (primary): subprocess against the plugin's own root (
- Verify:
node --test tests/scanners/manifest.test.mjs→ PASS - On failure: revert. If
readActiveConfigreturns empty for the real-plugin target: check thatdetectGitRootresolves to the marketplace root. - Checkpoint:
git commit -m "feat(config-audit): /config-audit manifest command (v5 N2)" - Manifest:
expected_paths: [scanners/manifest.mjs, commands/manifest.md, tests/scanners/manifest.test.mjs] must_contain: - { path: scanners/manifest.mjs, pattern: "readActiveConfig" } - { path: commands/manifest.md, pattern: "name: manifest" } commit_message_pattern: "^feat\\(config-audit\\):"
Step 20: N3 — Cache-Prefix Stability Analyzer
- Files:
scanners/cache-prefix-scanner.mjs(new),scanners/scan-orchestrator.mjs,scanners/lib/scoring.mjs(SCANNER_AREA_MAP),tests/scanners/cache-prefix.test.mjs(new),tests/fixtures/volatile-mid-section/(new) - Changes: New scanner with prefix
CPS. Walks CLAUDE.md cascade; classifies each segment as stable/volatile (using existing volatile patterns fromtoken-hotspots.mjs:38-43extended with shell-exec!prefix and${VAR}patterns). Flags volatility anywhere in cached prefix (not just top 30 lines). Severitymedium. - Reuses:
VOLATILE_PATTERNS,walkClaudeMdCascade. - Test first: Fixture with
!git logat line 60 → finding; fixture with volatile content only at line 200+ → no finding. - Verify:
node --test tests/scanners/cache-prefix.test.mjs→ PASS - On failure: revert.
- Checkpoint:
git commit -m "feat(config-audit): cache-prefix stability scanner CPS (v5 N3)" - Manifest:
expected_paths: - scanners/cache-prefix-scanner.mjs - scanners/scan-orchestrator.mjs - scanners/lib/scoring.mjs - tests/scanners/cache-prefix.test.mjs must_contain: - { path: scanners/scan-orchestrator.mjs, pattern: "scanCachePrefix|CPS" } - { path: scanners/lib/scoring.mjs, pattern: "CPS:" } commit_message_pattern: "^feat\\(config-audit\\):"
Step 21: N4 — Disabled-Tools-Still-In-Schema Detector
- Files:
scanners/disabled-in-schema-scanner.mjs(new),scanners/scan-orchestrator.mjs,scanners/lib/scoring.mjs,tests/scanners/disabled-in-schema.test.mjs(new),tests/fixtures/denied-tools-in-schema/(new) - Changes: New scanner with prefix
DIS. Reads cascadedsettings.json; finds tools that appear in bothpermissions.denyandpermissions.allow. Severitylow. - Reuses: Settings-cascade reading.
- Test first: Fixture with
Bashin both arrays → finding; clean settings → no finding. - Verify:
node --test tests/scanners/disabled-in-schema.test.mjs→ PASS - On failure: revert.
- Checkpoint:
git commit -m "feat(config-audit): disabled-in-schema scanner DIS (v5 N4)" - Manifest:
expected_paths: - scanners/disabled-in-schema-scanner.mjs - scanners/scan-orchestrator.mjs - tests/scanners/disabled-in-schema.test.mjs commit_message_pattern: "^feat\\(config-audit\\):"
Step 22a: N6 — verify Claude Code skill-namespacing model (research spike)
- Files:
docs/v5-namespace-research.md(new, gitignored) - Changes: Quick verification spike before building N6. Verify against current Claude Code behavior:
- When user types
/reviewand both a built-in command and a plugin skill namedreviewexist — which fires? Is invocation namespaced via/plugin:review? - When two plugins both expose a skill named
review— do their invocation paths differ? - User-level skills (under
~/.claude/skills/) — same name as plugin skill — does it collide? Methods: read Claude Code documentation; check existing plugin patterns in marketplace; if uncertain after 10 minutes of research, document the assumption explicitly and proceed with the most defensive interpretation (treat any same-name conflict as a finding).
- When user types
- Reuses: —
- Test first: None (research).
- Verify:
[ -f docs/v5-namespace-research.md ]containing at least: "Built-in vs plugin: ", "Plugin vs plugin: ", "User-level vs plugin: ", "Confidence: <high/medium/low>" - On failure: escalate — if research is inconclusive, ask user before proceeding to Step 22b.
- Checkpoint: No commit (file is local-only).
- Manifest:
expected_paths: [docs/v5-namespace-research.md] commit_message_pattern: ".*"
Step 22b: N6 — Cross-Plugin Skill/Command Collision Scanner (CA-COL-001)
- Files:
scanners/collision-scanner.mjs(new),scanners/scan-orchestrator.mjs,scanners/lib/scoring.mjs,tests/scanners/collision.test.mjs(new),tests/fixtures/collision-plugins/(new) - Changes: New scanner with prefix
COL(Finding IDCA-COL-001). Enumerate plugins viaenumeratePlugins. Build maps of skill names and command names by source. Detection logic determined by Step 22a research:- Plugin-vs-plugin same skill name: finding (severity
low) — invocation order ambiguity even if/plugin:skillis supported. - User-level skill vs plugin skill same name: finding (severity
medium) — bare invocation may resolve unpredictably. - Plugin skill vs Claude Code built-in: finding only if Step 22a confirms collision is real; otherwise no finding (info-level note in CHANGELOG).
- All findings include
details.namespacesarray describing each conflicting source.
- Plugin-vs-plugin same skill name: finding (severity
- Reuses:
enumeratePlugins,countPluginItems,findSkillMdFiles. - Test first: Multi-plugin fixture
collision-plugins/:- Layout:
plugins/plugin-a/skills/review/SKILL.md+plugins/plugin-b/skills/review/SKILL.md→ finding present (severitylow). - Negative:
plugins/plugin-a/skills/review/+plugins/plugin-b/skills/summarize/→ no finding. - Positive (user-vs-plugin): user skill at fake-HOME
skills/review/SKILL.md+ plugin skillplugin-a/skills/review/SKILL.md→ finding (severitymedium). - Suppression-glob check: existing
CA-TOK-*glob does NOT suppressCA-COL-001.
- Layout:
- Verify:
node --test tests/scanners/collision.test.mjs→ PASS - On failure: revert. False positives indicate namespace model deviation from Step 22a research — revisit research file.
- Checkpoint:
git commit -m "feat(config-audit): cross-plugin collision scanner COL (v5 N6)" - Manifest:
expected_paths: - scanners/collision-scanner.mjs - scanners/scan-orchestrator.mjs - tests/scanners/collision.test.mjs - tests/fixtures/collision-plugins/plugins/plugin-a/skills/review/SKILL.md - tests/fixtures/collision-plugins/plugins/plugin-b/skills/review/SKILL.md must_contain: - { path: scanners/collision-scanner.mjs, pattern: "SCANNER = 'COL'" } - { path: scanners/scan-orchestrator.mjs, pattern: "scanCollision|COL" } - { path: scanners/lib/scoring.mjs, pattern: "COL:" } commit_message_pattern: "^feat\\(config-audit\\):"
Step 23: beta.1 wrap — CHANGELOG + N1 backward-compat note
- Files:
CHANGELOG.md - Changes: Add
## [5.0.0-beta.1]entry. Include explicit subsection:### Known breaking changes—CA-TOK-*glob suppressions in existing.config-audit-ignorefiles now also matchCA-TOK-005(MCP budget). Document workaround: listCA-TOK-001 CA-TOK-002 CA-TOK-003explicitly. - Verify:
grep -c "CA-TOK-005" CHANGELOG.md→ ≥ 1 - On failure: revert.
- Checkpoint:
git commit -m "docs(config-audit): CHANGELOG 5.0.0-beta.1 + N1 breaking note" - Manifest:
expected_paths: [CHANGELOG.md] must_contain: - { path: CHANGELOG.md, pattern: "5\\.0\\.0-beta\\.1" } - { path: CHANGELOG.md, pattern: "CA-TOK-005" } commit_message_pattern: "^docs\\(config-audit\\):"
STAGE rc.1 — Knowledge rensing + tokenizer calibration (M7, M8, N5)
Step 24: M8 — Knowledge-base rensing (Sonnet-era → Opus 4.7)
- Files:
knowledge/configuration-best-practices.md,knowledge/anti-patterns.md(if relevant) - Changes: Replace "Keep under 200 lines" framing (line 9) with cache-stability guidance: "Place stable content in the first 30 lines (cache-friendly); volatile content (timestamps, dynamic counts) goes below the cache threshold." Add footnote: "200-line threshold was a Sonnet-era adherence heuristic; Opus 4.7 uses prompt-cache structure."
- Reuses: Existing knowledge file format.
- Test first: None (docs).
- Verify:
! grep -q "Keep under 200 lines" knowledge/configuration-best-practices.md - On failure: revert.
- Checkpoint:
git commit -m "docs(config-audit): knowledge rensing — Opus 4.7 cache-stability guidance (v5 M8)" - Manifest:
expected_paths: [knowledge/configuration-best-practices.md] forbidden_paths: [] commit_message_pattern: "^docs\\(config-audit\\):"
Step 25: M7 — Cache-telemetry recipe in knowledge/ + flag
- Files:
knowledge/cache-telemetry-recipe.md(new),commands/tokens.md,scanners/token-hotspots-cli.mjs,tests/scanners/token-hotspots-cli.test.mjs - Changes:
- New knowledge file documenting how a user can manually verify cache hit rate from session transcripts (parsing
cache_read_input_tokensfrom transcript JSON; recipe is opt-in, NOT bundled scanner logic — keeps non-goal of "no transcript-parsing as core feature"). - Add
--with-telemetry-recipeflag totoken-hotspots-cli.mjs: when present, includestelemetry_recipe_pathfield in JSON output pointing to the knowledge file. Without the flag, output unchanged. Committed as deliverable, not optional. - Update
commands/tokens.mdnext-steps to mention--with-telemetry-recipeand link the recipe.
- New knowledge file documenting how a user can manually verify cache hit rate from session transcripts (parsing
- Reuses: Knowledge-file format from
opus-4.7-patterns.md; CLI argv-parsing pattern fromposture.mjs. - Test first: Subprocess test:
node token-hotspots-cli.mjs <fixture> --with-telemetry-recipe --json | jq '.telemetry_recipe_path'→ non-empty string ending incache-telemetry-recipe.md. - Verify:
[ -f knowledge/cache-telemetry-recipe.md ]ANDnode --test tests/scanners/token-hotspots-cli.test.mjs→ PASS - On failure: revert.
- Checkpoint:
git commit -m "docs(config-audit): cache-telemetry recipe + --with-telemetry-recipe flag (v5 M7)" - Manifest:
expected_paths: - knowledge/cache-telemetry-recipe.md - commands/tokens.md - scanners/token-hotspots-cli.mjs - tests/scanners/token-hotspots-cli.test.mjs must_contain: - { path: scanners/token-hotspots-cli.mjs, pattern: "with-telemetry-recipe" } commit_message_pattern: "^docs\\(config-audit\\):"
Step 26: N5 — --accurate-tokens API calibration
-
Files:
scanners/token-hotspots-cli.mjs,scanners/lib/tokenizer-api.mjs(new),tests/scanners/accurate-tokens.test.mjs(new) -
Prerequisites: Node.js >= 18.13 (for
mock.methodfromnode:test). Verify withnode --version. If older, escalate. -
Changes: New helper module
tokenizer-api.mjsexportingasync callCountTokensApi(text, apiKey). Wrapsfetch('https://api.anthropic.com/v1/messages/count_tokens', ...)with:- 5-second AbortController timeout
- Exponential backoff on 429 (max 3 retries: 1s, 2s, 4s)
- API key MASKED to
${key.slice(0,8)}...in ANY error message and ANY thrown error - On non-429 HTTP error: throw
Error("count_tokens API failed: " + status)— no body included (body may contain the key in echo'd form) - Required headers:
x-api-key,anthropic-version: 2023-06-01,content-type: application/json
Wire
--accurate-tokensintotoken-hotspots-cli.mjs:- If
process.env.ANTHROPIC_API_KEYpresent: callcallCountTokensApifor the top 3 hotspots' content; populateoutput.calibration = { actual_tokens: <number>, source: 'count_tokens_api', sampled_hotspots: 3 }. - If absent:
output.calibration = { skipped: 'no-api-key' }and warn to stderr "ANTHROPIC_API_KEY not set — skipping API calibration".
-
Reuses: Existing CLI pattern, env-var reading.
-
Test first:
- No-API-key case: subprocess with
env: { ...process.env, ANTHROPIC_API_KEY: '' }. Assert exit 0, outputcalibration.skipped === 'no-api-key'. - With-key case:
import { mock } from 'node:test'. Usemock.method(tokenizerApi, 'callCountTokensApi', () => ({ input_tokens: 4200 })). Run CLI in-process (not subprocess — mock can't cross process boundary). Assertoutput.calibration.actual_tokens === 4200. - Error masking: stub
callCountTokensApito throwError("simulated 401 with key sk-ant-FAKEKEY-1234"). Assert that the JSON output and stderr containsk-ant-F...and NOTFAKEKEY-1234(mask works).
- No-API-key case: subprocess with
-
Verify:
node --test tests/scanners/accurate-tokens.test.mjs→ PASS -
On failure: revert. Most likely causes:
mock.methodnot available — check Node version >= 18.13.fetchunavailable — fall back tonode:https.
-
Checkpoint:
git commit -m "feat(config-audit): --accurate-tokens API calibration (v5 N5)" -
SC-6b note: The brief's SC-6b ("byte-estimat innen ±5% av Anthropic count_tokens-API") cannot be verified by automated tests using a stub — the stub returns a constant. SC-6b is a release gate: before tagging v5.0.0 in Step 30, KTG must run
--accurate-tokensagainst a known fixture with a realANTHROPIC_API_KEY, manually comparecalibration.actual_tokensto byte-estimated tokens for that fixture, and confirm error ≤ ±5%. If error > ±5%, fix the heuristic before tagging. -
Manifest:
expected_paths: - scanners/token-hotspots-cli.mjs - scanners/lib/tokenizer-api.mjs - tests/scanners/accurate-tokens.test.mjs must_contain: - { path: scanners/lib/tokenizer-api.mjs, pattern: "count_tokens" } - { path: scanners/lib/tokenizer-api.mjs, pattern: "AbortController|signal" } - { path: scanners/lib/tokenizer-api.mjs, pattern: "slice\\(0, ?8\\)" } commit_message_pattern: "^feat\\(config-audit\\):"
Step 27: rc.1 wrap — CHANGELOG entry
- Files:
CHANGELOG.md - Changes: Add
## [5.0.0-rc.1]summarizing M7, M8, N5. - Verify:
grep -c "5.0.0-rc.1" CHANGELOG.md→ 1 - On failure: revert.
- Checkpoint:
git commit -m "docs(config-audit): CHANGELOG 5.0.0-rc.1 entry" - Manifest:
expected_paths: [CHANGELOG.md] commit_message_pattern: "^docs\\(config-audit\\):"
STAGE release — v5.0.0 final
Step 28: README and CLAUDE.md sync (straggler-sweep)
- Files:
README.md,CLAUDE.md,commands/help.md,commands/posture.md,commands/config-audit.md,agents/feature-gap-agent.md - Changes: Update all badges and counts:
- Scanners: 9 → 12 (TOK extended + CPS + DIS + COL + manifest if counted)
- Commands: 17 → 18 (+ manifest)
- Tests: 543 → final count after all steps (run
node --test 'tests/**/*.test.mjs' 2>&1 | grep "tests") - Hooks: unchanged (4)
- Agents: unchanged (6)
- Knowledge: 7 → 8 (+ cache-telemetry-recipe)
- Quality areas: unchanged (8)
- Reuses: Self-audit
--check-readmefrom Step 16 to verify completeness. - Test first:
node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck.passed'→true - Verify: Same command above.
- On failure: retry — find the missing badge with
node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck.mismatches'. - Checkpoint:
git commit -m "docs(config-audit): straggler sweep for v5.0.0 — sync all badge counts" - Manifest:
expected_paths: [README.md, CLAUDE.md] commit_message_pattern: "^docs\\(config-audit\\):"
Step 29: Version bump + final CHANGELOG
- Files:
.claude-plugin/plugin.json,CHANGELOG.md,README.md(version badge) - Changes: Bump
plugin.jsonversion:4.0.0 → 5.0.0. Add## [5.0.0]entry to CHANGELOG with### Summary(consolidated from alpha/beta/rc entries) and### Breaking changes(F2 token magnitude jump, F3 severity weighting, N1 suppression backward-compat). - Reuses: v4.0.0 entry format.
- Test first:
[ "$(jq -r .version .claude-plugin/plugin.json)" = "5.0.0" ] - Verify:
grep "5.0.0" .claude-plugin/plugin.json && grep "## \[5.0.0\]" CHANGELOG.md - On failure: revert.
- Checkpoint:
git commit -m "chore(config-audit): bump version to 5.0.0" - Manifest:
expected_paths: [.claude-plugin/plugin.json, CHANGELOG.md, README.md] must_contain: - { path: .claude-plugin/plugin.json, pattern: "\"version\": \"5.0.0\"" } commit_message_pattern: "^chore\\(config-audit\\):"
Step 30: Final self-audit + SC-6b release gate + green tag
- Files: —
- Changes:
- Run full test suite. All 543 v4 tests + new tests must pass.
- Run
node scanners/self-audit.mjs --check-readme. Grade must be A;readmeCheck.passed === true. - SC-6b release gate (manual): If
ANTHROPIC_API_KEYis set, runnode scanners/token-hotspots-cli.mjs <known-fixture> --accurate-tokens --json; comparecalibration.actual_tokensagainst the heuristic byte-estimate for the same fixture; ensure delta ≤ ±5%. Document the comparison in the v5.0.0 CHANGELOG entry. If the user opts out of the SC-6b gate (no API key available), document this in CHANGELOG as "SC-6b verification deferred — ±5% tokenizer accuracy unverified." - Tag and push.
- Reuses: Self-audit from Step 16; CLI from Step 26.
- Test first:
node --test 'tests/**/*.test.mjs' 2>&1 | tail -5— all PASS - Verify:
node --test 'tests/**/*.test.mjs'→ all PASSnode scanners/self-audit.mjs --check-readme --json | jq -r '.overallGrade + " " + (.readmeCheck.passed | tostring)'→"A true"- SC-6b gate documented (pass or deferred) in CHANGELOG
git tag config-audit/v5.0.0
- On failure: escalate — if test/grade fails, diagnose and add follow-up steps in this plan; do not tag.
- Checkpoint: Tag is the equivalent of a commit. After tag:
git push origin main && git push origin config-audit/v5.0.0 - Manifest:
expected_paths: [] commit_message_pattern: ".*"
Manifest — objective completion predicate
Every step has a Manifest block with expected_paths, must_contain patterns, and a regex
commit_message_pattern. Steps that touch only docs may have empty must_contain.
Failure recovery rules
- revert —
git checkout -- <files>, restore working tree, do not proceed. - retry — try the alternative described in
On failure, revert if still failing. - escalate — stop entirely; human review required (used only at Step 30).
Alternatives Considered
| Approach | Pros | Cons | Why rejected |
|---|---|---|---|
Keep N1 (CA-TOK-005) inside token-hotspots.mjs (chosen) |
Lowest friction; preserves TOK ID namespace; consistent with patterns A-D | Counter is positional; CA-TOK-005 ID assigned by order of detection, not semantic |
Acceptable trade-off; tests assert on finding presence and severity, not exact ID number. The brief specifies CA-TOK-005, which can be enforced by detection order. |
Standalone mcp-budget-scanner.mjs with prefix MCB |
Clean separation; new ID namespace; testable in isolation | Diverges from brief's CA-TOK-005 spec; requires new SCANNER_AREA_MAP entry |
Brief explicitly names CA-TOK-005; standalone scanner would force a brief revision. |
| Defer F3 severity-weighting to v5.1.0 | Reduces alpha.1 risk of breaking baselines | Means alpha.1 ships only 4 of 7 must-fix items; brief's primary goal "reality-based token-optimization" depends on F3 | Brief lists F3 as must-fix and ties it directly to v5.0.0 success criteria. |
| Bundle N5 (live tokenizer) into v5.1.0 | Removes API-key risk surface from v5.0.0 | User explicitly confirmed N5 in v5.0.0 (Avklaringer 2026-05-01); features list specifies opt-in via flag, mitigating risk | User confirmed scope explicitly. |
Use external lib like tiktoken for N5 |
Higher accuracy | Violates zero-deps convention (CLAUDE.md "null avhengigheter") | Convention is hard rule. |
Test Strategy
- Framework:
node:test+node:assert/strict - Existing patterns:
- Scanner tests:
runScanner(fixtureName)helper that resets counter + runs full discovery+scan - Lib tests: factory functions (
makeScannerResult) for in-memory input data - Lib integration:
buildRichRepo()tmpdir with patched HOME - CLI tests:
execFile/execsubprocess + parse stdout JSON
- Scanner tests:
- New tests in this plan: approximately 60 new test cases across 13 test files
- Coverage gating: Per revised SC-10 — every F-fix and M-fix has ≥1 test; every new scanner (N1-N4, N6) has ≥1 fixture-backed test; F3 has severity-mix table; N5 has both API-key-present and -absent cases.
Tests to write
| Type | File | Verifies | Model test |
|---|---|---|---|
| Unit | tests/lib/severity.test.mjs |
WEIGHTS exported | existing severity tests |
| Unit | tests/lib/scoring.test.mjs |
severity-weighted area score | makeScannerResult pattern |
| Unit | tests/lib/active-config-reader.test.mjs |
'mcp' kind differentiation | existing estimateTokens cases |
| Integration | tests/lib/active-config-reader.test.mjs |
MCP servers report >500 tokens | buildRichRepo extension |
| Scanner | tests/scanners/token-hotspots.test.mjs |
F1, F4, F5, F7, M2, M4, N1 | runScanner pattern |
| Scanner | tests/scanners/settings-validator.test.mjs |
M6 additionalDirectories | existing validator tests |
| Scanner | tests/scanners/hook-validator.test.mjs |
M5 verbose hook output | existing hook tests |
| Scanner | tests/scanners/cache-prefix.test.mjs (new) |
N3 mid-section volatility | runScanner pattern |
| Scanner | tests/scanners/disabled-in-schema.test.mjs (new) |
N4 deny+allow conflict | runScanner pattern |
| Scanner | tests/scanners/collision.test.mjs (new) |
N6 cross-plugin collision | multi-plugin fixture |
| CLI | tests/scanners/manifest.test.mjs (new) |
N2 manifest CLI | execFile pattern |
| CLI | tests/scanners/accurate-tokens.test.mjs (new) |
N5 API + no-API paths | mock.method first use |
| Self-audit | tests/scanners/self-audit.test.mjs |
F6 --check-readme shape | existing runSelfAudit test |
Risks and Mitigations
| Priority | Risk | Location | Impact | Mitigation |
|---|---|---|---|---|
| Critical | F3 silently degrades grades for users with v4 baselines | scoring.mjs:184 (rewritten in Step 2) | Drift comparisons produce wrong deltas | Add scoringVersion: 'v5' to envelope meta (Step 2). diff-engine warns on cross-version compare in v5.0.1 patch (out of scope here) |
| Critical | F2 jump from 15 → 5000+ tokens per MCP collapses Token Efficiency grades | Step 5 | User's Grade A becomes Grade C overnight | CHANGELOG explicit BREAKING note (Step 9, 23, 29). Document in commands/posture.md next-steps |
| Critical | N5 API-key leak via error message or JSON output | Step 26 | Key persisted in session files / logs | tokenizer-api.mjs masks key to first 8 chars; never includes key in JSON; explicit test for masking |
| High | F3 baseline-all-a fixture may fail | Step 3 | Test suite blocks at alpha.1 | Step 3 dedicated to fixture audit; posture-grade-stability.test.mjs updated if needed |
| High | N1 tool-count threshold flagging real-world MCP servers (GitHub MCP has 28+ tools) | Step 18 | False-positive findings train users to suppress | Tiered severity: <20=none, 20-49=low, 50-99=medium, 100+=high (Step 18) |
| High | N6 namespace confusion (plugin-skill vs user-skill vs built-in) | Step 22 | Every plugin with skill named review flagged |
Scanner only compares same-namespace items; built-ins excluded; documented in scanner comment |
| High | N5 rate-limit (1000/min) exhausted in CI loop | Step 26 | Mid-scan crash; user's main quota impacted | 3 retries with exponential backoff; 5-sec timeout; --accurate-tokens-max-files future flag (out of scope) |
| Medium | Cascade-volatility false positives on inline date references | Step 20 | Noise findings | Keep line-anchored regex; negative fixture for inline dates |
| Medium | F6 self-audit fragile to README formatting changes | Step 16 | Hard-blocks every release | Use exact line-anchored substring (not URL regex); badge mismatch is low severity (advisory, not fail) |
| Medium | findingCounter is process-global; new scanners interfere if they call finding() outside orchestrator |
All N* steps | Wrong IDs in tests | All new scanners follow single-scan() entry; no nested calls |
| Medium | Suppression backward-compat: CA-TOK-* glob suppresses CA-TOK-005 |
Step 18+23 | Users miss highest-value finding | Documented in CHANGELOG (Step 23). One-time runtime warning is out of scope (v5.0.1 candidate) |
| Low | Network failure on N5 hangs 30s | Step 26 | Bad UX | 5-sec AbortController timeout, immediate stderr message |
| Low | Knowledge-base rensing breaks Sonnet-version users | Step 24 | Outdated guidance | Reframe with footnote, not delete |
Assumptions
| # | Assumption | Why unverifiable | Impact if wrong |
|---|---|---|---|
| 1 | Anthropic count_tokens endpoint accepts plain text payload and returns {input_tokens: number} |
Brief premise; not tested in this codebase | N5 produces wrong calibration values; falls back gracefully |
| 2 | MCP servers expose tool count via tools/list or package.json tools field |
MCP spec is evolving; servers vary | M1/N1 detection silently returns null; CA-TOK-005 finding may not fire on real servers; baseline behavior is "no finding" not "wrong finding" |
| 3 | readActiveConfig is performant enough to call from TOK on large repos |
Untested at scale | TOK scanner becomes slow; fix: cache activeConfig in scan-orchestrator and pass to scanners (out of scope) |
| 4 | posture-grade-stability.test.mjs baseline-all-a fixture is genuinely info-only after v4 work |
Assumed from naming + git history | Step 3 catches and fixes |
| 5 | Cross-plugin collision detection model (plugin-namespaced skills don't collide) is correct | Documented in N6 description but not in Anthropic specs | False positives/negatives on plugin-namespacing; verified via test fixture |
Verification
End-to-end checks after Step 30 completes (these mirror the brief's revised SCs):
- SC-10 (revised):
node --test 'tests/**/*.test.mjs'→ all green AND original 543 tests still pass AND ≥ 1 fixture-backed test exists per new scanner function (F1, F2, F3, M1, M2, M4, M5, M6, N1-N4, N6) — verified by file presence:tests/lib/active-config-reader.test.mjs— F2 'mcp' kind casestests/lib/scoring.test.mjs— F3 severity-mix casestests/scanners/token-hotspots.test.mjs— F1, F4, F5, F7, M2, M4, N1 casestests/scanners/settings-validator.test.mjs— M6 casestests/scanners/hook-validator.test.mjs— M5 casestests/scanners/manifest.test.mjs— N2 cases (new file)tests/scanners/cache-prefix.test.mjs— N3 cases (new file)tests/scanners/disabled-in-schema.test.mjs— N4 cases (new file)tests/scanners/collision.test.mjs— N6 cases (new file)tests/scanners/accurate-tokens.test.mjs— N5 cases (new file)tests/scanners/self-audit.test.mjs— F6 cases
- SC-1 (F1):
! grep -q "void readActiveConfig" scanners/token-hotspots.mjsANDgrep -q "readActiveConfig(targetPath" scanners/token-hotspots.mjs - SC-2 (F2):
grep -q "kind === 'mcp'" scanners/lib/active-config-reader.mjs - SC-3 (F3):
grep -q "import.*WEIGHTS.*riskScore\|import.*riskScore.*WEIGHTS" scanners/lib/scoring.mjs - SC-4 (F6):
node scanners/self-audit.mjs --check-readme --json | jq '.readmeCheck.passed'→true - SC-5 (N1):
node --test tests/scanners/token-hotspots.test.mjs --grep "mcp-budget"→ PASS - SC-6a (N2):
node scanners/manifest.mjs . --json | jq '.sources | length'→ > 0 AND output sorted DESC byestimated_tokens - SC-6b (N5): Manual gate — release-time verification of ±5% accuracy with real API key (Step 30); pass OR documented deferral in CHANGELOG
- SC-7 (N3):
node --test tests/scanners/cache-prefix.test.mjs→ PASS - SC-8 (N6):
node --test tests/scanners/collision.test.mjs→ PASS - SC-9 (M8):
! grep -q "Keep under 200 lines" knowledge/configuration-best-practices.md - SC-11 (N5): Both API-key-present and -absent paths covered in
tests/scanners/accurate-tokens.test.mjs - F5 cleanup:
! grep -q "detectSonnetEra\|CA-TOK-004" scanners/token-hotspots.mjs commands/tokens.md knowledge/opus-4.7-patterns.md - Release:
[ "$(jq -r .version .claude-plugin/plugin.json)" = "5.0.0" ] - Git:
git log --oneline -50 | grep -c "v5"≥ 5 (one per stage)
Estimated Scope
- Files to modify: 18 (incl.
commands/tokens.mdandknowledge/opus-4.7-patterns.mdper Step 8b) - Files to create: ~22 (5 new scanners + 1 lib + 1 command + 13 fixture dirs + 5 new test files + 1 research doc)
- Steps: 31 (was 30; added Step 8b for CA-TOK-004 reference cleanup, Step 22a for namespace research spike)
- Complexity: high (cross-cutting changes across scoring, tokenization, scanner registry, knowledge base)
Execution Strategy
The plan has 30 steps grouped into 5 sessions matching release stages. Sessions are sequential — alpha.1 must land before alpha.2, etc. Within a session, some steps are parallel-safe but for clarity all run sequentially.
Session 1 — alpha.1 (TOK rensing + scoring fix)
- Steps: 1-9 (includes Step 8b for CA-TOK-004 reference cleanup)
- Wave: 1
- Depends on: none
- Scope fence:
- Touch:
scanners/lib/severity.mjs,scanners/lib/scoring.mjs,scanners/lib/active-config-reader.mjs,scanners/token-hotspots.mjs,tests/lib/{severity,scoring,active-config-reader}.test.mjs,tests/scanners/token-hotspots.test.mjs,tests/scanners/posture-grade-stability.test.mjs,tests/fixtures/tok-active-config/,commands/tokens.md(Step 8b),knowledge/opus-4.7-patterns.md(Step 8b),CHANGELOG.md - Never touch: any scanner other than TOK; any new scanner files (those land later)
- Touch:
Session 2 — alpha.2 (structural gaps + README self-audit)
- Steps: 10-17
- Wave: 2
- Depends on: Session 1
- Scope fence:
- Touch:
scanners/{token-hotspots,settings-validator,hook-validator,self-audit}.mjs,scanners/lib/active-config-reader.mjs,tests/scanners/{settings-validator,hook-validator,self-audit,token-hotspots}.test.mjs,tests/fixtures/{additional-dirs-many,large-cascade,skill-bloated,mcp-tool-heavy,hooks-verbose,readme-desynced}/,CHANGELOG.md - Never touch: scanner-orchestrator (no new scanners yet); knowledge/ (later)
- Touch:
Session 3 — beta.1 (new scanners)
- Steps: 18, 19, 20, 21, 22a (research spike), 22b (collision scanner), 23
- Wave: 3
- Depends on: Session 2
- Scope fence:
- Touch:
scanners/token-hotspots.mjs(N1),scanners/{manifest,cache-prefix-scanner,disabled-in-schema-scanner,collision-scanner}.mjs(new),scanners/scan-orchestrator.mjs,scanners/lib/scoring.mjs(SCANNER_AREA_MAP only),commands/manifest.md(new), 5 new test files, 4 new fixture dirs,docs/v5-namespace-research.md(gitignored),CHANGELOG.md - Never touch: any other scanner code
- Touch:
Session 4 — rc.1 (knowledge + tokenizer)
- Steps: 24-27
- Wave: 4
- Depends on: Session 3
- Scope fence:
- Touch:
knowledge/{configuration-best-practices,cache-telemetry-recipe}.md,commands/tokens.md,scanners/token-hotspots-cli.mjs,scanners/lib/tokenizer-api.mjs(new),tests/scanners/accurate-tokens.test.mjs(new),CHANGELOG.md - Never touch: scanner code beyond CLI
- Touch:
Session 5 — release (v5.0.0 final)
- Steps: 28-30
- Wave: 5
- Depends on: Session 4
- Scope fence:
- Touch:
README.md,CLAUDE.md,commands/{help,posture,config-audit}.md,agents/feature-gap-agent.md,.claude-plugin/plugin.json,CHANGELOG.md - Never touch: any code; this is documentation + tag
- Touch:
Execution Order
- Wave 1: Session 1 (alpha.1)
- Wave 2: Session 2 (alpha.2) — after Wave 1
- Wave 3: Session 3 (beta.1) — after Wave 2
- Wave 4: Session 4 (rc.1) — after Wave 3
- Wave 5: Session 5 (release) — after Wave 4
Grouping rules applied
- Steps sharing files → same session (e.g., all TOK changes in Session 1+2)
- New-scanner steps → Session 3 (post structural)
- Knowledge/CLI changes → Session 4 (post all scanners)
- Doc-sync + version-bump → Session 5 (last, depends on all counts being final)
Plan Quality Score
| Dimension | Weight | Score | Notes |
|---|---|---|---|
| Structural integrity | 0.15 | 88 | Sessions ordered by dependencies; Step 22a research spike resolves namespace ambiguity before 22b |
| Step quality | 0.20 | 85 | All TBDs resolved; F7 explicit decision on Pattern C; Step 16 fs-counted not README-counted |
| Coverage completeness | 0.20 | 92 | All 22 brief items mapped; F5 documentation cleanup added (8b); SC-6b release gate documented |
| Specification quality | 0.15 | 86 | File paths verified; manifest must_not_contain replaces vacuous regex; Node version pinned for N5 |
| Risk & pre-mortem | 0.15 | 88 | 13 risks; namespace research spike resolves N6 mitigation circularity |
| Headless readiness | 0.10 | 84 | All steps have On Failure + Checkpoint; manifest blocks updated to use must_not_contain where appropriate |
| Manifest quality | 0.05 | 78 | must_contain + must_not_contain; fixture file paths fully enumerated for Step 6/14/18 |
| Weighted total | 1.00 | 86.6 | Grade: B+ |
Adversarial review:
- Plan critic: initial verdict REPLAN (5 blockers, 8 majors, 7 minors, score 67.7); all blockers + majors addressed in revisions
- Scope guardian: initial verdict MIXED (4 scope-gaps); all 4 gaps addressed in revisions
Revisions
| # | Finding | Severity | Resolution |
|---|---|---|---|
| 1 | Plan header "TBD" | blocker | Updated to "B+ (84/100)" after re-scoring |
| 2 | Step 25 "TBD if needed" flag | blocker | Committed --with-telemetry-recipe flag as deliverable; added test |
| 3 | Step 8 manifest ^(?!.*detectSonnetEra) is logically vacuous |
blocker | Replaced with must_not_contain field; added explicit grep verify |
| 4 | Step 6 fixture incomplete in expected_paths | blocker | Enumerated 4 fixture files: .mcp.json, CLAUDE.md, .claude-plugin/plugin.json, commands/sample.md |
| 5 | CA-TOK-004 references in commands/tokens.md and knowledge/opus-4.7-patterns.md after F5 |
blocker | Added Step 8b: dedicated cleanup step with grep verify |
| 6 | Step 12 missing test for claudeMd.estimatedTokens field shape |
major | Added assertion to Step 6 test (item c) |
| 7 | Step 18 missing toolCount=null handling | major | Added explicit null branch with low severity + "tool count unknown" message |
| 8 | Step 3 ordering vs Step 10 grade-stability re-invalidation | major | Step 10's table-driven test now checks per-finding severity; Step 3 audits remain at fixture-level grade |
| 9 | N6 namespace assumption is circular mitigation | major | Added Step 22a research spike with explicit verdict file before 22b implementation |
| 10 | Step 16 negative-case test depends on Step 28 docs sweep | major | Step 16 now uses filesystem counts as truth (not README); fs-counted detection breaks the cycle |
| 11 | Step 19 marketplace-large fixture issue with manifest CLI |
major | Added two test paths: real-config (plugin root) + fixture-based with buildRichRepo helper |
| 12 | Step 26 mock.method Node version requirement | major | Added prerequisite check: Node >= 18.13; documented in step + escalation path |
| 13 | estimateTokens kind inconsistency between discovery and readActiveConfig paths | major | Step 6 unifies: prefer readActiveConfig data for MCP/skills/plugins; discovery only for files not covered |
| 14 | F7 Pattern C left "unchanged" without rationale | scope-gap | Step 10 now explicitly recalibrates Pattern C: medium → low with reason; table-driven test asserts |
| 15 | M7 --with-telemetry-recipe flag was conditional |
scope-gap | Same as Revision 2 — committed as deliverable |
| 16 | SC-6b ±5% accuracy unprovable in automation | scope-gap | Step 30 added manual release gate with documented deferral path |
| 17 | SC-10 verification used old "≥600 tests" threshold | scope-gap | Verification section rewritten to per-feature coverage requirement |
| 18-24 | Various minors (docs file naming, manifest enumeration, CHANGELOG specifics) | minor | Addressed in their respective steps |