Kjell Tore Guttormsen 6cfca82885 fix(config-audit): expose hotspot.path for --accurate-tokens calibration + SC-6b PASS

The v5.0.0-rc.1 N5 implementation looked up hotspot.path in
calibrateAgainstApi() but token-hotspots.mjs only emitted hotspot.source —
calibration silently produced 0 actual_tokens because every iteration hit
the `if (!hotspot?.path) continue` guard.

Fix: file-backed hotspots now expose `path: h.absPath` in the JSON output.
MCP-server hotspots intentionally leave path unset — their tokens are
runtime tool-schema (formula-based: 500 + toolCount × 200), not file
content readable by count_tokens.

SC-6b release-gate verified against tests/fixtures/marketplace-large:
- Actual (count_tokens, claude-opus-4-7): 589 tokens for CLAUDE.md
- Estimated (4-bytes/token byte heuristic): 594 tokens
- Delta: -5 tokens / -0.85% — well within ±5% gate. PASS.

CHANGELOG: documented the fix + SC-6b result inline under [5.0.0].

All 635 tests still green. No estimateTokens tuning required for v5.0.0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-01 09:45:56 +02:00

41 KiB

Raw Blame History

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[5.0.0] - 2026-05-01

Summary

Reality-based token-optimization release. v4.0.0 shipped Opus-4.7 token surfaces aligned to a Sonnet-era cost model; v5.0.0 rebuilds the foundations against verified Opus-4.7 cost dynamics. Three pillars: honest token estimation (severity-weighted scoring, MCP estimates 15 → 500+, optional --accurate-tokens API calibration), new structural scanners (cache-prefix stability, dead tool grants, plugin collisions), and new diagnostic surfaces (/config-audit manifest, /config-audit tokens extended, knowledge-base rensing aligned to Opus 4.7 cache dynamics).

Consolidated from 5.0.0-alpha.1 (F1-F5 token-economy round), 5.0.0-alpha.2 (M1, M2, M4-M6, F6, F7 structural gaps + README self-audit), 5.0.0-beta.1 (N1-N4, N6 new scanners + manifest CLI), and 5.0.0-rc.1 (M7, M8 knowledge rensing + N5 tokenizer calibration).

Added

3 new scanners (9 → 12 deterministic):
- CPS — Cache-Prefix Stability (CA-CPS-NNN): volatile content in lines 31–150 of CLAUDE.md cascade, beyond TOK Pattern A's top-30 window. Volatile-pattern set extends Pattern A with shell-exec lines (! prefix) and ${VAR} substitutions.
- DIS — Disabled-In-Schema (CA-DIS-NNN): tools listed in BOTH permissions.deny AND permissions.allow. Tool identity uses bare name (Bash(npm:*) and Bash are the same tool). Severity low.
- COL — Cross-Plugin Skill Collision (CA-COL-001): plugin-vs-plugin same skill name → low; user-vs-plugin → medium. details.namespaces payload identifies conflicting sources.
TOK extensions:
- CA-TOK-005 MCP tool-schema budget: per-server tiered finding (< 20 none, 20–49 low, 50–99 medium, 100+ high; null low + "tool count unknown"). Scoped to project-local .mcp.json.
- Pattern E — Oversized cascade: medium when activeConfig.claudeMd.estimatedTokens > 10_000.
- Pattern F — Bloated SKILL.md description: low when frontmatter description > 500 chars (loads every turn). Scoped to discovery.files.
/config-audit manifest + scanners/manifest.mjs CLI — single ranked table of every system-prompt token source (CLAUDE.md cascade, plugins, skills, MCP servers, hooks) sorted DESC by estimated_tokens. CLAUDE.md per-file tokens distributed proportional to bytes.
--accurate-tokens flag on token-hotspots-cli.mjs (N5): when ANTHROPIC_API_KEY is set, calls Anthropic's count_tokens for the top 3 hotspots and populates output.calibration = { actual_tokens, source: 'count_tokens_api', sampled_hotspots: 3 }. When absent: calibration = { skipped: 'no-api-key' } plus stderr warning.
scanners/lib/tokenizer-api.mjs — count_tokens wrapper. 5s AbortController timeout. Exponential backoff on 429 (3 retries: 1s/2s/4s). API key masked to ${key.slice(0,8)}... in every error; HTTP body never included in errors (it may echo the key on auth failures). maskKey() exported.
--with-telemetry-recipe flag on the same CLI (M7): emits telemetry_recipe_path field pointing to knowledge/cache-telemetry-recipe.md.
knowledge/cache-telemetry-recipe.md (M7): manual jq recipe summing cache_read_input_tokens + cache_creation_input_tokens per turn from session transcripts. Hit-rate interpretation table.
'mcp' kind on estimateTokens (F2): active MCP servers estimate ≥ 500 tokens (base + schema overhead) instead of v4's flat 15. Optional {toolCount} raises to 500 + toolCount × 200.
MCP tool-count detection (M1): readActiveMcpServers resolves count via cache → node_modules/<pkg>/package.json → {toolCount: null, toolCountUnknown: true} fallback.
additionalDirectories settings key (M6): added to KNOWN_KEYS; new low-severity finding when length > 2.
HKV verbose hook output (M5): low-severity finding when referenced hook script contains > 50 console.log/process.stdout.write lines (static, no execution).
self-audit --check-readme flag (F6): filesystem counts compared against README badges. Helper checkReadmeBadges(pluginDir). Step 28 of v5 plan reconciled all badges.
scoringVersion: 'v5' field on scoreByArea output for cross-version drift detection.
WEIGHTS named export from scanners/lib/severity.mjs (frozen).
details field on findings (output.mjs:finding()): optional structured payload for scanner-specific data (used by COL).
Plugin Hygiene as 10th quality area (from COL). Posture JSON now reports 10 areas.
TOK-readActiveConfig integration (F1): one hotspot per active MCP server; result.activeConfig summary (claudeMd cascade tokens, mcpServerCount, pluginCount, skillCount); try/catch fallback when scope-limited.

Changed

F3 — scoreByArea is severity-weighted. Penalty = Σ count[s] × WEIGHTS[s]; passRate = max(0, 100 − penalty / max(10, findingCount × 4) × 100). Lows no longer crater an area's grade; criticals/highs do. baseline-all-a fixture remains all-A (no critical/high present).
F7 — TOK pattern severities recalibrated for tokens-per-turn impact: Pattern A medium → high, Pattern B low → medium, Pattern C medium → low. Each finding carries a calibration_note evidence field documenting the heuristic basis.
scoreByArea deduplicates by area name (N3 prep): TOK + CPS share "Token Efficiency"; SET + DIS share "Settings". Combined row with merged finding counts.
M8 — knowledge rensing: replaced "Keep CLAUDE.md under 200 lines" in knowledge/configuration-best-practices.md with cache-stability guidance (first 30 lines stable, volatile content below the cache threshold). Footnote explains the 200-line rule was a Sonnet-era adherence heuristic; Opus 4.7 uses prompt-cache structure as the dominant cost lever. Cross-references knowledge/opus-4.7-patterns.md.
commands/tokens.md next-steps: documents --with-telemetry-recipe as the cache-verification path.
Scanner count: 9 → 12. Command count: 17 → 18. Knowledge: 7 → 8. Quality areas: 8 → 10.
.gitignore — unignore rules for tests/fixtures/**/node_modules/ so the mcp-tool-heavy fixture stays under version control.

Removed

F4 — TOK hotspot padding loop and take dead-code. Hotspots may now contain fewer than 3 entries for tiny projects (the honest answer); contract still bounds at ≤ 10.
F5 — Pattern D / CA-TOK-004 (sonnet-era signature). Catalogue entry removed from knowledge/opus-4.7-patterns.md and commands/tokens.md. Suppression entries for CA-TOK-004 are now no-ops.

Breaking changes

F2 — MCP token estimates jump from flat 15 to ≥ 500. Token Efficiency grades for projects with MCP servers may shift. whats-active totals report higher numbers. Documented in commands/posture.md next-steps.
F3 — scoreByArea is severity-weighted. Posture JSON consumers reading areas[*].score will see different values for non-clean configs. Use result.scoringVersion === 'v5' to detect the change. Drift comparisons across v4↔v5 baselines may show artificial deltas — re-baseline after upgrade.
F5 — Pattern D / CA-TOK-004 no longer emitted. Existing exact CA-TOK-004 suppression entries are harmless but obsolete.
N1 suppression backward-compat — CA-TOK-* glob now also matches CA-TOK-005. To preserve prior behavior of suppressing only patterns A/B/C, replace the glob with explicit IDs:
```
CA-TOK-001
CA-TOK-002
CA-TOK-003
```
A one-time runtime warning for this case is a v5.0.1 candidate.
Posture areas count: 9 → 10 (Plugin Hygiene from COL). Consumers hard-coding 9 must update.

Migration notes

CA-TOK-* glob suppressions: explicit-ID list recommended if CA-TOK-005 should not be suppressed.
CA-TOK-004 exact-ID suppression entries: safe to remove.
Drift baselines created against v4 should be re-saved post-upgrade to avoid artificial F3 weighting deltas.
Posture JSON consumers must update any hardcoded areas.length === 8 or === 9 assertions to >= 10.

Tests

543 → 635 (+92): F1-F7 (alpha rounds = +43), N1-N4 + N6 (beta = +39), M7 + M8 + N5 (rc = +10). 36 test files (12 lib + 23 scanner + 1 hook).
New fixtures: tok-active-config/, additional-dirs-many/, additional-dirs-ok/, large-cascade/, small-cascade/, skill-bloated/, skill-tight/, mcp-tool-heavy/ (with mocked node_modules/), hooks-verbose/, hooks-quiet/, readme-desynced/, mcp-budget/{14,25,60,120,unknown}-tools/, volatile-mid-section/{volatile-line-60,volatile-line-200}/, denied-tools-in-schema/, collision-plugins/fake-home/ (plugin-a + plugin-b + plugin-c + user-level review skill).
New test files: tests/scanners/manifest.test.mjs, tests/scanners/cache-prefix.test.mjs, tests/scanners/disabled-in-schema.test.mjs, tests/scanners/collision.test.mjs, tests/scanners/accurate-tokens.test.mjs.

Notes

mock.method against ESM module exports does not work (Node 18+ ESM read-only export bindings). v5 tests use globalThis.fetch mocking for --accurate-tokens instead — equivalent coverage at the actual external-dependency boundary.
Plugin-vs-built-in collision detection is intentionally not implemented. Step 22a research spike (docs/v5-namespace-research.md, gitignored) could not verify Claude Code's resolution behavior when a plugin command shares a name with a built-in. Treated as info-only; v5.0.1 candidate.
README/CLAUDE.md badge reconciliation done in Step 28 (this release). self-audit --check-readme PASSES against the filesystem. Test count counter switched from file-count to test-case count via subprocess node --test parse.
hotspot.path exposed on file-backed hotspots (Step 30 fix). The rc.1 --accurate-tokens implementation looked up hotspot.path but the scanner only emitted source. File-backed hotspots now carry path (absolute path); MCP-server hotspots leave it unset (they are virtual entries representing runtime tool-schema cost, not file content).

SC-6b release-gate result (verified 2026-05-01)

PASS — 0.85% under-estimation against real count_tokens API.
Fixture: tests/fixtures/marketplace-large/. Top-3 hotspots = 1 file-backed (CLAUDE.md) + 2 MCP virtuals. MCP entries skipped per design (no readable content; their tokens are formula-based at 500 + toolCount × 200).
CLAUDE.md actual: 589 tokens (Anthropic count_tokens, claude-opus-4-7). Estimated: 594 tokens (byte heuristic at 4 bytes/token via estimateTokens). Delta: −5 tokens, −0.85% — well within the ±5% gate.
No tuning of estimateTokens heuristic required for v5.0.0.

[5.0.0-rc.1] - 2026-05-01

Summary

Release candidate for v5.0.0 — knowledge rensing and tokenizer calibration. Three deliverables: M8 (Sonnet-era → Opus 4.7 best-practices rewrite), M7 (cache-telemetry recipe in knowledge/ plus an opt-in CLI flag), and N5 (--accurate-tokens API calibration via Anthropic's count_tokens endpoint).

Added

N5 — --accurate-tokens flag on scanners/token-hotspots-cli.mjs. When ANTHROPIC_API_KEY is set, the CLI calls Anthropic's count_tokens endpoint for the top 3 hotspots and populates output.calibration = { actual_tokens, source: 'count_tokens_api', sampled_hotspots: 3 }. When the key is absent, calibration = { skipped: 'no-api-key' } and a stderr warning is emitted. Designed for the manual SC-6b release-gate verification, not routine use.
scanners/lib/tokenizer-api.mjs — wrapper around count_tokens with a 5-second AbortController timeout, exponential-backoff retry on HTTP 429 (max 3 retries: 1s, 2s, 4s), and required headers (x-api-key, anthropic-version: 2023-06-01, content-type). API key is masked to ${key.slice(0,8)}... in every error message and every thrown error; non-429 HTTP errors throw status code only — response body is never included (it may echo the key on auth failures). maskKey() is exported for callers that need safe logging.
M7 — knowledge/cache-telemetry-recipe.md (new). Manual jq recipe for verifying prompt-cache hit rate from Claude Code session transcripts (~/.claude/projects/<slug>/*.jsonl). Sums cache_read_input_tokens and cache_creation_input_tokens per turn and reports a hit-rate ratio. Recipe-form (not bundled scanner) keeps the project's "no transcript-parsing as core feature" non-goal intact while giving users a runtime escape hatch.
M7 — --with-telemetry-recipe flag on the same CLI. When passed, emits telemetry_recipe_path in the JSON output pointing to the recipe file. Without the flag, output is unchanged. Committed as a default deliverable, opt-in at invocation time.

Changed

M8 — knowledge-base rensing: replaced the "Keep CLAUDE.md under 200 lines" rule in knowledge/configuration-best-practices.md with cache-stability guidance (first 30 lines stable, volatile content below the cache threshold). Added a footnote that the 200-line rule was a Sonnet-era adherence heuristic; Opus 4.7 uses prompt-cache structure as the dominant cost lever. Cross-references knowledge/opus-4.7-patterns.md.
commands/tokens.md next-steps: documents --with-telemetry-recipe as the cache-verification path after a structural fix.

Tests

625 → 635 (+10): --with-telemetry-recipe (×2), tokenizer-api unit tests (×6 — masking, body-leak protection, AbortController signal, 429 retry, header set, fetch mock happy path), --accurate-tokens no-key subprocess test (×1), absent-flag negative test (×1).
New file: tests/scanners/accurate-tokens.test.mjs. No new fixtures (re-uses marketplace-large).

Notes

SC-6b release gate is NOT closed by these commits. Step 26's tests use mocked globalThis.fetch to verify the integration contract; ±5% accuracy against real count_tokens requires a live API key and must be verified manually before tagging v5.0.0 in Session 5.
The plan's specified mock.method(tokenizerApi, 'callCountTokensApi', ...) pattern collides with ESM read-only export bindings in Node 18+. Tests mock at the globalThis.fetch boundary instead — equivalent coverage, no module-export rebinding required.
README/CLAUDE.md badge counts and plugin.json version still target v4.0.0; Step 28+29 will sync those during the release wrap.
[skip-docs] tag on the N5 feat commit; M7 and M8 are docs(...) commits and don't need it.

[5.0.0-beta.1] - 2026-05-01

Summary

First v5.0.0 beta — new scanners. Five new finding sources land: MCP tool-schema budget (CA-TOK-005), system-prompt manifest CLI/command (/config-audit manifest), cache-prefix stability (CPS), disabled-tools-still-in-schema (DIS), and cross-plugin/user-vs-plugin skill collision (COL/CA-COL-001). Plugin Hygiene becomes a 10th area-scorecard column.

Added

N1 — CA-TOK-005 MCP tool-schema budget: per-server tiered finding inside the TOK scanner. Thresholds — < 20 no finding, 20–49 low, 50–99 medium, 100+ high; null (manifest unparseable) low + "tool count unknown" message. Scoped to project-local .mcp.json to keep /config-audit <path> actionable. Recommendation links to the Step 25 cache-telemetry recipe.
N2 — /config-audit manifest: new slash command + scanners/manifest.mjs CLI. Renders a single ranked table of every token source (CLAUDE.md cascade, plugins, skills, MCP servers, hooks) sorted DESC by estimated_tokens. Reuses readActiveConfig; CLAUDE.md per-file tokens are distributed proportional to bytes.
N3 — CPS scanner (CA-CPS-NNN): Cache-Prefix Stability Analyzer. Walks the CLAUDE.md cascade and flags volatile content between lines 31 and 150 — beyond TOK Pattern A's top-30 territory. Volatile-pattern set extends Pattern A with shell-exec lines (! prefix) and ${VAR} substitutions. Severity medium per finding. Skips lines 1–30 (Pattern A's range).
N4 — DIS scanner (CA-DIS-NNN): Disabled-In-Schema Detector. Detects tools that appear in BOTH permissions.deny and permissions.allow within the same settings.json. The deny list wins, so allow entries are dead config but still load every turn. Tool identity is the bare name (everything before (); Bash(npm:*) and Bash are treated as the same tool. Severity low.
N6 — COL scanner (CA-COL-001): Cross-Plugin Skill Collision detector. Plugin-vs-plugin same skill name → low. User-vs-plugin same skill name → medium. Findings carry details.namespaces array with {source, name, path} for every conflicting source.
details field on findings: output.mjs:finding() helper now passes through optional details for scanner-specific structured payloads (used by COL).
"Plugin Hygiene" area (10th in scorecard): COL contributes here. Posture JSON now reports 10 areas instead of 9.

Changed

scoreByArea deduplicates by area name: when multiple scanners share an area (TOK + CPS → "Token Efficiency", SET + DIS → "Settings"), they produce one combined row with merged finding counts. Existing 9-area contract preserved for non-Plugin-Hygiene areas.

Known breaking changes

Suppression backward-compat — CA-TOK-* glob now also matches CA-TOK-005. Existing .config-audit-ignore entries that suppress TOK findings via the CA-TOK-* glob will silently include CA-TOK-005 (MCP budget). To preserve the prior behavior of suppressing only patterns A/B/C, replace the glob with explicit IDs:
```
CA-TOK-001
CA-TOK-002
CA-TOK-003
```
A one-time runtime warning for this case is out of scope for v5.0.0 — it is a candidate for v5.0.1.
Plugin-vs-built-in collision is intentionally not implemented. The Step 22a research spike could not verify Claude Code's resolution behavior when a plugin command shares a name with a built-in (/help, /clear, /init, /review, /config, /cost, /security-review). Treated as info-only in this release; a follow-up v5.0.1 ticket may add an opt-in check.

Tests

586 → 625 (+39): N1 (×7), N2 (×11), N3 (×7), N4 (×6), N6 (×8).
New fixtures: mcp-budget/{14,25,60,120,unknown}-tools/, volatile-mid-section/{volatile-line-60,volatile-line-200}/, denied-tools-in-schema/, collision-plugins/fake-home/ (plugin-a + plugin-b + plugin-c + user-level review skill).

Notes

[skip-docs] tag used on every feat commit — README/CLAUDE.md badge counts (scanner count, command count, test count) and the architecture sections are intentionally fenced off until Session 5 (Step 28). This keeps the v5 plan's session boundaries clean even when the Forgejo pre-commit-docs-gate hook would otherwise block these commits.

[5.0.0-alpha.2] - 2026-05-01

Summary

Second v5.0.0 alpha — structural gaps + README self-audit. TOK pattern severities recalibrated for tokens/turn impact (F7), three new findings cover settings/skills/cascade structure (M2, M4, M6), MCP tool-count detection wired (M1), HKV gains a verbose-output check (M5), and self-audit grows a --check-readme flag (F6).

Added

F7 — TOK severity recalibration: Pattern A (cache-breaking volatile top) medium → high, Pattern B (redundant permissions) low → medium, Pattern C (deep imports) medium → low. Each finding now carries a calibration_note evidence field documenting the heuristic basis.
M6 — additionalDirectories settings key: added to KNOWN_KEYS so it no longer trips "unknown settings key". New low-severity finding when additionalDirectories.length > 2.
M4 — TOK Pattern E: medium-severity finding when activeConfig.claudeMd.estimatedTokens > 10_000 — flags cascades that bleed budget every turn.
M2 — TOK Pattern F: low-severity finding for project-local SKILL.md whose frontmatter description exceeds 500 characters (description loads on every turn even when the body does not). Scoped to discovery.files; user/plugin skills out of project scope are not flagged.
M1 — MCP tool-count detection: readActiveMcpServers now resolves tool count via cache → node_modules/<pkg>/package.json → {toolCount: null, toolCountUnknown: true} fallback. Tool count drives estimateTokens per server.
M5 — HKV verbose hook output: new low-severity finding when a referenced hook script contains > 50 console.log / process.stdout.write lines (static heuristic, no execution).
F6 — self-audit --check-readme flag: filesystem counts (scanners, commands, agents, hooks, tests, knowledge) compared against README badge values. Helper export: checkReadmeBadges(pluginDir).

Changed

TOK severities (F7) — see Added. Posture aggregates that depended on Pattern A being medium will now reflect the higher-impact rating.
.gitignore — added unignore rules so tests/fixtures/**/node_modules/ are tracked. Required by the mcp-tool-heavy fixture.

Tests

563 → 586 (+23): F7 table-driven (×6), M6 (×3), M4 (×2), M2 (×2), M1 (×4), M5 (×2), F6 (×4).
New fixtures: additional-dirs-many/, additional-dirs-ok/, large-cascade/, small-cascade/, skill-bloated/, skill-tight/, mcp-tool-heavy/ (with mocked node_modules/), hooks-verbose/, hooks-quiet/, readme-desynced/.

Notes

result.readmeCheck.passed === true is not required during alpha/beta phases. The real plugin's own check is currently red (scanners 10 vs README 9, tests 31 vs README 543) — reconciliation deferred to Session 5 Step 28 (README sync).
[skip-docs] tag used on every commit — README/CLAUDE.md badge counts and architecture text are intentionally fenced off until Session 5.

[5.0.0-alpha.1] - 2026-05-01

Summary

First v5.0.0 alpha — token-economy round, F1-F5. The TOK scanner now consumes readActiveConfig (per-MCP-server hotspots, claudeMd cascade tokens), severity weighting replaces flat finding counts in scoreByArea, and MCP servers no longer estimate at a flat 15 tokens. Pattern D (CA-TOK-004 sonnet-era signature) removed — too noisy, not actionable.

Added

'mcp' kind for estimateTokens (F2): an active MCP server now estimates ≥ 500 tokens (base protocol + schema overhead) instead of the v4 flat 15. Optional {toolCount} raises the estimate to 500 + toolCount * 200 once Step 14 wires tool-count detection.
TOK ↔ readActiveConfig integration (F1): the TOK scanner emits one hotspot per active MCP server, sums their tokens into total_estimated_tokens, and exposes result.activeConfig (claudeMd cascade tokens, mcpServerCount, pluginCount, skillCount).
scoringVersion: 'v5' field on scoreByArea output for cross-version drift detection.
WEIGHTS named export from scanners/lib/severity.mjs (Object.freeze).

Changed

BREAKING (intentional, F3): scoreByArea is now severity-weighted. Penalty = Σ count[s] * WEIGHTS[s]; passRate = max(0, 100 - penalty / max(10, findingCount * 4) * 100). Lows no longer crater an area's grade; a single high or critical consumes a large fraction of budget. baseline-all-a fixture remains all-A (no critical/high on that fixture).
BREAKING (intentional, F2): MCP server token estimates jump from a flat 15 to ≥ 500. whats-active totals and TOK hotspots will report higher numbers for any project with active MCP servers.
BREAKING (intentional, F5): Pattern D / CA-TOK-004 (sonnet-era signature) is no longer emitted. Suppression entries for CA-TOK-004 are now no-ops; downstream tools that filter on the ID should drop it. The catalogue entry was removed from knowledge/opus-4.7-patterns.md and commands/tokens.md.
Hotspots contract (F4): the v4 padding loop and take dead-code are gone. Hotspots may now contain fewer than 3 entries for tiny projects (the honest answer); contract still bounds at ≤ 10.

Migration notes

CA-TOK-* glob suppression entries continue to suppress 001-003. Existing exact CA-TOK-004 entries are harmless but obsolete — remove them at convenience.
Posture/JSON consumers reading areas[*].score will see different values for non-clean configs. Use result.scoringVersion === 'v5' to detect.

Tests

543 → 563 across the alpha.1 commits (+9 severity-weighting/scoring, +4 estimateTokens 'mcp', +1 MCP caller migration, +3 readActiveConfig integration, +2 hotspots-uniqueness, +2 sonnet-era zero-finding).
New fixture tests/fixtures/tok-active-config/ — minimal repo with .mcp.json (2 servers), CLAUDE.md, plugin skeleton.

[4.0.0] - 2026-04-19

Summary

Opus 4.7 era upgrade. New TOK scanner detects token-efficiency anti-patterns (cache-breaking volatile content, redundant tool permissions, deep import chains, sonnet-era minimal setups). Token Efficiency joins the quality scorecard as the 8th area. Scanner-agent and verifier-agent migrate from haiku → sonnet per global no-haiku policy.

Added

token-hotspots.mjs scanner (CA-TOK-001..004) — 4 patterns aligned with Opus 4.7 token-cost dynamics:
- CA-TOK-001 cache-breaking volatile content (timestamps/UUIDs in top 30 lines of CLAUDE.md)
- CA-TOK-002 redundant tool permissions (duplicate or subset overlaps)
- CA-TOK-003 deep @import chains (>2 hops on the load path)
- CA-TOK-004 sonnet-era minimal setup (no skills/MCP/hooks/managed/plugins)
/config-audit tokens [path] [--global] — ranked hotspot table + per-pattern findings.
scanners/token-hotspots-cli.mjs — standalone CLI emitting total_estimated_tokens, hotspots, and per-finding output.
Token Efficiency as the 8th quality area in the posture scorecard (now 9 scanners total: CML/SET/HKV/RUL/MCP/IMP/CNF/GAP/TOK).
id field on every area in the scorecard payload (token_efficiency, instruction_clarity, etc.) for stable downstream lookup.
13 new TOK scanner tests + 3 CLI tests + posture grade-stability test for token_efficiency.
Knowledge refresh: knowledge/opus-4.7-patterns.md, plus 2026-04 deltas (v2.1.83–v2.1.111) added to feature-evolution.md, claude-code-capabilities.md, and hook-events-reference.md from research/03-claude-code-changes-config-surfaces.md.

Changed

BREAKING (additive surface): Quality areas count 7 → 8. Posture JSON consumers that hard-coded 7 areas must update.
BREAKING (model migration): scanner-agent and verifier-agent migrated haiku → sonnet. Latency and cost trade-offs accepted; deterministic scanner CLIs preferred over agent invocations.
Scanner count: 8 → 9 (TOK added).
Command count: 16 → 17 (/config-audit tokens added).
Version bump: 3.1.0 → 4.0.0.

[3.1.0] - 2026-04-14

Summary

New read-only command /config-audit whats-active — shows exactly what Claude Code loads for a given repo, with token estimates.

Added

/config-audit whats-active [path] — inventory of active plugins, skills, MCP servers, hooks, and CLAUDE.md cascade for a repo, with source attribution (user/project/plugin) and rough token estimates. Read-only, <2s.
scanners/lib/active-config-reader.mjs — pure async helper: readActiveConfig(), detectGitRoot(), walkClaudeMdCascade(), readClaudeJsonProjectSlice() (longest-prefix matching), enumeratePlugins(), enumerateSkills(), readActiveHooks(), readActiveMcpServers(), estimateTokens().
scanners/whats-active.mjs — thin CLI shim supporting --json, --output-file, --verbose, --suggest-disables.
Optional --suggest-disables flag surfaces deterministic disable candidates (disabled MCP servers, zero-item plugins, unreferenced plugins, orphan skills) and invites an LLM judgment pass in the command.
36 new tests in tests/lib/active-config-reader.test.mjs, plus a rich-repo tmpdir fixture helper.

Changed

Version bump: 3.0.1 → 3.1.0 (minor, additive feature, no breaking changes).
Command count: 15 → 16.

[3.0.1] - 2026-04-04

Summary

Cross-platform fix — scanners, hooks, and lib now work correctly on Windows.

Fixed

file-discovery.mjs: depth calculation, agent/command/plugin path matching now use path.sep
scan-orchestrator.mjs: fixture-path filtering now uses path.sep
post-edit-verify.mjs: rules-dir regex handles both / and \ separators
auto-backup-config.mjs: rules-dir detection now uses path.sep
import-resolver.mjs: circular import display uses basename(), /tmp fallback replaced with os.tmpdir()
string-utils.mjs: normalizePath trailing separator regex handles both / and \

Added

4 cross-platform path tests (total 486 tests)

[3.0.0] - 2026-04-04

Summary

Health redesign — configuration health is now quality-only. Feature utilization removed from grades entirely.

Changed

Health = quality only. 7 deterministic scanners (CML, SET, HKV, RUL, MCP, IMP, CNF) determine your grade. Feature Coverage is no longer a graded area.
Feature recommendations are opt-in. Unused features shown as "opportunities" via /config-audit feature-gap, grouped by impact (high/medium/explore), backed by Anthropic docs. No more "Feature Coverage: F" for correct minimal setups.
Posture output redesigned. Shows Health: {grade} ({score}/100) with 7 quality areas. Removed utilization %, maturity level, segment label.
Feature-gap is interactive. Users select recommendations to implement directly — no manual file editing required. Backup created automatically.
avgScore bug fixed. Grade letter and displayed score now computed from the same population (quality areas only).

Added

generateHealthScorecard() in scoring.mjs — quality-only scorecard
opportunitySummary() in feature-gap-scanner.mjs — groups findings by impact tier
opportunityCount field in posture JSON output
"Official Configuration Guidance" section in knowledge base (Anthropic docs, proven impacts)
21 new tests (total 482 across 27 test files)

Removed

S2-PROMPT.md and V2-ANNOUNCEMENT.md — v2 development artifacts
Utilization %, maturity level, segment label from posture terminal output and reports
Feature Coverage row from area breakdown tables
"Top Actions" sourced from GAP findings (replaced by opportunities pointer)

Backward Compatibility

JSON output preserves all legacy fields (utilization, maturity, segment) for programmatic consumers
Drift baselines unaffected — GAP findings still present in envelopes
All existing exports maintained (calculateUtilization, determineMaturityLevel, etc.)

[2.2.0] - 2026-04-04

Summary

UX quality fix — fixture filtering, session path migration, output polish.

Added

Automatic test-fixture filtering in scan-orchestrator: findings from tests/, examples/, __tests__/ excluded from grades, stored in env.fixture_findings
--include-fixtures CLI flag for scan-orchestrator and posture to override filtering
scan-orchestrator.test.mjs — 20 new tests for fixture filtering and isFixturePath
Legacy session path detection in cleanup command

Changed

Session storage moved from ~/.config-audit/ to ~/.claude/config-audit/ (pathguard compatible)
Self-audit grade: F → A (98) after fixture filtering
Combined scanner + posture into single Bash call in default audit command
Removed "F grade is misleading" disclaimer — grades are now accurate
All CLI banners and envelope metadata updated to v2.2.0
461 tests (up from 441), 27 test files (up from 26)

Removed

Manual fixture counting instruction in config-audit.md (orchestrator handles it)
Redundant isFixtureOrExample filter in self-audit.mjs (promoted to orchestrator)

[2.1.0] - 2026-04-03

Summary

UX redesign — auto-scope detection, zero questions, simplified command surface.

Changed

/config-audit now runs full audit automatically (auto-detects scope from git context)
Removed mode selection prompts — scope override via /config-audit full|repo|home|current
Simplified from 17 to 15 commands (removed quick, report, watch; added help)
All CLI banners and envelope metadata updated to v2.1.0

Added

/config-audit help command with categorized command reference
Auto-scope detection from git context (repo vs home vs full-machine)

Removed

/config-audit:quick (merged into default /config-audit)
/config-audit:report (merged into analyze output)
/config-audit:watch (use /config-audit drift instead)

[2.0.0] - 2026-04-03 (v2.0 Complete)

Summary

Complete rewrite from LLM-only prototype to deterministic scanner-backed configuration intelligence. 7 development sessions (S1-S7), ~15,000 lines of code, 408+ tests.

Highlights

8 deterministic scanners (CML, SET, HKV, RUL, MCP, IMP, CNF, GAP) + PLH standalone
Feature gap analysis with 25 dimensions across 4 tiers
Auto-fix engine with 9 fix types + backup/rollback
Drift detection with baseline comparison
Suppression engine (.config-audit-ignore)
Self-audit CLI
17 commands, 6 agents, 4 hooks
408+ tests (zero external dependencies)

Added (S7)

Example projects: examples/minimal-setup/ and examples/optimal-setup/
Demo script: examples/run-demo.sh
.config-audit-ignore for self-audit suppressions
V2-ANNOUNCEMENT.md
DEPRECATED.md for capability-auditor skill

Fixed (S7)

hooks.json: SessionStart and Stop timeout 5ms → 5000ms
self-audit.mjs: Suppression now enabled (was hardcoded to suppress: false)

Changed (S7)

README.md: Complete rewrite for public release
CLAUDE.md: Added Suppressions section
.gitignore: Added node_modules/ and S*-PROMPT.md

[1.6.0] - 2026-04-03 (v2.0 S6: Unified Reports + Self-Audit + Suppressions)

Added

Report generator scanners/lib/report-generator.mjs — unified markdown reports: generatePostureReport(), generateDriftReport(), generatePluginHealthReport(), generateFullReport()
Suppression engine scanners/lib/suppression.mjs — .config-audit-ignore file support with exact IDs and glob patterns (CA-SET-*), audit trail via suppressed_findings in envelope
Self-audit CLI scanners/self-audit.mjs — runs all scanners + plugin health on this plugin: node self-audit.mjs [--json] [--fix], exit codes 0/1/2
PostToolUse hook post-edit-verify.mjs — verifies config files after Edit/Write, blocks if new critical/high findings introduced
New command: /config-audit:report — generate unified report (posture + optional drift/plugin-health)
Test fixture .config-audit-ignore in fixable-project
54 new tests (total 408 across 25 test files)

Changed

scan-orchestrator.mjs: suppression integration — applies .config-audit-ignore after all scanners run, --no-suppress flag to disable
hooks.json: added PostToolUse event with post-edit-verify

[1.5.0] - 2026-04-03 (v2.0 S5: Drift + Watch + Plugin Health)

Added

Diff engine scanners/lib/diff-engine.mjs — diffEnvelopes() comparing baseline vs current, formatDiffReport() for terminal output
Baseline manager scanners/lib/baseline.mjs — save/load/list/delete named baselines in ~/.claude/config-audit/baselines/
Drift CLI scanners/drift-cli.mjs — standalone: node drift-cli.mjs <path> [--save] [--baseline name] [--json] [--list]
Plugin health scanner scanners/plugin-health-scanner.mjs (PLH) — validates plugin structure, frontmatter, cross-plugin conflicts (runs independently, not in scan-orchestrator)
3 new commands:
- /config-audit:drift — compare current config against saved baseline
- /config-audit:watch — on-demand drift check with baseline monitoring
- /config-audit:plugin-health — audit plugin structure and cross-plugin coherence
Test fixtures test-plugin/ (valid) and broken-plugin/ (invalid) for plugin health tests
48 new tests (total 354 across 21 test files)

[1.4.0] - 2026-04-03 (v2.0 S4: Fix + Rollback Action Pillar)

Added

Fix engine scanners/fix-engine.mjs — deterministic auto-fix for 9 fix types:
- json-key-add (missing $schema), json-key-remove (deprecated keys), json-key-type-fix (type mismatches, invalid effortLevel), json-restructure (hooks array→object, matcher object→string), frontmatter-rename (globs→paths), file-rename (non-.md→.md)
Rollback engine scanners/rollback-engine.mjs — listBackups(), restoreBackup(), deleteBackup() with checksum verification
Fix CLI scanners/fix-cli.mjs — standalone: node fix-cli.mjs <path> [--apply] [--json] [--global], dry-run by default
Backup lib scanners/lib/backup.mjs — shared backup module with checksums and manifests
2 new commands:
- /config-audit:fix — scan, plan, backup, apply, verify in one flow
- /config-audit:rollback — list or restore from backups
PreToolUse hook auto-backup-config.mjs — auto-backup config files before Edit/Write
Test fixture fixable-project/ — fixture with all 9 fixable issue types
38 new tests (total 306 across 17 test files)

Changed

file-discovery.mjs: walkRulesDir now discovers all files (not just .md) for non-.md validation
backup-before-change.mjs: refactored to use shared lib/backup.mjs (no logic duplication)
hooks.json: added PreToolUse event with auto-backup

[1.3.0] - 2026-04-03 (v2.0 S3: Posture + Feature Gap Commands)

Added

Scoring module scanners/lib/scoring.mjs — utilization, maturity (5 levels), segments, area scoring, scorecard generation
Posture CLI scanners/posture.mjs — standalone Node.js tool: node posture.mjs <path> [--json] [--global]
2 new commands:
- /config-audit:posture — quick scorecard with A-F grades, utilization%, maturity level
- /config-audit:feature-gap — deep gap analysis with prioritized next-best-actions
feature-gap-agent — Opus agent for deep analysis, report generation (max 200 lines)
Knowledge file gap-closure-templates.md — 11 templates with effort/gain estimates
HTML report template templates/feature-gap-report.html — visual report with progress bars, grade badges
64 new tests (total 268 across 14 test files)

Changed

Tier weighting: T1 gaps count 3x, T2 count 2x, T3/T4 count 1x in utilization score
Maturity is threshold-based: highest level where ALL requirements are met

[1.2.0] - 2026-04-03 (v2.0 S2: Advanced Scanners + Knowledge Base)

Added

4 advanced scanners (zero external deps):
- mcp-config-validator.mjs (MCP) — server types, trust levels, env vars, unknown fields
- import-resolver.mjs (IMP) — broken @imports, circular refs, deep chains, tilde paths
- conflict-detector.mjs (CNF) — settings conflicts, permission contradictions, hook duplicates
- feature-gap-scanner.mjs (GAP) — 25 feature gaps across 4 tiers (Foundation/Depth/Advanced/Enterprise)
Knowledge base — 5 reference documents: capabilities, best practices, anti-patterns, hook events, feature evolution
New test fixtures — .mcp.json files, @import chains, conflict-project/ fixture
75 new tests (total 204 across 12 test files)

Changed

Scan orchestrator runs 8 scanners (was 4)
Analyzer agent cross-references scanner findings with knowledge base

[1.1.0] - 2026-04-03 (v2.0 S1: Scanner Foundation)

Added

Deterministic scanner infrastructure — 4 Node.js scanners (zero external deps):
- claude-md-linter.mjs (CML) — CLAUDE.md structure, length, sections, @imports, duplicates
- settings-validator.mjs (SET) — settings.json schema, unknown/deprecated keys, type checks
- hook-validator.mjs (HKV) — hooks.json format, script existence, event validity, timeouts
- rules-validator.mjs (RUL) — .claude/rules/ glob matching, orphan detection, deprecated fields
Scanner lib — 5 shared modules: severity, output, file-discovery, yaml-parser, string-utils
Scan orchestrator — scan-orchestrator.mjs runs all scanners, outputs JSON envelope
Test infrastructure — 129 tests across 8 test files using node:test (zero deps)
Test fixtures — 4 fixture projects (healthy, broken, empty, minimal)
Finding ID format: CA-{SCANNER}-{NNN} (e.g. CA-CML-001)

Fixed

Agent model mismatches: scanner→haiku, analyzer→sonnet, planner→opus, implementer→sonnet, verifier→haiku

Changed

CLAUDE.md rewritten in English for public release readiness

[1.0.0] - 2026-02-11

Added

Cross-platform support (macOS, Linux, Windows)

Fixed

stop-session-reminder.mjs: Use path.basename/path.dirname instead of hardcoded / split
backup-before-change.mjs: Handle both / and \ path separators in safe filename generation

Removed

"Windows: hooks are 100% bash" from known gaps (was incorrect — all hooks are Node.js)

[0.7.0] - 2026-02-07

Note

Version reset from 1.2.0 to reflect actual maturity. Previous version was inflated — this plugin has never been externally tested.

What exists today

6 specialized agents (scanner, analyzer, interviewer, planner, implementer, verifier)
Full machine-wide Claude Code configuration discovery
Scope selection (current project, repo, home, full machine)
Inheritance hierarchy mapping and conflict detection
Mandatory backups before any changes
Rollback support
Syntax validation for all configuration files
Quick audit-only mode
Full optimization workflow with HITL checkpoints

Known gaps

Testing: no automated tests
Onboarding: never verified that a new user can install and use from scratch
External verification: nobody else has ever used this

41 KiB Raw Blame History Unescape Escape

Changelog

[5.0.0] - 2026-05-01

Summary

Added

Changed

Removed

Breaking changes

Migration notes

Tests

Notes

SC-6b release-gate result (verified 2026-05-01)

[5.0.0-rc.1] - 2026-05-01

Summary

Added

Changed

Tests

Notes

[5.0.0-beta.1] - 2026-05-01

Summary

Added

Changed

Known breaking changes

Tests

Notes

[5.0.0-alpha.2] - 2026-05-01

Summary

Added

Changed

Tests

Notes

[5.0.0-alpha.1] - 2026-05-01

Summary

Added

Changed

Migration notes

Tests

[4.0.0] - 2026-04-19

Summary

Added

Changed

[3.1.0] - 2026-04-14

Summary

Added

Changed

[3.0.1] - 2026-04-04

Summary

Fixed

Added

[3.0.0] - 2026-04-04

Summary

Changed

Added

Removed

Backward Compatibility

[2.2.0] - 2026-04-04

Summary

Added

Changed

Removed

[2.1.0] - 2026-04-03

Summary

Changed

Added

Removed

[2.0.0] - 2026-04-03 (v2.0 Complete)

Summary

Highlights

Added (S7)

Fixed (S7)

Changed (S7)

[1.6.0] - 2026-04-03 (v2.0 S6: Unified Reports + Self-Audit + Suppressions)

Added

Changed

[1.5.0] - 2026-04-03 (v2.0 S5: Drift + Watch + Plugin Health)

Added

[1.4.0] - 2026-04-03 (v2.0 S4: Fix + Rollback Action Pillar)

Added

Changed

[1.3.0] - 2026-04-03 (v2.0 S3: Posture + Feature Gap Commands)

41 KiB

Raw Blame History