ktg-plugin-marketplace/plugins/config-audit/CHANGELOG.md
Kjell Tore Guttormsen 6cfca82885 fix(config-audit): expose hotspot.path for --accurate-tokens calibration + SC-6b PASS
The v5.0.0-rc.1 N5 implementation looked up hotspot.path in
calibrateAgainstApi() but token-hotspots.mjs only emitted hotspot.source —
calibration silently produced 0 actual_tokens because every iteration hit
the `if (!hotspot?.path) continue` guard.

Fix: file-backed hotspots now expose `path: h.absPath` in the JSON output.
MCP-server hotspots intentionally leave path unset — their tokens are
runtime tool-schema (formula-based: 500 + toolCount × 200), not file
content readable by count_tokens.

SC-6b release-gate verified against tests/fixtures/marketplace-large:
- Actual (count_tokens, claude-opus-4-7): 589 tokens for CLAUDE.md
- Estimated (4-bytes/token byte heuristic): 594 tokens
- Delta: -5 tokens / -0.85% — well within ±5% gate. PASS.

CHANGELOG: documented the fix + SC-6b result inline under [5.0.0].

All 635 tests still green. No estimateTokens tuning required for v5.0.0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:45:56 +02:00

41 KiB
Raw Blame History

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[5.0.0] - 2026-05-01

Summary

Reality-based token-optimization release. v4.0.0 shipped Opus-4.7 token surfaces aligned to a Sonnet-era cost model; v5.0.0 rebuilds the foundations against verified Opus-4.7 cost dynamics. Three pillars: honest token estimation (severity-weighted scoring, MCP estimates 15 → 500+, optional --accurate-tokens API calibration), new structural scanners (cache-prefix stability, dead tool grants, plugin collisions), and new diagnostic surfaces (/config-audit manifest, /config-audit tokens extended, knowledge-base rensing aligned to Opus 4.7 cache dynamics).

Consolidated from 5.0.0-alpha.1 (F1-F5 token-economy round), 5.0.0-alpha.2 (M1, M2, M4-M6, F6, F7 structural gaps + README self-audit), 5.0.0-beta.1 (N1-N4, N6 new scanners + manifest CLI), and 5.0.0-rc.1 (M7, M8 knowledge rensing + N5 tokenizer calibration).

Added

  • 3 new scanners (9 → 12 deterministic):
    • CPS — Cache-Prefix Stability (CA-CPS-NNN): volatile content in lines 31150 of CLAUDE.md cascade, beyond TOK Pattern A's top-30 window. Volatile-pattern set extends Pattern A with shell-exec lines (! prefix) and ${VAR} substitutions.
    • DIS — Disabled-In-Schema (CA-DIS-NNN): tools listed in BOTH permissions.deny AND permissions.allow. Tool identity uses bare name (Bash(npm:*) and Bash are the same tool). Severity low.
    • COL — Cross-Plugin Skill Collision (CA-COL-001): plugin-vs-plugin same skill name → low; user-vs-plugin → medium. details.namespaces payload identifies conflicting sources.
  • TOK extensions:
    • CA-TOK-005 MCP tool-schema budget: per-server tiered finding (< 20 none, 2049 low, 5099 medium, 100+ high; null low + "tool count unknown"). Scoped to project-local .mcp.json.
    • Pattern E — Oversized cascade: medium when activeConfig.claudeMd.estimatedTokens > 10_000.
    • Pattern F — Bloated SKILL.md description: low when frontmatter description > 500 chars (loads every turn). Scoped to discovery.files.
  • /config-audit manifest + scanners/manifest.mjs CLI — single ranked table of every system-prompt token source (CLAUDE.md cascade, plugins, skills, MCP servers, hooks) sorted DESC by estimated_tokens. CLAUDE.md per-file tokens distributed proportional to bytes.
  • --accurate-tokens flag on token-hotspots-cli.mjs (N5): when ANTHROPIC_API_KEY is set, calls Anthropic's count_tokens for the top 3 hotspots and populates output.calibration = { actual_tokens, source: 'count_tokens_api', sampled_hotspots: 3 }. When absent: calibration = { skipped: 'no-api-key' } plus stderr warning.
  • scanners/lib/tokenizer-api.mjscount_tokens wrapper. 5s AbortController timeout. Exponential backoff on 429 (3 retries: 1s/2s/4s). API key masked to ${key.slice(0,8)}... in every error; HTTP body never included in errors (it may echo the key on auth failures). maskKey() exported.
  • --with-telemetry-recipe flag on the same CLI (M7): emits telemetry_recipe_path field pointing to knowledge/cache-telemetry-recipe.md.
  • knowledge/cache-telemetry-recipe.md (M7): manual jq recipe summing cache_read_input_tokens + cache_creation_input_tokens per turn from session transcripts. Hit-rate interpretation table.
  • 'mcp' kind on estimateTokens (F2): active MCP servers estimate ≥ 500 tokens (base + schema overhead) instead of v4's flat 15. Optional {toolCount} raises to 500 + toolCount × 200.
  • MCP tool-count detection (M1): readActiveMcpServers resolves count via cache → node_modules/<pkg>/package.json{toolCount: null, toolCountUnknown: true} fallback.
  • additionalDirectories settings key (M6): added to KNOWN_KEYS; new low-severity finding when length > 2.
  • HKV verbose hook output (M5): low-severity finding when referenced hook script contains > 50 console.log/process.stdout.write lines (static, no execution).
  • self-audit --check-readme flag (F6): filesystem counts compared against README badges. Helper checkReadmeBadges(pluginDir). Step 28 of v5 plan reconciled all badges.
  • scoringVersion: 'v5' field on scoreByArea output for cross-version drift detection.
  • WEIGHTS named export from scanners/lib/severity.mjs (frozen).
  • details field on findings (output.mjs:finding()): optional structured payload for scanner-specific data (used by COL).
  • Plugin Hygiene as 10th quality area (from COL). Posture JSON now reports 10 areas.
  • TOK-readActiveConfig integration (F1): one hotspot per active MCP server; result.activeConfig summary (claudeMd cascade tokens, mcpServerCount, pluginCount, skillCount); try/catch fallback when scope-limited.

Changed

  • F3 — scoreByArea is severity-weighted. Penalty = Σ count[s] × WEIGHTS[s]; passRate = max(0, 100 penalty / max(10, findingCount × 4) × 100). Lows no longer crater an area's grade; criticals/highs do. baseline-all-a fixture remains all-A (no critical/high present).
  • F7 — TOK pattern severities recalibrated for tokens-per-turn impact: Pattern A medium → high, Pattern B low → medium, Pattern C medium → low. Each finding carries a calibration_note evidence field documenting the heuristic basis.
  • scoreByArea deduplicates by area name (N3 prep): TOK + CPS share "Token Efficiency"; SET + DIS share "Settings". Combined row with merged finding counts.
  • M8 — knowledge rensing: replaced "Keep CLAUDE.md under 200 lines" in knowledge/configuration-best-practices.md with cache-stability guidance (first 30 lines stable, volatile content below the cache threshold). Footnote explains the 200-line rule was a Sonnet-era adherence heuristic; Opus 4.7 uses prompt-cache structure as the dominant cost lever. Cross-references knowledge/opus-4.7-patterns.md.
  • commands/tokens.md next-steps: documents --with-telemetry-recipe as the cache-verification path.
  • Scanner count: 9 → 12. Command count: 17 → 18. Knowledge: 7 → 8. Quality areas: 8 → 10.
  • .gitignore — unignore rules for tests/fixtures/**/node_modules/ so the mcp-tool-heavy fixture stays under version control.

Removed

  • F4 — TOK hotspot padding loop and take dead-code. Hotspots may now contain fewer than 3 entries for tiny projects (the honest answer); contract still bounds at ≤ 10.
  • F5 — Pattern D / CA-TOK-004 (sonnet-era signature). Catalogue entry removed from knowledge/opus-4.7-patterns.md and commands/tokens.md. Suppression entries for CA-TOK-004 are now no-ops.

Breaking changes

  • F2 — MCP token estimates jump from flat 15 to ≥ 500. Token Efficiency grades for projects with MCP servers may shift. whats-active totals report higher numbers. Documented in commands/posture.md next-steps.
  • F3 — scoreByArea is severity-weighted. Posture JSON consumers reading areas[*].score will see different values for non-clean configs. Use result.scoringVersion === 'v5' to detect the change. Drift comparisons across v4↔v5 baselines may show artificial deltas — re-baseline after upgrade.
  • F5 — Pattern D / CA-TOK-004 no longer emitted. Existing exact CA-TOK-004 suppression entries are harmless but obsolete.
  • N1 suppression backward-compat — CA-TOK-* glob now also matches CA-TOK-005. To preserve prior behavior of suppressing only patterns A/B/C, replace the glob with explicit IDs:
    CA-TOK-001
    CA-TOK-002
    CA-TOK-003
    
    A one-time runtime warning for this case is a v5.0.1 candidate.
  • Posture areas count: 9 → 10 (Plugin Hygiene from COL). Consumers hard-coding 9 must update.

Migration notes

  • CA-TOK-* glob suppressions: explicit-ID list recommended if CA-TOK-005 should not be suppressed.
  • CA-TOK-004 exact-ID suppression entries: safe to remove.
  • Drift baselines created against v4 should be re-saved post-upgrade to avoid artificial F3 weighting deltas.
  • Posture JSON consumers must update any hardcoded areas.length === 8 or === 9 assertions to >= 10.

Tests

  • 543 → 635 (+92): F1-F7 (alpha rounds = +43), N1-N4 + N6 (beta = +39), M7 + M8 + N5 (rc = +10). 36 test files (12 lib + 23 scanner + 1 hook).
  • New fixtures: tok-active-config/, additional-dirs-many/, additional-dirs-ok/, large-cascade/, small-cascade/, skill-bloated/, skill-tight/, mcp-tool-heavy/ (with mocked node_modules/), hooks-verbose/, hooks-quiet/, readme-desynced/, mcp-budget/{14,25,60,120,unknown}-tools/, volatile-mid-section/{volatile-line-60,volatile-line-200}/, denied-tools-in-schema/, collision-plugins/fake-home/ (plugin-a + plugin-b + plugin-c + user-level review skill).
  • New test files: tests/scanners/manifest.test.mjs, tests/scanners/cache-prefix.test.mjs, tests/scanners/disabled-in-schema.test.mjs, tests/scanners/collision.test.mjs, tests/scanners/accurate-tokens.test.mjs.

Notes

  • mock.method against ESM module exports does not work (Node 18+ ESM read-only export bindings). v5 tests use globalThis.fetch mocking for --accurate-tokens instead — equivalent coverage at the actual external-dependency boundary.
  • Plugin-vs-built-in collision detection is intentionally not implemented. Step 22a research spike (docs/v5-namespace-research.md, gitignored) could not verify Claude Code's resolution behavior when a plugin command shares a name with a built-in. Treated as info-only; v5.0.1 candidate.
  • README/CLAUDE.md badge reconciliation done in Step 28 (this release). self-audit --check-readme PASSES against the filesystem. Test count counter switched from file-count to test-case count via subprocess node --test parse.
  • hotspot.path exposed on file-backed hotspots (Step 30 fix). The rc.1 --accurate-tokens implementation looked up hotspot.path but the scanner only emitted source. File-backed hotspots now carry path (absolute path); MCP-server hotspots leave it unset (they are virtual entries representing runtime tool-schema cost, not file content).

SC-6b release-gate result (verified 2026-05-01)

  • PASS — 0.85% under-estimation against real count_tokens API.
  • Fixture: tests/fixtures/marketplace-large/. Top-3 hotspots = 1 file-backed (CLAUDE.md) + 2 MCP virtuals. MCP entries skipped per design (no readable content; their tokens are formula-based at 500 + toolCount × 200).
  • CLAUDE.md actual: 589 tokens (Anthropic count_tokens, claude-opus-4-7). Estimated: 594 tokens (byte heuristic at 4 bytes/token via estimateTokens). Delta: 5 tokens, 0.85% — well within the ±5% gate.
  • No tuning of estimateTokens heuristic required for v5.0.0.

[5.0.0-rc.1] - 2026-05-01

Summary

Release candidate for v5.0.0 — knowledge rensing and tokenizer calibration. Three deliverables: M8 (Sonnet-era → Opus 4.7 best-practices rewrite), M7 (cache-telemetry recipe in knowledge/ plus an opt-in CLI flag), and N5 (--accurate-tokens API calibration via Anthropic's count_tokens endpoint).

Added

  • N5 — --accurate-tokens flag on scanners/token-hotspots-cli.mjs. When ANTHROPIC_API_KEY is set, the CLI calls Anthropic's count_tokens endpoint for the top 3 hotspots and populates output.calibration = { actual_tokens, source: 'count_tokens_api', sampled_hotspots: 3 }. When the key is absent, calibration = { skipped: 'no-api-key' } and a stderr warning is emitted. Designed for the manual SC-6b release-gate verification, not routine use.
  • scanners/lib/tokenizer-api.mjs — wrapper around count_tokens with a 5-second AbortController timeout, exponential-backoff retry on HTTP 429 (max 3 retries: 1s, 2s, 4s), and required headers (x-api-key, anthropic-version: 2023-06-01, content-type). API key is masked to ${key.slice(0,8)}... in every error message and every thrown error; non-429 HTTP errors throw status code only — response body is never included (it may echo the key on auth failures). maskKey() is exported for callers that need safe logging.
  • M7 — knowledge/cache-telemetry-recipe.md (new). Manual jq recipe for verifying prompt-cache hit rate from Claude Code session transcripts (~/.claude/projects/<slug>/*.jsonl). Sums cache_read_input_tokens and cache_creation_input_tokens per turn and reports a hit-rate ratio. Recipe-form (not bundled scanner) keeps the project's "no transcript-parsing as core feature" non-goal intact while giving users a runtime escape hatch.
  • M7 — --with-telemetry-recipe flag on the same CLI. When passed, emits telemetry_recipe_path in the JSON output pointing to the recipe file. Without the flag, output is unchanged. Committed as a default deliverable, opt-in at invocation time.

Changed

  • M8 — knowledge-base rensing: replaced the "Keep CLAUDE.md under 200 lines" rule in knowledge/configuration-best-practices.md with cache-stability guidance (first 30 lines stable, volatile content below the cache threshold). Added a footnote that the 200-line rule was a Sonnet-era adherence heuristic; Opus 4.7 uses prompt-cache structure as the dominant cost lever. Cross-references knowledge/opus-4.7-patterns.md.
  • commands/tokens.md next-steps: documents --with-telemetry-recipe as the cache-verification path after a structural fix.

Tests

  • 625 → 635 (+10): --with-telemetry-recipe (×2), tokenizer-api unit tests (×6 — masking, body-leak protection, AbortController signal, 429 retry, header set, fetch mock happy path), --accurate-tokens no-key subprocess test (×1), absent-flag negative test (×1).
  • New file: tests/scanners/accurate-tokens.test.mjs. No new fixtures (re-uses marketplace-large).

Notes

  • SC-6b release gate is NOT closed by these commits. Step 26's tests use mocked globalThis.fetch to verify the integration contract; ±5% accuracy against real count_tokens requires a live API key and must be verified manually before tagging v5.0.0 in Session 5.
  • The plan's specified mock.method(tokenizerApi, 'callCountTokensApi', ...) pattern collides with ESM read-only export bindings in Node 18+. Tests mock at the globalThis.fetch boundary instead — equivalent coverage, no module-export rebinding required.
  • README/CLAUDE.md badge counts and plugin.json version still target v4.0.0; Step 28+29 will sync those during the release wrap.
  • [skip-docs] tag on the N5 feat commit; M7 and M8 are docs(...) commits and don't need it.

[5.0.0-beta.1] - 2026-05-01

Summary

First v5.0.0 beta — new scanners. Five new finding sources land: MCP tool-schema budget (CA-TOK-005), system-prompt manifest CLI/command (/config-audit manifest), cache-prefix stability (CPS), disabled-tools-still-in-schema (DIS), and cross-plugin/user-vs-plugin skill collision (COL/CA-COL-001). Plugin Hygiene becomes a 10th area-scorecard column.

Added

  • N1 — CA-TOK-005 MCP tool-schema budget: per-server tiered finding inside the TOK scanner. Thresholds — < 20 no finding, 2049 low, 5099 medium, 100+ high; null (manifest unparseable) low + "tool count unknown" message. Scoped to project-local .mcp.json to keep /config-audit <path> actionable. Recommendation links to the Step 25 cache-telemetry recipe.
  • N2 — /config-audit manifest: new slash command + scanners/manifest.mjs CLI. Renders a single ranked table of every token source (CLAUDE.md cascade, plugins, skills, MCP servers, hooks) sorted DESC by estimated_tokens. Reuses readActiveConfig; CLAUDE.md per-file tokens are distributed proportional to bytes.
  • N3 — CPS scanner (CA-CPS-NNN): Cache-Prefix Stability Analyzer. Walks the CLAUDE.md cascade and flags volatile content between lines 31 and 150 — beyond TOK Pattern A's top-30 territory. Volatile-pattern set extends Pattern A with shell-exec lines (! prefix) and ${VAR} substitutions. Severity medium per finding. Skips lines 130 (Pattern A's range).
  • N4 — DIS scanner (CA-DIS-NNN): Disabled-In-Schema Detector. Detects tools that appear in BOTH permissions.deny and permissions.allow within the same settings.json. The deny list wins, so allow entries are dead config but still load every turn. Tool identity is the bare name (everything before (); Bash(npm:*) and Bash are treated as the same tool. Severity low.
  • N6 — COL scanner (CA-COL-001): Cross-Plugin Skill Collision detector. Plugin-vs-plugin same skill name → low. User-vs-plugin same skill name → medium. Findings carry details.namespaces array with {source, name, path} for every conflicting source.
  • details field on findings: output.mjs:finding() helper now passes through optional details for scanner-specific structured payloads (used by COL).
  • "Plugin Hygiene" area (10th in scorecard): COL contributes here. Posture JSON now reports 10 areas instead of 9.

Changed

  • scoreByArea deduplicates by area name: when multiple scanners share an area (TOK + CPS → "Token Efficiency", SET + DIS → "Settings"), they produce one combined row with merged finding counts. Existing 9-area contract preserved for non-Plugin-Hygiene areas.

Known breaking changes

  • Suppression backward-compat — CA-TOK-* glob now also matches CA-TOK-005. Existing .config-audit-ignore entries that suppress TOK findings via the CA-TOK-* glob will silently include CA-TOK-005 (MCP budget). To preserve the prior behavior of suppressing only patterns A/B/C, replace the glob with explicit IDs:
    CA-TOK-001
    CA-TOK-002
    CA-TOK-003
    
    A one-time runtime warning for this case is out of scope for v5.0.0 — it is a candidate for v5.0.1.
  • Plugin-vs-built-in collision is intentionally not implemented. The Step 22a research spike could not verify Claude Code's resolution behavior when a plugin command shares a name with a built-in (/help, /clear, /init, /review, /config, /cost, /security-review). Treated as info-only in this release; a follow-up v5.0.1 ticket may add an opt-in check.

Tests

  • 586 → 625 (+39): N1 (×7), N2 (×11), N3 (×7), N4 (×6), N6 (×8).
  • New fixtures: mcp-budget/{14,25,60,120,unknown}-tools/, volatile-mid-section/{volatile-line-60,volatile-line-200}/, denied-tools-in-schema/, collision-plugins/fake-home/ (plugin-a + plugin-b + plugin-c + user-level review skill).

Notes

  • [skip-docs] tag used on every feat commit — README/CLAUDE.md badge counts (scanner count, command count, test count) and the architecture sections are intentionally fenced off until Session 5 (Step 28). This keeps the v5 plan's session boundaries clean even when the Forgejo pre-commit-docs-gate hook would otherwise block these commits.

[5.0.0-alpha.2] - 2026-05-01

Summary

Second v5.0.0 alpha — structural gaps + README self-audit. TOK pattern severities recalibrated for tokens/turn impact (F7), three new findings cover settings/skills/cascade structure (M2, M4, M6), MCP tool-count detection wired (M1), HKV gains a verbose-output check (M5), and self-audit grows a --check-readme flag (F6).

Added

  • F7 — TOK severity recalibration: Pattern A (cache-breaking volatile top) medium → high, Pattern B (redundant permissions) low → medium, Pattern C (deep imports) medium → low. Each finding now carries a calibration_note evidence field documenting the heuristic basis.
  • M6 — additionalDirectories settings key: added to KNOWN_KEYS so it no longer trips "unknown settings key". New low-severity finding when additionalDirectories.length > 2.
  • M4 — TOK Pattern E: medium-severity finding when activeConfig.claudeMd.estimatedTokens > 10_000 — flags cascades that bleed budget every turn.
  • M2 — TOK Pattern F: low-severity finding for project-local SKILL.md whose frontmatter description exceeds 500 characters (description loads on every turn even when the body does not). Scoped to discovery.files; user/plugin skills out of project scope are not flagged.
  • M1 — MCP tool-count detection: readActiveMcpServers now resolves tool count via cache → node_modules/<pkg>/package.json{toolCount: null, toolCountUnknown: true} fallback. Tool count drives estimateTokens per server.
  • M5 — HKV verbose hook output: new low-severity finding when a referenced hook script contains > 50 console.log / process.stdout.write lines (static heuristic, no execution).
  • F6 — self-audit --check-readme flag: filesystem counts (scanners, commands, agents, hooks, tests, knowledge) compared against README badge values. Helper export: checkReadmeBadges(pluginDir).

Changed

  • TOK severities (F7) — see Added. Posture aggregates that depended on Pattern A being medium will now reflect the higher-impact rating.
  • .gitignore — added unignore rules so tests/fixtures/**/node_modules/ are tracked. Required by the mcp-tool-heavy fixture.

Tests

  • 563 → 586 (+23): F7 table-driven (×6), M6 (×3), M4 (×2), M2 (×2), M1 (×4), M5 (×2), F6 (×4).
  • New fixtures: additional-dirs-many/, additional-dirs-ok/, large-cascade/, small-cascade/, skill-bloated/, skill-tight/, mcp-tool-heavy/ (with mocked node_modules/), hooks-verbose/, hooks-quiet/, readme-desynced/.

Notes

  • result.readmeCheck.passed === true is not required during alpha/beta phases. The real plugin's own check is currently red (scanners 10 vs README 9, tests 31 vs README 543) — reconciliation deferred to Session 5 Step 28 (README sync).
  • [skip-docs] tag used on every commit — README/CLAUDE.md badge counts and architecture text are intentionally fenced off until Session 5.

[5.0.0-alpha.1] - 2026-05-01

Summary

First v5.0.0 alpha — token-economy round, F1-F5. The TOK scanner now consumes readActiveConfig (per-MCP-server hotspots, claudeMd cascade tokens), severity weighting replaces flat finding counts in scoreByArea, and MCP servers no longer estimate at a flat 15 tokens. Pattern D (CA-TOK-004 sonnet-era signature) removed — too noisy, not actionable.

Added

  • 'mcp' kind for estimateTokens (F2): an active MCP server now estimates ≥ 500 tokens (base protocol + schema overhead) instead of the v4 flat 15. Optional {toolCount} raises the estimate to 500 + toolCount * 200 once Step 14 wires tool-count detection.
  • TOK ↔ readActiveConfig integration (F1): the TOK scanner emits one hotspot per active MCP server, sums their tokens into total_estimated_tokens, and exposes result.activeConfig (claudeMd cascade tokens, mcpServerCount, pluginCount, skillCount).
  • scoringVersion: 'v5' field on scoreByArea output for cross-version drift detection.
  • WEIGHTS named export from scanners/lib/severity.mjs (Object.freeze).

Changed

  • BREAKING (intentional, F3): scoreByArea is now severity-weighted. Penalty = Σ count[s] * WEIGHTS[s]; passRate = max(0, 100 - penalty / max(10, findingCount * 4) * 100). Lows no longer crater an area's grade; a single high or critical consumes a large fraction of budget. baseline-all-a fixture remains all-A (no critical/high on that fixture).
  • BREAKING (intentional, F2): MCP server token estimates jump from a flat 15 to ≥ 500. whats-active totals and TOK hotspots will report higher numbers for any project with active MCP servers.
  • BREAKING (intentional, F5): Pattern D / CA-TOK-004 (sonnet-era signature) is no longer emitted. Suppression entries for CA-TOK-004 are now no-ops; downstream tools that filter on the ID should drop it. The catalogue entry was removed from knowledge/opus-4.7-patterns.md and commands/tokens.md.
  • Hotspots contract (F4): the v4 padding loop and take dead-code are gone. Hotspots may now contain fewer than 3 entries for tiny projects (the honest answer); contract still bounds at ≤ 10.

Migration notes

  • CA-TOK-* glob suppression entries continue to suppress 001-003. Existing exact CA-TOK-004 entries are harmless but obsolete — remove them at convenience.
  • Posture/JSON consumers reading areas[*].score will see different values for non-clean configs. Use result.scoringVersion === 'v5' to detect.

Tests

  • 543 → 563 across the alpha.1 commits (+9 severity-weighting/scoring, +4 estimateTokens 'mcp', +1 MCP caller migration, +3 readActiveConfig integration, +2 hotspots-uniqueness, +2 sonnet-era zero-finding).
  • New fixture tests/fixtures/tok-active-config/ — minimal repo with .mcp.json (2 servers), CLAUDE.md, plugin skeleton.

[4.0.0] - 2026-04-19

Summary

Opus 4.7 era upgrade. New TOK scanner detects token-efficiency anti-patterns (cache-breaking volatile content, redundant tool permissions, deep import chains, sonnet-era minimal setups). Token Efficiency joins the quality scorecard as the 8th area. Scanner-agent and verifier-agent migrate from haiku → sonnet per global no-haiku policy.

Added

  • token-hotspots.mjs scanner (CA-TOK-001..004) — 4 patterns aligned with Opus 4.7 token-cost dynamics:
    • CA-TOK-001 cache-breaking volatile content (timestamps/UUIDs in top 30 lines of CLAUDE.md)
    • CA-TOK-002 redundant tool permissions (duplicate or subset overlaps)
    • CA-TOK-003 deep @import chains (>2 hops on the load path)
    • CA-TOK-004 sonnet-era minimal setup (no skills/MCP/hooks/managed/plugins)
  • /config-audit tokens [path] [--global] — ranked hotspot table + per-pattern findings.
  • scanners/token-hotspots-cli.mjs — standalone CLI emitting total_estimated_tokens, hotspots, and per-finding output.
  • Token Efficiency as the 8th quality area in the posture scorecard (now 9 scanners total: CML/SET/HKV/RUL/MCP/IMP/CNF/GAP/TOK).
  • id field on every area in the scorecard payload (token_efficiency, instruction_clarity, etc.) for stable downstream lookup.
  • 13 new TOK scanner tests + 3 CLI tests + posture grade-stability test for token_efficiency.
  • Knowledge refresh: knowledge/opus-4.7-patterns.md, plus 2026-04 deltas (v2.1.83v2.1.111) added to feature-evolution.md, claude-code-capabilities.md, and hook-events-reference.md from research/03-claude-code-changes-config-surfaces.md.

Changed

  • BREAKING (additive surface): Quality areas count 7 → 8. Posture JSON consumers that hard-coded 7 areas must update.
  • BREAKING (model migration): scanner-agent and verifier-agent migrated haikusonnet. Latency and cost trade-offs accepted; deterministic scanner CLIs preferred over agent invocations.
  • Scanner count: 8 → 9 (TOK added).
  • Command count: 16 → 17 (/config-audit tokens added).
  • Version bump: 3.1.04.0.0.

[3.1.0] - 2026-04-14

Summary

New read-only command /config-audit whats-active — shows exactly what Claude Code loads for a given repo, with token estimates.

Added

  • /config-audit whats-active [path] — inventory of active plugins, skills, MCP servers, hooks, and CLAUDE.md cascade for a repo, with source attribution (user/project/plugin) and rough token estimates. Read-only, <2s.
  • scanners/lib/active-config-reader.mjs — pure async helper: readActiveConfig(), detectGitRoot(), walkClaudeMdCascade(), readClaudeJsonProjectSlice() (longest-prefix matching), enumeratePlugins(), enumerateSkills(), readActiveHooks(), readActiveMcpServers(), estimateTokens().
  • scanners/whats-active.mjs — thin CLI shim supporting --json, --output-file, --verbose, --suggest-disables.
  • Optional --suggest-disables flag surfaces deterministic disable candidates (disabled MCP servers, zero-item plugins, unreferenced plugins, orphan skills) and invites an LLM judgment pass in the command.
  • 36 new tests in tests/lib/active-config-reader.test.mjs, plus a rich-repo tmpdir fixture helper.

Changed

  • Version bump: 3.0.13.1.0 (minor, additive feature, no breaking changes).
  • Command count: 15 → 16.

[3.0.1] - 2026-04-04

Summary

Cross-platform fix — scanners, hooks, and lib now work correctly on Windows.

Fixed

  • file-discovery.mjs: depth calculation, agent/command/plugin path matching now use path.sep
  • scan-orchestrator.mjs: fixture-path filtering now uses path.sep
  • post-edit-verify.mjs: rules-dir regex handles both / and \ separators
  • auto-backup-config.mjs: rules-dir detection now uses path.sep
  • import-resolver.mjs: circular import display uses basename(), /tmp fallback replaced with os.tmpdir()
  • string-utils.mjs: normalizePath trailing separator regex handles both / and \

Added

  • 4 cross-platform path tests (total 486 tests)

[3.0.0] - 2026-04-04

Summary

Health redesign — configuration health is now quality-only. Feature utilization removed from grades entirely.

Changed

  • Health = quality only. 7 deterministic scanners (CML, SET, HKV, RUL, MCP, IMP, CNF) determine your grade. Feature Coverage is no longer a graded area.
  • Feature recommendations are opt-in. Unused features shown as "opportunities" via /config-audit feature-gap, grouped by impact (high/medium/explore), backed by Anthropic docs. No more "Feature Coverage: F" for correct minimal setups.
  • Posture output redesigned. Shows Health: {grade} ({score}/100) with 7 quality areas. Removed utilization %, maturity level, segment label.
  • Feature-gap is interactive. Users select recommendations to implement directly — no manual file editing required. Backup created automatically.
  • avgScore bug fixed. Grade letter and displayed score now computed from the same population (quality areas only).

Added

  • generateHealthScorecard() in scoring.mjs — quality-only scorecard
  • opportunitySummary() in feature-gap-scanner.mjs — groups findings by impact tier
  • opportunityCount field in posture JSON output
  • "Official Configuration Guidance" section in knowledge base (Anthropic docs, proven impacts)
  • 21 new tests (total 482 across 27 test files)

Removed

  • S2-PROMPT.md and V2-ANNOUNCEMENT.md — v2 development artifacts
  • Utilization %, maturity level, segment label from posture terminal output and reports
  • Feature Coverage row from area breakdown tables
  • "Top Actions" sourced from GAP findings (replaced by opportunities pointer)

Backward Compatibility

  • JSON output preserves all legacy fields (utilization, maturity, segment) for programmatic consumers
  • Drift baselines unaffected — GAP findings still present in envelopes
  • All existing exports maintained (calculateUtilization, determineMaturityLevel, etc.)

[2.2.0] - 2026-04-04

Summary

UX quality fix — fixture filtering, session path migration, output polish.

Added

  • Automatic test-fixture filtering in scan-orchestrator: findings from tests/, examples/, __tests__/ excluded from grades, stored in env.fixture_findings
  • --include-fixtures CLI flag for scan-orchestrator and posture to override filtering
  • scan-orchestrator.test.mjs — 20 new tests for fixture filtering and isFixturePath
  • Legacy session path detection in cleanup command

Changed

  • Session storage moved from ~/.config-audit/ to ~/.claude/config-audit/ (pathguard compatible)
  • Self-audit grade: F → A (98) after fixture filtering
  • Combined scanner + posture into single Bash call in default audit command
  • Removed "F grade is misleading" disclaimer — grades are now accurate
  • All CLI banners and envelope metadata updated to v2.2.0
  • 461 tests (up from 441), 27 test files (up from 26)

Removed

  • Manual fixture counting instruction in config-audit.md (orchestrator handles it)
  • Redundant isFixtureOrExample filter in self-audit.mjs (promoted to orchestrator)

[2.1.0] - 2026-04-03

Summary

UX redesign — auto-scope detection, zero questions, simplified command surface.

Changed

  • /config-audit now runs full audit automatically (auto-detects scope from git context)
  • Removed mode selection prompts — scope override via /config-audit full|repo|home|current
  • Simplified from 17 to 15 commands (removed quick, report, watch; added help)
  • All CLI banners and envelope metadata updated to v2.1.0

Added

  • /config-audit help command with categorized command reference
  • Auto-scope detection from git context (repo vs home vs full-machine)

Removed

  • /config-audit:quick (merged into default /config-audit)
  • /config-audit:report (merged into analyze output)
  • /config-audit:watch (use /config-audit drift instead)

[2.0.0] - 2026-04-03 (v2.0 Complete)

Summary

Complete rewrite from LLM-only prototype to deterministic scanner-backed configuration intelligence. 7 development sessions (S1-S7), ~15,000 lines of code, 408+ tests.

Highlights

  • 8 deterministic scanners (CML, SET, HKV, RUL, MCP, IMP, CNF, GAP) + PLH standalone
  • Feature gap analysis with 25 dimensions across 4 tiers
  • Auto-fix engine with 9 fix types + backup/rollback
  • Drift detection with baseline comparison
  • Suppression engine (.config-audit-ignore)
  • Self-audit CLI
  • 17 commands, 6 agents, 4 hooks
  • 408+ tests (zero external dependencies)

Added (S7)

  • Example projects: examples/minimal-setup/ and examples/optimal-setup/
  • Demo script: examples/run-demo.sh
  • .config-audit-ignore for self-audit suppressions
  • V2-ANNOUNCEMENT.md
  • DEPRECATED.md for capability-auditor skill

Fixed (S7)

  • hooks.json: SessionStart and Stop timeout 5ms → 5000ms
  • self-audit.mjs: Suppression now enabled (was hardcoded to suppress: false)

Changed (S7)

  • README.md: Complete rewrite for public release
  • CLAUDE.md: Added Suppressions section
  • .gitignore: Added node_modules/ and S*-PROMPT.md

[1.6.0] - 2026-04-03 (v2.0 S6: Unified Reports + Self-Audit + Suppressions)

Added

  • Report generator scanners/lib/report-generator.mjs — unified markdown reports: generatePostureReport(), generateDriftReport(), generatePluginHealthReport(), generateFullReport()
  • Suppression engine scanners/lib/suppression.mjs.config-audit-ignore file support with exact IDs and glob patterns (CA-SET-*), audit trail via suppressed_findings in envelope
  • Self-audit CLI scanners/self-audit.mjs — runs all scanners + plugin health on this plugin: node self-audit.mjs [--json] [--fix], exit codes 0/1/2
  • PostToolUse hook post-edit-verify.mjs — verifies config files after Edit/Write, blocks if new critical/high findings introduced
  • New command: /config-audit:report — generate unified report (posture + optional drift/plugin-health)
  • Test fixture .config-audit-ignore in fixable-project
  • 54 new tests (total 408 across 25 test files)

Changed

  • scan-orchestrator.mjs: suppression integration — applies .config-audit-ignore after all scanners run, --no-suppress flag to disable
  • hooks.json: added PostToolUse event with post-edit-verify

[1.5.0] - 2026-04-03 (v2.0 S5: Drift + Watch + Plugin Health)

Added

  • Diff engine scanners/lib/diff-engine.mjs — diffEnvelopes() comparing baseline vs current, formatDiffReport() for terminal output
  • Baseline manager scanners/lib/baseline.mjs — save/load/list/delete named baselines in ~/.claude/config-audit/baselines/
  • Drift CLI scanners/drift-cli.mjs — standalone: node drift-cli.mjs <path> [--save] [--baseline name] [--json] [--list]
  • Plugin health scanner scanners/plugin-health-scanner.mjs (PLH) — validates plugin structure, frontmatter, cross-plugin conflicts (runs independently, not in scan-orchestrator)
  • 3 new commands:
    • /config-audit:drift — compare current config against saved baseline
    • /config-audit:watch — on-demand drift check with baseline monitoring
    • /config-audit:plugin-health — audit plugin structure and cross-plugin coherence
  • Test fixtures test-plugin/ (valid) and broken-plugin/ (invalid) for plugin health tests
  • 48 new tests (total 354 across 21 test files)

[1.4.0] - 2026-04-03 (v2.0 S4: Fix + Rollback Action Pillar)

Added

  • Fix engine scanners/fix-engine.mjs — deterministic auto-fix for 9 fix types:
    • json-key-add (missing $schema), json-key-remove (deprecated keys), json-key-type-fix (type mismatches, invalid effortLevel), json-restructure (hooks array→object, matcher object→string), frontmatter-rename (globs→paths), file-rename (non-.md→.md)
  • Rollback engine scanners/rollback-engine.mjs — listBackups(), restoreBackup(), deleteBackup() with checksum verification
  • Fix CLI scanners/fix-cli.mjs — standalone: node fix-cli.mjs <path> [--apply] [--json] [--global], dry-run by default
  • Backup lib scanners/lib/backup.mjs — shared backup module with checksums and manifests
  • 2 new commands:
    • /config-audit:fix — scan, plan, backup, apply, verify in one flow
    • /config-audit:rollback — list or restore from backups
  • PreToolUse hook auto-backup-config.mjs — auto-backup config files before Edit/Write
  • Test fixture fixable-project/ — fixture with all 9 fixable issue types
  • 38 new tests (total 306 across 17 test files)

Changed

  • file-discovery.mjs: walkRulesDir now discovers all files (not just .md) for non-.md validation
  • backup-before-change.mjs: refactored to use shared lib/backup.mjs (no logic duplication)
  • hooks.json: added PreToolUse event with auto-backup

[1.3.0] - 2026-04-03 (v2.0 S3: Posture + Feature Gap Commands)

Added

  • Scoring module scanners/lib/scoring.mjs — utilization, maturity (5 levels), segments, area scoring, scorecard generation
  • Posture CLI scanners/posture.mjs — standalone Node.js tool: node posture.mjs <path> [--json] [--global]
  • 2 new commands:
    • /config-audit:posture — quick scorecard with A-F grades, utilization%, maturity level
    • /config-audit:feature-gap — deep gap analysis with prioritized next-best-actions
  • feature-gap-agent — Opus agent for deep analysis, report generation (max 200 lines)
  • Knowledge file gap-closure-templates.md — 11 templates with effort/gain estimates
  • HTML report template templates/feature-gap-report.html — visual report with progress bars, grade badges
  • 64 new tests (total 268 across 14 test files)

Changed

  • Tier weighting: T1 gaps count 3x, T2 count 2x, T3/T4 count 1x in utilization score
  • Maturity is threshold-based: highest level where ALL requirements are met

[1.2.0] - 2026-04-03 (v2.0 S2: Advanced Scanners + Knowledge Base)

Added

  • 4 advanced scanners (zero external deps):
    • mcp-config-validator.mjs (MCP) — server types, trust levels, env vars, unknown fields
    • import-resolver.mjs (IMP) — broken @imports, circular refs, deep chains, tilde paths
    • conflict-detector.mjs (CNF) — settings conflicts, permission contradictions, hook duplicates
    • feature-gap-scanner.mjs (GAP) — 25 feature gaps across 4 tiers (Foundation/Depth/Advanced/Enterprise)
  • Knowledge base — 5 reference documents: capabilities, best practices, anti-patterns, hook events, feature evolution
  • New test fixtures.mcp.json files, @import chains, conflict-project/ fixture
  • 75 new tests (total 204 across 12 test files)

Changed

  • Scan orchestrator runs 8 scanners (was 4)
  • Analyzer agent cross-references scanner findings with knowledge base

[1.1.0] - 2026-04-03 (v2.0 S1: Scanner Foundation)

Added

  • Deterministic scanner infrastructure — 4 Node.js scanners (zero external deps):
    • claude-md-linter.mjs (CML) — CLAUDE.md structure, length, sections, @imports, duplicates
    • settings-validator.mjs (SET) — settings.json schema, unknown/deprecated keys, type checks
    • hook-validator.mjs (HKV) — hooks.json format, script existence, event validity, timeouts
    • rules-validator.mjs (RUL) — .claude/rules/ glob matching, orphan detection, deprecated fields
  • Scanner lib — 5 shared modules: severity, output, file-discovery, yaml-parser, string-utils
  • Scan orchestratorscan-orchestrator.mjs runs all scanners, outputs JSON envelope
  • Test infrastructure — 129 tests across 8 test files using node:test (zero deps)
  • Test fixtures — 4 fixture projects (healthy, broken, empty, minimal)
  • Finding ID format: CA-{SCANNER}-{NNN} (e.g. CA-CML-001)

Fixed

  • Agent model mismatches: scanner→haiku, analyzer→sonnet, planner→opus, implementer→sonnet, verifier→haiku

Changed

  • CLAUDE.md rewritten in English for public release readiness

[1.0.0] - 2026-02-11

Added

  • Cross-platform support (macOS, Linux, Windows)

Fixed

  • stop-session-reminder.mjs: Use path.basename/path.dirname instead of hardcoded / split
  • backup-before-change.mjs: Handle both / and \ path separators in safe filename generation

Removed

  • "Windows: hooks are 100% bash" from known gaps (was incorrect — all hooks are Node.js)

[0.7.0] - 2026-02-07

Note

Version reset from 1.2.0 to reflect actual maturity. Previous version was inflated — this plugin has never been externally tested.

What exists today

  • 6 specialized agents (scanner, analyzer, interviewer, planner, implementer, verifier)
  • Full machine-wide Claude Code configuration discovery
  • Scope selection (current project, repo, home, full machine)
  • Inheritance hierarchy mapping and conflict detection
  • Mandatory backups before any changes
  • Rollback support
  • Syntax validation for all configuration files
  • Quick audit-only mode
  • Full optimization workflow with HITL checkpoints

Known gaps

  • Testing: no automated tests
  • Onboarding: never verified that a new user can install and use from scratch
  • External verification: nobody else has ever used this