ktg-plugin-marketplace/plugins/config-audit/CHANGELOG.md
Kjell Tore Guttormsen 6cfca82885 fix(config-audit): expose hotspot.path for --accurate-tokens calibration + SC-6b PASS
The v5.0.0-rc.1 N5 implementation looked up hotspot.path in
calibrateAgainstApi() but token-hotspots.mjs only emitted hotspot.source —
calibration silently produced 0 actual_tokens because every iteration hit
the `if (!hotspot?.path) continue` guard.

Fix: file-backed hotspots now expose `path: h.absPath` in the JSON output.
MCP-server hotspots intentionally leave path unset — their tokens are
runtime tool-schema (formula-based: 500 + toolCount × 200), not file
content readable by count_tokens.

SC-6b release-gate verified against tests/fixtures/marketplace-large:
- Actual (count_tokens, claude-opus-4-7): 589 tokens for CLAUDE.md
- Estimated (4-bytes/token byte heuristic): 594 tokens
- Delta: -5 tokens / -0.85% — well within ±5% gate. PASS.

CHANGELOG: documented the fix + SC-6b result inline under [5.0.0].

All 635 tests still green. No estimateTokens tuning required for v5.0.0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:45:56 +02:00

494 lines
41 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [5.0.0] - 2026-05-01
### Summary
Reality-based token-optimization release. v4.0.0 shipped Opus-4.7 token surfaces aligned to a Sonnet-era cost model; v5.0.0 rebuilds the foundations against verified Opus-4.7 cost dynamics. Three pillars: honest token estimation (severity-weighted scoring, MCP estimates 15 → 500+, optional `--accurate-tokens` API calibration), new structural scanners (cache-prefix stability, dead tool grants, plugin collisions), and new diagnostic surfaces (`/config-audit manifest`, `/config-audit tokens` extended, knowledge-base rensing aligned to Opus 4.7 cache dynamics).
Consolidated from `5.0.0-alpha.1` (F1-F5 token-economy round), `5.0.0-alpha.2` (M1, M2, M4-M6, F6, F7 structural gaps + README self-audit), `5.0.0-beta.1` (N1-N4, N6 new scanners + manifest CLI), and `5.0.0-rc.1` (M7, M8 knowledge rensing + N5 tokenizer calibration).
### Added
- **3 new scanners (9 → 12 deterministic):**
- **CPS — Cache-Prefix Stability** (`CA-CPS-NNN`): volatile content in lines 31150 of CLAUDE.md cascade, beyond TOK Pattern A's top-30 window. Volatile-pattern set extends Pattern A with shell-exec lines (`!` prefix) and `${VAR}` substitutions.
- **DIS — Disabled-In-Schema** (`CA-DIS-NNN`): tools listed in BOTH `permissions.deny` AND `permissions.allow`. Tool identity uses bare name (`Bash(npm:*)` and `Bash` are the same tool). Severity low.
- **COL — Cross-Plugin Skill Collision** (`CA-COL-001`): plugin-vs-plugin same skill name → low; user-vs-plugin → medium. `details.namespaces` payload identifies conflicting sources.
- **TOK extensions:**
- **CA-TOK-005 MCP tool-schema budget:** per-server tiered finding (< 20 none, 2049 low, 5099 medium, 100+ high; null low + "tool count unknown"). Scoped to project-local `.mcp.json`.
- **Pattern E — Oversized cascade:** medium when `activeConfig.claudeMd.estimatedTokens > 10_000`.
- **Pattern F — Bloated SKILL.md description:** low when frontmatter `description > 500 chars` (loads every turn). Scoped to `discovery.files`.
- **`/config-audit manifest`** + `scanners/manifest.mjs` CLI — single ranked table of every system-prompt token source (CLAUDE.md cascade, plugins, skills, MCP servers, hooks) sorted DESC by `estimated_tokens`. CLAUDE.md per-file tokens distributed proportional to bytes.
- **`--accurate-tokens` flag** on `token-hotspots-cli.mjs` (N5): when `ANTHROPIC_API_KEY` is set, calls Anthropic's `count_tokens` for the top 3 hotspots and populates `output.calibration = { actual_tokens, source: 'count_tokens_api', sampled_hotspots: 3 }`. When absent: `calibration = { skipped: 'no-api-key' }` plus stderr warning.
- **`scanners/lib/tokenizer-api.mjs`** — `count_tokens` wrapper. 5s AbortController timeout. Exponential backoff on 429 (3 retries: 1s/2s/4s). API key masked to `${key.slice(0,8)}...` in every error; HTTP body never included in errors (it may echo the key on auth failures). `maskKey()` exported.
- **`--with-telemetry-recipe` flag** on the same CLI (M7): emits `telemetry_recipe_path` field pointing to `knowledge/cache-telemetry-recipe.md`.
- **`knowledge/cache-telemetry-recipe.md`** (M7): manual `jq` recipe summing `cache_read_input_tokens` + `cache_creation_input_tokens` per turn from session transcripts. Hit-rate interpretation table.
- **`'mcp'` kind on `estimateTokens`** (F2): active MCP servers estimate ≥ 500 tokens (base + schema overhead) instead of v4's flat 15. Optional `{toolCount}` raises to `500 + toolCount × 200`.
- **MCP tool-count detection** (M1): `readActiveMcpServers` resolves count via cache → `node_modules/<pkg>/package.json``{toolCount: null, toolCountUnknown: true}` fallback.
- **`additionalDirectories` settings key** (M6): added to `KNOWN_KEYS`; new low-severity finding when length > 2.
- **HKV verbose hook output** (M5): low-severity finding when referenced hook script contains > 50 `console.log`/`process.stdout.write` lines (static, no execution).
- **`self-audit --check-readme` flag** (F6): filesystem counts compared against README badges. Helper `checkReadmeBadges(pluginDir)`. Step 28 of v5 plan reconciled all badges.
- **`scoringVersion: 'v5'`** field on `scoreByArea` output for cross-version drift detection.
- **`WEIGHTS`** named export from `scanners/lib/severity.mjs` (frozen).
- **`details` field on findings** (`output.mjs:finding()`): optional structured payload for scanner-specific data (used by COL).
- **Plugin Hygiene** as 10th quality area (from COL). Posture JSON now reports 10 areas.
- **TOK-readActiveConfig integration** (F1): one hotspot per active MCP server; `result.activeConfig` summary (claudeMd cascade tokens, mcpServerCount, pluginCount, skillCount); try/catch fallback when scope-limited.
### Changed
- **F3 — `scoreByArea` is severity-weighted.** Penalty = `Σ count[s] × WEIGHTS[s]`; `passRate = max(0, 100 penalty / max(10, findingCount × 4) × 100)`. Lows no longer crater an area's grade; criticals/highs do. `baseline-all-a` fixture remains all-A (no critical/high present).
- **F7 — TOK pattern severities recalibrated** for tokens-per-turn impact: Pattern A `medium → high`, Pattern B `low → medium`, Pattern C `medium → low`. Each finding carries a `calibration_note` evidence field documenting the heuristic basis.
- **`scoreByArea` deduplicates by area name** (N3 prep): TOK + CPS share "Token Efficiency"; SET + DIS share "Settings". Combined row with merged finding counts.
- **M8 — knowledge rensing:** replaced "Keep CLAUDE.md under 200 lines" in `knowledge/configuration-best-practices.md` with cache-stability guidance (first 30 lines stable, volatile content below the cache threshold). Footnote explains the 200-line rule was a Sonnet-era adherence heuristic; Opus 4.7 uses prompt-cache structure as the dominant cost lever. Cross-references `knowledge/opus-4.7-patterns.md`.
- **`commands/tokens.md` next-steps:** documents `--with-telemetry-recipe` as the cache-verification path.
- **Scanner count: 9 → 12.** Command count: 17 → 18. Knowledge: 7 → 8. Quality areas: 8 → 10.
- **`.gitignore`** — unignore rules for `tests/fixtures/**/node_modules/` so the `mcp-tool-heavy` fixture stays under version control.
### Removed
- **F4 — TOK hotspot padding loop and `take` dead-code.** Hotspots may now contain fewer than 3 entries for tiny projects (the honest answer); contract still bounds at ≤ 10.
- **F5 — Pattern D / `CA-TOK-004` (sonnet-era signature).** Catalogue entry removed from `knowledge/opus-4.7-patterns.md` and `commands/tokens.md`. Suppression entries for `CA-TOK-004` are now no-ops.
### Breaking changes
- **F2 — MCP token estimates jump from flat 15 to ≥ 500.** Token Efficiency grades for projects with MCP servers may shift. `whats-active` totals report higher numbers. Documented in `commands/posture.md` next-steps.
- **F3 — `scoreByArea` is severity-weighted.** Posture JSON consumers reading `areas[*].score` will see different values for non-clean configs. Use `result.scoringVersion === 'v5'` to detect the change. Drift comparisons across v4↔v5 baselines may show artificial deltas — re-baseline after upgrade.
- **F5 — Pattern D / `CA-TOK-004` no longer emitted.** Existing exact `CA-TOK-004` suppression entries are harmless but obsolete.
- **N1 suppression backward-compat — `CA-TOK-*` glob now also matches `CA-TOK-005`.** To preserve prior behavior of suppressing only patterns A/B/C, replace the glob with explicit IDs:
```
CA-TOK-001
CA-TOK-002
CA-TOK-003
```
A one-time runtime warning for this case is a v5.0.1 candidate.
- **Posture areas count: 9 → 10** (Plugin Hygiene from COL). Consumers hard-coding 9 must update.
### Migration notes
- `CA-TOK-*` glob suppressions: explicit-ID list recommended if CA-TOK-005 should not be suppressed.
- `CA-TOK-004` exact-ID suppression entries: safe to remove.
- Drift baselines created against v4 should be re-saved post-upgrade to avoid artificial F3 weighting deltas.
- Posture JSON consumers must update any hardcoded `areas.length === 8` or `=== 9` assertions to `>= 10`.
### Tests
- 543 → 635 (+92): F1-F7 (alpha rounds = +43), N1-N4 + N6 (beta = +39), M7 + M8 + N5 (rc = +10). 36 test files (12 lib + 23 scanner + 1 hook).
- New fixtures: `tok-active-config/`, `additional-dirs-many/`, `additional-dirs-ok/`, `large-cascade/`, `small-cascade/`, `skill-bloated/`, `skill-tight/`, `mcp-tool-heavy/` (with mocked `node_modules/`), `hooks-verbose/`, `hooks-quiet/`, `readme-desynced/`, `mcp-budget/{14,25,60,120,unknown}-tools/`, `volatile-mid-section/{volatile-line-60,volatile-line-200}/`, `denied-tools-in-schema/`, `collision-plugins/fake-home/` (plugin-a + plugin-b + plugin-c + user-level review skill).
- New test files: `tests/scanners/manifest.test.mjs`, `tests/scanners/cache-prefix.test.mjs`, `tests/scanners/disabled-in-schema.test.mjs`, `tests/scanners/collision.test.mjs`, `tests/scanners/accurate-tokens.test.mjs`.
### Notes
- **`mock.method` against ESM module exports does not work** (Node 18+ ESM read-only export bindings). v5 tests use `globalThis.fetch` mocking for `--accurate-tokens` instead — equivalent coverage at the actual external-dependency boundary.
- **Plugin-vs-built-in collision detection is intentionally not implemented.** Step 22a research spike (`docs/v5-namespace-research.md`, gitignored) could not verify Claude Code's resolution behavior when a plugin command shares a name with a built-in. Treated as info-only; v5.0.1 candidate.
- **README/CLAUDE.md badge reconciliation** done in Step 28 (this release). `self-audit --check-readme` PASSES against the filesystem. Test count counter switched from file-count to test-case count via subprocess `node --test` parse.
- **`hotspot.path` exposed on file-backed hotspots** (Step 30 fix). The rc.1 `--accurate-tokens` implementation looked up `hotspot.path` but the scanner only emitted `source`. File-backed hotspots now carry `path` (absolute path); MCP-server hotspots leave it unset (they are virtual entries representing runtime tool-schema cost, not file content).
### SC-6b release-gate result (verified 2026-05-01)
- **PASS — 0.85% under-estimation against real `count_tokens` API.**
- Fixture: `tests/fixtures/marketplace-large/`. Top-3 hotspots = 1 file-backed (`CLAUDE.md`) + 2 MCP virtuals. MCP entries skipped per design (no readable content; their tokens are formula-based at 500 + toolCount × 200).
- `CLAUDE.md` actual: 589 tokens (Anthropic `count_tokens`, `claude-opus-4-7`). Estimated: 594 tokens (byte heuristic at 4 bytes/token via `estimateTokens`). Delta: **5 tokens, 0.85%** — well within the ±5% gate.
- No tuning of `estimateTokens` heuristic required for v5.0.0.
## [5.0.0-rc.1] - 2026-05-01
### Summary
Release candidate for v5.0.0 — knowledge rensing and tokenizer calibration. Three deliverables: M8 (Sonnet-era → Opus 4.7 best-practices rewrite), M7 (cache-telemetry recipe in `knowledge/` plus an opt-in CLI flag), and N5 (`--accurate-tokens` API calibration via Anthropic's `count_tokens` endpoint).
### Added
- **N5 — `--accurate-tokens` flag** on `scanners/token-hotspots-cli.mjs`. When `ANTHROPIC_API_KEY` is set, the CLI calls Anthropic's `count_tokens` endpoint for the top 3 hotspots and populates `output.calibration = { actual_tokens, source: 'count_tokens_api', sampled_hotspots: 3 }`. When the key is absent, `calibration = { skipped: 'no-api-key' }` and a stderr warning is emitted. Designed for the manual SC-6b release-gate verification, not routine use.
- **`scanners/lib/tokenizer-api.mjs`** — wrapper around `count_tokens` with a 5-second AbortController timeout, exponential-backoff retry on HTTP 429 (max 3 retries: 1s, 2s, 4s), and required headers (`x-api-key`, `anthropic-version: 2023-06-01`, `content-type`). API key is masked to `${key.slice(0,8)}...` in every error message and every thrown error; non-429 HTTP errors throw status code only — response body is never included (it may echo the key on auth failures). `maskKey()` is exported for callers that need safe logging.
- **M7 — `knowledge/cache-telemetry-recipe.md`** (new). Manual `jq` recipe for verifying prompt-cache hit rate from Claude Code session transcripts (`~/.claude/projects/<slug>/*.jsonl`). Sums `cache_read_input_tokens` and `cache_creation_input_tokens` per turn and reports a hit-rate ratio. Recipe-form (not bundled scanner) keeps the project's "no transcript-parsing as core feature" non-goal intact while giving users a runtime escape hatch.
- **M7 — `--with-telemetry-recipe` flag** on the same CLI. When passed, emits `telemetry_recipe_path` in the JSON output pointing to the recipe file. Without the flag, output is unchanged. Committed as a default deliverable, opt-in at invocation time.
### Changed
- **M8 — knowledge-base rensing:** replaced the "Keep CLAUDE.md under 200 lines" rule in `knowledge/configuration-best-practices.md` with cache-stability guidance (first 30 lines stable, volatile content below the cache threshold). Added a footnote that the 200-line rule was a Sonnet-era adherence heuristic; Opus 4.7 uses prompt-cache structure as the dominant cost lever. Cross-references `knowledge/opus-4.7-patterns.md`.
- **`commands/tokens.md` next-steps:** documents `--with-telemetry-recipe` as the cache-verification path after a structural fix.
### Tests
- 625 → 635 (+10): `--with-telemetry-recipe` (×2), tokenizer-api unit tests (×6 — masking, body-leak protection, AbortController signal, 429 retry, header set, fetch mock happy path), `--accurate-tokens` no-key subprocess test (×1), absent-flag negative test (×1).
- New file: `tests/scanners/accurate-tokens.test.mjs`. No new fixtures (re-uses `marketplace-large`).
### Notes
- **SC-6b release gate is NOT closed by these commits.** Step 26's tests use mocked `globalThis.fetch` to verify the integration contract; ±5% accuracy against real `count_tokens` requires a live API key and must be verified manually before tagging v5.0.0 in Session 5.
- The plan's specified `mock.method(tokenizerApi, 'callCountTokensApi', ...)` pattern collides with ESM read-only export bindings in Node 18+. Tests mock at the `globalThis.fetch` boundary instead — equivalent coverage, no module-export rebinding required.
- README/CLAUDE.md badge counts and `plugin.json` version still target v4.0.0; Step 28+29 will sync those during the release wrap.
- `[skip-docs]` tag on the N5 feat commit; M7 and M8 are `docs(...)` commits and don't need it.
## [5.0.0-beta.1] - 2026-05-01
### Summary
First v5.0.0 beta — new scanners. Five new finding sources land: MCP tool-schema budget (CA-TOK-005), system-prompt manifest CLI/command (`/config-audit manifest`), cache-prefix stability (CPS), disabled-tools-still-in-schema (DIS), and cross-plugin/user-vs-plugin skill collision (COL/CA-COL-001). Plugin Hygiene becomes a 10th area-scorecard column.
### Added
- **N1 — `CA-TOK-005` MCP tool-schema budget:** per-server tiered finding inside the TOK scanner. Thresholds — `< 20` no finding, `2049` low, `5099` medium, `100+` high; `null` (manifest unparseable) low + "tool count unknown" message. Scoped to project-local `.mcp.json` to keep `/config-audit <path>` actionable. Recommendation links to the Step 25 cache-telemetry recipe.
- **N2 — `/config-audit manifest`:** new slash command + `scanners/manifest.mjs` CLI. Renders a single ranked table of every token source (CLAUDE.md cascade, plugins, skills, MCP servers, hooks) sorted DESC by `estimated_tokens`. Reuses `readActiveConfig`; CLAUDE.md per-file tokens are distributed proportional to bytes.
- **N3 — CPS scanner (`CA-CPS-NNN`):** Cache-Prefix Stability Analyzer. Walks the CLAUDE.md cascade and flags volatile content between lines 31 and 150 — beyond TOK Pattern A's top-30 territory. Volatile-pattern set extends Pattern A with shell-exec lines (`!` prefix) and `${VAR}` substitutions. Severity medium per finding. Skips lines 130 (Pattern A's range).
- **N4 — DIS scanner (`CA-DIS-NNN`):** Disabled-In-Schema Detector. Detects tools that appear in BOTH `permissions.deny` and `permissions.allow` within the same `settings.json`. The deny list wins, so allow entries are dead config but still load every turn. Tool identity is the bare name (everything before `(`); `Bash(npm:*)` and `Bash` are treated as the same tool. Severity low.
- **N6 — COL scanner (`CA-COL-001`):** Cross-Plugin Skill Collision detector. Plugin-vs-plugin same skill name → low. User-vs-plugin same skill name → medium. Findings carry `details.namespaces` array with `{source, name, path}` for every conflicting source.
- **`details` field on findings:** `output.mjs:finding()` helper now passes through optional `details` for scanner-specific structured payloads (used by COL).
- **"Plugin Hygiene" area** (10th in scorecard): COL contributes here. Posture JSON now reports 10 areas instead of 9.
### Changed
- **`scoreByArea` deduplicates by area name:** when multiple scanners share an area (TOK + CPS → "Token Efficiency", SET + DIS → "Settings"), they produce one combined row with merged finding counts. Existing 9-area contract preserved for non-Plugin-Hygiene areas.
### Known breaking changes
- **Suppression backward-compat — `CA-TOK-*` glob now also matches `CA-TOK-005`.** Existing `.config-audit-ignore` entries that suppress TOK findings via the `CA-TOK-*` glob will silently include CA-TOK-005 (MCP budget). To preserve the prior behavior of suppressing only patterns A/B/C, replace the glob with explicit IDs:
```
CA-TOK-001
CA-TOK-002
CA-TOK-003
```
A one-time runtime warning for this case is out of scope for v5.0.0 — it is a candidate for v5.0.1.
- **Plugin-vs-built-in collision is intentionally not implemented.** The Step 22a research spike could not verify Claude Code's resolution behavior when a plugin command shares a name with a built-in (`/help`, `/clear`, `/init`, `/review`, `/config`, `/cost`, `/security-review`). Treated as info-only in this release; a follow-up v5.0.1 ticket may add an opt-in check.
### Tests
- 586 → 625 (+39): N1 (×7), N2 (×11), N3 (×7), N4 (×6), N6 (×8).
- New fixtures: `mcp-budget/{14,25,60,120,unknown}-tools/`, `volatile-mid-section/{volatile-line-60,volatile-line-200}/`, `denied-tools-in-schema/`, `collision-plugins/fake-home/` (plugin-a + plugin-b + plugin-c + user-level review skill).
### Notes
- `[skip-docs]` tag used on every feat commit — README/CLAUDE.md badge counts (scanner count, command count, test count) and the architecture sections are intentionally fenced off until Session 5 (Step 28). This keeps the v5 plan's session boundaries clean even when the Forgejo `pre-commit-docs-gate` hook would otherwise block these commits.
## [5.0.0-alpha.2] - 2026-05-01
### Summary
Second v5.0.0 alpha — structural gaps + README self-audit. TOK pattern severities recalibrated for tokens/turn impact (F7), three new findings cover settings/skills/cascade structure (M2, M4, M6), MCP tool-count detection wired (M1), HKV gains a verbose-output check (M5), and self-audit grows a `--check-readme` flag (F6).
### Added
- **F7 — TOK severity recalibration:** Pattern A (cache-breaking volatile top) `medium → high`, Pattern B (redundant permissions) `low → medium`, Pattern C (deep imports) `medium → low`. Each finding now carries a `calibration_note` evidence field documenting the heuristic basis.
- **M6 — `additionalDirectories` settings key:** added to `KNOWN_KEYS` so it no longer trips "unknown settings key". New low-severity finding when `additionalDirectories.length > 2`.
- **M4 — TOK Pattern E:** medium-severity finding when `activeConfig.claudeMd.estimatedTokens > 10_000` — flags cascades that bleed budget every turn.
- **M2 — TOK Pattern F:** low-severity finding for project-local `SKILL.md` whose frontmatter `description` exceeds 500 characters (description loads on every turn even when the body does not). Scoped to `discovery.files`; user/plugin skills out of project scope are not flagged.
- **M1 — MCP tool-count detection:** `readActiveMcpServers` now resolves tool count via cache → `node_modules/<pkg>/package.json` → `{toolCount: null, toolCountUnknown: true}` fallback. Tool count drives `estimateTokens` per server.
- **M5 — HKV verbose hook output:** new low-severity finding when a referenced hook script contains > 50 `console.log` / `process.stdout.write` lines (static heuristic, no execution).
- **F6 — `self-audit --check-readme` flag:** filesystem counts (scanners, commands, agents, hooks, tests, knowledge) compared against README badge values. Helper export: `checkReadmeBadges(pluginDir)`.
### Changed
- **TOK severities** (F7) — see Added. Posture aggregates that depended on Pattern A being `medium` will now reflect the higher-impact rating.
- **`.gitignore`** — added unignore rules so `tests/fixtures/**/node_modules/` are tracked. Required by the `mcp-tool-heavy` fixture.
### Tests
- 563 → 586 (+23): F7 table-driven (×6), M6 (×3), M4 (×2), M2 (×2), M1 (×4), M5 (×2), F6 (×4).
- New fixtures: `additional-dirs-many/`, `additional-dirs-ok/`, `large-cascade/`, `small-cascade/`, `skill-bloated/`, `skill-tight/`, `mcp-tool-heavy/` (with mocked `node_modules/`), `hooks-verbose/`, `hooks-quiet/`, `readme-desynced/`.
### Notes
- `result.readmeCheck.passed === true` is **not** required during alpha/beta phases. The real plugin's own check is currently red (`scanners` 10 vs README 9, `tests` 31 vs README 543) — reconciliation deferred to Session 5 Step 28 (README sync).
- `[skip-docs]` tag used on every commit — README/CLAUDE.md badge counts and architecture text are intentionally fenced off until Session 5.
## [5.0.0-alpha.1] - 2026-05-01
### Summary
First v5.0.0 alpha — token-economy round, F1-F5. The TOK scanner now consumes `readActiveConfig` (per-MCP-server hotspots, claudeMd cascade tokens), severity weighting replaces flat finding counts in `scoreByArea`, and MCP servers no longer estimate at a flat 15 tokens. Pattern D (CA-TOK-004 sonnet-era signature) removed — too noisy, not actionable.
### Added
- **`'mcp'` kind for `estimateTokens`** (F2): an active MCP server now estimates ≥ 500 tokens (base protocol + schema overhead) instead of the v4 flat 15. Optional `{toolCount}` raises the estimate to `500 + toolCount * 200` once Step 14 wires tool-count detection.
- **TOK ↔ readActiveConfig integration** (F1): the TOK scanner emits one hotspot per active MCP server, sums their tokens into `total_estimated_tokens`, and exposes `result.activeConfig` (claudeMd cascade tokens, mcpServerCount, pluginCount, skillCount).
- **`scoringVersion: 'v5'`** field on `scoreByArea` output for cross-version drift detection.
- **`WEIGHTS`** named export from `scanners/lib/severity.mjs` (`Object.freeze`).
### Changed
- **BREAKING (intentional, F3):** `scoreByArea` is now severity-weighted. Penalty = `Σ count[s] * WEIGHTS[s]`; `passRate = max(0, 100 - penalty / max(10, findingCount * 4) * 100)`. Lows no longer crater an area's grade; a single high or critical consumes a large fraction of budget. `baseline-all-a` fixture remains all-A (no critical/high on that fixture).
- **BREAKING (intentional, F2):** MCP server token estimates jump from a flat 15 to ≥ 500. `whats-active` totals and TOK hotspots will report higher numbers for any project with active MCP servers.
- **BREAKING (intentional, F5):** Pattern D / `CA-TOK-004` (sonnet-era signature) is no longer emitted. Suppression entries for `CA-TOK-004` are now no-ops; downstream tools that filter on the ID should drop it. The catalogue entry was removed from `knowledge/opus-4.7-patterns.md` and `commands/tokens.md`.
- **Hotspots contract (F4):** the v4 padding loop and `take` dead-code are gone. Hotspots may now contain fewer than 3 entries for tiny projects (the honest answer); contract still bounds at ≤ 10.
### Migration notes
- `CA-TOK-*` glob suppression entries continue to suppress 001-003. Existing exact `CA-TOK-004` entries are harmless but obsolete — remove them at convenience.
- Posture/JSON consumers reading `areas[*].score` will see different values for non-clean configs. Use `result.scoringVersion === 'v5'` to detect.
### Tests
- 543 → 563 across the alpha.1 commits (+9 severity-weighting/scoring, +4 estimateTokens 'mcp', +1 MCP caller migration, +3 readActiveConfig integration, +2 hotspots-uniqueness, +2 sonnet-era zero-finding).
- New fixture `tests/fixtures/tok-active-config/` — minimal repo with `.mcp.json` (2 servers), `CLAUDE.md`, plugin skeleton.
## [4.0.0] - 2026-04-19
### Summary
Opus 4.7 era upgrade. New TOK scanner detects token-efficiency anti-patterns (cache-breaking volatile content, redundant tool permissions, deep import chains, sonnet-era minimal setups). Token Efficiency joins the quality scorecard as the 8th area. Scanner-agent and verifier-agent migrate from haiku → sonnet per global no-haiku policy.
### Added
- **`token-hotspots.mjs`** scanner (CA-TOK-001..004) — 4 patterns aligned with Opus 4.7 token-cost dynamics:
- CA-TOK-001 cache-breaking volatile content (timestamps/UUIDs in top 30 lines of CLAUDE.md)
- CA-TOK-002 redundant tool permissions (duplicate or subset overlaps)
- CA-TOK-003 deep @import chains (>2 hops on the load path)
- CA-TOK-004 sonnet-era minimal setup (no skills/MCP/hooks/managed/plugins)
- **`/config-audit tokens [path] [--global]`** — ranked hotspot table + per-pattern findings.
- **`scanners/token-hotspots-cli.mjs`** — standalone CLI emitting `total_estimated_tokens`, `hotspots`, and per-finding output.
- **Token Efficiency** as the 8th quality area in the posture scorecard (now 9 scanners total: CML/SET/HKV/RUL/MCP/IMP/CNF/GAP/TOK).
- `id` field on every area in the scorecard payload (`token_efficiency`, `instruction_clarity`, etc.) for stable downstream lookup.
- 13 new TOK scanner tests + 3 CLI tests + posture grade-stability test for `token_efficiency`.
- Knowledge refresh: `knowledge/opus-4.7-patterns.md`, plus 2026-04 deltas (v2.1.83v2.1.111) added to `feature-evolution.md`, `claude-code-capabilities.md`, and `hook-events-reference.md` from `research/03-claude-code-changes-config-surfaces.md`.
### Changed
- **BREAKING (additive surface):** Quality areas count 7 → 8. Posture JSON consumers that hard-coded 7 areas must update.
- **BREAKING (model migration):** `scanner-agent` and `verifier-agent` migrated `haiku` → `sonnet`. Latency and cost trade-offs accepted; deterministic scanner CLIs preferred over agent invocations.
- Scanner count: 8 → 9 (TOK added).
- Command count: 16 → 17 (`/config-audit tokens` added).
- Version bump: `3.1.0` → `4.0.0`.
## [3.1.0] - 2026-04-14
### Summary
New read-only command `/config-audit whats-active` — shows exactly what Claude Code loads for a given repo, with token estimates.
### Added
- **`/config-audit whats-active [path]`** — inventory of active plugins, skills, MCP servers, hooks, and CLAUDE.md cascade for a repo, with source attribution (user/project/plugin) and rough token estimates. Read-only, <2s.
- `scanners/lib/active-config-reader.mjs` — pure async helper: `readActiveConfig()`, `detectGitRoot()`, `walkClaudeMdCascade()`, `readClaudeJsonProjectSlice()` (longest-prefix matching), `enumeratePlugins()`, `enumerateSkills()`, `readActiveHooks()`, `readActiveMcpServers()`, `estimateTokens()`.
- `scanners/whats-active.mjs` — thin CLI shim supporting `--json`, `--output-file`, `--verbose`, `--suggest-disables`.
- Optional `--suggest-disables` flag surfaces deterministic disable candidates (disabled MCP servers, zero-item plugins, unreferenced plugins, orphan skills) and invites an LLM judgment pass in the command.
- 36 new tests in `tests/lib/active-config-reader.test.mjs`, plus a `rich-repo` tmpdir fixture helper.
### Changed
- Version bump: `3.0.1` → `3.1.0` (minor, additive feature, no breaking changes).
- Command count: 15 → 16.
## [3.0.1] - 2026-04-04
### Summary
Cross-platform fix — scanners, hooks, and lib now work correctly on Windows.
### Fixed
- `file-discovery.mjs`: depth calculation, agent/command/plugin path matching now use `path.sep`
- `scan-orchestrator.mjs`: fixture-path filtering now uses `path.sep`
- `post-edit-verify.mjs`: rules-dir regex handles both `/` and `\` separators
- `auto-backup-config.mjs`: rules-dir detection now uses `path.sep`
- `import-resolver.mjs`: circular import display uses `basename()`, `/tmp` fallback replaced with `os.tmpdir()`
- `string-utils.mjs`: `normalizePath` trailing separator regex handles both `/` and `\`
### Added
- 4 cross-platform path tests (total 486 tests)
## [3.0.0] - 2026-04-04
### Summary
Health redesign — configuration health is now quality-only. Feature utilization removed from grades entirely.
### Changed
- **Health = quality only.** 7 deterministic scanners (CML, SET, HKV, RUL, MCP, IMP, CNF) determine your grade. Feature Coverage is no longer a graded area.
- **Feature recommendations are opt-in.** Unused features shown as "opportunities" via `/config-audit feature-gap`, grouped by impact (high/medium/explore), backed by Anthropic docs. No more "Feature Coverage: F" for correct minimal setups.
- **Posture output redesigned.** Shows `Health: {grade} ({score}/100)` with 7 quality areas. Removed utilization %, maturity level, segment label.
- **Feature-gap is interactive.** Users select recommendations to implement directly — no manual file editing required. Backup created automatically.
- **avgScore bug fixed.** Grade letter and displayed score now computed from the same population (quality areas only).
### Added
- `generateHealthScorecard()` in scoring.mjs — quality-only scorecard
- `opportunitySummary()` in feature-gap-scanner.mjs — groups findings by impact tier
- `opportunityCount` field in posture JSON output
- "Official Configuration Guidance" section in knowledge base (Anthropic docs, proven impacts)
- 21 new tests (total 482 across 27 test files)
### Removed
- `S2-PROMPT.md` and `V2-ANNOUNCEMENT.md` — v2 development artifacts
- Utilization %, maturity level, segment label from posture terminal output and reports
- Feature Coverage row from area breakdown tables
- "Top Actions" sourced from GAP findings (replaced by opportunities pointer)
### Backward Compatibility
- JSON output preserves all legacy fields (utilization, maturity, segment) for programmatic consumers
- Drift baselines unaffected — GAP findings still present in envelopes
- All existing exports maintained (calculateUtilization, determineMaturityLevel, etc.)
## [2.2.0] - 2026-04-04
### Summary
UX quality fix — fixture filtering, session path migration, output polish.
### Added
- Automatic test-fixture filtering in scan-orchestrator: findings from `tests/`, `examples/`, `__tests__/` excluded from grades, stored in `env.fixture_findings`
- `--include-fixtures` CLI flag for scan-orchestrator and posture to override filtering
- `scan-orchestrator.test.mjs` — 20 new tests for fixture filtering and `isFixturePath`
- Legacy session path detection in cleanup command
### Changed
- Session storage moved from `~/.config-audit/` to `~/.claude/config-audit/` (pathguard compatible)
- Self-audit grade: F → A (98) after fixture filtering
- Combined scanner + posture into single Bash call in default audit command
- Removed "F grade is misleading" disclaimer — grades are now accurate
- All CLI banners and envelope metadata updated to v2.2.0
- 461 tests (up from 441), 27 test files (up from 26)
### Removed
- Manual fixture counting instruction in `config-audit.md` (orchestrator handles it)
- Redundant `isFixtureOrExample` filter in `self-audit.mjs` (promoted to orchestrator)
## [2.1.0] - 2026-04-03
### Summary
UX redesign — auto-scope detection, zero questions, simplified command surface.
### Changed
- `/config-audit` now runs full audit automatically (auto-detects scope from git context)
- Removed mode selection prompts — scope override via `/config-audit full|repo|home|current`
- Simplified from 17 to 15 commands (removed quick, report, watch; added help)
- All CLI banners and envelope metadata updated to v2.1.0
### Added
- `/config-audit help` command with categorized command reference
- Auto-scope detection from git context (repo vs home vs full-machine)
### Removed
- `/config-audit:quick` (merged into default `/config-audit`)
- `/config-audit:report` (merged into analyze output)
- `/config-audit:watch` (use `/config-audit drift` instead)
## [2.0.0] - 2026-04-03 (v2.0 Complete)
### Summary
Complete rewrite from LLM-only prototype to deterministic scanner-backed configuration intelligence.
7 development sessions (S1-S7), ~15,000 lines of code, 408+ tests.
### Highlights
- 8 deterministic scanners (CML, SET, HKV, RUL, MCP, IMP, CNF, GAP) + PLH standalone
- Feature gap analysis with 25 dimensions across 4 tiers
- Auto-fix engine with 9 fix types + backup/rollback
- Drift detection with baseline comparison
- Suppression engine (.config-audit-ignore)
- Self-audit CLI
- 17 commands, 6 agents, 4 hooks
- 408+ tests (zero external dependencies)
### Added (S7)
- Example projects: `examples/minimal-setup/` and `examples/optimal-setup/`
- Demo script: `examples/run-demo.sh`
- `.config-audit-ignore` for self-audit suppressions
- `V2-ANNOUNCEMENT.md`
- `DEPRECATED.md` for capability-auditor skill
### Fixed (S7)
- `hooks.json`: SessionStart and Stop timeout 5ms → 5000ms
- `self-audit.mjs`: Suppression now enabled (was hardcoded to `suppress: false`)
### Changed (S7)
- README.md: Complete rewrite for public release
- CLAUDE.md: Added Suppressions section
- `.gitignore`: Added `node_modules/` and `S*-PROMPT.md`
## [1.6.0] - 2026-04-03 (v2.0 S6: Unified Reports + Self-Audit + Suppressions)
### Added
- **Report generator** `scanners/lib/report-generator.mjs` — unified markdown reports: generatePostureReport(), generateDriftReport(), generatePluginHealthReport(), generateFullReport()
- **Suppression engine** `scanners/lib/suppression.mjs``.config-audit-ignore` file support with exact IDs and glob patterns (CA-SET-*), audit trail via `suppressed_findings` in envelope
- **Self-audit CLI** `scanners/self-audit.mjs` — runs all scanners + plugin health on this plugin: `node self-audit.mjs [--json] [--fix]`, exit codes 0/1/2
- **PostToolUse hook** `post-edit-verify.mjs` — verifies config files after Edit/Write, blocks if new critical/high findings introduced
- **New command**: `/config-audit:report` — generate unified report (posture + optional drift/plugin-health)
- **Test fixture** `.config-audit-ignore` in fixable-project
- 54 new tests (total 408 across 25 test files)
### Changed
- `scan-orchestrator.mjs`: suppression integration — applies .config-audit-ignore after all scanners run, `--no-suppress` flag to disable
- `hooks.json`: added PostToolUse event with post-edit-verify
## [1.5.0] - 2026-04-03 (v2.0 S5: Drift + Watch + Plugin Health)
### Added
- **Diff engine** `scanners/lib/diff-engine.mjs` — diffEnvelopes() comparing baseline vs current, formatDiffReport() for terminal output
- **Baseline manager** `scanners/lib/baseline.mjs` — save/load/list/delete named baselines in ~/.claude/config-audit/baselines/
- **Drift CLI** `scanners/drift-cli.mjs` — standalone: `node drift-cli.mjs <path> [--save] [--baseline name] [--json] [--list]`
- **Plugin health scanner** `scanners/plugin-health-scanner.mjs` (PLH) — validates plugin structure, frontmatter, cross-plugin conflicts (runs independently, not in scan-orchestrator)
- **3 new commands**:
- `/config-audit:drift` — compare current config against saved baseline
- `/config-audit:watch` — on-demand drift check with baseline monitoring
- `/config-audit:plugin-health` — audit plugin structure and cross-plugin coherence
- **Test fixtures** `test-plugin/` (valid) and `broken-plugin/` (invalid) for plugin health tests
- 48 new tests (total 354 across 21 test files)
## [1.4.0] - 2026-04-03 (v2.0 S4: Fix + Rollback Action Pillar)
### Added
- **Fix engine** `scanners/fix-engine.mjs` — deterministic auto-fix for 9 fix types:
- `json-key-add` (missing $schema), `json-key-remove` (deprecated keys), `json-key-type-fix` (type mismatches, invalid effortLevel), `json-restructure` (hooks array→object, matcher object→string), `frontmatter-rename` (globs→paths), `file-rename` (non-.md→.md)
- **Rollback engine** `scanners/rollback-engine.mjs` — listBackups(), restoreBackup(), deleteBackup() with checksum verification
- **Fix CLI** `scanners/fix-cli.mjs` — standalone: `node fix-cli.mjs <path> [--apply] [--json] [--global]`, dry-run by default
- **Backup lib** `scanners/lib/backup.mjs` — shared backup module with checksums and manifests
- **2 new commands**:
- `/config-audit:fix` — scan, plan, backup, apply, verify in one flow
- `/config-audit:rollback` — list or restore from backups
- **PreToolUse hook** `auto-backup-config.mjs` — auto-backup config files before Edit/Write
- **Test fixture** `fixable-project/` — fixture with all 9 fixable issue types
- 38 new tests (total 306 across 17 test files)
### Changed
- `file-discovery.mjs`: walkRulesDir now discovers all files (not just .md) for non-.md validation
- `backup-before-change.mjs`: refactored to use shared `lib/backup.mjs` (no logic duplication)
- hooks.json: added PreToolUse event with auto-backup
## [1.3.0] - 2026-04-03 (v2.0 S3: Posture + Feature Gap Commands)
### Added
- **Scoring module** `scanners/lib/scoring.mjs` — utilization, maturity (5 levels), segments, area scoring, scorecard generation
- **Posture CLI** `scanners/posture.mjs` — standalone Node.js tool: `node posture.mjs <path> [--json] [--global]`
- **2 new commands**:
- `/config-audit:posture` — quick scorecard with A-F grades, utilization%, maturity level
- `/config-audit:feature-gap` — deep gap analysis with prioritized next-best-actions
- **feature-gap-agent** — Opus agent for deep analysis, report generation (max 200 lines)
- **Knowledge file** `gap-closure-templates.md` — 11 templates with effort/gain estimates
- **HTML report template** `templates/feature-gap-report.html` — visual report with progress bars, grade badges
- 64 new tests (total 268 across 14 test files)
### Changed
- Tier weighting: T1 gaps count 3x, T2 count 2x, T3/T4 count 1x in utilization score
- Maturity is threshold-based: highest level where ALL requirements are met
## [1.2.0] - 2026-04-03 (v2.0 S2: Advanced Scanners + Knowledge Base)
### Added
- **4 advanced scanners** (zero external deps):
- `mcp-config-validator.mjs` (MCP) — server types, trust levels, env vars, unknown fields
- `import-resolver.mjs` (IMP) — broken @imports, circular refs, deep chains, tilde paths
- `conflict-detector.mjs` (CNF) — settings conflicts, permission contradictions, hook duplicates
- `feature-gap-scanner.mjs` (GAP) — 25 feature gaps across 4 tiers (Foundation/Depth/Advanced/Enterprise)
- **Knowledge base** — 5 reference documents: capabilities, best practices, anti-patterns, hook events, feature evolution
- **New test fixtures** — `.mcp.json` files, @import chains, `conflict-project/` fixture
- 75 new tests (total 204 across 12 test files)
### Changed
- Scan orchestrator runs 8 scanners (was 4)
- Analyzer agent cross-references scanner findings with knowledge base
## [1.1.0] - 2026-04-03 (v2.0 S1: Scanner Foundation)
### Added
- **Deterministic scanner infrastructure** — 4 Node.js scanners (zero external deps):
- `claude-md-linter.mjs` (CML) — CLAUDE.md structure, length, sections, @imports, duplicates
- `settings-validator.mjs` (SET) — settings.json schema, unknown/deprecated keys, type checks
- `hook-validator.mjs` (HKV) — hooks.json format, script existence, event validity, timeouts
- `rules-validator.mjs` (RUL) — .claude/rules/ glob matching, orphan detection, deprecated fields
- **Scanner lib** — 5 shared modules: severity, output, file-discovery, yaml-parser, string-utils
- **Scan orchestrator** — `scan-orchestrator.mjs` runs all scanners, outputs JSON envelope
- **Test infrastructure** — 129 tests across 8 test files using node:test (zero deps)
- **Test fixtures** — 4 fixture projects (healthy, broken, empty, minimal)
- Finding ID format: `CA-{SCANNER}-{NNN}` (e.g. `CA-CML-001`)
### Fixed
- Agent model mismatches: scanner→haiku, analyzer→sonnet, planner→opus, implementer→sonnet, verifier→haiku
### Changed
- CLAUDE.md rewritten in English for public release readiness
## [1.0.0] - 2026-02-11
### Added
- Cross-platform support (macOS, Linux, Windows)
### Fixed
- `stop-session-reminder.mjs`: Use `path.basename`/`path.dirname` instead of hardcoded `/` split
- `backup-before-change.mjs`: Handle both `/` and `\` path separators in safe filename generation
### Removed
- "Windows: hooks are 100% bash" from known gaps (was incorrect — all hooks are Node.js)
## [0.7.0] - 2026-02-07
### Note
Version reset from 1.2.0 to reflect actual maturity. Previous version was inflated — this plugin has never been externally tested.
### What exists today
- 6 specialized agents (scanner, analyzer, interviewer, planner, implementer, verifier)
- Full machine-wide Claude Code configuration discovery
- Scope selection (current project, repo, home, full machine)
- Inheritance hierarchy mapping and conflict detection
- Mandatory backups before any changes
- Rollback support
- Syntax validation for all configuration files
- Quick audit-only mode
- Full optimization workflow with HITL checkpoints
### Known gaps
- Testing: no automated tests
- Onboarding: never verified that a new user can install and use from scratch
- External verification: nobody else has ever used this