ktg-plugin-marketplace/plugins/config-audit/CHANGELOG.md

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [5.1.0] - 2026-05-01

### Summary
Plain-language UX humanizer release. Default output of all 18 commands now leads with prose; technical IDs surface at end-of-line as references rather than headlines. Non-expert users — the bulk of the OSS audience — now read findings like "Fix soon: The same automation is set up more than once" instead of "[high] CA-CNF-001: Hook duplicate event registration". Scanner internals are unchanged; humanization is a pure output-time transform applied at the rendering layer. The `--raw` flag preserves v5.0.0 verbatim output for tooling that scrapes stderr; `--json` is unchanged from v5.0.0 and remains byte-stable for programmatic consumption.

Delivered across 6 waves (Wave 0 baseline → Wave 1 humanizer module → Wave 2 test re-anchoring → Wave 3 CLI wiring → Wave 4 contract tests → Wave 5 templates/agents → Wave 6 release).

### Added
- **`scanners/lib/humanizer.mjs`** — pure-function output translator: `humanizeFinding`, `humanizeFindings`, `humanizeEnvelope`, `computeRelevanceContext`. Never mutates inputs. Adds three additive fields per finding (`userImpactCategory`, `userActionLanguage`, `relevanceContext`) and replaces title/description/recommendation when a translation is available; falls through to originals otherwise.
- **`scanners/lib/humanizer-data.mjs`** — TRANSLATIONS table for 13 scanner prefixes (CML, SET, HKV, RUL, MCP, IMP, CNF, COL, TOK, CPS, DIS, GAP, PLH). Three-step lookup per finding: exact title → regex pattern → `_default` → fall through to scanner original.
- **`--raw` flag** threaded through every CLI: `posture.mjs`, `scan-orchestrator.mjs`, `token-hotspots-cli.mjs`, `manifest.mjs`, `whats-active.mjs`, `fix-cli.mjs`, `drift-cli.mjs`, `self-audit.mjs`. Bypasses humanizer; emits byte-stable v5.0.0 verbatim output.
- **User-impact categories** (5 labels): Configuration mistake, Conflict, Wasted tokens, Missed opportunity, Dead config. Mapped from scanner prefix.
- **Action-language phrases** (5 labels): Fix this now, Fix soon, Fix when convenient, Optional cleanup, FYI. Mapped from severity.
- **Relevance context** (3 values): `test-fixture-no-impact`, `affects-this-machine-only`, `affects-everyone`. Computed from finding's file path — basenames matching `*.local.*` and paths containing `/tests/fixtures/` are recognized.
- **Self-audit terminal humanization** — `formatSelfAudit()` routes through `humanizeEnvelope`. JSON path (`--json`) is unchanged; humanization applies only to the prose terminal render.
- **Forbidden-words lint** (`tests/lint-forbidden-words.json` + runner) — 3-tier vocabulary blocklist enforced over default-mode output, ensuring humanized prose stays in plain language.
- **Scenario read-test** (`tests/scenario-read-test.mjs` + 5 scenarios) — corpus-driven readability check covering broken hook, duplicate keys, stale @import, dead tool, oversized cascade.
- **`tests/snapshots/v5.0.0/`** + **`tests/snapshots/v5.0.0-stderr/`** — frozen byte-equal references for SC-6 (--json) and SC-7 (--raw) backwards-compatibility tests across 8 CLIs.
- **`tests/snapshots/default-output/`** — humanized-prose snapshots for SC-5 default-output stability.

### Changed
- **Default output of all 18 commands** now uses plain-language descriptions. Findings group by user-impact category; titles lead with prose; technical IDs (`CA-CML-001`, `CA-TOK-005`, …) surface at end-of-line as references.
- **All 21 command and agent templates** updated to render humanized output by default and pass `--raw` through when the user requests v5.0.0 verbatim mode.
- **CLI flag inventory** — every CLI now accepts `--raw` (new) in addition to `--json` (existing, unchanged). `--output-file <path>` still writes raw v5.0.0-shape JSON regardless of mode (humanizer-bypassed, posture-specific).

### Migration
- **No action required for existing automation** that consumes `--json` — the JSON envelope shape is byte-stable with v5.0.0 and humanizer fields are bypassed in `--json` and `--raw` paths.
- **Tooling that scrapes stderr** from default mode (e.g., `posture.mjs`'s scorecard) needs review — default stderr now uses prose vocabulary. Pass `--raw` for byte-stable v5.0.0 verbatim stderr.
- **No scanner-internal changes.** Finding IDs, severity ladders, scoring weights, and area scorecards are unchanged. Upgrades are presentation-layer only.

### Test count
- 635 → 792 tests across 52 test files (+157 humanizer-tester through Waves 0–5).
- New top-level tests: `json-backcompat.test.mjs`, `raw-backcompat.test.mjs`, `scenario-read-test.test.mjs`, `snapshot-default-output.test.mjs`.
- New lib tests: `humanizer.test.mjs`, `humanizer-data.test.mjs`, `scoring-humanizer.test.mjs`.
- New scanner tests: `posture-humanizer.test.mjs`, `scan-orchestrator-humanizer.test.mjs`, `cli-humanizer.test.mjs`.

### Out of scope (deferred to v5.1.1+)
- **Posture `--output-file` humanization** — `posture.mjs` does not call `humanizeEnvelope`, so files written via `--output-file` are raw v5.0.0-shape JSON. Future revision: drop `--output-file` from command templates or add a `--humanized-json` flag.
- **Knowledge cross-references** (Step 17 of plan) — not delivered per user decision (2a).
- **Scoring scorecard JSON headline emission** — currently rendered prose-side only; command templates that want to skip stderr parsing would benefit.

### Verification
- 792/792 tests pass (`node --test 'tests/**/*.test.mjs'`)
- `node scanners/self-audit.mjs --json --check-readme` returns `configGrade: A` (97), `pluginGrade: A` (100), `readmeCheck.passed: true`
- README badge updated: `tests-635+` → `tests-792+`

## [5.0.0] - 2026-05-01

### Summary
Reality-based token-optimization release. v4.0.0 shipped Opus-4.7 token surfaces aligned to a Sonnet-era cost model; v5.0.0 rebuilds the foundations against verified Opus-4.7 cost dynamics. Three pillars: honest token estimation (severity-weighted scoring, MCP estimates 15 → 500+, optional `--accurate-tokens` API calibration), new structural scanners (cache-prefix stability, dead tool grants, plugin collisions), and new diagnostic surfaces (`/config-audit manifest`, `/config-audit tokens` extended, knowledge-base rensing aligned to Opus 4.7 cache dynamics).

Consolidated from `5.0.0-alpha.1` (F1-F5 token-economy round), `5.0.0-alpha.2` (M1, M2, M4-M6, F6, F7 structural gaps + README self-audit), `5.0.0-beta.1` (N1-N4, N6 new scanners + manifest CLI), and `5.0.0-rc.1` (M7, M8 knowledge rensing + N5 tokenizer calibration).

### Added
- **3 new scanners (9 → 12 deterministic):**
  - **CPS — Cache-Prefix Stability** (`CA-CPS-NNN`): volatile content in lines 31–150 of CLAUDE.md cascade, beyond TOK Pattern A's top-30 window. Volatile-pattern set extends Pattern A with shell-exec lines (`!` prefix) and `${VAR}` substitutions.
  - **DIS — Disabled-In-Schema** (`CA-DIS-NNN`): tools listed in BOTH `permissions.deny` AND `permissions.allow`. Tool identity uses bare name (`Bash(npm:*)` and `Bash` are the same tool). Severity low.
  - **COL — Cross-Plugin Skill Collision** (`CA-COL-001`): plugin-vs-plugin same skill name → low; user-vs-plugin → medium. `details.namespaces` payload identifies conflicting sources.
- **TOK extensions:**
  - **CA-TOK-005 MCP tool-schema budget:** per-server tiered finding (< 20 none, 20–49 low, 50–99 medium, 100+ high; null low + "tool count unknown"). Scoped to project-local `.mcp.json`.
  - **Pattern E — Oversized cascade:** medium when `activeConfig.claudeMd.estimatedTokens > 10_000`.
  - **Pattern F — Bloated SKILL.md description:** low when frontmatter `description > 500 chars` (loads every turn). Scoped to `discovery.files`.
- **`/config-audit manifest`** + `scanners/manifest.mjs` CLI — single ranked table of every system-prompt token source (CLAUDE.md cascade, plugins, skills, MCP servers, hooks) sorted DESC by `estimated_tokens`. CLAUDE.md per-file tokens distributed proportional to bytes.
- **`--accurate-tokens` flag** on `token-hotspots-cli.mjs` (N5): when `ANTHROPIC_API_KEY` is set, calls Anthropic's `count_tokens` for the top 3 hotspots and populates `output.calibration = { actual_tokens, source: 'count_tokens_api', sampled_hotspots: 3 }`. When absent: `calibration = { skipped: 'no-api-key' }` plus stderr warning.
- **`scanners/lib/tokenizer-api.mjs`** — `count_tokens` wrapper. 5s AbortController timeout. Exponential backoff on 429 (3 retries: 1s/2s/4s). API key masked to `${key.slice(0,8)}...` in every error; HTTP body never included in errors (it may echo the key on auth failures). `maskKey()` exported.
- **`--with-telemetry-recipe` flag** on the same CLI (M7): emits `telemetry_recipe_path` field pointing to `knowledge/cache-telemetry-recipe.md`.
- **`knowledge/cache-telemetry-recipe.md`** (M7): manual `jq` recipe summing `cache_read_input_tokens` + `cache_creation_input_tokens` per turn from session transcripts. Hit-rate interpretation table.
- **`'mcp'` kind on `estimateTokens`** (F2): active MCP servers estimate ≥ 500 tokens (base + schema overhead) instead of v4's flat 15. Optional `{toolCount}` raises to `500 + toolCount × 200`.
- **MCP tool-count detection** (M1): `readActiveMcpServers` resolves count via cache → `node_modules/<pkg>/package.json` → `{toolCount: null, toolCountUnknown: true}` fallback.
- **`additionalDirectories` settings key** (M6): added to `KNOWN_KEYS`; new low-severity finding when length > 2.
- **HKV verbose hook output** (M5): low-severity finding when referenced hook script contains > 50 `console.log`/`process.stdout.write` lines (static, no execution).
- **`self-audit --check-readme` flag** (F6): filesystem counts compared against README badges. Helper `checkReadmeBadges(pluginDir)`. Step 28 of v5 plan reconciled all badges.
- **`scoringVersion: 'v5'`** field on `scoreByArea` output for cross-version drift detection.
- **`WEIGHTS`** named export from `scanners/lib/severity.mjs` (frozen).
- **`details` field on findings** (`output.mjs:finding()`): optional structured payload for scanner-specific data (used by COL).
- **Plugin Hygiene** as 10th quality area (from COL). Posture JSON now reports 10 areas.
- **TOK-readActiveConfig integration** (F1): one hotspot per active MCP server; `result.activeConfig` summary (claudeMd cascade tokens, mcpServerCount, pluginCount, skillCount); try/catch fallback when scope-limited.

### Changed
- **F3 — `scoreByArea` is severity-weighted.** Penalty = `Σ count[s] × WEIGHTS[s]`; `passRate = max(0, 100 − penalty / max(10, findingCount × 4) × 100)`. Lows no longer crater an area's grade; criticals/highs do. `baseline-all-a` fixture remains all-A (no critical/high present).
- **F7 — TOK pattern severities recalibrated** for tokens-per-turn impact: Pattern A `medium → high`, Pattern B `low → medium`, Pattern C `medium → low`. Each finding carries a `calibration_note` evidence field documenting the heuristic basis.
- **`scoreByArea` deduplicates by area name** (N3 prep): TOK + CPS share "Token Efficiency"; SET + DIS share "Settings". Combined row with merged finding counts.
- **M8 — knowledge rensing:** replaced "Keep CLAUDE.md under 200 lines" in `knowledge/configuration-best-practices.md` with cache-stability guidance (first 30 lines stable, volatile content below the cache threshold). Footnote explains the 200-line rule was a Sonnet-era adherence heuristic; Opus 4.7 uses prompt-cache structure as the dominant cost lever. Cross-references `knowledge/opus-4.7-patterns.md`.
- **`commands/tokens.md` next-steps:** documents `--with-telemetry-recipe` as the cache-verification path.
- **Scanner count: 9 → 12.** Command count: 17 → 18. Knowledge: 7 → 8. Quality areas: 8 → 10.
- **`.gitignore`** — unignore rules for `tests/fixtures/**/node_modules/` so the `mcp-tool-heavy` fixture stays under version control.

### Removed
- **F4 — TOK hotspot padding loop and `take` dead-code.** Hotspots may now contain fewer than 3 entries for tiny projects (the honest answer); contract still bounds at ≤ 10.
- **F5 — Pattern D / `CA-TOK-004` (sonnet-era signature).** Catalogue entry removed from `knowledge/opus-4.7-patterns.md` and `commands/tokens.md`. Suppression entries for `CA-TOK-004` are now no-ops.

### Breaking changes
- **F2 — MCP token estimates jump from flat 15 to ≥ 500.** Token Efficiency grades for projects with MCP servers may shift. `whats-active` totals report higher numbers. Documented in `commands/posture.md` next-steps.
- **F3 — `scoreByArea` is severity-weighted.** Posture JSON consumers reading `areas[*].score` will see different values for non-clean configs. Use `result.scoringVersion === 'v5'` to detect the change. Drift comparisons across v4↔v5 baselines may show artificial deltas — re-baseline after upgrade.
- **F5 — Pattern D / `CA-TOK-004` no longer emitted.** Existing exact `CA-TOK-004` suppression entries are harmless but obsolete.
- **N1 suppression backward-compat — `CA-TOK-*` glob now also matches `CA-TOK-005`.** To preserve prior behavior of suppressing only patterns A/B/C, replace the glob with explicit IDs:
  ```
  CA-TOK-001
  CA-TOK-002
  CA-TOK-003
  ```
  A one-time runtime warning for this case is a v5.0.1 candidate.
- **Posture areas count: 9 → 10** (Plugin Hygiene from COL). Consumers hard-coding 9 must update.

### Migration notes
- `CA-TOK-*` glob suppressions: explicit-ID list recommended if CA-TOK-005 should not be suppressed.
- `CA-TOK-004` exact-ID suppression entries: safe to remove.
- Drift baselines created against v4 should be re-saved post-upgrade to avoid artificial F3 weighting deltas.
- Posture JSON consumers must update any hardcoded `areas.length === 8` or `=== 9` assertions to `>= 10`.

### Tests
- 543 → 635 (+92): F1-F7 (alpha rounds = +43), N1-N4 + N6 (beta = +39), M7 + M8 + N5 (rc = +10). 36 test files (12 lib + 23 scanner + 1 hook).
- New fixtures: `tok-active-config/`, `additional-dirs-many/`, `additional-dirs-ok/`, `large-cascade/`, `small-cascade/`, `skill-bloated/`, `skill-tight/`, `mcp-tool-heavy/` (with mocked `node_modules/`), `hooks-verbose/`, `hooks-quiet/`, `readme-desynced/`, `mcp-budget/{14,25,60,120,unknown}-tools/`, `volatile-mid-section/{volatile-line-60,volatile-line-200}/`, `denied-tools-in-schema/`, `collision-plugins/fake-home/` (plugin-a + plugin-b + plugin-c + user-level review skill).
- New test files: `tests/scanners/manifest.test.mjs`, `tests/scanners/cache-prefix.test.mjs`, `tests/scanners/disabled-in-schema.test.mjs`, `tests/scanners/collision.test.mjs`, `tests/scanners/accurate-tokens.test.mjs`.

### Notes
- **`mock.method` against ESM module exports does not work** (Node 18+ ESM read-only export bindings). v5 tests use `globalThis.fetch` mocking for `--accurate-tokens` instead — equivalent coverage at the actual external-dependency boundary.
- **Plugin-vs-built-in collision detection is intentionally not implemented.** Step 22a research spike (`docs/v5-namespace-research.md`, gitignored) could not verify Claude Code's resolution behavior when a plugin command shares a name with a built-in. Treated as info-only; v5.0.1 candidate.
- **README/CLAUDE.md badge reconciliation** done in Step 28 (this release). `self-audit --check-readme` PASSES against the filesystem. Test count counter switched from file-count to test-case count via subprocess `node --test` parse.
- **`hotspot.path` exposed on file-backed hotspots** (Step 30 fix). The rc.1 `--accurate-tokens` implementation looked up `hotspot.path` but the scanner only emitted `source`. File-backed hotspots now carry `path` (absolute path); MCP-server hotspots leave it unset (they are virtual entries representing runtime tool-schema cost, not file content).

### SC-6b release-gate result (verified 2026-05-01)
- **PASS — 0.85% under-estimation against real `count_tokens` API.**
- Fixture: `tests/fixtures/marketplace-large/`. Top-3 hotspots = 1 file-backed (`CLAUDE.md`) + 2 MCP virtuals. MCP entries skipped per design (no readable content; their tokens are formula-based at 500 + toolCount × 200).
- `CLAUDE.md` actual: 589 tokens (Anthropic `count_tokens`, `claude-opus-4-7`). Estimated: 594 tokens (byte heuristic at 4 bytes/token via `estimateTokens`). Delta: **−5 tokens, −0.85%** — well within the ±5% gate.
- No tuning of `estimateTokens` heuristic required for v5.0.0.

## [5.0.0-rc.1] - 2026-05-01

### Summary
Release candidate for v5.0.0 — knowledge rensing and tokenizer calibration. Three deliverables: M8 (Sonnet-era → Opus 4.7 best-practices rewrite), M7 (cache-telemetry recipe in `knowledge/` plus an opt-in CLI flag), and N5 (`--accurate-tokens` API calibration via Anthropic's `count_tokens` endpoint).

### Added
- **N5 — `--accurate-tokens` flag** on `scanners/token-hotspots-cli.mjs`. When `ANTHROPIC_API_KEY` is set, the CLI calls Anthropic's `count_tokens` endpoint for the top 3 hotspots and populates `output.calibration = { actual_tokens, source: 'count_tokens_api', sampled_hotspots: 3 }`. When the key is absent, `calibration = { skipped: 'no-api-key' }` and a stderr warning is emitted. Designed for the manual SC-6b release-gate verification, not routine use.
- **`scanners/lib/tokenizer-api.mjs`** — wrapper around `count_tokens` with a 5-second AbortController timeout, exponential-backoff retry on HTTP 429 (max 3 retries: 1s, 2s, 4s), and required headers (`x-api-key`, `anthropic-version: 2023-06-01`, `content-type`). API key is masked to `${key.slice(0,8)}...` in every error message and every thrown error; non-429 HTTP errors throw status code only — response body is never included (it may echo the key on auth failures). `maskKey()` is exported for callers that need safe logging.
- **M7 — `knowledge/cache-telemetry-recipe.md`** (new). Manual `jq` recipe for verifying prompt-cache hit rate from Claude Code session transcripts (`~/.claude/projects/<slug>/*.jsonl`). Sums `cache_read_input_tokens` and `cache_creation_input_tokens` per turn and reports a hit-rate ratio. Recipe-form (not bundled scanner) keeps the project's "no transcript-parsing as core feature" non-goal intact while giving users a runtime escape hatch.
- **M7 — `--with-telemetry-recipe` flag** on the same CLI. When passed, emits `telemetry_recipe_path` in the JSON output pointing to the recipe file. Without the flag, output is unchanged. Committed as a default deliverable, opt-in at invocation time.

### Changed
- **M8 — knowledge-base rensing:** replaced the "Keep CLAUDE.md under 200 lines" rule in `knowledge/configuration-best-practices.md` with cache-stability guidance (first 30 lines stable, volatile content below the cache threshold). Added a footnote that the 200-line rule was a Sonnet-era adherence heuristic; Opus 4.7 uses prompt-cache structure as the dominant cost lever. Cross-references `knowledge/opus-4.7-patterns.md`.
- **`commands/tokens.md` next-steps:** documents `--with-telemetry-recipe` as the cache-verification path after a structural fix.

### Tests
- 625 → 635 (+10): `--with-telemetry-recipe` (×2), tokenizer-api unit tests (×6 — masking, body-leak protection, AbortController signal, 429 retry, header set, fetch mock happy path), `--accurate-tokens` no-key subprocess test (×1), absent-flag negative test (×1).
- New file: `tests/scanners/accurate-tokens.test.mjs`. No new fixtures (re-uses `marketplace-large`).

### Notes
- **SC-6b release gate is NOT closed by these commits.** Step 26's tests use mocked `globalThis.fetch` to verify the integration contract; ±5% accuracy against real `count_tokens` requires a live API key and must be verified manually before tagging v5.0.0 in Session 5.
- The plan's specified `mock.method(tokenizerApi, 'callCountTokensApi', ...)` pattern collides with ESM read-only export bindings in Node 18+. Tests mock at the `globalThis.fetch` boundary instead — equivalent coverage, no module-export rebinding required.
- README/CLAUDE.md badge counts and `plugin.json` version still target v4.0.0; Step 28+29 will sync those during the release wrap.
- `[skip-docs]` tag on the N5 feat commit; M7 and M8 are `docs(...)` commits and don't need it.

## [5.0.0-beta.1] - 2026-05-01

### Summary
First v5.0.0 beta — new scanners. Five new finding sources land: MCP tool-schema budget (CA-TOK-005), system-prompt manifest CLI/command (`/config-audit manifest`), cache-prefix stability (CPS), disabled-tools-still-in-schema (DIS), and cross-plugin/user-vs-plugin skill collision (COL/CA-COL-001). Plugin Hygiene becomes a 10th area-scorecard column.

### Added
- **N1 — `CA-TOK-005` MCP tool-schema budget:** per-server tiered finding inside the TOK scanner. Thresholds — `< 20` no finding, `20–49` low, `50–99` medium, `100+` high; `null` (manifest unparseable) low + "tool count unknown" message. Scoped to project-local `.mcp.json` to keep `/config-audit <path>` actionable. Recommendation links to the Step 25 cache-telemetry recipe.
- **N2 — `/config-audit manifest`:** new slash command + `scanners/manifest.mjs` CLI. Renders a single ranked table of every token source (CLAUDE.md cascade, plugins, skills, MCP servers, hooks) sorted DESC by `estimated_tokens`. Reuses `readActiveConfig`; CLAUDE.md per-file tokens are distributed proportional to bytes.
- **N3 — CPS scanner (`CA-CPS-NNN`):** Cache-Prefix Stability Analyzer. Walks the CLAUDE.md cascade and flags volatile content between lines 31 and 150 — beyond TOK Pattern A's top-30 territory. Volatile-pattern set extends Pattern A with shell-exec lines (`!` prefix) and `${VAR}` substitutions. Severity medium per finding. Skips lines 1–30 (Pattern A's range).
- **N4 — DIS scanner (`CA-DIS-NNN`):** Disabled-In-Schema Detector. Detects tools that appear in BOTH `permissions.deny` and `permissions.allow` within the same `settings.json`. The deny list wins, so allow entries are dead config but still load every turn. Tool identity is the bare name (everything before `(`); `Bash(npm:*)` and `Bash` are treated as the same tool. Severity low.
- **N6 — COL scanner (`CA-COL-001`):** Cross-Plugin Skill Collision detector. Plugin-vs-plugin same skill name → low. User-vs-plugin same skill name → medium. Findings carry `details.namespaces` array with `{source, name, path}` for every conflicting source.
- **`details` field on findings:** `output.mjs:finding()` helper now passes through optional `details` for scanner-specific structured payloads (used by COL).
- **"Plugin Hygiene" area** (10th in scorecard): COL contributes here. Posture JSON now reports 10 areas instead of 9.

### Changed
- **`scoreByArea` deduplicates by area name:** when multiple scanners share an area (TOK + CPS → "Token Efficiency", SET + DIS → "Settings"), they produce one combined row with merged finding counts. Existing 9-area contract preserved for non-Plugin-Hygiene areas.

### Known breaking changes
- **Suppression backward-compat — `CA-TOK-*` glob now also matches `CA-TOK-005`.** Existing `.config-audit-ignore` entries that suppress TOK findings via the `CA-TOK-*` glob will silently include CA-TOK-005 (MCP budget). To preserve the prior behavior of suppressing only patterns A/B/C, replace the glob with explicit IDs:
  ```
  CA-TOK-001
  CA-TOK-002
  CA-TOK-003
  ```
  A one-time runtime warning for this case is out of scope for v5.0.0 — it is a candidate for v5.0.1.
- **Plugin-vs-built-in collision is intentionally not implemented.** The Step 22a research spike could not verify Claude Code's resolution behavior when a plugin command shares a name with a built-in (`/help`, `/clear`, `/init`, `/review`, `/config`, `/cost`, `/security-review`). Treated as info-only in this release; a follow-up v5.0.1 ticket may add an opt-in check.

### Tests
- 586 → 625 (+39): N1 (×7), N2 (×11), N3 (×7), N4 (×6), N6 (×8).
- New fixtures: `mcp-budget/{14,25,60,120,unknown}-tools/`, `volatile-mid-section/{volatile-line-60,volatile-line-200}/`, `denied-tools-in-schema/`, `collision-plugins/fake-home/` (plugin-a + plugin-b + plugin-c + user-level review skill).

### Notes
- `[skip-docs]` tag used on every feat commit — README/CLAUDE.md badge counts (scanner count, command count, test count) and the architecture sections are intentionally fenced off until Session 5 (Step 28). This keeps the v5 plan's session boundaries clean even when the Forgejo `pre-commit-docs-gate` hook would otherwise block these commits.

## [5.0.0-alpha.2] - 2026-05-01

### Summary
Second v5.0.0 alpha — structural gaps + README self-audit. TOK pattern severities recalibrated for tokens/turn impact (F7), three new findings cover settings/skills/cascade structure (M2, M4, M6), MCP tool-count detection wired (M1), HKV gains a verbose-output check (M5), and self-audit grows a `--check-readme` flag (F6).

### Added
- **F7 — TOK severity recalibration:** Pattern A (cache-breaking volatile top) `medium → high`, Pattern B (redundant permissions) `low → medium`, Pattern C (deep imports) `medium → low`. Each finding now carries a `calibration_note` evidence field documenting the heuristic basis.
- **M6 — `additionalDirectories` settings key:** added to `KNOWN_KEYS` so it no longer trips "unknown settings key". New low-severity finding when `additionalDirectories.length > 2`.
- **M4 — TOK Pattern E:** medium-severity finding when `activeConfig.claudeMd.estimatedTokens > 10_000` — flags cascades that bleed budget every turn.
- **M2 — TOK Pattern F:** low-severity finding for project-local `SKILL.md` whose frontmatter `description` exceeds 500 characters (description loads on every turn even when the body does not). Scoped to `discovery.files`; user/plugin skills out of project scope are not flagged.
- **M1 — MCP tool-count detection:** `readActiveMcpServers` now resolves tool count via cache → `node_modules/<pkg>/package.json` → `{toolCount: null, toolCountUnknown: true}` fallback. Tool count drives `estimateTokens` per server.
- **M5 — HKV verbose hook output:** new low-severity finding when a referenced hook script contains > 50 `console.log` / `process.stdout.write` lines (static heuristic, no execution).
- **F6 — `self-audit --check-readme` flag:** filesystem counts (scanners, commands, agents, hooks, tests, knowledge) compared against README badge values. Helper export: `checkReadmeBadges(pluginDir)`.

### Changed
- **TOK severities** (F7) — see Added. Posture aggregates that depended on Pattern A being `medium` will now reflect the higher-impact rating.
- **`.gitignore`** — added unignore rules so `tests/fixtures/**/node_modules/` are tracked. Required by the `mcp-tool-heavy` fixture.

### Tests
- 563 → 586 (+23): F7 table-driven (×6), M6 (×3), M4 (×2), M2 (×2), M1 (×4), M5 (×2), F6 (×4).
- New fixtures: `additional-dirs-many/`, `additional-dirs-ok/`, `large-cascade/`, `small-cascade/`, `skill-bloated/`, `skill-tight/`, `mcp-tool-heavy/` (with mocked `node_modules/`), `hooks-verbose/`, `hooks-quiet/`, `readme-desynced/`.

### Notes
- `result.readmeCheck.passed === true` is **not** required during alpha/beta phases. The real plugin's own check is currently red (`scanners` 10 vs README 9, `tests` 31 vs README 543) — reconciliation deferred to Session 5 Step 28 (README sync).
- `[skip-docs]` tag used on every commit — README/CLAUDE.md badge counts and architecture text are intentionally fenced off until Session 5.

## [5.0.0-alpha.1] - 2026-05-01

### Summary
First v5.0.0 alpha — token-economy round, F1-F5. The TOK scanner now consumes `readActiveConfig` (per-MCP-server hotspots, claudeMd cascade tokens), severity weighting replaces flat finding counts in `scoreByArea`, and MCP servers no longer estimate at a flat 15 tokens. Pattern D (CA-TOK-004 sonnet-era signature) removed — too noisy, not actionable.

### Added
- **`'mcp'` kind for `estimateTokens`** (F2): an active MCP server now estimates ≥ 500 tokens (base protocol + schema overhead) instead of the v4 flat 15. Optional `{toolCount}` raises the estimate to `500 + toolCount * 200` once Step 14 wires tool-count detection.
- **TOK ↔ readActiveConfig integration** (F1): the TOK scanner emits one hotspot per active MCP server, sums their tokens into `total_estimated_tokens`, and exposes `result.activeConfig` (claudeMd cascade tokens, mcpServerCount, pluginCount, skillCount).
- **`scoringVersion: 'v5'`** field on `scoreByArea` output for cross-version drift detection.
- **`WEIGHTS`** named export from `scanners/lib/severity.mjs` (`Object.freeze`).

### Changed
- **BREAKING (intentional, F3):** `scoreByArea` is now severity-weighted. Penalty = `Σ count[s] * WEIGHTS[s]`; `passRate = max(0, 100 - penalty / max(10, findingCount * 4) * 100)`. Lows no longer crater an area's grade; a single high or critical consumes a large fraction of budget. `baseline-all-a` fixture remains all-A (no critical/high on that fixture).
- **BREAKING (intentional, F2):** MCP server token estimates jump from a flat 15 to ≥ 500. `whats-active` totals and TOK hotspots will report higher numbers for any project with active MCP servers.
- **BREAKING (intentional, F5):** Pattern D / `CA-TOK-004` (sonnet-era signature) is no longer emitted. Suppression entries for `CA-TOK-004` are now no-ops; downstream tools that filter on the ID should drop it. The catalogue entry was removed from `knowledge/opus-4.7-patterns.md` and `commands/tokens.md`.
- **Hotspots contract (F4):** the v4 padding loop and `take` dead-code are gone. Hotspots may now contain fewer than 3 entries for tiny projects (the honest answer); contract still bounds at ≤ 10.

### Migration notes
- `CA-TOK-*` glob suppression entries continue to suppress 001-003. Existing exact `CA-TOK-004` entries are harmless but obsolete — remove them at convenience.
- Posture/JSON consumers reading `areas[*].score` will see different values for non-clean configs. Use `result.scoringVersion === 'v5'` to detect.

### Tests
- 543 → 563 across the alpha.1 commits (+9 severity-weighting/scoring, +4 estimateTokens 'mcp', +1 MCP caller migration, +3 readActiveConfig integration, +2 hotspots-uniqueness, +2 sonnet-era zero-finding).
- New fixture `tests/fixtures/tok-active-config/` — minimal repo with `.mcp.json` (2 servers), `CLAUDE.md`, plugin skeleton.

## [4.0.0] - 2026-04-19

### Summary
Opus 4.7 era upgrade. New TOK scanner detects token-efficiency anti-patterns (cache-breaking volatile content, redundant tool permissions, deep import chains, sonnet-era minimal setups). Token Efficiency joins the quality scorecard as the 8th area. Scanner-agent and verifier-agent migrate from haiku → sonnet per global no-haiku policy.

### Added
- **`token-hotspots.mjs`** scanner (CA-TOK-001..004) — 4 patterns aligned with Opus 4.7 token-cost dynamics:
  - CA-TOK-001 cache-breaking volatile content (timestamps/UUIDs in top 30 lines of CLAUDE.md)
  - CA-TOK-002 redundant tool permissions (duplicate or subset overlaps)
  - CA-TOK-003 deep @import chains (>2 hops on the load path)
  - CA-TOK-004 sonnet-era minimal setup (no skills/MCP/hooks/managed/plugins)
- **`/config-audit tokens [path] [--global]`** — ranked hotspot table + per-pattern findings.
- **`scanners/token-hotspots-cli.mjs`** — standalone CLI emitting `total_estimated_tokens`, `hotspots`, and per-finding output.
- **Token Efficiency** as the 8th quality area in the posture scorecard (now 9 scanners total: CML/SET/HKV/RUL/MCP/IMP/CNF/GAP/TOK).
- `id` field on every area in the scorecard payload (`token_efficiency`, `instruction_clarity`, etc.) for stable downstream lookup.
- 13 new TOK scanner tests + 3 CLI tests + posture grade-stability test for `token_efficiency`.
- Knowledge refresh: `knowledge/opus-4.7-patterns.md`, plus 2026-04 deltas (v2.1.83–v2.1.111) added to `feature-evolution.md`, `claude-code-capabilities.md`, and `hook-events-reference.md` from `research/03-claude-code-changes-config-surfaces.md`.

### Changed
- **BREAKING (additive surface):** Quality areas count 7 → 8. Posture JSON consumers that hard-coded 7 areas must update.
- **BREAKING (model migration):** `scanner-agent` and `verifier-agent` migrated `haiku` → `sonnet`. Latency and cost trade-offs accepted; deterministic scanner CLIs preferred over agent invocations.
- Scanner count: 8 → 9 (TOK added).
- Command count: 16 → 17 (`/config-audit tokens` added).
- Version bump: `3.1.0` → `4.0.0`.

## [3.1.0] - 2026-04-14

### Summary
New read-only command `/config-audit whats-active` — shows exactly what Claude Code loads for a given repo, with token estimates.

### Added
- **`/config-audit whats-active [path]`** — inventory of active plugins, skills, MCP servers, hooks, and CLAUDE.md cascade for a repo, with source attribution (user/project/plugin) and rough token estimates. Read-only, <2s.
- `scanners/lib/active-config-reader.mjs` — pure async helper: `readActiveConfig()`, `detectGitRoot()`, `walkClaudeMdCascade()`, `readClaudeJsonProjectSlice()` (longest-prefix matching), `enumeratePlugins()`, `enumerateSkills()`, `readActiveHooks()`, `readActiveMcpServers()`, `estimateTokens()`.
- `scanners/whats-active.mjs` — thin CLI shim supporting `--json`, `--output-file`, `--verbose`, `--suggest-disables`.
- Optional `--suggest-disables` flag surfaces deterministic disable candidates (disabled MCP servers, zero-item plugins, unreferenced plugins, orphan skills) and invites an LLM judgment pass in the command.
- 36 new tests in `tests/lib/active-config-reader.test.mjs`, plus a `rich-repo` tmpdir fixture helper.

### Changed
- Version bump: `3.0.1` → `3.1.0` (minor, additive feature, no breaking changes).
- Command count: 15 → 16.

## [3.0.1] - 2026-04-04

### Summary
Cross-platform fix — scanners, hooks, and lib now work correctly on Windows.

### Fixed
- `file-discovery.mjs`: depth calculation, agent/command/plugin path matching now use `path.sep`
- `scan-orchestrator.mjs`: fixture-path filtering now uses `path.sep`
- `post-edit-verify.mjs`: rules-dir regex handles both `/` and `\` separators
- `auto-backup-config.mjs`: rules-dir detection now uses `path.sep`
- `import-resolver.mjs`: circular import display uses `basename()`, `/tmp` fallback replaced with `os.tmpdir()`
- `string-utils.mjs`: `normalizePath` trailing separator regex handles both `/` and `\`

### Added
- 4 cross-platform path tests (total 486 tests)

## [3.0.0] - 2026-04-04

### Summary
Health redesign — configuration health is now quality-only. Feature utilization removed from grades entirely.

### Changed
- **Health = quality only.** 7 deterministic scanners (CML, SET, HKV, RUL, MCP, IMP, CNF) determine your grade. Feature Coverage is no longer a graded area.
- **Feature recommendations are opt-in.** Unused features shown as "opportunities" via `/config-audit feature-gap`, grouped by impact (high/medium/explore), backed by Anthropic docs. No more "Feature Coverage: F" for correct minimal setups.
- **Posture output redesigned.** Shows `Health: {grade} ({score}/100)` with 7 quality areas. Removed utilization %, maturity level, segment label.
- **Feature-gap is interactive.** Users select recommendations to implement directly — no manual file editing required. Backup created automatically.
- **avgScore bug fixed.** Grade letter and displayed score now computed from the same population (quality areas only).

### Added
- `generateHealthScorecard()` in scoring.mjs — quality-only scorecard
- `opportunitySummary()` in feature-gap-scanner.mjs — groups findings by impact tier
- `opportunityCount` field in posture JSON output
- "Official Configuration Guidance" section in knowledge base (Anthropic docs, proven impacts)
- 21 new tests (total 482 across 27 test files)

### Removed
- `S2-PROMPT.md` and `V2-ANNOUNCEMENT.md` — v2 development artifacts
- Utilization %, maturity level, segment label from posture terminal output and reports
- Feature Coverage row from area breakdown tables
- "Top Actions" sourced from GAP findings (replaced by opportunities pointer)

### Backward Compatibility
- JSON output preserves all legacy fields (utilization, maturity, segment) for programmatic consumers
- Drift baselines unaffected — GAP findings still present in envelopes
- All existing exports maintained (calculateUtilization, determineMaturityLevel, etc.)

## [2.2.0] - 2026-04-04

### Summary
UX quality fix — fixture filtering, session path migration, output polish.

### Added
- Automatic test-fixture filtering in scan-orchestrator: findings from `tests/`, `examples/`, `__tests__/` excluded from grades, stored in `env.fixture_findings`
- `--include-fixtures` CLI flag for scan-orchestrator and posture to override filtering
- `scan-orchestrator.test.mjs` — 20 new tests for fixture filtering and `isFixturePath`
- Legacy session path detection in cleanup command

### Changed
- Session storage moved from `~/.config-audit/` to `~/.claude/config-audit/` (pathguard compatible)
- Self-audit grade: F → A (98) after fixture filtering
- Combined scanner + posture into single Bash call in default audit command
- Removed "F grade is misleading" disclaimer — grades are now accurate
- All CLI banners and envelope metadata updated to v2.2.0
- 461 tests (up from 441), 27 test files (up from 26)

### Removed
- Manual fixture counting instruction in `config-audit.md` (orchestrator handles it)
- Redundant `isFixtureOrExample` filter in `self-audit.mjs` (promoted to orchestrator)

## [2.1.0] - 2026-04-03

### Summary
UX redesign — auto-scope detection, zero questions, simplified command surface.

### Changed
- `/config-audit` now runs full audit automatically (auto-detects scope from git context)
- Removed mode selection prompts — scope override via `/config-audit full|repo|home|current`
- Simplified from 17 to 15 commands (removed quick, report, watch; added help)
- All CLI banners and envelope metadata updated to v2.1.0

### Added
- `/config-audit help` command with categorized command reference
- Auto-scope detection from git context (repo vs home vs full-machine)

### Removed
- `/config-audit:quick` (merged into default `/config-audit`)
- `/config-audit:report` (merged into analyze output)
- `/config-audit:watch` (use `/config-audit drift` instead)

## [2.0.0] - 2026-04-03 (v2.0 Complete)

### Summary
Complete rewrite from LLM-only prototype to deterministic scanner-backed configuration intelligence.
7 development sessions (S1-S7), ~15,000 lines of code, 408+ tests.

### Highlights
- 8 deterministic scanners (CML, SET, HKV, RUL, MCP, IMP, CNF, GAP) + PLH standalone
- Feature gap analysis with 25 dimensions across 4 tiers
- Auto-fix engine with 9 fix types + backup/rollback
- Drift detection with baseline comparison
- Suppression engine (.config-audit-ignore)
- Self-audit CLI
- 17 commands, 6 agents, 4 hooks
- 408+ tests (zero external dependencies)

### Added (S7)
- Example projects: `examples/minimal-setup/` and `examples/optimal-setup/`
- Demo script: `examples/run-demo.sh`
- `.config-audit-ignore` for self-audit suppressions
- `V2-ANNOUNCEMENT.md`
- `DEPRECATED.md` for capability-auditor skill

### Fixed (S7)
- `hooks.json`: SessionStart and Stop timeout 5ms → 5000ms
- `self-audit.mjs`: Suppression now enabled (was hardcoded to `suppress: false`)

### Changed (S7)
- README.md: Complete rewrite for public release
- CLAUDE.md: Added Suppressions section
- `.gitignore`: Added `node_modules/` and `S*-PROMPT.md`

## [1.6.0] - 2026-04-03 (v2.0 S6: Unified Reports + Self-Audit + Suppressions)

### Added
- **Report generator** `scanners/lib/report-generator.mjs` — unified markdown reports: generatePostureReport(), generateDriftReport(), generatePluginHealthReport(), generateFullReport()
- **Suppression engine** `scanners/lib/suppression.mjs` — `.config-audit-ignore` file support with exact IDs and glob patterns (CA-SET-*), audit trail via `suppressed_findings` in envelope
- **Self-audit CLI** `scanners/self-audit.mjs` — runs all scanners + plugin health on this plugin: `node self-audit.mjs [--json] [--fix]`, exit codes 0/1/2
- **PostToolUse hook** `post-edit-verify.mjs` — verifies config files after Edit/Write, blocks if new critical/high findings introduced
- **New command**: `/config-audit:report` — generate unified report (posture + optional drift/plugin-health)
- **Test fixture** `.config-audit-ignore` in fixable-project
- 54 new tests (total 408 across 25 test files)

### Changed
- `scan-orchestrator.mjs`: suppression integration — applies .config-audit-ignore after all scanners run, `--no-suppress` flag to disable
- `hooks.json`: added PostToolUse event with post-edit-verify

## [1.5.0] - 2026-04-03 (v2.0 S5: Drift + Watch + Plugin Health)

### Added
- **Diff engine** `scanners/lib/diff-engine.mjs` — diffEnvelopes() comparing baseline vs current, formatDiffReport() for terminal output
- **Baseline manager** `scanners/lib/baseline.mjs` — save/load/list/delete named baselines in ~/.claude/config-audit/baselines/
- **Drift CLI** `scanners/drift-cli.mjs` — standalone: `node drift-cli.mjs <path> [--save] [--baseline name] [--json] [--list]`
- **Plugin health scanner** `scanners/plugin-health-scanner.mjs` (PLH) — validates plugin structure, frontmatter, cross-plugin conflicts (runs independently, not in scan-orchestrator)
- **3 new commands**:
  - `/config-audit:drift` — compare current config against saved baseline
  - `/config-audit:watch` — on-demand drift check with baseline monitoring
  - `/config-audit:plugin-health` — audit plugin structure and cross-plugin coherence
- **Test fixtures** `test-plugin/` (valid) and `broken-plugin/` (invalid) for plugin health tests
- 48 new tests (total 354 across 21 test files)

## [1.4.0] - 2026-04-03 (v2.0 S4: Fix + Rollback Action Pillar)

### Added
- **Fix engine** `scanners/fix-engine.mjs` — deterministic auto-fix for 9 fix types:
  - `json-key-add` (missing $schema), `json-key-remove` (deprecated keys), `json-key-type-fix` (type mismatches, invalid effortLevel), `json-restructure` (hooks array→object, matcher object→string), `frontmatter-rename` (globs→paths), `file-rename` (non-.md→.md)
- **Rollback engine** `scanners/rollback-engine.mjs` — listBackups(), restoreBackup(), deleteBackup() with checksum verification
- **Fix CLI** `scanners/fix-cli.mjs` — standalone: `node fix-cli.mjs <path> [--apply] [--json] [--global]`, dry-run by default
- **Backup lib** `scanners/lib/backup.mjs` — shared backup module with checksums and manifests
- **2 new commands**:
  - `/config-audit:fix` — scan, plan, backup, apply, verify in one flow
  - `/config-audit:rollback` — list or restore from backups
- **PreToolUse hook** `auto-backup-config.mjs` — auto-backup config files before Edit/Write
- **Test fixture** `fixable-project/` — fixture with all 9 fixable issue types
- 38 new tests (total 306 across 17 test files)

### Changed
- `file-discovery.mjs`: walkRulesDir now discovers all files (not just .md) for non-.md validation
- `backup-before-change.mjs`: refactored to use shared `lib/backup.mjs` (no logic duplication)
- hooks.json: added PreToolUse event with auto-backup

## [1.3.0] - 2026-04-03 (v2.0 S3: Posture + Feature Gap Commands)

### Added
- **Scoring module** `scanners/lib/scoring.mjs` — utilization, maturity (5 levels), segments, area scoring, scorecard generation
- **Posture CLI** `scanners/posture.mjs` — standalone Node.js tool: `node posture.mjs <path> [--json] [--global]`
- **2 new commands**:
  - `/config-audit:posture` — quick scorecard with A-F grades, utilization%, maturity level
  - `/config-audit:feature-gap` — deep gap analysis with prioritized next-best-actions
- **feature-gap-agent** — Opus agent for deep analysis, report generation (max 200 lines)
- **Knowledge file** `gap-closure-templates.md` — 11 templates with effort/gain estimates
- **HTML report template** `templates/feature-gap-report.html` — visual report with progress bars, grade badges
- 64 new tests (total 268 across 14 test files)

### Changed
- Tier weighting: T1 gaps count 3x, T2 count 2x, T3/T4 count 1x in utilization score
- Maturity is threshold-based: highest level where ALL requirements are met

## [1.2.0] - 2026-04-03 (v2.0 S2: Advanced Scanners + Knowledge Base)

### Added
- **4 advanced scanners** (zero external deps):
  - `mcp-config-validator.mjs` (MCP) — server types, trust levels, env vars, unknown fields
  - `import-resolver.mjs` (IMP) — broken @imports, circular refs, deep chains, tilde paths
  - `conflict-detector.mjs` (CNF) — settings conflicts, permission contradictions, hook duplicates
  - `feature-gap-scanner.mjs` (GAP) — 25 feature gaps across 4 tiers (Foundation/Depth/Advanced/Enterprise)
- **Knowledge base** — 5 reference documents: capabilities, best practices, anti-patterns, hook events, feature evolution
- **New test fixtures** — `.mcp.json` files, @import chains, `conflict-project/` fixture
- 75 new tests (total 204 across 12 test files)

### Changed
- Scan orchestrator runs 8 scanners (was 4)
- Analyzer agent cross-references scanner findings with knowledge base

## [1.1.0] - 2026-04-03 (v2.0 S1: Scanner Foundation)

### Added
- **Deterministic scanner infrastructure** — 4 Node.js scanners (zero external deps):
  - `claude-md-linter.mjs` (CML) — CLAUDE.md structure, length, sections, @imports, duplicates
  - `settings-validator.mjs` (SET) — settings.json schema, unknown/deprecated keys, type checks
  - `hook-validator.mjs` (HKV) — hooks.json format, script existence, event validity, timeouts
  - `rules-validator.mjs` (RUL) — .claude/rules/ glob matching, orphan detection, deprecated fields
- **Scanner lib** — 5 shared modules: severity, output, file-discovery, yaml-parser, string-utils
- **Scan orchestrator** — `scan-orchestrator.mjs` runs all scanners, outputs JSON envelope
- **Test infrastructure** — 129 tests across 8 test files using node:test (zero deps)
- **Test fixtures** — 4 fixture projects (healthy, broken, empty, minimal)
- Finding ID format: `CA-{SCANNER}-{NNN}` (e.g. `CA-CML-001`)

### Fixed
- Agent model mismatches: scanner→haiku, analyzer→sonnet, planner→opus, implementer→sonnet, verifier→haiku

### Changed
- CLAUDE.md rewritten in English for public release readiness

## [1.0.0] - 2026-02-11

### Added
- Cross-platform support (macOS, Linux, Windows)

### Fixed
- `stop-session-reminder.mjs`: Use `path.basename`/`path.dirname` instead of hardcoded `/` split
- `backup-before-change.mjs`: Handle both `/` and `\` path separators in safe filename generation

### Removed
- "Windows: hooks are 100% bash" from known gaps (was incorrect — all hooks are Node.js)

## [0.7.0] - 2026-02-07

### Note
Version reset from 1.2.0 to reflect actual maturity. Previous version was inflated — this plugin has never been externally tested.

### What exists today
- 6 specialized agents (scanner, analyzer, interviewer, planner, implementer, verifier)
- Full machine-wide Claude Code configuration discovery
- Scope selection (current project, repo, home, full machine)
- Inheritance hierarchy mapping and conflict detection
- Mandatory backups before any changes
- Rollback support
- Syntax validation for all configuration files
- Quick audit-only mode
- Full optimization workflow with HITL checkpoints

### Known gaps
- Testing: no automated tests
- Onboarding: never verified that a new user can install and use from scratch
- External verification: nobody else has ever used this