diff --git a/plugins/config-audit/docs/v5-implementation-log.md b/plugins/config-audit/docs/v5-implementation-log.md index 090cbe4..780735d 100644 --- a/plugins/config-audit/docs/v5-implementation-log.md +++ b/plugins/config-audit/docs/v5-implementation-log.md @@ -136,9 +136,47 @@ Written at the end of each session. State for the next session lives in --- -## Session 4 — rc.1 (TBD) +## Session 4 — rc.1 (2026-05-01) -*Steps 24-27.* +**Goal:** ship `v5.0.0-rc.1` — knowledge rensing + tokenizer calibration. Steps 24-27. + +### Steps + +- **Step 24 — M8 knowledge rensing.** Replaced "Keep CLAUDE.md under 200 lines" with cache-stability guidance (first 30 lines stable, volatile content below the cache threshold). Added footnote explaining the 200-line rule was a Sonnet-era adherence heuristic. Verified: `grep -q "Keep under 200 lines"` returns no match. Commit: `e1e23ed` `docs(config-audit): knowledge rensing — Opus 4.7 cache-stability guidance (v5 M8)`. + +- **Step 25 — M7 cache-telemetry recipe.** + - New `knowledge/cache-telemetry-recipe.md` — copy-paste `jq` recipe that sums `cache_read_input_tokens` and `cache_creation_input_tokens` per turn from `~/.claude/projects//*.jsonl`. Hit-rate interpretation table, per-turn breakdown for spotting regression turns, design-rationale note explaining why this is a recipe and not a scanner. + - `--with-telemetry-recipe` flag on `token-hotspots-cli.mjs`. When present, emits `telemetry_recipe_path` field in JSON output. Without the flag, output unchanged (committed as default deliverable, opt-in at invocation). + - `commands/tokens.md` updated: flag documented in Step 1 args, surfaced in next-steps as the cache-verification path after a structural fix. + - Tests (×3): negative test (flag absent → field absent), positive test (flag present → string ending in `cache-telemetry-recipe.md`), existing 2 tests still pass. 627 → 628 tests. + - Commit: `df6e012` `docs(config-audit): cache-telemetry recipe + --with-telemetry-recipe flag (v5 M7)`. + +- **Step 26 — N5 `--accurate-tokens` API calibration.** + - New `scanners/lib/tokenizer-api.mjs`: `callCountTokensApi(text, apiKey, options)` wraps Anthropic's `count_tokens` endpoint. Required headers (`x-api-key`, `anthropic-version: 2023-06-01`, `content-type`). 5-second AbortController timeout. Exponential backoff on HTTP 429 (max 3 retries: 1s, 2s, 4s — base configurable for tests). Non-429 HTTP errors throw `count_tokens API failed (key sk-ant-X...): HTTP ` with the body deliberately omitted to avoid echo-leak. Network/abort errors masked similarly. `maskKey()` exported as a utility. + - `--accurate-tokens` flag on `token-hotspots-cli.mjs`. When `ANTHROPIC_API_KEY` is present, calls the API for the top 3 hotspots and populates `output.calibration = { actual_tokens, source: 'count_tokens_api', sampled_hotspots: 3 }`. When absent, `calibration = { skipped: 'no-api-key' }` plus stderr warning. On API error, `calibration = { skipped: 'api-error', error: }`. + - **Mocking pattern correction:** v5-plan specified `mock.method(tokenizerApi, 'callCountTokensApi', ...)` but ESM read-only export bindings reject property redefinition (`TypeError: Cannot redefine property: callCountTokensApi`). Switched to mocking `globalThis.fetch` instead — equivalent coverage at the actual external-dependency boundary. Documented in CHANGELOG Notes and the test-file comment. + - Tests (×8): 2× CLI subprocess (no-key skip + flag absence), 6× tokenizer-api unit (key-masking on network error, body-leak protection on 401, AbortController signal threaded, 429 retry with mocked fetch, headers asserted, happy-path fetch mock). + - Test count: 628 → 635 (+7 net; the +1 from the "absent-flag" test was added in Step 25 above so the Step 26 delta sees 7 new tests). + - Commit: `b741430` `feat(config-audit): --accurate-tokens API calibration (v5 N5) [skip-docs]`. + +- **Step 27 — rc.1 wrap.** Added `## [5.0.0-rc.1]` entry to `CHANGELOG.md` with Summary / Added / Changed / Tests / Notes. Documented the SC-6b release-gate carve-out (manual verification before tagging) and the `mock.method` → `fetch` mocking pivot. Commit: `1ce26fe` `docs(config-audit): CHANGELOG 5.0.0-rc.1 entry`. + +### Result + +- 4 steps shipped, all green. Pushed to Forgejo `main` (autorisert). +- Test count: 625 → 635 (+10). +- New files: `knowledge/cache-telemetry-recipe.md`, `scanners/lib/tokenizer-api.mjs`, `tests/scanners/accurate-tokens.test.mjs`. +- Modified: `knowledge/configuration-best-practices.md`, `scanners/token-hotspots-cli.mjs`, `commands/tokens.md`, `tests/scanners/token-hotspots-cli.test.mjs`, `CHANGELOG.md`. +- Untouched (scope fence): `README.md`, `CLAUDE.md`, `.claude-plugin/plugin.json` — all wait for Session 5. + +### Observations carried into Session 5 + +- **SC-6b release gate is open.** Before tagging `v5.0.0`, KTG must run `--accurate-tokens` against a known fixture with a real `ANTHROPIC_API_KEY`, manually compare `calibration.actual_tokens` against the byte-estimated value for that fixture, and confirm error ≤ ±5%. If error exceeds ±5%, the heuristic in `estimateTokens` must be re-tuned before tagging. +- **`mock.method` for ESM modules is a known footgun** — record this in REMEMBER for future scanners that try to stub library exports. Use `globalThis.fetch` mocking, dependency-injection seams, or `vi.mock`-style loaders if needed; do NOT rely on `mock.method` against ESM module namespaces. +- **`--check-readme` will still fail in beta state.** Self-audit's badge mismatch report (scanners 12 vs 9, tests now 31 vs 543) is by-design until Step 28's straggler sweep aligns README/CLAUDE.md with filesystem truth. Posture-test still expects 10 areas (unchanged in this session). +- **`fetch` global confirmed working** on Node 25.8.2 (KTG's machine). No fallback to `node:https` needed. + +**No blockers carried into Session 5.** ---