ktg-plugin-marketplace/plugins/config-audit/knowledge/cache-telemetry-recipe.md

114 lines
4.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Cache Telemetry Recipe
> Manual recipe for verifying prompt-cache hit rate from Claude Code session
> transcripts. Opt-in. The TOK scanner is structural — it estimates token cost
> from disk content but never reads runtime telemetry. This recipe closes that
> gap when you need to confirm a structural fix actually improved cache reuse.
>
> Last verified 2026-05-01 against Claude Code transcript schema.
## Synopsis
Each turn in a Claude Code session is logged as a JSONL entry under
`~/.claude/projects/<slug>/`. Anthropic's API response includes
`cache_read_input_tokens` and `cache_creation_input_tokens` per turn, and Claude
Code persists these in the transcript. Summing them gives a per-session cache
hit rate without needing the API key or any external service.
A high cache-read share (≥ 70%) means structural fixes are working. A low share
(< 30%) means something at the top of the prompt is changing per turn —
typically a CLAUDE.md timestamp, a rolling counter, or a deep `@import`
boundary. Cross-reference with `/config-audit tokens` to find the culprit.
## Recipe
### 1. Locate the transcript
```bash
# Newest transcript for the current project
PROJECT_SLUG=$(pwd | sed 's|/|-|g')
TRANSCRIPT=$(ls -t ~/.claude/projects/${PROJECT_SLUG}/*.jsonl 2>/dev/null | head -1)
echo "Transcript: $TRANSCRIPT"
```
If no transcript exists, run a few turns in Claude Code first.
### 2. Sum cache tokens per turn
```bash
# Requires jq. Sums cache_read and cache_creation across all turns.
jq -s '
[.[] | select(.type == "assistant" and .message.usage)]
| {
turns: length,
cache_read: ([.[] | .message.usage.cache_read_input_tokens // 0] | add),
cache_creation: ([.[] | .message.usage.cache_creation_input_tokens // 0] | add),
input_no_cache: ([.[] | .message.usage.input_tokens // 0] | add)
}
| . + {
total_input: (.cache_read + .cache_creation + .input_no_cache),
hit_rate: (if (.cache_read + .cache_creation + .input_no_cache) > 0
then (.cache_read / (.cache_read + .cache_creation + .input_no_cache))
else 0 end)
}
' "$TRANSCRIPT"
```
Example output:
```json
{
"turns": 18,
"cache_read": 458320,
"cache_creation": 12440,
"input_no_cache": 5120,
"total_input": 475880,
"hit_rate": 0.9631
}
```
### 3. Interpret
| Hit rate | Reading |
|----------|---------|
| ≥ 0.85 | Cache structure healthy. Structural fixes are paying off. |
| 0.500.85 | Cache works but something near the prefix is shifting. Inspect first 30 lines of CLAUDE.md and any `@import`-ed file. |
| 0.200.50 | Cache is being broken most turns. Likely a volatile CLAUDE.md top-of-file (timestamp, session id, rolling activity log) or a `defaultMode` flip. Run `/config-audit tokens` to locate. |
| < 0.20 | Cache is essentially disabled. Either the prefix is rewritten every turn, or the session is so short caching never warmed up. |
### 4. Per-turn breakdown (for spotting the regression turn)
```bash
jq -c '
select(.type == "assistant" and .message.usage)
| {
ts: .timestamp,
cache_read: (.message.usage.cache_read_input_tokens // 0),
cache_creation: (.message.usage.cache_creation_input_tokens // 0)
}
' "$TRANSCRIPT" | head -20
```
Look for turns where `cache_read` drops sharply and `cache_creation` spikes —
that's a cache invalidation event. Whatever changed in CLAUDE.md, settings.json,
or the active `@import` chain at that moment is the cause.
## Why this is a recipe, not a scanner
Parsing transcripts as a core scanner feature was rejected during v5 planning:
1. Transcripts are user-private session data. Bundling parsing logic implies
the plugin reads transcripts by default, which crosses a privacy boundary.
2. Transcript schema is undocumented and may change without notice. A scanner
would silently drift.
3. The recipe form (jq one-liner) is auditable in 30 seconds. A bundled parser
is not.
Surface area stays read-only and structural. This file is the escape hatch
when structural signal alone isn't enough.
## See also
- `knowledge/opus-4.7-patterns.md` — structural patterns the TOK scanner detects (CA-TOK-001..005)
- `knowledge/configuration-best-practices.md` — CLAUDE.md cache-stability guidance
- `/config-audit tokens --with-telemetry-recipe` — surfaces a pointer to this file in JSON output