Cache Telemetry Recipe

Manual recipe for verifying prompt-cache hit rate from Claude Code session transcripts. Opt-in. The TOK scanner is structural — it estimates token cost from disk content but never reads runtime telemetry. This recipe closes that gap when you need to confirm a structural fix actually improved cache reuse.

Last verified 2026-05-01 against Claude Code transcript schema.

Synopsis

Each turn in a Claude Code session is logged as a JSONL entry under ~/.claude/projects/<slug>/. Anthropic's API response includes cache_read_input_tokens and cache_creation_input_tokens per turn, and Claude Code persists these in the transcript. Summing them gives a per-session cache hit rate without needing the API key or any external service.

A high cache-read share (≥ 70%) means structural fixes are working. A low share (< 30%) means something at the top of the prompt is changing per turn — typically a CLAUDE.md timestamp, a rolling counter, or a deep @import boundary. Cross-reference with /config-audit tokens to find the culprit.

Recipe

1. Locate the transcript

# Newest transcript for the current project
PROJECT_SLUG=$(pwd | sed 's|/|-|g')
TRANSCRIPT=$(ls -t ~/.claude/projects/${PROJECT_SLUG}/*.jsonl 2>/dev/null | head -1)
echo "Transcript: $TRANSCRIPT"

If no transcript exists, run a few turns in Claude Code first.

2. Sum cache tokens per turn

# Requires jq. Sums cache_read and cache_creation across all turns.
jq -s '
  [.[] | select(.type == "assistant" and .message.usage)]
  | {
      turns: length,
      cache_read: ([.[] | .message.usage.cache_read_input_tokens // 0] | add),
      cache_creation: ([.[] | .message.usage.cache_creation_input_tokens // 0] | add),
      input_no_cache: ([.[] | .message.usage.input_tokens // 0] | add)
    }
  | . + {
      total_input: (.cache_read + .cache_creation + .input_no_cache),
      hit_rate: (if (.cache_read + .cache_creation + .input_no_cache) > 0
                 then (.cache_read / (.cache_read + .cache_creation + .input_no_cache))
                 else 0 end)
    }
' "$TRANSCRIPT"

Example output:

{
  "turns": 18,
  "cache_read": 458320,
  "cache_creation": 12440,
  "input_no_cache": 5120,
  "total_input": 475880,
  "hit_rate": 0.9631
}

3. Interpret

Hit rate	Reading
≥ 0.85	Cache structure healthy. Structural fixes are paying off.
0.50–0.85	Cache works but something near the prefix is shifting. Inspect first 30 lines of CLAUDE.md and any `@import`-ed file.
0.20–0.50	Cache is being broken most turns. Likely a volatile CLAUDE.md top-of-file (timestamp, session id, rolling activity log) or a `defaultMode` flip. Run `/config-audit tokens` to locate.
< 0.20	Cache is essentially disabled. Either the prefix is rewritten every turn, or the session is so short caching never warmed up.

4. Per-turn breakdown (for spotting the regression turn)

jq -c '
  select(.type == "assistant" and .message.usage)
  | {
      ts: .timestamp,
      cache_read: (.message.usage.cache_read_input_tokens // 0),
      cache_creation: (.message.usage.cache_creation_input_tokens // 0)
    }
' "$TRANSCRIPT" | head -20

Look for turns where cache_read drops sharply and cache_creation spikes — that's a cache invalidation event. Whatever changed in CLAUDE.md, settings.json, or the active @import chain at that moment is the cause.

Why this is a recipe, not a scanner

Parsing transcripts as a core scanner feature was rejected during v5 planning:

Transcripts are user-private session data. Bundling parsing logic implies the plugin reads transcripts by default, which crosses a privacy boundary.
Transcript schema is undocumented and may change without notice. A scanner would silently drift.
The recipe form (jq one-liner) is auditable in 30 seconds. A bundled parser is not.

Surface area stays read-only and structural. This file is the escape hatch when structural signal alone isn't enough.

4.3 KiB Raw Blame History Unescape Escape