Kjell Tore Guttormsen 486f544d39 test(ultraplan-local): add skill-factory calibration fixtures

3 source/draft pairs that pin n-gram-overlap verdicts to representative prose:
- accepted (containment 0.014 / longestRun 3)
- needs-review (containment 0.211 / longestRun 12)
- rejected (containment 0.676 / longestRun 74)

Topics: session-start hooks, subagent delegation, output styles — proximate to
the production source material the skill-factory will ingest.

Plan: .claude/projects/2026-04-18-skill-factory-fase-1-mvp/plan.md (step 5)

2026-04-18 15:16:28 +02:00

4.9 KiB

Raw Blame History

Session-Start Hooks Reference

Session-start hooks fire when a Claude Code conversation begins. Their job is to inject opening context into the agent's working memory before the user has typed anything. Practitioners reach for them when a project carries cross-cutting state that must be surfaced on every session entry: open todo items, recent commit history, current branch, the location of relevant config files, and so on.

The runtime invokes these hooks once per session. Each hook receives a JSON payload describing the session id, the working directory at startup, the launch flags, and a few environment hints. Output written to stdout is appended verbatim into the system prompt area before the agent reads its first user message. A non-zero exit code blocks the session from progressing. Implementations should treat that path with care — a misbehaving session-start hook can soft-brick the project for the operator until they remember to disable the binding.

A typical pattern is the briefing script. The hook reads a TODO file, summarizes recent activity in git, and emits a short markdown block with three or four headings. The agent then begins its first turn already aware of where the user left off. This collapses the warm-up phase that would otherwise consume several tool calls every time someone opens the project.

Performance discipline matters. Session-start hooks add latency to every cold start. Operators have reported sessions that take eight or ten seconds just to render the first prompt because a chain of synchronous network calls runs at session start. Cache where possible. Keep the hot path local. If you genuinely need a remote lookup, make it asynchronous and wrap the result so a slow upstream cannot stall the entire session.

Conflict handling shows up when multiple layers register a session-start hook. Plugin hooks land first, then user-level hooks, then project-level hooks. Each layer's stdout output is concatenated in that order. There is no merging or deduplication. A plugin and the user might both decide to print the current branch name, and the agent will see both. The runtime makes no attempt to resolve such collisions; the operator must coordinate.

Failure recovery deserves explicit thought. If a session-start hook crashes mid-way, partial stdout that was already emitted may still reach the agent. This can produce a confusing initial state where the briefing was truncated and the agent reasons over a half-written summary. Defensive implementations buffer their entire output and emit it as a single write at the very end, after all I/O has succeeded. That way a crash leaves a clean empty briefing rather than a partial one.

Testing these hooks is awkward because there is no native simulation harness. The standard workaround is to invoke the hook script directly with a synthesized JSON payload on stdin, then inspect the resulting stdout. Operators with tight feedback loops keep a small fixture file containing the canonical payload and run the hook through it after every change. This catches the most common failure modes — JSON parse errors, missing keys, broken pipes — without needing to spin up a real session.

Security boundaries deserve a second look. Session-start hooks run with the operator's full filesystem privileges, before any sandbox is established for the agent itself. A compromised hook can read any path the operator can read. Plugin marketplaces distribute hooks as plain shell or node scripts, often without strong signing. The mitigation is review discipline: read the hook source before installing a plugin, and pin plugin versions so an upstream rug-pull does not silently swap the script.

Observability is mostly DIY. The runtime does not log hook execution to any structured channel. If you want telemetry — how often each hook ran, how long it took, whether it succeeded — you must wire it yourself, typically by appending to a local jsonl file from inside the hook script. A few teams have built lightweight aggregators that scrape these jsonl files into a dashboard, but no shared community tool exists for this yet.

The flag landscape is small. There is no built-in flag to skip session-start hooks for a particular run. Operators sometimes work around this by toggling a sentinel environment variable that their own hook checks at the top, exiting fast when present. This is purely a convention; the runtime will not interpret it.

Looking ahead, the most-requested feature on community boards is selective firing — the ability to bind a session-start hook only when the session opens in a particular subdirectory, or only when a particular file pattern matches. The current matcher syntax is too coarse for this, and operators end up writing internal switch logic inside the hook to bail out early when the heuristic does not apply. A future runtime release may address this, but as of the current shipping version it remains an open gap that operators must paper over themselves.

4.9 KiB Raw Blame History

Session-Start Hooks Reference

4.9 KiB

Raw Blame History