test(ultraplan-local): add skill-factory calibration fixtures

3 source/draft pairs that pin n-gram-overlap verdicts to representative prose: - accepted (containment 0.014 / longestRun 3) - needs-review (containment 0.211 / longestRun 12) - rejected (containment 0.676 / longestRun 74) Topics: session-start hooks, subagent delegation, output styles — proximate to the production source material the skill-factory will ingest. Plan: .claude/projects/2026-04-18-skill-factory-fase-1-mvp/plan.md (step 5)
2026-04-18 15:16:28 +02:00 · 2026-04-18 15:16:28 +02:00 · 486f544d39
commit 486f544d39
parent 491711119a
7 changed files with 280 additions and 0 deletions
--- a/plugins/ultraplan-local/tests/fixtures/skill-factory/draft-accepted.md
+++ b/plugins/ultraplan-local/tests/fixtures/skill-factory/draft-accepted.md
@ -0,0 +1,55 @@
+---
+name: session-startup-hook-pattern
+description: Pattern for warming up agent context at conversation boot
+layer: pattern
+cc_feature: hooks
+source: ./source-accepted.md
+concept: warm-start briefing via boot hook
+last_verified: 2026-04-18
+ngram_overlap_score: null
+review_status: pending
+---
+
+## Use this when
+
+You want each new conversation to begin with the agent already aware of project status — pending tasks, recent commits, key file paths — instead of burning early turns on orientation.
+
+## Shape
+
+Bind one script to the boot lifecycle event. Inside, gather whatever situational data justifies opening a fresh chat with: git log summary, todo file digest, branch identity, env snapshot. Format as terse markdown headings. Emit on stdout. The runtime grafts whatever you print into the model's opening context window.
+
+## Forces
+
+Cold turns cost real seconds. Anything you compute inside this script delays the first user prompt visibly. Therefore: prefer cached lookups, avoid synchronous calls to remote services, and timebox expensive operations. If a piece of data takes longer than roughly 200 ms to fetch, demote it from the briefing or front-load a cache refresh asynchronously elsewhere.
+
+Multiple installation layers can each register their own boot script. They run in this order: marketplace-supplied first, your personal home-level second, and project-rooted third. Outputs concatenate without merging. Two scripts both reporting the branch name will both show up. There is no built-in dedupe.
+
+Crash semantics are partial-write friendly: anything you streamed to stdout before exiting non-zero still reaches the agent. The fix is to assemble the full briefing in a buffer and only flush on the success path. That way a midpoint failure leaves an empty intro rather than a torn one.
+
+## Gotchas
+
+- No native simulator. Test by piping a hand-crafted payload directly into the script and reading the stdout it returns. Keep one canonical payload checked in.
+- No selective firing — boot scripts cannot be scoped to subdirectories or file patterns. Workaround: branch inside the script body and short-circuit early when the heuristic does not match.
+- Privileges are unfiltered: these scripts run with full operator credentials before any agent sandbox spins up. Audit third-party scripts before enabling them, and pin their versions to defeat silent updates.
+- Telemetry is your problem: the runtime does not emit structured execution logs. If you want metrics, append to a local jsonl yourself from inside the script.
+
+## Anti-patterns
+
+- Doing remote network calls on the hot path without caching.
+- Skipping output buffering and streaming partial briefings.
+- Allowing two layers to print conflicting information without coordinating.
+- Trusting installed marketplace scripts implicitly.
+
+## Decision quick-check
+
+Pick this approach when:
+
+- Pending state across sessions is non-trivial enough to merit the boot delay.
+- You can keep the script under a 250 ms wall-clock budget.
+- The information surfaced changes meaningfully between sessions.
+
+Skip it when:
+
+- Project state rarely shifts; static instructions in CLAUDE.md serve the same purpose at zero startup cost.
+- The data you would print is sensitive and could leak into copies of the conversation transcript.
+- You are still iterating on what should appear; build the briefing as a plain script the operator runs manually before nailing it down as a boot binding.