test(ultraplan-local): add skill-factory calibration fixtures
3 source/draft pairs that pin n-gram-overlap verdicts to representative prose: - accepted (containment 0.014 / longestRun 3) - needs-review (containment 0.211 / longestRun 12) - rejected (containment 0.676 / longestRun 74) Topics: session-start hooks, subagent delegation, output styles — proximate to the production source material the skill-factory will ingest. Plan: .claude/projects/2026-04-18-skill-factory-fase-1-mvp/plan.md (step 5)
This commit is contained in:
parent
491711119a
commit
486f544d39
7 changed files with 280 additions and 0 deletions
55
plugins/ultraplan-local/tests/fixtures/skill-factory/draft-accepted.md
vendored
Normal file
55
plugins/ultraplan-local/tests/fixtures/skill-factory/draft-accepted.md
vendored
Normal file
|
|
@ -0,0 +1,55 @@
|
|||
---
|
||||
name: session-startup-hook-pattern
|
||||
description: Pattern for warming up agent context at conversation boot
|
||||
layer: pattern
|
||||
cc_feature: hooks
|
||||
source: ./source-accepted.md
|
||||
concept: warm-start briefing via boot hook
|
||||
last_verified: 2026-04-18
|
||||
ngram_overlap_score: null
|
||||
review_status: pending
|
||||
---
|
||||
|
||||
## Use this when
|
||||
|
||||
You want each new conversation to begin with the agent already aware of project status — pending tasks, recent commits, key file paths — instead of burning early turns on orientation.
|
||||
|
||||
## Shape
|
||||
|
||||
Bind one script to the boot lifecycle event. Inside, gather whatever situational data justifies opening a fresh chat with: git log summary, todo file digest, branch identity, env snapshot. Format as terse markdown headings. Emit on stdout. The runtime grafts whatever you print into the model's opening context window.
|
||||
|
||||
## Forces
|
||||
|
||||
Cold turns cost real seconds. Anything you compute inside this script delays the first user prompt visibly. Therefore: prefer cached lookups, avoid synchronous calls to remote services, and timebox expensive operations. If a piece of data takes longer than roughly 200 ms to fetch, demote it from the briefing or front-load a cache refresh asynchronously elsewhere.
|
||||
|
||||
Multiple installation layers can each register their own boot script. They run in this order: marketplace-supplied first, your personal home-level second, and project-rooted third. Outputs concatenate without merging. Two scripts both reporting the branch name will both show up. There is no built-in dedupe.
|
||||
|
||||
Crash semantics are partial-write friendly: anything you streamed to stdout before exiting non-zero still reaches the agent. The fix is to assemble the full briefing in a buffer and only flush on the success path. That way a midpoint failure leaves an empty intro rather than a torn one.
|
||||
|
||||
## Gotchas
|
||||
|
||||
- No native simulator. Test by piping a hand-crafted payload directly into the script and reading the stdout it returns. Keep one canonical payload checked in.
|
||||
- No selective firing — boot scripts cannot be scoped to subdirectories or file patterns. Workaround: branch inside the script body and short-circuit early when the heuristic does not match.
|
||||
- Privileges are unfiltered: these scripts run with full operator credentials before any agent sandbox spins up. Audit third-party scripts before enabling them, and pin their versions to defeat silent updates.
|
||||
- Telemetry is your problem: the runtime does not emit structured execution logs. If you want metrics, append to a local jsonl yourself from inside the script.
|
||||
|
||||
## Anti-patterns
|
||||
|
||||
- Doing remote network calls on the hot path without caching.
|
||||
- Skipping output buffering and streaming partial briefings.
|
||||
- Allowing two layers to print conflicting information without coordinating.
|
||||
- Trusting installed marketplace scripts implicitly.
|
||||
|
||||
## Decision quick-check
|
||||
|
||||
Pick this approach when:
|
||||
|
||||
- Pending state across sessions is non-trivial enough to merit the boot delay.
|
||||
- You can keep the script under a 250 ms wall-clock budget.
|
||||
- The information surfaced changes meaningfully between sessions.
|
||||
|
||||
Skip it when:
|
||||
|
||||
- Project state rarely shifts; static instructions in CLAUDE.md serve the same purpose at zero startup cost.
|
||||
- The data you would print is sensitive and could leak into copies of the conversation transcript.
|
||||
- You are still iterating on what should appear; build the briefing as a plain script the operator runs manually before nailing it down as a boot binding.
|
||||
Loading…
Add table
Add a link
Reference in a new issue