Kjell Tore Guttormsen 486f544d39 test(ultraplan-local): add skill-factory calibration fixtures

3 source/draft pairs that pin n-gram-overlap verdicts to representative prose:
- accepted (containment 0.014 / longestRun 3)
- needs-review (containment 0.211 / longestRun 12)
- rejected (containment 0.676 / longestRun 74)

Topics: session-start hooks, subagent delegation, output styles — proximate to
the production source material the skill-factory will ingest.

Plan: .claude/projects/2026-04-18-skill-factory-fase-1-mvp/plan.md (step 5)

2026-04-18 15:16:28 +02:00

4.6 KiB

Raw Blame History

Subagent Delegation Reference

Subagents are isolated worker contexts that the primary agent can spawn to handle bounded sub-tasks. Each subagent runs in its own conversation, with its own context window, and returns a single structured response when finished. The orchestrating agent treats this response as a tool result and continues from there. The pattern shines when a piece of work is independent enough that running it in the main thread would either pollute the context or block other concurrent work.

The headline benefit is context isolation. The primary thread can stay focused on the user's intent while a subagent burns its own context on noisy intermediate steps — searching the codebase, reading large files, summarizing transcripts. Whatever the subagent reads or generates does not appear in the orchestrator's transcript. Only the final return value crosses the boundary. This bound is precious in long sessions where every token of orchestrator context counts.

The second benefit is parallelism. The orchestrator can launch several subagents in a single turn and wait for them all to finish. The runtime executes them concurrently. A research task that would have taken three sequential subagent calls finishes in roughly the time of the slowest single call. For workflows that legitimately decompose into independent investigations, the speedup is substantial.

There are real costs. Each subagent loads a fresh system prompt and any allow-listed tool schemas. That startup cost is several thousand tokens per subagent. Spawning ten subagents to do work that one careful Grep would have handled is wasteful. The break-even threshold depends on subagent complexity and on how much orchestrator context you save by not inlining the work, but the heuristic is: do not spawn a subagent for anything you could complete in two or three direct tool calls in the main thread.

The biggest pitfall is trust failure. The orchestrator only sees the subagent's final structured response. If the subagent hallucinates that it wrote a file, the orchestrator has no easy way to verify. Best practice is to have the orchestrator independently confirm every claim that matters: read the file, check the test result, inspect the diff. Treat the subagent's self-report as input to verification, not as ground truth.

Subagent prompts are the second pitfall. The subagent does not see the orchestrator's full conversation; it only receives the prompt the orchestrator constructs at spawn time. Anything the orchestrator implicitly knew about the user's intent, the file paths discussed, or the constraints in play must be re-stated in the spawn prompt. Otherwise the subagent will go off on tangents that look plausible in isolation but miss the point of the parent task. Briefing matters.

Tool scoping is the third dimension. The orchestrator can restrict the tool set the subagent receives. A code-search subagent does not need Write or Edit. A summarizer subagent does not need Bash. Tighter scopes reduce both the chance of off-task behavior and the system prompt overhead. Most workflows benefit from this discipline. The default open-tool-set is appropriate only for general-purpose subagents that genuinely need flexibility.

Failure modes deserve explicit thought. If a subagent crashes or returns an error, the orchestrator receives that as a tool error and must decide whether to retry, fall back to in-line execution, or propagate the failure to the user. The runtime does not auto-retry. Build the retry policy into the orchestrator's logic. Limit retries to two or three to prevent loops. Log the failure mode so the operator can iterate on the prompt.

Cost accounting matters. Subagents bill against the same account as the orchestrator. A workflow that spawns five subagents per orchestrator turn can multiply token spend by an order of magnitude. Watch the burn rate. If your workflow is hitting cost ceilings, the first place to look is whether subagents are being used for problems that did not warrant the overhead.

Composability deserves a final note. Subagents can themselves spawn further subagents, producing a tree. In practice this rarely pays off; the overhead compounds at every layer and trust verification becomes intractable. Keep the tree shallow — one orchestrator, one layer of workers — unless the problem genuinely requires deeper hierarchy.

Looking forward, community discussions show steady demand for richer return-value schemas, persistent subagent state across calls, and cleaner cancellation semantics. None of these are shipping today. Operators must work within the constraints of single-shot stateless workers with the result schema they define themselves at spawn time.

4.6 KiB Raw Blame History

Subagent Delegation Reference

4.6 KiB

Raw Blame History