ktg-plugin-marketplace/plugins/ultraplan-local/tests/fixtures/skill-factory/draft-needs-review.md
Kjell Tore Guttormsen 486f544d39 test(ultraplan-local): add skill-factory calibration fixtures
3 source/draft pairs that pin n-gram-overlap verdicts to representative prose:
- accepted (containment 0.014 / longestRun 3)
- needs-review (containment 0.211 / longestRun 12)
- rejected (containment 0.676 / longestRun 74)

Topics: session-start hooks, subagent delegation, output styles — proximate to
the production source material the skill-factory will ingest.

Plan: .claude/projects/2026-04-18-skill-factory-fase-1-mvp/plan.md (step 5)
2026-04-18 15:16:28 +02:00

4.7 KiB

name description layer cc_feature source concept last_verified ngram_overlap_score review_status
subagent-delegation-pattern Pattern for spawning isolated worker contexts to handle bounded sub-tasks pattern subagents ./source-needs-review.md context-isolated worker delegation 2026-04-18 null pending

Use this when

A chunk of work is logically self-contained, would otherwise drown the main thread in noise, or could profitably run in parallel with other branches of investigation. Reach for delegation whenever the orchestrator's transcript would be polluted by intermediate steps that nobody upstream cares about.

Shape

The orchestrator constructs a complete prompt at spawn time and hands it off to a worker. That worker operates in its own conversation, with its own context window, and answers with one structured response when finished. The parent treats that response as a tool result and proceeds. Whatever the worker reads, generates, or discards never appears in the orchestrator's transcript — only the final return value crosses the boundary. That bound is precious in long sessions where every token of orchestrator context counts.

For parallel decomposition, the parent kicks off several workers in a single turn and waits for them all to complete. The runtime executes them concurrently, so a research task that would otherwise be three sequential delegations finishes in roughly the time of the slowest single one. For workflows that legitimately decompose into independent investigations, the speedup is substantial.

Forces

Each spawn carries a startup tax. The runtime loads a fresh system prompt and the relevant tool schemas — several thousand tokens before the worker has done anything useful. Spawning ten workers to do work that one careful Grep would have handled is wasteful. Heuristic: do not delegate anything you could finish in two or three direct tool calls in the main thread.

Tool scoping cuts both noise and overhead. A search worker has no need for write or edit. A summarizer has no business running shell commands. Tightening the toolset reduces the chance of off-task behavior and shrinks the system prompt overhead. The default open-tool-set is appropriate only for general-purpose subagents that genuinely need flexibility.

Briefing is the second discipline. The worker has no view into the parent's broader conversation. Anything implicit upstream — user intent, file paths already in play, active constraints — has to be restated explicitly inside the spawn payload. Skip that step and the worker drifts into plausible-looking tangents that have nothing to do with what the parent was actually trying to accomplish.

Gotchas

  • Trust failure is the largest pitfall. The parent only sees the worker's final structured response. If the worker hallucinates that it wrote a file, the orchestrator has no easy way to verify. Best practice: independently confirm every claim that matters — read the file, check the test result, inspect the diff.
  • Failure handling is on you. The runtime does not auto-retry. When a worker crashes, decide deliberately whether to retry, fall back to in-line execution, or surface the failure to the user. Cap retries at two or three to prevent loops, and log enough to iterate on the prompt later.
  • Cost compounds quickly. Workers bill against the same account as the orchestrator. A workflow that spawns five workers per orchestrator turn can multiply token spend by an order of magnitude. Watch the burn rate and treat hitting cost ceilings as a signal that something is over-delegated.
  • Deep hierarchies seldom pay off. Workers can spawn further workers, producing a tree, but in practice the overhead compounds at every layer and trust verification becomes intractable. One orchestrator, one layer of workers — that is the default.

Anti-patterns

  • Delegating work that two or three direct tool calls would have covered.
  • Granting the open default toolset when a narrow scope would suffice.
  • Trusting a worker's self-report without separate verification.
  • Building deep hierarchies where every additional layer compounds overhead and weakens the trust chain.

Decision quick-check

Pick this approach when:

  • The sub-task is bounded and produces an output you can structure ahead of time.
  • You can brief it adequately in a single spawn prompt without dragging in the whole transcript.
  • The context savings or parallelism gain pay back the spawn tax.

Skip it when:

  • The work fits in two or three direct tool calls from the main thread.
  • The brief required would essentially recapitulate the orchestrator's whole conversation anyway.
  • The result is hard for you to verify and the cost of a fabricated success would be material.