Kjell Tore Guttormsen d1befac35a feat(ultraplan-local): v1.7.0 — self-verifying plan chain

Wave 1 of a 6-session parallel build revealed three failure modes:
(1) hallucinated completion (status=completed after 2/5 steps, last
tool call was an arbitrary file review), (2) fail-late bash (3/6
sessions had push blocked inside sub-agent sandbox after all work
was done), (3) no objective verification (plans were prose).

v1.7 closes all three by making the plan an executable contract.

Per-step YAML manifest (expected_paths, commit_message_pattern,
bash_syntax_check, forbidden_paths, must_contain) is the objective
completion predicate. Plan-critic dimension 10 (Manifest quality)
is a hard gate. Session decomposer propagates manifests verbatim
and emits an obligatory Step 0 pre-flight (git push --dry-run,
exit 77 sentinel) in every session spec.

ultraexecute-local gets Phase 7.5 (independent manifest audit from
git log + filesystem, ignoring agent bookkeeping) and Phase 7.6
(bounded recovery dispatch, recovery_depth ≤ 2). Hard Rule 17
forbids marking a step passed without manifest verification. Hard
Rule 18 forbids ending on an arbitrary tool call before reporting.

Division of labor is made explicit:
- /ultraresearch-local gathers context (no build decisions)
- /ultraplan-local produces an executable contract (manifests,
  plan-critic gate)
- /ultraexecute-local executes disciplined (does NOT compensate
  for weak plans — escalates)

Code complete. Docs partial (Arbeidsdeling table + manifest section
added to plugin + marketplace READMEs). Verification tests
(10-sequence) pending — see REMEMBER.md.

Backward compat: v1.6 plans without plan_version marker get
legacy mode with synthesized manifests and legacy_plan: true in
progress file. Plan-critic emits advisory, not block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-12 07:38:16 +02:00

8.8 KiB

Raw Blame History

{Task Title}

Plan quality: {grade} ({score}/100) — {APPROVE | APPROVE_WITH_NOTES | REVISE | REPLAN}

Generated by ultraplan-local v{version} on {YYYY-MM-DD} — plan_version: 1.7

Context

Why this change is needed. The problem or need it addresses, what prompted it, and the intended outcome. Reference the spec file if one was used.

Architecture Diagram

graph TD
    subgraph "Changes in this plan"
        %% C4-style component diagram showing what the plan touches
        %% Highlight modified components, new components, and connections
    end

Replace with actual Mermaid diagram showing the components this plan modifies, their relationships, and the data flow between them.

Codebase Analysis

Tech stack: {languages, frameworks, build tools}
Key patterns: {architecture patterns, conventions observed}
Relevant files: {paths to files that will be read or modified}
Reusable code: {existing functions, utilities, abstractions to leverage}
External tech (researched): {technologies that were looked up via research-scout}
Recent git activity: {relevant recent commits, active branches, code ownership}

Research Sources

Omit this section when no external research was conducted.

Technology	Source	Key Findings	Confidence
{name}	{URL}	{summary}	{high/med/low}

Implementation Plan

Each step targets 1–2 files and one focused change. Steps follow TDD structure when the project has tests.

Step 1: {description}

Files: path/to/file.ts
Changes: {exactly what to modify — no placeholders, no "update as needed"}
Reuses: {existing function/pattern from codebase, with file path}
Test first:
- File: path/to/test.ts (existing | new)
- Verifies: {what the test checks}
- Pattern: path/to/existing-test.ts (follow this style)
Verify: {exact command} → expected: {output}
On failure: {revert | retry | skip | escalate} — {specific instructions}
Checkpoint: git commit -m "{conventional commit message}"

Manifest:

manifest:
  expected_paths:
    - path/to/file.ts
  min_file_count: 1
  commit_message_pattern: "^feat\\(scope\\):"
  bash_syntax_check: []
  forbidden_paths: []
  must_contain: []

Step 2: {description}

Files: path/to/file.ts
Changes: {exactly what to modify}
Reuses: {existing function/pattern}
Test first:
- File: path/to/test.ts (existing | new)
- Verifies: {what the test checks}
- Pattern: path/to/existing-test.ts
Verify: {exact command} → expected: {output}
On failure: {revert | retry | skip | escalate} — {specific instructions}
Checkpoint: git commit -m "{conventional commit message}"

Manifest:

manifest:
  expected_paths:
    - path/to/file.ts
  min_file_count: 1
  commit_message_pattern: "^feat\\(scope\\):"
  bash_syntax_check: []
  forbidden_paths: []
  must_contain:
    - path: path/to/file.ts
      pattern: "expected content marker"

For projects without tests: omit "Test first" and keep "Verify" with a concrete command (e.g., run the app, check output, curl an endpoint).

Manifest — objective completion predicate

Every step MUST have a Manifest block. This is the machine-checkable contract that ultraexecute-local verifies after the Verify command passes. A step is not considered complete until its manifest verifies — regardless of Verify command exit code.

expected_paths — files that must exist after this step. Existing files must be present in repo; new files must be marked (new file) in prose.
min_file_count — minimum number of expected_paths that must exist. Typically equal to len(expected_paths).
commit_message_pattern — regex that MUST match the HEAD commit message after Checkpoint runs. Use escaped regex syntax (e.g., \\(scope\\)).
bash_syntax_check — list of .sh files that must pass bash -n. Auto-include any .sh in expected_paths.
forbidden_paths — files this step must NOT modify (defense-in-depth beyond Scope Fence).
must_contain — optional grep assertions: path + pattern pairs that must match in created/modified files.

Failure recovery rules

On failure: revert — undo this step's changes (git checkout -- {files}), do NOT proceed
On failure: retry — attempt once more with the alternative approach described, then revert if still failing
On failure: skip — this step is non-critical; continue to next step and note the skip
On failure: escalate — stop execution entirely; the issue requires human judgment
Checkpoint — after each step succeeds, commit changes so subsequent failures cannot corrupt completed work

Alternatives Considered

Approach	Pros	Cons	Why rejected
{name}	...	...	...

Test Strategy

Framework: {test framework and runner}
Existing patterns: {how tests are structured in this codebase}
New tests in this plan: {N} tests across {N} steps

Tests to write

Type	File	Verifies	Model test
Unit	`path/to/test`	{what it tests}	`path/to/existing-test`

For projects without tests: describe manual verification approach instead.

Risks and Mitigations

Priority	Risk	Location	Impact	Mitigation
{Critical/High/Medium/Low}	{description}	`file:line`	{what happens}	{how to handle}

Assumptions

Things the planner could not verify from codebase or research. Each assumption is a risk — review before executing.

#	Assumption	Why unverifiable	Impact if wrong
1	{what we assumed}	{why we couldn't check}	{what breaks}

If this list has 3+ items, the plan may need additional investigation before execution.

Verification

Per-step manifest verification runs automatically during execution (every step's Manifest block is objectively checked by ultraexecute-local before the step is marked passed). This section is for end-to-end integration checks that cross step boundaries — complete workflows, system-level behavior.

{exact command} → expected: {exact output or behavior}
{exact command} → expected: {exact output or behavior}

Estimated Scope

Files to modify: {N}
Files to create: {N}
Complexity: {low | medium | high}

Execution Strategy

Include this section when the plan has more than 5 implementation steps. Omit for small plans (≤ 5 steps) — ultraexecute will run them sequentially in a single session.

The execution strategy groups steps into sessions and organizes sessions into waves. Sessions in the same wave can run in parallel. Sessions in later waves depend on earlier waves completing first.

Session 1: {title}

Steps: {step numbers, e.g., 1, 2, 3}
Wave: {wave number}
Depends on: {session numbers, or "none"}
Scope fence:
- Touch: {files this session may modify}
- Never touch: {files reserved for other sessions}

Session 2: {title}

Steps: {step numbers}
Wave: {wave number}
Depends on: {session numbers, or "none"}
Scope fence:
- Touch: {files}
- Never touch: {files}

Execution Order

Wave 1: {session list} (parallel)
Wave 2: {session list} (after Wave 1)

Grouping rules applied

Steps sharing files → same session
Steps in independent modules → separate sessions (parallelizable)
3–5 steps per session (target)
Sessions ordered by dependency, waves by independence

Plan Quality Score

Dimension	Weight	Score	Notes
Structural integrity	0.15	{0–100}	{step ordering, dependencies}
Step quality	0.20	{0–100}	{granularity, specificity, TDD}
Coverage completeness	0.20	{0–100}	{spec → steps, no gaps}
Specification quality	0.15	{0–100}	{no placeholders, clear criteria}
Risk & pre-mortem	0.15	{0–100}	{failure modes addressed}
Headless readiness	0.10	{0–100}	{On failure + Checkpoint per step}
Manifest quality	0.05	{0–100}	{all steps have valid, checkable manifests}
Weighted total	1.00	{score}	Grade: {A/B/C/D}

Adversarial review:

Plan critic: {verdict — findings count by severity, key issues}
Scope guardian: {verdict — ALIGNED / CREEP / GAP / MIXED}

Revisions

Added by adversarial review. Omit if no revisions were needed.

#	Finding	Severity	Resolution
1	{what was wrong}	{blocker/major/minor}	{how it was fixed}

8.8 KiB Raw Blame History Unescape Escape