feat(ultraplan-local): v1.7.0 — self-verifying plan chain

Wave 1 of a 6-session parallel build revealed three failure modes: (1) hallucinated completion (status=completed after 2/5 steps, last tool call was an arbitrary file review), (2) fail-late bash (3/6 sessions had push blocked inside sub-agent sandbox after all work was done), (3) no objective verification (plans were prose). v1.7 closes all three by making the plan an executable contract. Per-step YAML manifest (expected_paths, commit_message_pattern, bash_syntax_check, forbidden_paths, must_contain) is the objective completion predicate. Plan-critic dimension 10 (Manifest quality) is a hard gate. Session decomposer propagates manifests verbatim and emits an obligatory Step 0 pre-flight (git push --dry-run, exit 77 sentinel) in every session spec. ultraexecute-local gets Phase 7.5 (independent manifest audit from git log + filesystem, ignoring agent bookkeeping) and Phase 7.6 (bounded recovery dispatch, recovery_depth ≤ 2). Hard Rule 17 forbids marking a step passed without manifest verification. Hard Rule 18 forbids ending on an arbitrary tool call before reporting. Division of labor is made explicit: - /ultraresearch-local gathers context (no build decisions) - /ultraplan-local produces an executable contract (manifests, plan-critic gate) - /ultraexecute-local executes disciplined (does NOT compensate for weak plans — escalates) Code complete. Docs partial (Arbeidsdeling table + manifest section added to plugin + marketplace READMEs). Verification tests (10-sequence) pending — see REMEMBER.md. Backward compat: v1.6 plans without plan_version marker get legacy mode with synthesized manifests and legacy_plan: true in progress file. Plan-critic emits advisory, not block. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 07:38:16 +02:00 · 2026-04-12 07:38:16 +02:00 · d1befac35a
commit d1befac35a
parent 72f2e8f6c9
11 changed files with 651 additions and 27 deletions
--- a/plugins/ultraplan-local/agents/plan-critic.md
+++ b/plugins/ultraplan-local/agents/plan-critic.md
@ -113,6 +113,42 @@ Steps missing On failure or Checkpoint clauses are **major** findings
 (not blockers — the plan is still valid for interactive use, but it
 cannot be decomposed into headless sessions).

+### 10. Manifest quality (hard gate)
+
+Manifests are the objective completion predicate. ultraexecute-local uses
+them to determine whether a step is actually done — not just whether the
+Verify command returned 0. A plan without valid manifests cannot drive
+deterministic execution.
+
+Check plans with `plan_version: 1.7` (or later) against these rules:
+
+- Does EVERY step have a `Manifest:` block with YAML content?
+- Are `expected_paths` entries all either existing files OR explicitly marked
+  `(new file)` in the step's Changes prose?
+- Is `expected_paths` a subset of `Files:` (no orphan paths)?
+- Does `commit_message_pattern` compile as a valid regex? (check with a
+  mental regex-parse — e.g., unbalanced `(`, `[` is invalid)
+- Does the `commit_message_pattern` actually match the literal Checkpoint
+  commit message declared in the step?
+- Are all `bash_syntax_check` entries `.sh` files that appear in
+  `expected_paths` (not references to external scripts)?
+- Do `forbidden_paths` avoid overlap with `expected_paths` (contradiction)?
+- Does the step create shell scripts that are NOT listed in
+  `bash_syntax_check`? (minor finding — suggests incomplete manifest)
+
+**Severity:**
+- Missing Manifest block on any step → **major** (same tier as missing On failure)
+- Invalid regex in commit_message_pattern → **major**
+- Pattern doesn't match declared Checkpoint → **major**
+- `expected_paths` references non-existent path not marked new → **major**
+- `forbidden_paths` overlaps `expected_paths` → **blocker** (contradiction)
+- Missing bash_syntax_check for declared `.sh` files → **minor**
+
+**Backward compat:** For plans without `plan_version: 1.7` (legacy), emit
+a single advisory note ("Plan is v1.6 legacy format — manifests will be
+synthesized by ultraexecute with reduced audit precision") and skip this
+dimension's scoring.
+
 ## Rating system

 Rate each finding:
@ -131,10 +167,15 @@ After reviewing all findings, produce a quantitative score:
 | Coverage completeness | 0.20 | Spec-to-steps mapping, no gaps |
 | Specification quality | 0.15 | No placeholders, clear criteria |
 | Risk & pre-mortem | 0.15 | Failure modes addressed, mitigations realistic |
-| Headless readiness | 0.15 | On failure clauses, checkpoints, circuit breakers |
+| Headless readiness | 0.10 | On failure clauses, checkpoints, circuit breakers |
+| Manifest quality | 0.05 | Every step has a valid, checkable manifest (v1.7+) |

 Score each dimension 0–100, then compute the weighted total.

+**Weighting note (v1.7):** Headless readiness reduced 0.15→0.10, Manifest
+quality added at 0.05. Total still 1.00. For legacy v1.6 plans, Manifest
+quality is not scored and Headless readiness returns to 0.15.
+
 **Grade thresholds:**
 - **A** (90–100): APPROVE
 - **B** (75–89): APPROVE_WITH_NOTES
@ -166,7 +207,8 @@ Score each dimension 0–100, then compute the weighted total.
 | Coverage completeness | 0.20 | {0–100} | {assessment} |
 | Specification quality | 0.15 | {0–100} | {assessment} |
 | Risk & pre-mortem | 0.15 | {0–100} | {assessment} |
-| Headless readiness | 0.15 | {0–100} | {assessment} |
+| Headless readiness | 0.10 | {0–100} | {assessment} |
+| Manifest quality | 0.05 | {0–100} | {assessment — omit for legacy v1.6} |
 | **Weighted total** | **1.00** | **{score}** | **Grade: {A/B/C/D}** |

 ## Summary
--- a/plugins/ultraplan-local/agents/planning-orchestrator.md
+++ b/plugins/ultraplan-local/agents/planning-orchestrator.md
@ -186,6 +186,50 @@ Write a comprehensive implementation plan including:
 - Test Strategy (if test-strategist was used)
 - Verification (concrete commands), Estimated Scope

+**Plan-version header:** Include `plan_version: 1.7` in the metadata line below
+the title. This signals to ultraexecute-local that the plan includes per-step
+verification manifests and enables strict audit mode. Plans without this
+marker are treated as legacy v1.6 with synthesized minimal manifests.
+
+### Manifest generation rules (REQUIRED for every step)
+
+Every implementation step MUST include a `Manifest:` block as its last field,
+after Checkpoint. The manifest is the objective completion predicate — the
+machine-checkable contract that ultraexecute-local will verify after the
+Verify command passes. A step cannot be marked passed if its manifest does
+not verify.
+
+Derive the manifest fields mechanically from the step's other fields:
+
+- **expected_paths** ← copy the step's `Files:` list verbatim. Each path must
+  either exist in the repo OR be explicitly marked `(new file)` in the step's
+  Changes prose. Do not list paths that neither exist nor are declared new.
+- **min_file_count** ← default to `len(expected_paths)`. Lower only when the
+  step explicitly allows partial creation (rare).
+- **commit_message_pattern** ← regex-escape the fixed parts of the Checkpoint
+  commit message. Preserve Conventional Commit structure. Example:
+  Checkpoint `git commit -m "feat(auth): add JWT middleware"` →
+  pattern `"^feat\\(auth\\):"`. The pattern must compile as a valid regex and
+  must match the declared Checkpoint message.
+- **bash_syntax_check** ← auto-include every `.sh` file appearing in
+  expected_paths. Add other shell scripts the step creates transitively.
+- **forbidden_paths** ← populate from the Execution Strategy's "Never touch"
+  scope-fence for this step's session (when present). Defense-in-depth.
+- **must_contain** ← optional. Add `path + pattern` pairs when the step must
+  produce specific markers in a file (e.g., a new config section, a required
+  export, a migration boundary).
+
+**Validation before writing plan:**
+1. Every `expected_paths` entry is either verifiable (file exists) or marked
+   `(new file)` in prose.
+2. Every `commit_message_pattern` compiles as a regex and matches the declared
+   Checkpoint message when applied to it.
+3. Every `bash_syntax_check` entry has a `.sh` suffix and appears in
+   `expected_paths`.
+4. No `forbidden_paths` overlaps with `expected_paths` (contradiction).
+
+If any validation fails, fix the plan before handing to Phase 6 review.
+
 ### Failure recovery (REQUIRED for every step)

 Each implementation step MUST include:
@ -237,12 +281,19 @@ Create directories if needed.
 Launch two review agents **in parallel**:

 - `plan-critic` — find missing steps, wrong ordering, fragile assumptions,
-  missing error handling, scope creep, underspecified steps
+  missing error handling, scope creep, underspecified steps, AND manifest
+  quality (dimension 10: every step has a valid, regex-compilable,
+  path-verified manifest). Missing or invalid manifest = **major** finding.
 - `scope-guardian` — verify plan matches spec requirements, find scope
  creep and scope gaps, validate file/function references

 After both complete:
 - Address all blockers and major issues by revising the plan
+- **Manifest quality is a hard gate:** any manifest-related `major` finding
+  must be fixed before the plan can be handed off. This enforces the
+  principle that ultraexecute-local relies on the plan being
+  machine-checkable — a plan without verifiable manifests cannot drive
+  deterministic execution.
 - Add a "Revisions" note at the bottom documenting changes

 ### Phase 7 — Completion
--- a/plugins/ultraplan-local/agents/session-decomposer.md
+++ b/plugins/ultraplan-local/agents/session-decomposer.md
@ -50,9 +50,23 @@ Extract from the plan:
 3. Per-step dependencies (explicit or implicit from step ordering)
 4. Per-step verification commands
 5. Per-step failure recovery (if present)
-6. The overall verification section
-7. Context and codebase analysis sections
-8. Check for an existing `## Execution Strategy` section
+6. **Per-step verification manifest (v1.7+)** — the `Manifest:` YAML block
+   following Checkpoint. Parse it as YAML. Preserve all fields:
+   `expected_paths`, `min_file_count`, `commit_message_pattern`,
+   `bash_syntax_check`, `forbidden_paths`, `must_contain`.
+7. The overall verification section
+8. Context and codebase analysis sections
+9. The `plan_version` marker (if present in the header line)
+10. Check for an existing `## Execution Strategy` section
+
+**Manifest handling:**
+- If `plan_version: 1.7` or later AND any step is missing a Manifest block:
+  STOP with error "Plan claims v1.7 but step N lacks Manifest. Re-run
+  planning-orchestrator." Do not attempt to synthesize.
+- If no `plan_version` marker is present: treat as legacy v1.6. Synthesize
+  minimal manifests from `Files:` (expected_paths) and the Checkpoint commit
+  message (commit_message_pattern escaped). Mark output session specs with
+  `legacy_synthesis: true` in their Session Manifest.

 **If an Execution Strategy already exists:**
 - Log: "Existing Execution Strategy detected — using as primary input."
@ -135,6 +149,59 @@ For each session, write a spec file to the output directory:
 5. **Failure handling** — what to do on failure (copied from plan's On failure fields,
   or default to "stop and report")
 6. **Handoff state** — what this session produces that other sessions need
+7. **Per-step Manifest blocks** — copy each plan step's Manifest YAML verbatim
+   into the corresponding session-spec step. Do NOT edit or summarize.
+8. **Session Manifest aggregate** — synthesize a top-level `## Session Manifest`
+   block aggregating all per-step manifests in the session:
+   - `expected_paths`: union of all steps' expected_paths (deduplicated)
+   - `commit_count`: number of implementation steps in this session (excludes Step 0)
+   - `commit_message_patterns`: list of per-step patterns, in step order
+   - `bash_syntax_check`: union of all steps' bash_syntax_check
+   - `scope_touch`: from Scope Fence Touch (already present)
+   - `scope_forbidden`: from Scope Fence Never Touch + union of step
+     forbidden_paths
+   - `plan_version`: from the source plan
+   - `legacy_synthesis`: true/false based on Step 1's handling
+
+### Step 5.5 — Emit obligatory Step 0 pre-flight
+
+Every generated session spec MUST begin its `## Steps` list with a synthetic
+**Step 0: Sandbox pre-flight** that validates the subagent bash sandbox can
+reach the remote before any real work is done. This catches the fail-late
+push-denial observed in Wave 1 (3/6 sessions all lost their pushes at the
+very end).
+
+The Step 0 block to prepend verbatim:
+
+```markdown
+### Step 0: Sandbox pre-flight (auto-generated — do not modify)
+
+- **Files:** none (read-only test)
+- **Changes:** verify git push permissions are available in this sandbox
+- **Verify:**
+  ```
+  git push --dry-run origin HEAD 2>&1 | tee /tmp/push-dryrun-$$.log; grep -qE "(rejected|error|denied|forbidden|permission)" /tmp/push-dryrun-$$.log && exit 77 || true
+  ```
+  → expected: non-77 exit code
+- **On failure:** `escalate` — exit code 77 means this sandbox cannot push.
+  Abort immediately; do not attempt any work. Main orchestrator will
+  re-spawn with correct permissions.
+- **Checkpoint:** none (no file changes)
+- **Manifest:**
+  ```yaml
+  manifest:
+    expected_paths: []
+    min_file_count: 0
+    commit_message_pattern: ""
+    bash_syntax_check: []
+    forbidden_paths: []
+    must_contain: []
+    sandbox_preflight: true
+  ```
+```
+
+Do NOT skip Step 0 for any session. It is the only early-detection mechanism
+for sandbox-blocked bash.

 ### Step 6 — Generate the dependency diagram