feat(ultraplan-local): v1.7.0 — self-verifying plan chain

Wave 1 of a 6-session parallel build revealed three failure modes: (1) hallucinated completion (status=completed after 2/5 steps, last tool call was an arbitrary file review), (2) fail-late bash (3/6 sessions had push blocked inside sub-agent sandbox after all work was done), (3) no objective verification (plans were prose). v1.7 closes all three by making the plan an executable contract. Per-step YAML manifest (expected_paths, commit_message_pattern, bash_syntax_check, forbidden_paths, must_contain) is the objective completion predicate. Plan-critic dimension 10 (Manifest quality) is a hard gate. Session decomposer propagates manifests verbatim and emits an obligatory Step 0 pre-flight (git push --dry-run, exit 77 sentinel) in every session spec. ultraexecute-local gets Phase 7.5 (independent manifest audit from git log + filesystem, ignoring agent bookkeeping) and Phase 7.6 (bounded recovery dispatch, recovery_depth ≤ 2). Hard Rule 17 forbids marking a step passed without manifest verification. Hard Rule 18 forbids ending on an arbitrary tool call before reporting. Division of labor is made explicit: - /ultraresearch-local gathers context (no build decisions) - /ultraplan-local produces an executable contract (manifests, plan-critic gate) - /ultraexecute-local executes disciplined (does NOT compensate for weak plans — escalates) Code complete. Docs partial (Arbeidsdeling table + manifest section added to plugin + marketplace READMEs). Verification tests (10-sequence) pending — see REMEMBER.md. Backward compat: v1.6 plans without plan_version marker get legacy mode with synthesized manifests and legacy_plan: true in progress file. Plan-critic emits advisory, not block. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 07:38:16 +02:00 · 2026-04-12 07:38:16 +02:00 · d1befac35a
commit d1befac35a
parent 72f2e8f6c9
11 changed files with 651 additions and 27 deletions
--- a/plugins/ultraplan-local/commands/ultraexecute-local.md
+++ b/plugins/ultraplan-local/commands/ultraexecute-local.md
@ -87,10 +87,48 @@ Extract every `### Step N: {description}` heading (in order). For each step, ext
 - **Verify** — command to run after implementation
 - **On failure** — recovery action (revert/retry/skip/escalate)
 - **Checkpoint** — git commit command after success
+- **Manifest** — YAML block following Checkpoint (v1.7+)

 If a step is missing `On failure`, default to `escalate` and record a parse warning.
 If a step is missing `Verify`, record a parse warning.

+### Parse the Manifest block (v1.7+)
+
+Read the plan header for a `plan_version` marker. Treat ≥ `1.7` as strict
+mode; absence or <1.7 as legacy mode.
+
+**Strict mode (plan_version ≥ 1.7):**
+- Every step MUST have a `Manifest:` block with a YAML fenced payload.
+- Parse the YAML. Required keys: `expected_paths` (list), `min_file_count`
+  (integer), `commit_message_pattern` (string), `bash_syntax_check` (list),
+  `forbidden_paths` (list), `must_contain` (list of `{path, pattern}` dicts).
+- Validate each `commit_message_pattern` compiles as a regex (use a tolerant
+  parser — do not execute untrusted input against shell).
+- If any step is missing its Manifest, or YAML is malformed, STOP with:
+  ```
+  Error: plan_version=1.7 but step {N} has invalid/missing Manifest.
+  Re-run planning-orchestrator — plan is not executable.
+  ```
+
+**Legacy mode (no plan_version, or < 1.7):**
+- Synthesize a minimal manifest per step:
+  - `expected_paths` ← step's `Files:` list
+  - `min_file_count` ← `len(expected_paths)`
+  - `commit_message_pattern` ← regex-escape first 3 words of Checkpoint msg
+  - `bash_syntax_check` ← auto-detect `.sh` in Files
+  - `forbidden_paths` ← []
+  - `must_contain` ← []
+- Record `legacy_plan: true` in the progress file.
+- Emit warning: `Legacy plan (v1.6 or earlier) — manifests synthesized with
+  reduced audit precision.`
+
+### Parse Session Manifest (session specs only)
+
+If the file is a session spec (v1.7+), parse the `## Session Manifest` block
+into a YAML dict. Preserve `session_manifest.*` fields for Phase 7.5 audit.
+If missing and `plan_version ≥ 1.7`, record a parse warning but continue —
+the per-step manifests are still available for audit.
+
 ### Parse session spec fields (if applicable)

 - **Entry condition** from `## Dependencies`
@ -721,7 +759,7 @@ Read the step's `Files:` and `Changes:` fields. Implement exactly as described.
 - If `Reuses:` references existing code, read that code first for context
 - Only touch files listed in `Files:` — nothing else

-#### Sub-step D — Verification
+#### Sub-step D1 — Command verification

 **Security check (mandatory):** Before running the Verify command, check it against
 the executor security denylist. If the command matches ANY of these patterns,
@ -763,7 +801,50 @@ Result: {PASS | FAIL} (exit code {N})
 {if FAIL}: Output (first 10 lines): {output}
 ```

-If **PASS**: proceed to Sub-step F (checkpoint).
+**Step 0 sentinel:** if the Verify command exits `77` AND the step's manifest
+has `sandbox_preflight: true`, do NOT treat as a normal failure. Set step
+status = `blocked` and apply `On failure: escalate`. Skip D2 manifest
+verification (nothing to verify for a read-only sandbox test). Commit none —
+jump straight to Phase 7 with a structured "sandbox-blocked" reason.
+
+If **PASS**: proceed to Sub-step D2 (manifest verification).
+
+#### Sub-step D2 — Manifest verification
+
+After the Verify command passes, verify the step's Manifest block. This is
+the objective completion predicate: a step is not passed until its manifest
+holds, regardless of Verify exit code.
+
+**Checks to run (in order):**
+
+1. **Expected paths exist:** for each `expected_paths` entry, verify the
+   file exists in the repo. Count how many exist.
+2. **min_file_count satisfied:** count from (1) must be ≥ `min_file_count`.
+3. **Forbidden paths untouched:** for each `forbidden_paths` entry, run
+   `git diff --name-only HEAD~{attempts} HEAD -- {path}` (since this step
+   began). Any modified forbidden path = fail.
+4. **Bash syntax check:** for each entry in `bash_syntax_check` AND any
+   `.sh` file that appears in `git diff --name-only HEAD~1 HEAD` (safety
+   net for unlisted scripts), run `bash -n {path}`. Non-zero exit = fail.
+5. **must_contain patterns:** for each `{path, pattern}` pair, run
+   `grep -E "{pattern}" {path}`. No match = fail.
+
+**On manifest failure:**
+
+```
+Manifest verification FAIL for step {N}:
+- {which check failed with detail}
+```
+
+Apply the step's `On failure:` clause (same as Sub-step D1 failure). Increment
+attempts. Manifest failures count against the retry cap equally with command
+failures.
+
+If all manifest checks pass: proceed to Sub-step F (checkpoint).
+
+Record per-step manifest audit result in progress file:
+`steps.{N}.manifest_audit = "pass" | "fail"`,
+`steps.{N}.manifest_drift = [{check: reason, ...}]` on fail.

 #### Sub-step E — On failure handling

@ -816,8 +897,15 @@ The step's verification already passed — the commit is bookkeeping.
 Step {N}: PASS (committed: {hash})
 ```

+**Commit-message-pattern check:** after the commit succeeds, read the HEAD
+commit message (`git log -1 --pretty=%s`) and match it against the step's
+`commit_message_pattern`. Mismatch does NOT fail the step (verification
+already passed) but is recorded as `checkpoint_drift: {expected_pattern,
+actual_message}` in the progress file. Phase 7.5 audit reports drift as
+an advisory.
+
 Update progress: `steps.{N}.status = "passed"`, `steps.{N}.commit = {hash}`,
-`steps.{N}.completed_at = now`.
+`steps.{N}.completed_at = now`, `steps.{N}.manifest_audit = "pass"`.

 ### Step mode exit

@ -839,6 +927,96 @@ Exit condition check:
 If all pass: `exit_condition_checked: true` in progress file.
 If any fail: record which failed. Include in final report.

+## Phase 7.5 — Manifest audit (independent)
+
+**Runs for all modes except dry-run.** This is the last-line-of-defense
+check: it ignores the executor's own per-step bookkeeping and re-verifies
+session-wide state directly from the filesystem and git log. If the audit
+disagrees with the executor's self-report, the audit wins.
+
+This phase exists because agents can hallucinate completion. Phase 7.5
+produces an independent truth based on objective state, not self-narrative.
+
+**Steps:**
+
+1. **Enumerate expected paths:** aggregate `expected_paths` across every
+   step manifest (and the session_manifest, if present). Deduplicate.
+
+2. **Filesystem check:** for each expected_path, confirm the file exists.
+
+3. **Commit-count check:** count commits since session-start
+   (`git rev-list --count {start_sha}..HEAD`). Expected count =
+   number of steps with `status=passed` (excludes Step 0 if sandbox_preflight).
+
+4. **Commit-message pattern sweep:** walk `git log {start_sha}..HEAD` and
+   confirm each commit matches one of the declared
+   `commit_message_patterns` (in any order — order is advisory).
+
+5. **Bash syntax sweep:** for every `.sh` file in
+   `git diff --name-only {start_sha} HEAD`, run `bash -n`. Collect failures.
+
+6. **Forbidden-path sweep:** for each `scope_forbidden` entry, check
+   `git diff --name-only {start_sha} HEAD -- {path}` is empty.
+
+Compare the audit result against the executor's `progress.status`:
+
+- **Audit pass + progress=completed:** status stays `completed`.
+- **Audit fail + progress=completed:** OVERRIDE to `partial`. The executor
+  believed it was done; the filesystem says otherwise. This is the Wave 1
+  hallucination case — the audit is the defense.
+- **Audit fail + progress=failed/stopped:** status stays as-is; drift is
+  informational.
+
+Record in progress file:
+- `manifest_audit.status = "pass" | "drift"`
+- `manifest_audit.drift_details = [{check, expected, actual}, ...]`
+
+## Phase 7.6 — Recovery dispatch (multi-session parent context only)
+
+**Preconditions:**
+- This is the parent ultraexecute invocation (not a child `--session N`)
+- Phase 7.5 reported `drift`
+- `recovery_depth < 2` (hard cap to prevent infinite loops)
+
+**Skip otherwise.** Recovery in child context or at depth 2+ escalates to
+the user for manual resolution.
+
+**Synthesize a recovery session spec:**
+
+1. Determine the missing step numbers from `manifest_audit.drift_details`
+   (steps whose expected_paths are absent or whose commits are missing).
+2. Read the original session spec. Copy only the missing steps (preserving
+   their Manifest blocks) into a new file:
+   `{output_dir}/session-{N}-recovery-{depth}.md`
+3. Populate `## Recovery Metadata`:
+   - `recovery_of: {original session spec path}`
+   - `recovery_depth: {current depth + 1}`
+   - `missing_steps: [N, M, ...]`
+   - `entry_condition_override: "previous partial session committed at {sha}"`
+   - `parent_progress_file: {path}`
+4. Prepend the synthetic Step 0 pre-flight (same as normal session specs).
+5. Set `recovery_dispatched = true` in parent progress file.
+
+**Invoke the recovery session:**
+
+```bash
+cd "$WORKTREE_PATH" && claude -p "/ultraexecute-local --session {N} {recovery spec path}" \
+  --allowedTools "Read,Write,Edit,Bash,Glob,Grep" \
+  --permission-mode bypassPermissions \
+  > "$LOG_DIR/session-{N}-recovery-{depth}.log" 2>&1
+```
+
+Wait for the recovery session to complete. After it returns, re-run Phase
+7.5 audit one more time. If it still drifts at `recovery_depth=2`:
+
+```
+RECOVERY EXHAUSTED: session {N} drifted after 2 recovery attempts.
+Missing: {steps and details}
+Status: partial (recovery_depth=2, escalated to user)
+```
+
+Do NOT dispatch a third recovery. Report to the user.
+
 ## Phase 8 — Final report

 Always produce a final report.
@ -855,11 +1033,20 @@ Update progress file: `status` to `completed`/`failed`/`stopped`, `updated_at`,

 ### Step Results

-| Step | Description | Result | Attempts | Commit |
-|------|-------------|--------|----------|--------|
-| 1 | {desc} | PASS | 1 | abc1234 |
-| 2 | {desc} | FAIL | 3 | — |
-| 3 | {desc} | — | 0 | — |
+| Step | Description | Result | Attempts | Commit | Manifest |
+|------|-------------|--------|----------|--------|----------|
+| 0 | Sandbox pre-flight | PASS | 1 | — | n/a |
+| 1 | {desc} | PASS | 1 | abc1234 | pass |
+| 2 | {desc} | FAIL | 3 | — | — |
+| 3 | {desc} | — | 0 | — | — |
+
+### Manifest Audit (Phase 7.5)
+
+- **Status:** {pass | drift}
+- **Drift details:** {enumerated; empty on pass}
+- **Recovery dispatched:** {true | false}
+- **Recovery depth:** {N}
+- **Legacy plan:** {true | false}

 ### Summary

@ -867,6 +1054,7 @@ Update progress file: `status` to `completed`/`failed`/`stopped`, `updated_at`,
 - Skipped: {N}
 - Failed: {N}
 - Not reached: {N}
+- Blocked (sandbox): {N}

 {if all passed + exit condition passed}:
 All steps completed. Exit condition: PASS.
@ -886,6 +1074,14 @@ Attempts: {N}
 To resume: /ultraexecute-local --resume {path}
 ```

+**Result vocabulary (v1.7, strict):**
+- `completed` — all steps passed AND Phase 7.5 manifest audit passed
+- `partial` — steps passed per executor but Phase 7.5 found drift, OR
+  Phase 7.6 recovery incomplete
+- `blocked` — Step 0 sandbox pre-flight exited 77; no real work attempted
+- `failed` — a step failed and On failure was revert/retry (retry cap hit)
+- `stopped` — On failure: escalate triggered
+
 **JSON summary block** (always at the end, machine-parseable):

 ```json
@ -893,14 +1089,21 @@ To resume: /ultraexecute-local --resume {path}
  "ultraexecute_summary": {
    "plan": "{path}",
    "plan_type": "{plan | session-spec}",
-    "result": "{completed | failed | stopped | partial}",
+    "plan_version": "{1.7 | 1.6 | legacy}",
+    "result": "{completed | partial | blocked | failed | stopped}",
    "steps_total": 0,
    "steps_passed": 0,
    "steps_failed": 0,
    "steps_skipped": 0,
    "steps_not_reached": 0,
+    "steps_blocked": 0,
    "failed_at_step": null,
    "exit_condition": "{pass | fail | skipped | n/a}",
+    "manifest_audit": "{pass | drift | n/a}",
+    "drift_details": [],
+    "recovery_dispatched": false,
+    "recovery_depth": 0,
+    "legacy_plan": false,
    "progress_file": "{path}"
  }
 }
@ -995,3 +1198,18 @@ Never let stats failures block the workflow.
    (git hook injection), `~/.ssh/`, `~/.aws/`, `~/.gnupg/`, `.env` files,
    shell configs (`~/.zshrc`, `~/.bashrc`, `~/.profile`), or
    `.claude/settings.json` / `.claude/hooks/` (infrastructure self-modification).
+
+17. **Manifest is the completion predicate.** A step may not be marked passed
+    if its manifest does not verify, regardless of the Verify command's exit
+    code. The manifest is the objective contract — Verify is necessary but
+    not sufficient. For v1.7+ plans: a Verify pass with a manifest fail sets
+    step result to `failed` and triggers the On-failure clause. For legacy
+    v1.6 plans: synthesized manifests apply with the same force, but
+    `legacy_plan: true` is logged in progress.
+
+18. **Last-activity rule.** The executor's final tool call before writing
+    Phase 8 must be a manifest check (Phase 7.5 audit), never an arbitrary
+    file review. This prevents the "hallucinated completion" failure mode
+    where a transcript ends on an unrelated Read and the agent self-reports
+    `completed` without verifying. If Phase 7.5 has not run, the executor
+    may not emit `result: completed` under any circumstances.