feat(ultraplan-local): v1.7.0 — self-verifying plan chain

Wave 1 of a 6-session parallel build revealed three failure modes:
(1) hallucinated completion (status=completed after 2/5 steps, last
tool call was an arbitrary file review), (2) fail-late bash (3/6
sessions had push blocked inside sub-agent sandbox after all work
was done), (3) no objective verification (plans were prose).

v1.7 closes all three by making the plan an executable contract.

Per-step YAML manifest (expected_paths, commit_message_pattern,
bash_syntax_check, forbidden_paths, must_contain) is the objective
completion predicate. Plan-critic dimension 10 (Manifest quality)
is a hard gate. Session decomposer propagates manifests verbatim
and emits an obligatory Step 0 pre-flight (git push --dry-run,
exit 77 sentinel) in every session spec.

ultraexecute-local gets Phase 7.5 (independent manifest audit from
git log + filesystem, ignoring agent bookkeeping) and Phase 7.6
(bounded recovery dispatch, recovery_depth ≤ 2). Hard Rule 17
forbids marking a step passed without manifest verification. Hard
Rule 18 forbids ending on an arbitrary tool call before reporting.

Division of labor is made explicit:
- /ultraresearch-local gathers context (no build decisions)
- /ultraplan-local produces an executable contract (manifests,
  plan-critic gate)
- /ultraexecute-local executes disciplined (does NOT compensate
  for weak plans — escalates)

Code complete. Docs partial (Arbeidsdeling table + manifest section
added to plugin + marketplace READMEs). Verification tests
(10-sequence) pending — see REMEMBER.md.

Backward compat: v1.6 plans without plan_version marker get
legacy mode with synthesized manifests and legacy_plan: true in
progress file. Plan-critic emits advisory, not block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Kjell Tore Guttormsen 2026-04-12 07:38:16 +02:00
commit d1befac35a
11 changed files with 651 additions and 27 deletions

View file

@ -87,10 +87,48 @@ Extract every `### Step N: {description}` heading (in order). For each step, ext
- **Verify** — command to run after implementation
- **On failure** — recovery action (revert/retry/skip/escalate)
- **Checkpoint** — git commit command after success
- **Manifest** — YAML block following Checkpoint (v1.7+)
If a step is missing `On failure`, default to `escalate` and record a parse warning.
If a step is missing `Verify`, record a parse warning.
### Parse the Manifest block (v1.7+)
Read the plan header for a `plan_version` marker. Treat ≥ `1.7` as strict
mode; absence or <1.7 as legacy mode.
**Strict mode (plan_version ≥ 1.7):**
- Every step MUST have a `Manifest:` block with a YAML fenced payload.
- Parse the YAML. Required keys: `expected_paths` (list), `min_file_count`
(integer), `commit_message_pattern` (string), `bash_syntax_check` (list),
`forbidden_paths` (list), `must_contain` (list of `{path, pattern}` dicts).
- Validate each `commit_message_pattern` compiles as a regex (use a tolerant
parser — do not execute untrusted input against shell).
- If any step is missing its Manifest, or YAML is malformed, STOP with:
```
Error: plan_version=1.7 but step {N} has invalid/missing Manifest.
Re-run planning-orchestrator — plan is not executable.
```
**Legacy mode (no plan_version, or < 1.7):**
- Synthesize a minimal manifest per step:
- `expected_paths` ← step's `Files:` list
- `min_file_count``len(expected_paths)`
- `commit_message_pattern` ← regex-escape first 3 words of Checkpoint msg
- `bash_syntax_check` ← auto-detect `.sh` in Files
- `forbidden_paths` ← []
- `must_contain` ← []
- Record `legacy_plan: true` in the progress file.
- Emit warning: `Legacy plan (v1.6 or earlier) — manifests synthesized with
reduced audit precision.`
### Parse Session Manifest (session specs only)
If the file is a session spec (v1.7+), parse the `## Session Manifest` block
into a YAML dict. Preserve `session_manifest.*` fields for Phase 7.5 audit.
If missing and `plan_version ≥ 1.7`, record a parse warning but continue —
the per-step manifests are still available for audit.
### Parse session spec fields (if applicable)
- **Entry condition** from `## Dependencies`
@ -721,7 +759,7 @@ Read the step's `Files:` and `Changes:` fields. Implement exactly as described.
- If `Reuses:` references existing code, read that code first for context
- Only touch files listed in `Files:` — nothing else
#### Sub-step D — Verification
#### Sub-step D1 — Command verification
**Security check (mandatory):** Before running the Verify command, check it against
the executor security denylist. If the command matches ANY of these patterns,
@ -763,7 +801,50 @@ Result: {PASS | FAIL} (exit code {N})
{if FAIL}: Output (first 10 lines): {output}
```
If **PASS**: proceed to Sub-step F (checkpoint).
**Step 0 sentinel:** if the Verify command exits `77` AND the step's manifest
has `sandbox_preflight: true`, do NOT treat as a normal failure. Set step
status = `blocked` and apply `On failure: escalate`. Skip D2 manifest
verification (nothing to verify for a read-only sandbox test). Commit none —
jump straight to Phase 7 with a structured "sandbox-blocked" reason.
If **PASS**: proceed to Sub-step D2 (manifest verification).
#### Sub-step D2 — Manifest verification
After the Verify command passes, verify the step's Manifest block. This is
the objective completion predicate: a step is not passed until its manifest
holds, regardless of Verify exit code.
**Checks to run (in order):**
1. **Expected paths exist:** for each `expected_paths` entry, verify the
file exists in the repo. Count how many exist.
2. **min_file_count satisfied:** count from (1) must be ≥ `min_file_count`.
3. **Forbidden paths untouched:** for each `forbidden_paths` entry, run
`git diff --name-only HEAD~{attempts} HEAD -- {path}` (since this step
began). Any modified forbidden path = fail.
4. **Bash syntax check:** for each entry in `bash_syntax_check` AND any
`.sh` file that appears in `git diff --name-only HEAD~1 HEAD` (safety
net for unlisted scripts), run `bash -n {path}`. Non-zero exit = fail.
5. **must_contain patterns:** for each `{path, pattern}` pair, run
`grep -E "{pattern}" {path}`. No match = fail.
**On manifest failure:**
```
Manifest verification FAIL for step {N}:
- {which check failed with detail}
```
Apply the step's `On failure:` clause (same as Sub-step D1 failure). Increment
attempts. Manifest failures count against the retry cap equally with command
failures.
If all manifest checks pass: proceed to Sub-step F (checkpoint).
Record per-step manifest audit result in progress file:
`steps.{N}.manifest_audit = "pass" | "fail"`,
`steps.{N}.manifest_drift = [{check: reason, ...}]` on fail.
#### Sub-step E — On failure handling
@ -816,8 +897,15 @@ The step's verification already passed — the commit is bookkeeping.
Step {N}: PASS (committed: {hash})
```
**Commit-message-pattern check:** after the commit succeeds, read the HEAD
commit message (`git log -1 --pretty=%s`) and match it against the step's
`commit_message_pattern`. Mismatch does NOT fail the step (verification
already passed) but is recorded as `checkpoint_drift: {expected_pattern,
actual_message}` in the progress file. Phase 7.5 audit reports drift as
an advisory.
Update progress: `steps.{N}.status = "passed"`, `steps.{N}.commit = {hash}`,
`steps.{N}.completed_at = now`.
`steps.{N}.completed_at = now`, `steps.{N}.manifest_audit = "pass"`.
### Step mode exit
@ -839,6 +927,96 @@ Exit condition check:
If all pass: `exit_condition_checked: true` in progress file.
If any fail: record which failed. Include in final report.
## Phase 7.5 — Manifest audit (independent)
**Runs for all modes except dry-run.** This is the last-line-of-defense
check: it ignores the executor's own per-step bookkeeping and re-verifies
session-wide state directly from the filesystem and git log. If the audit
disagrees with the executor's self-report, the audit wins.
This phase exists because agents can hallucinate completion. Phase 7.5
produces an independent truth based on objective state, not self-narrative.
**Steps:**
1. **Enumerate expected paths:** aggregate `expected_paths` across every
step manifest (and the session_manifest, if present). Deduplicate.
2. **Filesystem check:** for each expected_path, confirm the file exists.
3. **Commit-count check:** count commits since session-start
(`git rev-list --count {start_sha}..HEAD`). Expected count =
number of steps with `status=passed` (excludes Step 0 if sandbox_preflight).
4. **Commit-message pattern sweep:** walk `git log {start_sha}..HEAD` and
confirm each commit matches one of the declared
`commit_message_patterns` (in any order — order is advisory).
5. **Bash syntax sweep:** for every `.sh` file in
`git diff --name-only {start_sha} HEAD`, run `bash -n`. Collect failures.
6. **Forbidden-path sweep:** for each `scope_forbidden` entry, check
`git diff --name-only {start_sha} HEAD -- {path}` is empty.
Compare the audit result against the executor's `progress.status`:
- **Audit pass + progress=completed:** status stays `completed`.
- **Audit fail + progress=completed:** OVERRIDE to `partial`. The executor
believed it was done; the filesystem says otherwise. This is the Wave 1
hallucination case — the audit is the defense.
- **Audit fail + progress=failed/stopped:** status stays as-is; drift is
informational.
Record in progress file:
- `manifest_audit.status = "pass" | "drift"`
- `manifest_audit.drift_details = [{check, expected, actual}, ...]`
## Phase 7.6 — Recovery dispatch (multi-session parent context only)
**Preconditions:**
- This is the parent ultraexecute invocation (not a child `--session N`)
- Phase 7.5 reported `drift`
- `recovery_depth < 2` (hard cap to prevent infinite loops)
**Skip otherwise.** Recovery in child context or at depth 2+ escalates to
the user for manual resolution.
**Synthesize a recovery session spec:**
1. Determine the missing step numbers from `manifest_audit.drift_details`
(steps whose expected_paths are absent or whose commits are missing).
2. Read the original session spec. Copy only the missing steps (preserving
their Manifest blocks) into a new file:
`{output_dir}/session-{N}-recovery-{depth}.md`
3. Populate `## Recovery Metadata`:
- `recovery_of: {original session spec path}`
- `recovery_depth: {current depth + 1}`
- `missing_steps: [N, M, ...]`
- `entry_condition_override: "previous partial session committed at {sha}"`
- `parent_progress_file: {path}`
4. Prepend the synthetic Step 0 pre-flight (same as normal session specs).
5. Set `recovery_dispatched = true` in parent progress file.
**Invoke the recovery session:**
```bash
cd "$WORKTREE_PATH" && claude -p "/ultraexecute-local --session {N} {recovery spec path}" \
--allowedTools "Read,Write,Edit,Bash,Glob,Grep" \
--permission-mode bypassPermissions \
> "$LOG_DIR/session-{N}-recovery-{depth}.log" 2>&1
```
Wait for the recovery session to complete. After it returns, re-run Phase
7.5 audit one more time. If it still drifts at `recovery_depth=2`:
```
RECOVERY EXHAUSTED: session {N} drifted after 2 recovery attempts.
Missing: {steps and details}
Status: partial (recovery_depth=2, escalated to user)
```
Do NOT dispatch a third recovery. Report to the user.
## Phase 8 — Final report
Always produce a final report.
@ -855,11 +1033,20 @@ Update progress file: `status` to `completed`/`failed`/`stopped`, `updated_at`,
### Step Results
| Step | Description | Result | Attempts | Commit |
|------|-------------|--------|----------|--------|
| 1 | {desc} | PASS | 1 | abc1234 |
| 2 | {desc} | FAIL | 3 | — |
| 3 | {desc} | — | 0 | — |
| Step | Description | Result | Attempts | Commit | Manifest |
|------|-------------|--------|----------|--------|----------|
| 0 | Sandbox pre-flight | PASS | 1 | — | n/a |
| 1 | {desc} | PASS | 1 | abc1234 | pass |
| 2 | {desc} | FAIL | 3 | — | — |
| 3 | {desc} | — | 0 | — | — |
### Manifest Audit (Phase 7.5)
- **Status:** {pass | drift}
- **Drift details:** {enumerated; empty on pass}
- **Recovery dispatched:** {true | false}
- **Recovery depth:** {N}
- **Legacy plan:** {true | false}
### Summary
@ -867,6 +1054,7 @@ Update progress file: `status` to `completed`/`failed`/`stopped`, `updated_at`,
- Skipped: {N}
- Failed: {N}
- Not reached: {N}
- Blocked (sandbox): {N}
{if all passed + exit condition passed}:
All steps completed. Exit condition: PASS.
@ -886,6 +1074,14 @@ Attempts: {N}
To resume: /ultraexecute-local --resume {path}
```
**Result vocabulary (v1.7, strict):**
- `completed` — all steps passed AND Phase 7.5 manifest audit passed
- `partial` — steps passed per executor but Phase 7.5 found drift, OR
Phase 7.6 recovery incomplete
- `blocked` — Step 0 sandbox pre-flight exited 77; no real work attempted
- `failed` — a step failed and On failure was revert/retry (retry cap hit)
- `stopped` — On failure: escalate triggered
**JSON summary block** (always at the end, machine-parseable):
```json
@ -893,14 +1089,21 @@ To resume: /ultraexecute-local --resume {path}
"ultraexecute_summary": {
"plan": "{path}",
"plan_type": "{plan | session-spec}",
"result": "{completed | failed | stopped | partial}",
"plan_version": "{1.7 | 1.6 | legacy}",
"result": "{completed | partial | blocked | failed | stopped}",
"steps_total": 0,
"steps_passed": 0,
"steps_failed": 0,
"steps_skipped": 0,
"steps_not_reached": 0,
"steps_blocked": 0,
"failed_at_step": null,
"exit_condition": "{pass | fail | skipped | n/a}",
"manifest_audit": "{pass | drift | n/a}",
"drift_details": [],
"recovery_dispatched": false,
"recovery_depth": 0,
"legacy_plan": false,
"progress_file": "{path}"
}
}
@ -995,3 +1198,18 @@ Never let stats failures block the workflow.
(git hook injection), `~/.ssh/`, `~/.aws/`, `~/.gnupg/`, `.env` files,
shell configs (`~/.zshrc`, `~/.bashrc`, `~/.profile`), or
`.claude/settings.json` / `.claude/hooks/` (infrastructure self-modification).
17. **Manifest is the completion predicate.** A step may not be marked passed
if its manifest does not verify, regardless of the Verify command's exit
code. The manifest is the objective contract — Verify is necessary but
not sufficient. For v1.7+ plans: a Verify pass with a manifest fail sets
step result to `failed` and triggers the On-failure clause. For legacy
v1.6 plans: synthesized manifests apply with the same force, but
`legacy_plan: true` is logged in progress.
18. **Last-activity rule.** The executor's final tool call before writing
Phase 8 must be a manifest check (Phase 7.5 audit), never an arbitrary
file review. This prevents the "hallucinated completion" failure mode
where a transcript ends on an unrelated Read and the agent self-reports
`completed` without verifying. If Phase 7.5 has not run, the executor
may not emit `result: completed` under any circumstances.