feat(ultraplan-local): v1.7.0 — self-verifying plan chain

Wave 1 of a 6-session parallel build revealed three failure modes:
(1) hallucinated completion (status=completed after 2/5 steps, last
tool call was an arbitrary file review), (2) fail-late bash (3/6
sessions had push blocked inside sub-agent sandbox after all work
was done), (3) no objective verification (plans were prose).

v1.7 closes all three by making the plan an executable contract.

Per-step YAML manifest (expected_paths, commit_message_pattern,
bash_syntax_check, forbidden_paths, must_contain) is the objective
completion predicate. Plan-critic dimension 10 (Manifest quality)
is a hard gate. Session decomposer propagates manifests verbatim
and emits an obligatory Step 0 pre-flight (git push --dry-run,
exit 77 sentinel) in every session spec.

ultraexecute-local gets Phase 7.5 (independent manifest audit from
git log + filesystem, ignoring agent bookkeeping) and Phase 7.6
(bounded recovery dispatch, recovery_depth ≤ 2). Hard Rule 17
forbids marking a step passed without manifest verification. Hard
Rule 18 forbids ending on an arbitrary tool call before reporting.

Division of labor is made explicit:
- /ultraresearch-local gathers context (no build decisions)
- /ultraplan-local produces an executable contract (manifests,
  plan-critic gate)
- /ultraexecute-local executes disciplined (does NOT compensate
  for weak plans — escalates)

Code complete. Docs partial (Arbeidsdeling table + manifest section
added to plugin + marketplace READMEs). Verification tests
(10-sequence) pending — see REMEMBER.md.

Backward compat: v1.6 plans without plan_version marker get
legacy mode with synthesized manifests and legacy_plan: true in
progress file. Plan-critic emits advisory, not block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Kjell Tore Guttormsen 2026-04-12 07:38:16 +02:00
commit d1befac35a
11 changed files with 651 additions and 27 deletions

View file

@ -1,7 +1,7 @@
{
"name": "ultraplan-local",
"description": "Deep implementation planning and research with interview, specialized agent swarms, external research, triangulation, adversarial review, session decomposition, and headless execution support.",
"version": "1.6.0",
"version": "1.7.0",
"author": {
"name": "Kjell Tore Guttormsen"
},

View file

@ -4,6 +4,90 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [1.7.0] - 2026-04-12
### The self-verifying plan chain
Wave 1 of a parallel 6-session build revealed three failure modes: (1) a
session reported `status=completed` after only 2/5 steps — last tool call
was an arbitrary file review, not a completion check; (2) 3/6 sessions
had push blocked inside the sub-agent bash sandbox *after* all work was
done; (3) plans and blueprints were prose, so the orchestrator had no
machine-readable way to verify completion. v1.7.0 closes all three by
making the plan itself an executable contract.
### Added
- **Per-step verification manifest** in plan format (`plan_version: 1.7`).
Every step now ends with a YAML `manifest:` block declaring
`expected_paths`, `min_file_count`, `commit_message_pattern`,
`bash_syntax_check`, `forbidden_paths`, `must_contain`. The manifest is
the objective completion predicate — the Verify command is necessary but
not sufficient.
- **Plan-critic dimension 10 — Manifest quality (hard gate).** Missing
or invalid manifest (unparseable regex, path contradiction, missing
block) is a `major` finding. v1.6 plans get a legacy-mode warning
instead of a block.
- **Session Manifest aggregate** in session specs — synthesized by
`session-decomposer` as the union of per-step manifests. Gives
`ultraexecute-local` a single YAML block per session to audit against.
- **Step 0: Sandbox pre-flight** — obligatory first step in every
generated session spec. Runs `git push --dry-run origin HEAD`; exit 77
= sandbox cannot push, session status becomes `blocked` (not `failed`),
no real work attempted. Escape hatch: `ULTRAEXECUTE_SKIP_PREFLIGHT=1`.
- **Launch script pre-flight**`headless-launch-template.md` adds a
`git push --dry-run` check outside the sandbox, before any session
spawns, catching credential issues at the earliest possible point.
- **Phase 7.5 — Manifest audit (independent).** After all steps complete,
`ultraexecute-local` re-verifies expected paths, commit count, commit
message patterns, bash syntax, and forbidden-path untouched-ness from
git log and filesystem. Agent's own bookkeeping is ignored. Disagreement
with progress file → status overridden to `partial`.
- **Phase 7.6 — Recovery dispatch (bounded).** When Phase 7.5 detects
drift in multi-session parent context, synthesize a temp session spec
containing only missing steps and dispatch via existing
`claude -p "/ultraexecute-local --session N"`. `recovery_depth ≤ 2`
hard cap — third drift escalates to user.
- **Hard Rule 17: Manifest is the completion predicate.** A step may
not be marked passed if its manifest does not verify, regardless of
Verify's exit code.
- **Hard Rule 18: Last-activity rule.** Executor's final tool call
before Phase 8 must be a manifest check, never an arbitrary file
review. Prevents hallucinated completion.
### Changed
- **Plan template (`templates/plan-template.md`)** — adds
`plan_version: 1.7` metadata line, `Manifest:` field on every step,
"Manifest — objective completion predicate" section.
- **Plan-critic scoring** rebalanced: Headless readiness 0.15 → 0.10,
Manifest quality 0.05 added. Legacy v1.6 plans skip the Manifest
dimension and keep Headless readiness at 0.15.
- **Planning-orchestrator Phase 5** adds "Manifest generation rules
(REQUIRED for every step)" with mechanical derivation from `Files:`
and Checkpoint. Validates regex compilation and path existence before
handoff to plan-critic.
- **Session-decomposer** parses plan manifests and propagates them
verbatim into session specs. For v1.7+ plans with missing manifests:
abort with pointer to failing step. For legacy v1.6 plans: synthesize
minimal manifests and flag `legacy_synthesis: true`.
- **ultraexecute-local Phase 2** parses manifest YAML. Ugyldig YAML =
abort with pointer to step. v<1.7 plans: synthesize + log
`legacy_plan: true`.
- **ultraexecute-local Phase 6** — sub-step D renamed to D1 "Command
verification"; new D2 "Manifest verification" runs after D1 with 5
checks. F "Checkpoint" adds `checkpoint_drift` logging when HEAD
message doesn't match `commit_message_pattern` (non-fatal).
- **Phase 8 report** — table gets Manifest column; JSON summary adds
`plan_version`, `manifest_audit`, `drift_details`, `recovery_dispatched`,
`recovery_depth`, `legacy_plan`. Result vocabulary strict:
`completed | partial | blocked | failed | stopped`.
- **Division of labor clarified** in README — `/ultraresearch-local`
gathers context (no decisions), `/ultraplan-local` transforms intent
into an executable contract (manifests, plan-critic gate),
`/ultraexecute-local` executes the contract disciplined (does NOT
compensate for weak plans — escalates).
## [1.6.0] - 2026-04-08
### Added

View file

@ -1,6 +1,6 @@
# ultraplan-local — Plan, Research, Execute
![Version](https://img.shields.io/badge/version-1.6.0-blue)
![Version](https://img.shields.io/badge/version-1.7.0-blue)
![License](https://img.shields.io/badge/license-MIT-green)
![Platform](https://img.shields.io/badge/platform-Claude%20Code-purple)
@ -14,6 +14,22 @@ A [Claude Code](https://docs.anthropic.com/en/docs/claude-code) plugin for deep
Research first, then plan, then execute. Or skip research and plan directly. Or plan and execute in one flow. The plan is the contract between planning and execution. The research brief is the contract between research and planning.
### Division of labor
| Command | Responsibility | Output |
|---|---|---|
| `/ultraresearch-local` | **Gather context** — code state, external docs, community, risk. Makes NO build decisions. | Research brief (markdown) |
| `/ultraplan-local` | **Transform intent into an executable contract** — per-step YAML manifest, regex-validated checkpoints, verifiable paths. Plan-critic is a hard gate. | `plan.md` with Manifest blocks + `plan_version: 1.7` |
| `/ultraexecute-local` | **Execute the contract disciplined** — fresh verification, independent manifest audit, honest reporting. Does NOT compensate for weak plans — escalates. | Progress file + structured report + manifest-audit status |
**Principle:** Execute trusts the plan. If execute has to guess or interpret, the plan is weak and must be revised upstream — not patched downstream.
### Self-verifying plan chain (v1.7)
Every step in a v1.7 plan ends with a YAML `manifest:` block declaring `expected_paths`, `commit_message_pattern`, `bash_syntax_check`, `forbidden_paths`, `must_contain`. This makes the plan the **objective completion predicate**: a step may not be marked passed if its manifest does not verify, regardless of the Verify command's exit code (Hard Rule 17).
After all steps complete, `ultraexecute-local` runs **Phase 7.5 — Manifest audit (independent)**: re-verifies every expected path from git log + filesystem, ignoring the agent's own bookkeeping. Drift → status `partial`, **Phase 7.6** auto-dispatches a bounded recovery session with only the missing steps (`recovery_depth ≤ 2`). Step 0 pre-flight (`git push --dry-run`) runs inside every session sandbox before any real work — exit 77 sentinel catches sandbox push-denial before the agent wastes the whole budget.
No cloud dependency. No GitHub requirement. Works on **Mac, Linux, and Windows**.
## Quick start

View file

@ -113,6 +113,42 @@ Steps missing On failure or Checkpoint clauses are **major** findings
(not blockers — the plan is still valid for interactive use, but it
cannot be decomposed into headless sessions).
### 10. Manifest quality (hard gate)
Manifests are the objective completion predicate. ultraexecute-local uses
them to determine whether a step is actually done — not just whether the
Verify command returned 0. A plan without valid manifests cannot drive
deterministic execution.
Check plans with `plan_version: 1.7` (or later) against these rules:
- Does EVERY step have a `Manifest:` block with YAML content?
- Are `expected_paths` entries all either existing files OR explicitly marked
`(new file)` in the step's Changes prose?
- Is `expected_paths` a subset of `Files:` (no orphan paths)?
- Does `commit_message_pattern` compile as a valid regex? (check with a
mental regex-parse — e.g., unbalanced `(`, `[` is invalid)
- Does the `commit_message_pattern` actually match the literal Checkpoint
commit message declared in the step?
- Are all `bash_syntax_check` entries `.sh` files that appear in
`expected_paths` (not references to external scripts)?
- Do `forbidden_paths` avoid overlap with `expected_paths` (contradiction)?
- Does the step create shell scripts that are NOT listed in
`bash_syntax_check`? (minor finding — suggests incomplete manifest)
**Severity:**
- Missing Manifest block on any step → **major** (same tier as missing On failure)
- Invalid regex in commit_message_pattern → **major**
- Pattern doesn't match declared Checkpoint → **major**
- `expected_paths` references non-existent path not marked new → **major**
- `forbidden_paths` overlaps `expected_paths`**blocker** (contradiction)
- Missing bash_syntax_check for declared `.sh` files → **minor**
**Backward compat:** For plans without `plan_version: 1.7` (legacy), emit
a single advisory note ("Plan is v1.6 legacy format — manifests will be
synthesized by ultraexecute with reduced audit precision") and skip this
dimension's scoring.
## Rating system
Rate each finding:
@ -131,10 +167,15 @@ After reviewing all findings, produce a quantitative score:
| Coverage completeness | 0.20 | Spec-to-steps mapping, no gaps |
| Specification quality | 0.15 | No placeholders, clear criteria |
| Risk & pre-mortem | 0.15 | Failure modes addressed, mitigations realistic |
| Headless readiness | 0.15 | On failure clauses, checkpoints, circuit breakers |
| Headless readiness | 0.10 | On failure clauses, checkpoints, circuit breakers |
| Manifest quality | 0.05 | Every step has a valid, checkable manifest (v1.7+) |
Score each dimension 0100, then compute the weighted total.
**Weighting note (v1.7):** Headless readiness reduced 0.15→0.10, Manifest
quality added at 0.05. Total still 1.00. For legacy v1.6 plans, Manifest
quality is not scored and Headless readiness returns to 0.15.
**Grade thresholds:**
- **A** (90100): APPROVE
- **B** (7589): APPROVE_WITH_NOTES
@ -166,7 +207,8 @@ Score each dimension 0100, then compute the weighted total.
| Coverage completeness | 0.20 | {0100} | {assessment} |
| Specification quality | 0.15 | {0100} | {assessment} |
| Risk & pre-mortem | 0.15 | {0100} | {assessment} |
| Headless readiness | 0.15 | {0100} | {assessment} |
| Headless readiness | 0.10 | {0100} | {assessment} |
| Manifest quality | 0.05 | {0100} | {assessment — omit for legacy v1.6} |
| **Weighted total** | **1.00** | **{score}** | **Grade: {A/B/C/D}** |
## Summary

View file

@ -186,6 +186,50 @@ Write a comprehensive implementation plan including:
- Test Strategy (if test-strategist was used)
- Verification (concrete commands), Estimated Scope
**Plan-version header:** Include `plan_version: 1.7` in the metadata line below
the title. This signals to ultraexecute-local that the plan includes per-step
verification manifests and enables strict audit mode. Plans without this
marker are treated as legacy v1.6 with synthesized minimal manifests.
### Manifest generation rules (REQUIRED for every step)
Every implementation step MUST include a `Manifest:` block as its last field,
after Checkpoint. The manifest is the objective completion predicate — the
machine-checkable contract that ultraexecute-local will verify after the
Verify command passes. A step cannot be marked passed if its manifest does
not verify.
Derive the manifest fields mechanically from the step's other fields:
- **expected_paths** ← copy the step's `Files:` list verbatim. Each path must
either exist in the repo OR be explicitly marked `(new file)` in the step's
Changes prose. Do not list paths that neither exist nor are declared new.
- **min_file_count** ← default to `len(expected_paths)`. Lower only when the
step explicitly allows partial creation (rare).
- **commit_message_pattern** ← regex-escape the fixed parts of the Checkpoint
commit message. Preserve Conventional Commit structure. Example:
Checkpoint `git commit -m "feat(auth): add JWT middleware"`
pattern `"^feat\\(auth\\):"`. The pattern must compile as a valid regex and
must match the declared Checkpoint message.
- **bash_syntax_check** ← auto-include every `.sh` file appearing in
expected_paths. Add other shell scripts the step creates transitively.
- **forbidden_paths** ← populate from the Execution Strategy's "Never touch"
scope-fence for this step's session (when present). Defense-in-depth.
- **must_contain** ← optional. Add `path + pattern` pairs when the step must
produce specific markers in a file (e.g., a new config section, a required
export, a migration boundary).
**Validation before writing plan:**
1. Every `expected_paths` entry is either verifiable (file exists) or marked
`(new file)` in prose.
2. Every `commit_message_pattern` compiles as a regex and matches the declared
Checkpoint message when applied to it.
3. Every `bash_syntax_check` entry has a `.sh` suffix and appears in
`expected_paths`.
4. No `forbidden_paths` overlaps with `expected_paths` (contradiction).
If any validation fails, fix the plan before handing to Phase 6 review.
### Failure recovery (REQUIRED for every step)
Each implementation step MUST include:
@ -237,12 +281,19 @@ Create directories if needed.
Launch two review agents **in parallel**:
- `plan-critic` — find missing steps, wrong ordering, fragile assumptions,
missing error handling, scope creep, underspecified steps
missing error handling, scope creep, underspecified steps, AND manifest
quality (dimension 10: every step has a valid, regex-compilable,
path-verified manifest). Missing or invalid manifest = **major** finding.
- `scope-guardian` — verify plan matches spec requirements, find scope
creep and scope gaps, validate file/function references
After both complete:
- Address all blockers and major issues by revising the plan
- **Manifest quality is a hard gate:** any manifest-related `major` finding
must be fixed before the plan can be handed off. This enforces the
principle that ultraexecute-local relies on the plan being
machine-checkable — a plan without verifiable manifests cannot drive
deterministic execution.
- Add a "Revisions" note at the bottom documenting changes
### Phase 7 — Completion

View file

@ -50,9 +50,23 @@ Extract from the plan:
3. Per-step dependencies (explicit or implicit from step ordering)
4. Per-step verification commands
5. Per-step failure recovery (if present)
6. The overall verification section
7. Context and codebase analysis sections
8. Check for an existing `## Execution Strategy` section
6. **Per-step verification manifest (v1.7+)** — the `Manifest:` YAML block
following Checkpoint. Parse it as YAML. Preserve all fields:
`expected_paths`, `min_file_count`, `commit_message_pattern`,
`bash_syntax_check`, `forbidden_paths`, `must_contain`.
7. The overall verification section
8. Context and codebase analysis sections
9. The `plan_version` marker (if present in the header line)
10. Check for an existing `## Execution Strategy` section
**Manifest handling:**
- If `plan_version: 1.7` or later AND any step is missing a Manifest block:
STOP with error "Plan claims v1.7 but step N lacks Manifest. Re-run
planning-orchestrator." Do not attempt to synthesize.
- If no `plan_version` marker is present: treat as legacy v1.6. Synthesize
minimal manifests from `Files:` (expected_paths) and the Checkpoint commit
message (commit_message_pattern escaped). Mark output session specs with
`legacy_synthesis: true` in their Session Manifest.
**If an Execution Strategy already exists:**
- Log: "Existing Execution Strategy detected — using as primary input."
@ -135,6 +149,59 @@ For each session, write a spec file to the output directory:
5. **Failure handling** — what to do on failure (copied from plan's On failure fields,
or default to "stop and report")
6. **Handoff state** — what this session produces that other sessions need
7. **Per-step Manifest blocks** — copy each plan step's Manifest YAML verbatim
into the corresponding session-spec step. Do NOT edit or summarize.
8. **Session Manifest aggregate** — synthesize a top-level `## Session Manifest`
block aggregating all per-step manifests in the session:
- `expected_paths`: union of all steps' expected_paths (deduplicated)
- `commit_count`: number of implementation steps in this session (excludes Step 0)
- `commit_message_patterns`: list of per-step patterns, in step order
- `bash_syntax_check`: union of all steps' bash_syntax_check
- `scope_touch`: from Scope Fence Touch (already present)
- `scope_forbidden`: from Scope Fence Never Touch + union of step
forbidden_paths
- `plan_version`: from the source plan
- `legacy_synthesis`: true/false based on Step 1's handling
### Step 5.5 — Emit obligatory Step 0 pre-flight
Every generated session spec MUST begin its `## Steps` list with a synthetic
**Step 0: Sandbox pre-flight** that validates the subagent bash sandbox can
reach the remote before any real work is done. This catches the fail-late
push-denial observed in Wave 1 (3/6 sessions all lost their pushes at the
very end).
The Step 0 block to prepend verbatim:
```markdown
### Step 0: Sandbox pre-flight (auto-generated — do not modify)
- **Files:** none (read-only test)
- **Changes:** verify git push permissions are available in this sandbox
- **Verify:**
```
git push --dry-run origin HEAD 2>&1 | tee /tmp/push-dryrun-$$.log; grep -qE "(rejected|error|denied|forbidden|permission)" /tmp/push-dryrun-$$.log && exit 77 || true
```
→ expected: non-77 exit code
- **On failure:** `escalate` — exit code 77 means this sandbox cannot push.
Abort immediately; do not attempt any work. Main orchestrator will
re-spawn with correct permissions.
- **Checkpoint:** none (no file changes)
- **Manifest:**
```yaml
manifest:
expected_paths: []
min_file_count: 0
commit_message_pattern: ""
bash_syntax_check: []
forbidden_paths: []
must_contain: []
sandbox_preflight: true
```
```
Do NOT skip Step 0 for any session. It is the only early-detection mechanism
for sandbox-blocked bash.
### Step 6 — Generate the dependency diagram

View file

@ -87,10 +87,48 @@ Extract every `### Step N: {description}` heading (in order). For each step, ext
- **Verify** — command to run after implementation
- **On failure** — recovery action (revert/retry/skip/escalate)
- **Checkpoint** — git commit command after success
- **Manifest** — YAML block following Checkpoint (v1.7+)
If a step is missing `On failure`, default to `escalate` and record a parse warning.
If a step is missing `Verify`, record a parse warning.
### Parse the Manifest block (v1.7+)
Read the plan header for a `plan_version` marker. Treat ≥ `1.7` as strict
mode; absence or <1.7 as legacy mode.
**Strict mode (plan_version ≥ 1.7):**
- Every step MUST have a `Manifest:` block with a YAML fenced payload.
- Parse the YAML. Required keys: `expected_paths` (list), `min_file_count`
(integer), `commit_message_pattern` (string), `bash_syntax_check` (list),
`forbidden_paths` (list), `must_contain` (list of `{path, pattern}` dicts).
- Validate each `commit_message_pattern` compiles as a regex (use a tolerant
parser — do not execute untrusted input against shell).
- If any step is missing its Manifest, or YAML is malformed, STOP with:
```
Error: plan_version=1.7 but step {N} has invalid/missing Manifest.
Re-run planning-orchestrator — plan is not executable.
```
**Legacy mode (no plan_version, or < 1.7):**
- Synthesize a minimal manifest per step:
- `expected_paths` ← step's `Files:` list
- `min_file_count``len(expected_paths)`
- `commit_message_pattern` ← regex-escape first 3 words of Checkpoint msg
- `bash_syntax_check` ← auto-detect `.sh` in Files
- `forbidden_paths` ← []
- `must_contain` ← []
- Record `legacy_plan: true` in the progress file.
- Emit warning: `Legacy plan (v1.6 or earlier) — manifests synthesized with
reduced audit precision.`
### Parse Session Manifest (session specs only)
If the file is a session spec (v1.7+), parse the `## Session Manifest` block
into a YAML dict. Preserve `session_manifest.*` fields for Phase 7.5 audit.
If missing and `plan_version ≥ 1.7`, record a parse warning but continue —
the per-step manifests are still available for audit.
### Parse session spec fields (if applicable)
- **Entry condition** from `## Dependencies`
@ -721,7 +759,7 @@ Read the step's `Files:` and `Changes:` fields. Implement exactly as described.
- If `Reuses:` references existing code, read that code first for context
- Only touch files listed in `Files:` — nothing else
#### Sub-step D — Verification
#### Sub-step D1 — Command verification
**Security check (mandatory):** Before running the Verify command, check it against
the executor security denylist. If the command matches ANY of these patterns,
@ -763,7 +801,50 @@ Result: {PASS | FAIL} (exit code {N})
{if FAIL}: Output (first 10 lines): {output}
```
If **PASS**: proceed to Sub-step F (checkpoint).
**Step 0 sentinel:** if the Verify command exits `77` AND the step's manifest
has `sandbox_preflight: true`, do NOT treat as a normal failure. Set step
status = `blocked` and apply `On failure: escalate`. Skip D2 manifest
verification (nothing to verify for a read-only sandbox test). Commit none —
jump straight to Phase 7 with a structured "sandbox-blocked" reason.
If **PASS**: proceed to Sub-step D2 (manifest verification).
#### Sub-step D2 — Manifest verification
After the Verify command passes, verify the step's Manifest block. This is
the objective completion predicate: a step is not passed until its manifest
holds, regardless of Verify exit code.
**Checks to run (in order):**
1. **Expected paths exist:** for each `expected_paths` entry, verify the
file exists in the repo. Count how many exist.
2. **min_file_count satisfied:** count from (1) must be ≥ `min_file_count`.
3. **Forbidden paths untouched:** for each `forbidden_paths` entry, run
`git diff --name-only HEAD~{attempts} HEAD -- {path}` (since this step
began). Any modified forbidden path = fail.
4. **Bash syntax check:** for each entry in `bash_syntax_check` AND any
`.sh` file that appears in `git diff --name-only HEAD~1 HEAD` (safety
net for unlisted scripts), run `bash -n {path}`. Non-zero exit = fail.
5. **must_contain patterns:** for each `{path, pattern}` pair, run
`grep -E "{pattern}" {path}`. No match = fail.
**On manifest failure:**
```
Manifest verification FAIL for step {N}:
- {which check failed with detail}
```
Apply the step's `On failure:` clause (same as Sub-step D1 failure). Increment
attempts. Manifest failures count against the retry cap equally with command
failures.
If all manifest checks pass: proceed to Sub-step F (checkpoint).
Record per-step manifest audit result in progress file:
`steps.{N}.manifest_audit = "pass" | "fail"`,
`steps.{N}.manifest_drift = [{check: reason, ...}]` on fail.
#### Sub-step E — On failure handling
@ -816,8 +897,15 @@ The step's verification already passed — the commit is bookkeeping.
Step {N}: PASS (committed: {hash})
```
**Commit-message-pattern check:** after the commit succeeds, read the HEAD
commit message (`git log -1 --pretty=%s`) and match it against the step's
`commit_message_pattern`. Mismatch does NOT fail the step (verification
already passed) but is recorded as `checkpoint_drift: {expected_pattern,
actual_message}` in the progress file. Phase 7.5 audit reports drift as
an advisory.
Update progress: `steps.{N}.status = "passed"`, `steps.{N}.commit = {hash}`,
`steps.{N}.completed_at = now`.
`steps.{N}.completed_at = now`, `steps.{N}.manifest_audit = "pass"`.
### Step mode exit
@ -839,6 +927,96 @@ Exit condition check:
If all pass: `exit_condition_checked: true` in progress file.
If any fail: record which failed. Include in final report.
## Phase 7.5 — Manifest audit (independent)
**Runs for all modes except dry-run.** This is the last-line-of-defense
check: it ignores the executor's own per-step bookkeeping and re-verifies
session-wide state directly from the filesystem and git log. If the audit
disagrees with the executor's self-report, the audit wins.
This phase exists because agents can hallucinate completion. Phase 7.5
produces an independent truth based on objective state, not self-narrative.
**Steps:**
1. **Enumerate expected paths:** aggregate `expected_paths` across every
step manifest (and the session_manifest, if present). Deduplicate.
2. **Filesystem check:** for each expected_path, confirm the file exists.
3. **Commit-count check:** count commits since session-start
(`git rev-list --count {start_sha}..HEAD`). Expected count =
number of steps with `status=passed` (excludes Step 0 if sandbox_preflight).
4. **Commit-message pattern sweep:** walk `git log {start_sha}..HEAD` and
confirm each commit matches one of the declared
`commit_message_patterns` (in any order — order is advisory).
5. **Bash syntax sweep:** for every `.sh` file in
`git diff --name-only {start_sha} HEAD`, run `bash -n`. Collect failures.
6. **Forbidden-path sweep:** for each `scope_forbidden` entry, check
`git diff --name-only {start_sha} HEAD -- {path}` is empty.
Compare the audit result against the executor's `progress.status`:
- **Audit pass + progress=completed:** status stays `completed`.
- **Audit fail + progress=completed:** OVERRIDE to `partial`. The executor
believed it was done; the filesystem says otherwise. This is the Wave 1
hallucination case — the audit is the defense.
- **Audit fail + progress=failed/stopped:** status stays as-is; drift is
informational.
Record in progress file:
- `manifest_audit.status = "pass" | "drift"`
- `manifest_audit.drift_details = [{check, expected, actual}, ...]`
## Phase 7.6 — Recovery dispatch (multi-session parent context only)
**Preconditions:**
- This is the parent ultraexecute invocation (not a child `--session N`)
- Phase 7.5 reported `drift`
- `recovery_depth < 2` (hard cap to prevent infinite loops)
**Skip otherwise.** Recovery in child context or at depth 2+ escalates to
the user for manual resolution.
**Synthesize a recovery session spec:**
1. Determine the missing step numbers from `manifest_audit.drift_details`
(steps whose expected_paths are absent or whose commits are missing).
2. Read the original session spec. Copy only the missing steps (preserving
their Manifest blocks) into a new file:
`{output_dir}/session-{N}-recovery-{depth}.md`
3. Populate `## Recovery Metadata`:
- `recovery_of: {original session spec path}`
- `recovery_depth: {current depth + 1}`
- `missing_steps: [N, M, ...]`
- `entry_condition_override: "previous partial session committed at {sha}"`
- `parent_progress_file: {path}`
4. Prepend the synthetic Step 0 pre-flight (same as normal session specs).
5. Set `recovery_dispatched = true` in parent progress file.
**Invoke the recovery session:**
```bash
cd "$WORKTREE_PATH" && claude -p "/ultraexecute-local --session {N} {recovery spec path}" \
--allowedTools "Read,Write,Edit,Bash,Glob,Grep" \
--permission-mode bypassPermissions \
> "$LOG_DIR/session-{N}-recovery-{depth}.log" 2>&1
```
Wait for the recovery session to complete. After it returns, re-run Phase
7.5 audit one more time. If it still drifts at `recovery_depth=2`:
```
RECOVERY EXHAUSTED: session {N} drifted after 2 recovery attempts.
Missing: {steps and details}
Status: partial (recovery_depth=2, escalated to user)
```
Do NOT dispatch a third recovery. Report to the user.
## Phase 8 — Final report
Always produce a final report.
@ -855,11 +1033,20 @@ Update progress file: `status` to `completed`/`failed`/`stopped`, `updated_at`,
### Step Results
| Step | Description | Result | Attempts | Commit |
|------|-------------|--------|----------|--------|
| 1 | {desc} | PASS | 1 | abc1234 |
| 2 | {desc} | FAIL | 3 | — |
| 3 | {desc} | — | 0 | — |
| Step | Description | Result | Attempts | Commit | Manifest |
|------|-------------|--------|----------|--------|----------|
| 0 | Sandbox pre-flight | PASS | 1 | — | n/a |
| 1 | {desc} | PASS | 1 | abc1234 | pass |
| 2 | {desc} | FAIL | 3 | — | — |
| 3 | {desc} | — | 0 | — | — |
### Manifest Audit (Phase 7.5)
- **Status:** {pass | drift}
- **Drift details:** {enumerated; empty on pass}
- **Recovery dispatched:** {true | false}
- **Recovery depth:** {N}
- **Legacy plan:** {true | false}
### Summary
@ -867,6 +1054,7 @@ Update progress file: `status` to `completed`/`failed`/`stopped`, `updated_at`,
- Skipped: {N}
- Failed: {N}
- Not reached: {N}
- Blocked (sandbox): {N}
{if all passed + exit condition passed}:
All steps completed. Exit condition: PASS.
@ -886,6 +1074,14 @@ Attempts: {N}
To resume: /ultraexecute-local --resume {path}
```
**Result vocabulary (v1.7, strict):**
- `completed` — all steps passed AND Phase 7.5 manifest audit passed
- `partial` — steps passed per executor but Phase 7.5 found drift, OR
Phase 7.6 recovery incomplete
- `blocked` — Step 0 sandbox pre-flight exited 77; no real work attempted
- `failed` — a step failed and On failure was revert/retry (retry cap hit)
- `stopped` — On failure: escalate triggered
**JSON summary block** (always at the end, machine-parseable):
```json
@ -893,14 +1089,21 @@ To resume: /ultraexecute-local --resume {path}
"ultraexecute_summary": {
"plan": "{path}",
"plan_type": "{plan | session-spec}",
"result": "{completed | failed | stopped | partial}",
"plan_version": "{1.7 | 1.6 | legacy}",
"result": "{completed | partial | blocked | failed | stopped}",
"steps_total": 0,
"steps_passed": 0,
"steps_failed": 0,
"steps_skipped": 0,
"steps_not_reached": 0,
"steps_blocked": 0,
"failed_at_step": null,
"exit_condition": "{pass | fail | skipped | n/a}",
"manifest_audit": "{pass | drift | n/a}",
"drift_details": [],
"recovery_dispatched": false,
"recovery_depth": 0,
"legacy_plan": false,
"progress_file": "{path}"
}
}
@ -995,3 +1198,18 @@ Never let stats failures block the workflow.
(git hook injection), `~/.ssh/`, `~/.aws/`, `~/.gnupg/`, `.env` files,
shell configs (`~/.zshrc`, `~/.bashrc`, `~/.profile`), or
`.claude/settings.json` / `.claude/hooks/` (infrastructure self-modification).
17. **Manifest is the completion predicate.** A step may not be marked passed
if its manifest does not verify, regardless of the Verify command's exit
code. The manifest is the objective contract — Verify is necessary but
not sufficient. For v1.7+ plans: a Verify pass with a manifest fail sets
step result to `failed` and triggers the On-failure clause. For legacy
v1.6 plans: synthesized manifests apply with the same force, but
`legacy_plan: true` is logged in progress.
18. **Last-activity rule.** The executor's final tool call before writing
Phase 8 must be a manifest check (Phase 7.5 audit), never an arbitrary
file review. This prevents the "hallucinated completion" failure mode
where a transcript ends on an unrelated Read and the agent self-reports
`completed` without verifying. If Phase 7.5 has not run, the executor
may not emit `result: completed` under any circumstances.

View file

@ -47,6 +47,27 @@ if [ -n "$(git status --porcelain)" ]; then
exit 1
fi
# Pre-flight: verify remote push permissions (catches credential/auth issues
# BEFORE spawning sessions). Sub-agent bash sandbox may have different
# credentials than the launching shell — Step 0 in each session spec handles
# the sandbox-side detection. Set ULTRAEXECUTE_SKIP_PREFLIGHT=1 for offline
# or air-gapped testing.
if [ "${ULTRAEXECUTE_SKIP_PREFLIGHT:-0}" != "1" ]; then
if ! git push --dry-run origin HEAD >/tmp/push-dryrun-launch.log 2>&1; then
echo "ERROR: git push --dry-run failed. Sessions will be unable to push."
cat /tmp/push-dryrun-launch.log
echo ""
echo "Fix remote credentials before running parallel execution, or set"
echo "ULTRAEXECUTE_SKIP_PREFLIGHT=1 to bypass (offline/air-gapped only)."
exit 1
fi
if grep -qE "(rejected|denied|forbidden|permission)" /tmp/push-dryrun-launch.log; then
echo "ERROR: git push --dry-run reports rejection. Sessions will fail at commit time."
cat /tmp/push-dryrun-launch.log
exit 1
fi
fi
echo "=== Ultraplan Headless Execution (Worktree-Isolated) ==="
echo "Plan: {plan_path}"
echo "Sessions: {total_sessions}"

View file

@ -2,7 +2,7 @@
> **Plan quality: {grade}** ({score}/100) — {APPROVE | APPROVE_WITH_NOTES | REVISE | REPLAN}
>
> Generated by ultraplan-local v{version} on {YYYY-MM-DD}
> Generated by ultraplan-local v{version} on {YYYY-MM-DD}`plan_version: 1.7`
## Context
@ -56,6 +56,17 @@ when the project has tests.
- **Verify:** `{exact command}` → expected: `{output}`
- **On failure:** {revert | retry | skip | escalate} — {specific instructions}
- **Checkpoint:** `git commit -m "{conventional commit message}"`
- **Manifest:**
```yaml
manifest:
expected_paths:
- path/to/file.ts
min_file_count: 1
commit_message_pattern: "^feat\\(scope\\):"
bash_syntax_check: []
forbidden_paths: []
must_contain: []
```
### Step 2: {description}
@ -69,10 +80,43 @@ when the project has tests.
- **Verify:** `{exact command}` → expected: `{output}`
- **On failure:** {revert | retry | skip | escalate} — {specific instructions}
- **Checkpoint:** `git commit -m "{conventional commit message}"`
- **Manifest:**
```yaml
manifest:
expected_paths:
- path/to/file.ts
min_file_count: 1
commit_message_pattern: "^feat\\(scope\\):"
bash_syntax_check: []
forbidden_paths: []
must_contain:
- path: path/to/file.ts
pattern: "expected content marker"
```
*For projects without tests: omit "Test first" and keep "Verify" with a
concrete command (e.g., run the app, check output, curl an endpoint).*
### Manifest — objective completion predicate
Every step MUST have a Manifest block. This is the machine-checkable contract
that ultraexecute-local verifies after the Verify command passes. A step is
not considered complete until its manifest verifies — regardless of Verify
command exit code.
- **expected_paths** — files that must exist after this step. Existing files
must be present in repo; new files must be marked `(new file)` in prose.
- **min_file_count** — minimum number of expected_paths that must exist.
Typically equal to `len(expected_paths)`.
- **commit_message_pattern** — regex that MUST match the HEAD commit message
after Checkpoint runs. Use escaped regex syntax (e.g., `\\(scope\\)`).
- **bash_syntax_check** — list of `.sh` files that must pass `bash -n`.
Auto-include any `.sh` in expected_paths.
- **forbidden_paths** — files this step must NOT modify (defense-in-depth
beyond Scope Fence).
- **must_contain** — optional grep assertions: `path` + `pattern` pairs that
must match in created/modified files.
### Failure recovery rules
- **On failure: revert** — undo this step's changes (`git checkout -- {files}`), do NOT proceed
@ -121,7 +165,10 @@ before execution.*
## Verification
End-to-end checks that prove the plan was implemented correctly.
*Per-step manifest verification runs automatically during execution (every
step's Manifest block is objectively checked by ultraexecute-local before the
step is marked passed). This section is for end-to-end integration checks
that cross step boundaries — complete workflows, system-level behavior.*
- [ ] `{exact command}` → expected: `{exact output or behavior}`
- [ ] `{exact command}` → expected: `{exact output or behavior}`
@ -179,7 +226,8 @@ later waves depend on earlier waves completing first.*
| Coverage completeness | 0.20 | {0100} | {spec → steps, no gaps} |
| Specification quality | 0.15 | {0100} | {no placeholders, clear criteria} |
| Risk & pre-mortem | 0.15 | {0100} | {failure modes addressed} |
| Headless readiness | 0.15 | {0100} | {On failure + Checkpoint per step} |
| Headless readiness | 0.10 | {0100} | {On failure + Checkpoint per step} |
| Manifest quality | 0.05 | {0100} | {all steps have valid, checkable manifests} |
| **Weighted total** | **1.00** | **{score}** | **Grade: {A/B/C/D}** |
**Adversarial review:**

View file

@ -20,8 +20,61 @@ the purpose and make judgment calls.}
- **Touch:** {explicit list of files this session may create or modify}
- **Never touch:** {files that belong to other sessions — hard boundary}
## Session Manifest
Machine-readable aggregate of all step manifests in this session. Used by
ultraexecute-local for independent Phase 7.5 audit.
```yaml
session_manifest:
plan_version: "1.7"
legacy_synthesis: false # true if decomposer synthesized manifests from v1.6 plan
expected_paths: # union across all steps (deduplicated)
- {path from step N}
- {path from step M}
commit_count: {N} # number of implementation steps (excludes Step 0)
commit_message_patterns: # in step order; Step 0 omitted
- "^feat\\(scope\\):"
- "^fix\\(scope\\):"
bash_syntax_check: [] # union of step bash_syntax_check
scope_touch: [] # from Scope Fence Touch
scope_forbidden: [] # Never touch + union of step forbidden_paths
```
## Steps
### Step 0: Sandbox pre-flight (auto-generated — do not modify)
- **Files:** none (read-only test)
- **Changes:** verify git push permissions are available in this sandbox
- **Verify:**
```
git push --dry-run origin HEAD 2>&1 | tee /tmp/push-dryrun-$$.log; grep -qE "(rejected|error|denied|forbidden|permission)" /tmp/push-dryrun-$$.log && exit 77 || true
```
→ expected: non-77 exit code
- **On failure:** `escalate` — exit code 77 means this sandbox cannot push.
Abort immediately; do not attempt any work. Main orchestrator will
re-spawn with correct permissions.
- **Checkpoint:** none (no file changes)
- **Manifest:**
```yaml
manifest:
expected_paths: []
min_file_count: 0
commit_message_pattern: ""
bash_syntax_check: []
forbidden_paths: []
must_contain: []
sandbox_preflight: true
```
*Step 0 runs in the same sandbox as all real work. If it exits 77,
ultraexecute-local marks the session `blocked` and does NOT proceed. This
catches the fail-late push-denial mode observed in Wave 1.*
*Escape hatch:* set `ULTRAEXECUTE_SKIP_PREFLIGHT=1` in the environment to
bypass Step 0 (use only for offline/air-gapped testing).
### Step 1: {description}
- **Files:** `{path}`
@ -31,10 +84,21 @@ the purpose and make judgment calls.}
- **Verify:** `{exact command}` → expected: `{output}`
- **On failure:** {revert | retry | skip | escalate} — {specific instructions}
- **Checkpoint:** `git commit -m "{message}"`
- **Manifest:**
```yaml
manifest:
expected_paths:
- {path}
min_file_count: 1
commit_message_pattern: "^feat\\(scope\\):"
bash_syntax_check: []
forbidden_paths: []
must_contain: []
```
### Step 2: {description}
{same structure as Step 1}
{same structure as Step 1, including Manifest block}
## Exit Condition
@ -78,3 +142,14 @@ introduced. This section bridges sessions — it's the "baton" in a relay race.}
- **Steps from plan:** {step N}{step M}
- **Estimated complexity:** {low | medium | high}
- **Model recommendation:** {opus | sonnet} — {rationale}
## Recovery Metadata
*This section is populated only when this session spec was generated by the
ultraexecute-local Phase 7.6 recovery dispatcher. Omit for normal sessions.*
- **Recovery of:** `{original session spec path}`
- **Recovery depth:** {1 | 2}
- **Missing steps (reason for recovery):** {step numbers + drift summary}
- **Entry condition override:** {e.g., "previous partial session committed at {sha}"}
- **Parent progress file:** `{path to .ultraexecute-progress-*.json}`