feat(ultraplan-local): v1.7.0 — self-verifying plan chain
Wave 1 of a 6-session parallel build revealed three failure modes: (1) hallucinated completion (status=completed after 2/5 steps, last tool call was an arbitrary file review), (2) fail-late bash (3/6 sessions had push blocked inside sub-agent sandbox after all work was done), (3) no objective verification (plans were prose). v1.7 closes all three by making the plan an executable contract. Per-step YAML manifest (expected_paths, commit_message_pattern, bash_syntax_check, forbidden_paths, must_contain) is the objective completion predicate. Plan-critic dimension 10 (Manifest quality) is a hard gate. Session decomposer propagates manifests verbatim and emits an obligatory Step 0 pre-flight (git push --dry-run, exit 77 sentinel) in every session spec. ultraexecute-local gets Phase 7.5 (independent manifest audit from git log + filesystem, ignoring agent bookkeeping) and Phase 7.6 (bounded recovery dispatch, recovery_depth ≤ 2). Hard Rule 17 forbids marking a step passed without manifest verification. Hard Rule 18 forbids ending on an arbitrary tool call before reporting. Division of labor is made explicit: - /ultraresearch-local gathers context (no build decisions) - /ultraplan-local produces an executable contract (manifests, plan-critic gate) - /ultraexecute-local executes disciplined (does NOT compensate for weak plans — escalates) Code complete. Docs partial (Arbeidsdeling table + manifest section added to plugin + marketplace READMEs). Verification tests (10-sequence) pending — see REMEMBER.md. Backward compat: v1.6 plans without plan_version marker get legacy mode with synthesized manifests and legacy_plan: true in progress file. Plan-critic emits advisory, not block. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
72f2e8f6c9
commit
d1befac35a
11 changed files with 651 additions and 27 deletions
|
|
@ -87,10 +87,48 @@ Extract every `### Step N: {description}` heading (in order). For each step, ext
|
|||
- **Verify** — command to run after implementation
|
||||
- **On failure** — recovery action (revert/retry/skip/escalate)
|
||||
- **Checkpoint** — git commit command after success
|
||||
- **Manifest** — YAML block following Checkpoint (v1.7+)
|
||||
|
||||
If a step is missing `On failure`, default to `escalate` and record a parse warning.
|
||||
If a step is missing `Verify`, record a parse warning.
|
||||
|
||||
### Parse the Manifest block (v1.7+)
|
||||
|
||||
Read the plan header for a `plan_version` marker. Treat ≥ `1.7` as strict
|
||||
mode; absence or <1.7 as legacy mode.
|
||||
|
||||
**Strict mode (plan_version ≥ 1.7):**
|
||||
- Every step MUST have a `Manifest:` block with a YAML fenced payload.
|
||||
- Parse the YAML. Required keys: `expected_paths` (list), `min_file_count`
|
||||
(integer), `commit_message_pattern` (string), `bash_syntax_check` (list),
|
||||
`forbidden_paths` (list), `must_contain` (list of `{path, pattern}` dicts).
|
||||
- Validate each `commit_message_pattern` compiles as a regex (use a tolerant
|
||||
parser — do not execute untrusted input against shell).
|
||||
- If any step is missing its Manifest, or YAML is malformed, STOP with:
|
||||
```
|
||||
Error: plan_version=1.7 but step {N} has invalid/missing Manifest.
|
||||
Re-run planning-orchestrator — plan is not executable.
|
||||
```
|
||||
|
||||
**Legacy mode (no plan_version, or < 1.7):**
|
||||
- Synthesize a minimal manifest per step:
|
||||
- `expected_paths` ← step's `Files:` list
|
||||
- `min_file_count` ← `len(expected_paths)`
|
||||
- `commit_message_pattern` ← regex-escape first 3 words of Checkpoint msg
|
||||
- `bash_syntax_check` ← auto-detect `.sh` in Files
|
||||
- `forbidden_paths` ← []
|
||||
- `must_contain` ← []
|
||||
- Record `legacy_plan: true` in the progress file.
|
||||
- Emit warning: `Legacy plan (v1.6 or earlier) — manifests synthesized with
|
||||
reduced audit precision.`
|
||||
|
||||
### Parse Session Manifest (session specs only)
|
||||
|
||||
If the file is a session spec (v1.7+), parse the `## Session Manifest` block
|
||||
into a YAML dict. Preserve `session_manifest.*` fields for Phase 7.5 audit.
|
||||
If missing and `plan_version ≥ 1.7`, record a parse warning but continue —
|
||||
the per-step manifests are still available for audit.
|
||||
|
||||
### Parse session spec fields (if applicable)
|
||||
|
||||
- **Entry condition** from `## Dependencies`
|
||||
|
|
@ -721,7 +759,7 @@ Read the step's `Files:` and `Changes:` fields. Implement exactly as described.
|
|||
- If `Reuses:` references existing code, read that code first for context
|
||||
- Only touch files listed in `Files:` — nothing else
|
||||
|
||||
#### Sub-step D — Verification
|
||||
#### Sub-step D1 — Command verification
|
||||
|
||||
**Security check (mandatory):** Before running the Verify command, check it against
|
||||
the executor security denylist. If the command matches ANY of these patterns,
|
||||
|
|
@ -763,7 +801,50 @@ Result: {PASS | FAIL} (exit code {N})
|
|||
{if FAIL}: Output (first 10 lines): {output}
|
||||
```
|
||||
|
||||
If **PASS**: proceed to Sub-step F (checkpoint).
|
||||
**Step 0 sentinel:** if the Verify command exits `77` AND the step's manifest
|
||||
has `sandbox_preflight: true`, do NOT treat as a normal failure. Set step
|
||||
status = `blocked` and apply `On failure: escalate`. Skip D2 manifest
|
||||
verification (nothing to verify for a read-only sandbox test). Commit none —
|
||||
jump straight to Phase 7 with a structured "sandbox-blocked" reason.
|
||||
|
||||
If **PASS**: proceed to Sub-step D2 (manifest verification).
|
||||
|
||||
#### Sub-step D2 — Manifest verification
|
||||
|
||||
After the Verify command passes, verify the step's Manifest block. This is
|
||||
the objective completion predicate: a step is not passed until its manifest
|
||||
holds, regardless of Verify exit code.
|
||||
|
||||
**Checks to run (in order):**
|
||||
|
||||
1. **Expected paths exist:** for each `expected_paths` entry, verify the
|
||||
file exists in the repo. Count how many exist.
|
||||
2. **min_file_count satisfied:** count from (1) must be ≥ `min_file_count`.
|
||||
3. **Forbidden paths untouched:** for each `forbidden_paths` entry, run
|
||||
`git diff --name-only HEAD~{attempts} HEAD -- {path}` (since this step
|
||||
began). Any modified forbidden path = fail.
|
||||
4. **Bash syntax check:** for each entry in `bash_syntax_check` AND any
|
||||
`.sh` file that appears in `git diff --name-only HEAD~1 HEAD` (safety
|
||||
net for unlisted scripts), run `bash -n {path}`. Non-zero exit = fail.
|
||||
5. **must_contain patterns:** for each `{path, pattern}` pair, run
|
||||
`grep -E "{pattern}" {path}`. No match = fail.
|
||||
|
||||
**On manifest failure:**
|
||||
|
||||
```
|
||||
Manifest verification FAIL for step {N}:
|
||||
- {which check failed with detail}
|
||||
```
|
||||
|
||||
Apply the step's `On failure:` clause (same as Sub-step D1 failure). Increment
|
||||
attempts. Manifest failures count against the retry cap equally with command
|
||||
failures.
|
||||
|
||||
If all manifest checks pass: proceed to Sub-step F (checkpoint).
|
||||
|
||||
Record per-step manifest audit result in progress file:
|
||||
`steps.{N}.manifest_audit = "pass" | "fail"`,
|
||||
`steps.{N}.manifest_drift = [{check: reason, ...}]` on fail.
|
||||
|
||||
#### Sub-step E — On failure handling
|
||||
|
||||
|
|
@ -816,8 +897,15 @@ The step's verification already passed — the commit is bookkeeping.
|
|||
Step {N}: PASS (committed: {hash})
|
||||
```
|
||||
|
||||
**Commit-message-pattern check:** after the commit succeeds, read the HEAD
|
||||
commit message (`git log -1 --pretty=%s`) and match it against the step's
|
||||
`commit_message_pattern`. Mismatch does NOT fail the step (verification
|
||||
already passed) but is recorded as `checkpoint_drift: {expected_pattern,
|
||||
actual_message}` in the progress file. Phase 7.5 audit reports drift as
|
||||
an advisory.
|
||||
|
||||
Update progress: `steps.{N}.status = "passed"`, `steps.{N}.commit = {hash}`,
|
||||
`steps.{N}.completed_at = now`.
|
||||
`steps.{N}.completed_at = now`, `steps.{N}.manifest_audit = "pass"`.
|
||||
|
||||
### Step mode exit
|
||||
|
||||
|
|
@ -839,6 +927,96 @@ Exit condition check:
|
|||
If all pass: `exit_condition_checked: true` in progress file.
|
||||
If any fail: record which failed. Include in final report.
|
||||
|
||||
## Phase 7.5 — Manifest audit (independent)
|
||||
|
||||
**Runs for all modes except dry-run.** This is the last-line-of-defense
|
||||
check: it ignores the executor's own per-step bookkeeping and re-verifies
|
||||
session-wide state directly from the filesystem and git log. If the audit
|
||||
disagrees with the executor's self-report, the audit wins.
|
||||
|
||||
This phase exists because agents can hallucinate completion. Phase 7.5
|
||||
produces an independent truth based on objective state, not self-narrative.
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. **Enumerate expected paths:** aggregate `expected_paths` across every
|
||||
step manifest (and the session_manifest, if present). Deduplicate.
|
||||
|
||||
2. **Filesystem check:** for each expected_path, confirm the file exists.
|
||||
|
||||
3. **Commit-count check:** count commits since session-start
|
||||
(`git rev-list --count {start_sha}..HEAD`). Expected count =
|
||||
number of steps with `status=passed` (excludes Step 0 if sandbox_preflight).
|
||||
|
||||
4. **Commit-message pattern sweep:** walk `git log {start_sha}..HEAD` and
|
||||
confirm each commit matches one of the declared
|
||||
`commit_message_patterns` (in any order — order is advisory).
|
||||
|
||||
5. **Bash syntax sweep:** for every `.sh` file in
|
||||
`git diff --name-only {start_sha} HEAD`, run `bash -n`. Collect failures.
|
||||
|
||||
6. **Forbidden-path sweep:** for each `scope_forbidden` entry, check
|
||||
`git diff --name-only {start_sha} HEAD -- {path}` is empty.
|
||||
|
||||
Compare the audit result against the executor's `progress.status`:
|
||||
|
||||
- **Audit pass + progress=completed:** status stays `completed`.
|
||||
- **Audit fail + progress=completed:** OVERRIDE to `partial`. The executor
|
||||
believed it was done; the filesystem says otherwise. This is the Wave 1
|
||||
hallucination case — the audit is the defense.
|
||||
- **Audit fail + progress=failed/stopped:** status stays as-is; drift is
|
||||
informational.
|
||||
|
||||
Record in progress file:
|
||||
- `manifest_audit.status = "pass" | "drift"`
|
||||
- `manifest_audit.drift_details = [{check, expected, actual}, ...]`
|
||||
|
||||
## Phase 7.6 — Recovery dispatch (multi-session parent context only)
|
||||
|
||||
**Preconditions:**
|
||||
- This is the parent ultraexecute invocation (not a child `--session N`)
|
||||
- Phase 7.5 reported `drift`
|
||||
- `recovery_depth < 2` (hard cap to prevent infinite loops)
|
||||
|
||||
**Skip otherwise.** Recovery in child context or at depth 2+ escalates to
|
||||
the user for manual resolution.
|
||||
|
||||
**Synthesize a recovery session spec:**
|
||||
|
||||
1. Determine the missing step numbers from `manifest_audit.drift_details`
|
||||
(steps whose expected_paths are absent or whose commits are missing).
|
||||
2. Read the original session spec. Copy only the missing steps (preserving
|
||||
their Manifest blocks) into a new file:
|
||||
`{output_dir}/session-{N}-recovery-{depth}.md`
|
||||
3. Populate `## Recovery Metadata`:
|
||||
- `recovery_of: {original session spec path}`
|
||||
- `recovery_depth: {current depth + 1}`
|
||||
- `missing_steps: [N, M, ...]`
|
||||
- `entry_condition_override: "previous partial session committed at {sha}"`
|
||||
- `parent_progress_file: {path}`
|
||||
4. Prepend the synthetic Step 0 pre-flight (same as normal session specs).
|
||||
5. Set `recovery_dispatched = true` in parent progress file.
|
||||
|
||||
**Invoke the recovery session:**
|
||||
|
||||
```bash
|
||||
cd "$WORKTREE_PATH" && claude -p "/ultraexecute-local --session {N} {recovery spec path}" \
|
||||
--allowedTools "Read,Write,Edit,Bash,Glob,Grep" \
|
||||
--permission-mode bypassPermissions \
|
||||
> "$LOG_DIR/session-{N}-recovery-{depth}.log" 2>&1
|
||||
```
|
||||
|
||||
Wait for the recovery session to complete. After it returns, re-run Phase
|
||||
7.5 audit one more time. If it still drifts at `recovery_depth=2`:
|
||||
|
||||
```
|
||||
RECOVERY EXHAUSTED: session {N} drifted after 2 recovery attempts.
|
||||
Missing: {steps and details}
|
||||
Status: partial (recovery_depth=2, escalated to user)
|
||||
```
|
||||
|
||||
Do NOT dispatch a third recovery. Report to the user.
|
||||
|
||||
## Phase 8 — Final report
|
||||
|
||||
Always produce a final report.
|
||||
|
|
@ -855,11 +1033,20 @@ Update progress file: `status` to `completed`/`failed`/`stopped`, `updated_at`,
|
|||
|
||||
### Step Results
|
||||
|
||||
| Step | Description | Result | Attempts | Commit |
|
||||
|------|-------------|--------|----------|--------|
|
||||
| 1 | {desc} | PASS | 1 | abc1234 |
|
||||
| 2 | {desc} | FAIL | 3 | — |
|
||||
| 3 | {desc} | — | 0 | — |
|
||||
| Step | Description | Result | Attempts | Commit | Manifest |
|
||||
|------|-------------|--------|----------|--------|----------|
|
||||
| 0 | Sandbox pre-flight | PASS | 1 | — | n/a |
|
||||
| 1 | {desc} | PASS | 1 | abc1234 | pass |
|
||||
| 2 | {desc} | FAIL | 3 | — | — |
|
||||
| 3 | {desc} | — | 0 | — | — |
|
||||
|
||||
### Manifest Audit (Phase 7.5)
|
||||
|
||||
- **Status:** {pass | drift}
|
||||
- **Drift details:** {enumerated; empty on pass}
|
||||
- **Recovery dispatched:** {true | false}
|
||||
- **Recovery depth:** {N}
|
||||
- **Legacy plan:** {true | false}
|
||||
|
||||
### Summary
|
||||
|
||||
|
|
@ -867,6 +1054,7 @@ Update progress file: `status` to `completed`/`failed`/`stopped`, `updated_at`,
|
|||
- Skipped: {N}
|
||||
- Failed: {N}
|
||||
- Not reached: {N}
|
||||
- Blocked (sandbox): {N}
|
||||
|
||||
{if all passed + exit condition passed}:
|
||||
All steps completed. Exit condition: PASS.
|
||||
|
|
@ -886,6 +1074,14 @@ Attempts: {N}
|
|||
To resume: /ultraexecute-local --resume {path}
|
||||
```
|
||||
|
||||
**Result vocabulary (v1.7, strict):**
|
||||
- `completed` — all steps passed AND Phase 7.5 manifest audit passed
|
||||
- `partial` — steps passed per executor but Phase 7.5 found drift, OR
|
||||
Phase 7.6 recovery incomplete
|
||||
- `blocked` — Step 0 sandbox pre-flight exited 77; no real work attempted
|
||||
- `failed` — a step failed and On failure was revert/retry (retry cap hit)
|
||||
- `stopped` — On failure: escalate triggered
|
||||
|
||||
**JSON summary block** (always at the end, machine-parseable):
|
||||
|
||||
```json
|
||||
|
|
@ -893,14 +1089,21 @@ To resume: /ultraexecute-local --resume {path}
|
|||
"ultraexecute_summary": {
|
||||
"plan": "{path}",
|
||||
"plan_type": "{plan | session-spec}",
|
||||
"result": "{completed | failed | stopped | partial}",
|
||||
"plan_version": "{1.7 | 1.6 | legacy}",
|
||||
"result": "{completed | partial | blocked | failed | stopped}",
|
||||
"steps_total": 0,
|
||||
"steps_passed": 0,
|
||||
"steps_failed": 0,
|
||||
"steps_skipped": 0,
|
||||
"steps_not_reached": 0,
|
||||
"steps_blocked": 0,
|
||||
"failed_at_step": null,
|
||||
"exit_condition": "{pass | fail | skipped | n/a}",
|
||||
"manifest_audit": "{pass | drift | n/a}",
|
||||
"drift_details": [],
|
||||
"recovery_dispatched": false,
|
||||
"recovery_depth": 0,
|
||||
"legacy_plan": false,
|
||||
"progress_file": "{path}"
|
||||
}
|
||||
}
|
||||
|
|
@ -995,3 +1198,18 @@ Never let stats failures block the workflow.
|
|||
(git hook injection), `~/.ssh/`, `~/.aws/`, `~/.gnupg/`, `.env` files,
|
||||
shell configs (`~/.zshrc`, `~/.bashrc`, `~/.profile`), or
|
||||
`.claude/settings.json` / `.claude/hooks/` (infrastructure self-modification).
|
||||
|
||||
17. **Manifest is the completion predicate.** A step may not be marked passed
|
||||
if its manifest does not verify, regardless of the Verify command's exit
|
||||
code. The manifest is the objective contract — Verify is necessary but
|
||||
not sufficient. For v1.7+ plans: a Verify pass with a manifest fail sets
|
||||
step result to `failed` and triggers the On-failure clause. For legacy
|
||||
v1.6 plans: synthesized manifests apply with the same force, but
|
||||
`legacy_plan: true` is logged in progress.
|
||||
|
||||
18. **Last-activity rule.** The executor's final tool call before writing
|
||||
Phase 8 must be a manifest check (Phase 7.5 audit), never an arbitrary
|
||||
file review. This prevents the "hallucinated completion" failure mode
|
||||
where a transcript ends on an unrelated Read and the agent self-reports
|
||||
`completed` without verifying. If Phase 7.5 has not run, the executor
|
||||
may not emit `result: completed` under any circumstances.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue