feat(ultraplan-local): v1.7.0 — self-verifying plan chain
Wave 1 of a 6-session parallel build revealed three failure modes: (1) hallucinated completion (status=completed after 2/5 steps, last tool call was an arbitrary file review), (2) fail-late bash (3/6 sessions had push blocked inside sub-agent sandbox after all work was done), (3) no objective verification (plans were prose). v1.7 closes all three by making the plan an executable contract. Per-step YAML manifest (expected_paths, commit_message_pattern, bash_syntax_check, forbidden_paths, must_contain) is the objective completion predicate. Plan-critic dimension 10 (Manifest quality) is a hard gate. Session decomposer propagates manifests verbatim and emits an obligatory Step 0 pre-flight (git push --dry-run, exit 77 sentinel) in every session spec. ultraexecute-local gets Phase 7.5 (independent manifest audit from git log + filesystem, ignoring agent bookkeeping) and Phase 7.6 (bounded recovery dispatch, recovery_depth ≤ 2). Hard Rule 17 forbids marking a step passed without manifest verification. Hard Rule 18 forbids ending on an arbitrary tool call before reporting. Division of labor is made explicit: - /ultraresearch-local gathers context (no build decisions) - /ultraplan-local produces an executable contract (manifests, plan-critic gate) - /ultraexecute-local executes disciplined (does NOT compensate for weak plans — escalates) Code complete. Docs partial (Arbeidsdeling table + manifest section added to plugin + marketplace READMEs). Verification tests (10-sequence) pending — see REMEMBER.md. Backward compat: v1.6 plans without plan_version marker get legacy mode with synthesized manifests and legacy_plan: true in progress file. Plan-critic emits advisory, not block. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
72f2e8f6c9
commit
d1befac35a
11 changed files with 651 additions and 27 deletions
|
|
@ -113,6 +113,42 @@ Steps missing On failure or Checkpoint clauses are **major** findings
|
|||
(not blockers — the plan is still valid for interactive use, but it
|
||||
cannot be decomposed into headless sessions).
|
||||
|
||||
### 10. Manifest quality (hard gate)
|
||||
|
||||
Manifests are the objective completion predicate. ultraexecute-local uses
|
||||
them to determine whether a step is actually done — not just whether the
|
||||
Verify command returned 0. A plan without valid manifests cannot drive
|
||||
deterministic execution.
|
||||
|
||||
Check plans with `plan_version: 1.7` (or later) against these rules:
|
||||
|
||||
- Does EVERY step have a `Manifest:` block with YAML content?
|
||||
- Are `expected_paths` entries all either existing files OR explicitly marked
|
||||
`(new file)` in the step's Changes prose?
|
||||
- Is `expected_paths` a subset of `Files:` (no orphan paths)?
|
||||
- Does `commit_message_pattern` compile as a valid regex? (check with a
|
||||
mental regex-parse — e.g., unbalanced `(`, `[` is invalid)
|
||||
- Does the `commit_message_pattern` actually match the literal Checkpoint
|
||||
commit message declared in the step?
|
||||
- Are all `bash_syntax_check` entries `.sh` files that appear in
|
||||
`expected_paths` (not references to external scripts)?
|
||||
- Do `forbidden_paths` avoid overlap with `expected_paths` (contradiction)?
|
||||
- Does the step create shell scripts that are NOT listed in
|
||||
`bash_syntax_check`? (minor finding — suggests incomplete manifest)
|
||||
|
||||
**Severity:**
|
||||
- Missing Manifest block on any step → **major** (same tier as missing On failure)
|
||||
- Invalid regex in commit_message_pattern → **major**
|
||||
- Pattern doesn't match declared Checkpoint → **major**
|
||||
- `expected_paths` references non-existent path not marked new → **major**
|
||||
- `forbidden_paths` overlaps `expected_paths` → **blocker** (contradiction)
|
||||
- Missing bash_syntax_check for declared `.sh` files → **minor**
|
||||
|
||||
**Backward compat:** For plans without `plan_version: 1.7` (legacy), emit
|
||||
a single advisory note ("Plan is v1.6 legacy format — manifests will be
|
||||
synthesized by ultraexecute with reduced audit precision") and skip this
|
||||
dimension's scoring.
|
||||
|
||||
## Rating system
|
||||
|
||||
Rate each finding:
|
||||
|
|
@ -131,10 +167,15 @@ After reviewing all findings, produce a quantitative score:
|
|||
| Coverage completeness | 0.20 | Spec-to-steps mapping, no gaps |
|
||||
| Specification quality | 0.15 | No placeholders, clear criteria |
|
||||
| Risk & pre-mortem | 0.15 | Failure modes addressed, mitigations realistic |
|
||||
| Headless readiness | 0.15 | On failure clauses, checkpoints, circuit breakers |
|
||||
| Headless readiness | 0.10 | On failure clauses, checkpoints, circuit breakers |
|
||||
| Manifest quality | 0.05 | Every step has a valid, checkable manifest (v1.7+) |
|
||||
|
||||
Score each dimension 0–100, then compute the weighted total.
|
||||
|
||||
**Weighting note (v1.7):** Headless readiness reduced 0.15→0.10, Manifest
|
||||
quality added at 0.05. Total still 1.00. For legacy v1.6 plans, Manifest
|
||||
quality is not scored and Headless readiness returns to 0.15.
|
||||
|
||||
**Grade thresholds:**
|
||||
- **A** (90–100): APPROVE
|
||||
- **B** (75–89): APPROVE_WITH_NOTES
|
||||
|
|
@ -166,7 +207,8 @@ Score each dimension 0–100, then compute the weighted total.
|
|||
| Coverage completeness | 0.20 | {0–100} | {assessment} |
|
||||
| Specification quality | 0.15 | {0–100} | {assessment} |
|
||||
| Risk & pre-mortem | 0.15 | {0–100} | {assessment} |
|
||||
| Headless readiness | 0.15 | {0–100} | {assessment} |
|
||||
| Headless readiness | 0.10 | {0–100} | {assessment} |
|
||||
| Manifest quality | 0.05 | {0–100} | {assessment — omit for legacy v1.6} |
|
||||
| **Weighted total** | **1.00** | **{score}** | **Grade: {A/B/C/D}** |
|
||||
|
||||
## Summary
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue