feat(ultraplan-local): v1.8.0 — close Opus 4.7 schema-drift gap
Opus 4.7 reads agent instructions more literally than 4.6. The v1.7 planning-orchestrator described the Step+Manifest schema via prose + procedural rules, which 4.6 inferred correctly but 4.7 sometimes rendered as narrative "Fase N" prose — producing plans ultraexecute Phase 2 rejected. First observed 2026-04-17 during llm-security v6.2.0 planning. v1.8.0 closes the gap: - planning-orchestrator Phase 5 embeds a literal copyable Step+Manifest example (JWT middleware) replacing "read the template" prose - Explicit forbidden-format clause: ## Fase N, ### Phase N, ### Stage N, and any non-"### Step N:" heading are denied - Phase 5.5 schema self-check: grep-verify canonical Step count matches Manifest count and narrative heading count is zero, before handing to plan-critic - ultraexecute-local --validate mode: schema-only check that parses steps + manifests, reports READY/FAIL with actionable error hints, no security scan, no execution. Fast sanity check between /ultraplan-local and full execution. Static verification: 17/17 PASS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
9f893c3858
commit
9ecd66929c
7 changed files with 203 additions and 9 deletions
|
|
@ -1,7 +1,7 @@
|
|||
{
|
||||
"name": "ultraplan-local",
|
||||
"description": "Deep implementation planning and research with interview, specialized agent swarms, external research, triangulation, adversarial review, session decomposition, and headless execution support.",
|
||||
"version": "1.7.0",
|
||||
"version": "1.8.0",
|
||||
"author": {
|
||||
"name": "Kjell Tore Guttormsen"
|
||||
},
|
||||
|
|
|
|||
|
|
@ -4,6 +4,62 @@ All notable changes to this project will be documented in this file.
|
|||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||||
|
||||
## [1.8.0] - 2026-04-17
|
||||
|
||||
### Opus 4.7 prompt literalism — closing the schema-drift gap
|
||||
|
||||
Opus 4.7 reads agent instructions more literally than 4.6 (per 4.7 system
|
||||
card §6.3.1.1). The v1.7 planning-orchestrator described the Step+Manifest
|
||||
schema via prose + procedural rules ("read the template"), which 4.6
|
||||
inferred correctly but 4.7 sometimes rendered as narrative "Fase N" prose.
|
||||
The result: plans that executed cleanly on 4.6 were rejected by
|
||||
ultraexecute Phase 2 parsing on 4.7 — first observed during v6.2.0 planning
|
||||
for `llm-security`. v1.8.0 closes the gap by replacing prose rules with a
|
||||
literal copyable template, explicit forbidden-format clauses, and a
|
||||
pre-handoff schema self-check.
|
||||
|
||||
### Added
|
||||
|
||||
- **Inline literal Step+Manifest template** in `planning-orchestrator`
|
||||
Phase 5 — a complete, copyable example (JWT middleware step) replaces
|
||||
"read the template" prose. Removes ambiguity about heading format,
|
||||
field order, and manifest YAML structure.
|
||||
- **Forbidden heading-format clause** in Phase 5 — explicit denylist for
|
||||
`## Fase N`, `### Phase N`, `### Stage N`, and other narrative formats
|
||||
the executor cannot parse. Negative constraints land harder on 4.7.
|
||||
- **Phase 5.5 schema self-check** in `planning-orchestrator` — after
|
||||
writing the plan, grep-verify canonical `### Step N:` count matches
|
||||
`manifest:` count, and narrative heading count is zero. Rewrite plan
|
||||
if self-check fails, before handing to plan-critic.
|
||||
- **`--validate` mode** in `ultraexecute-local` — schema-only check that
|
||||
parses steps and manifests, reports `READY | FAIL` with specific
|
||||
error hints, and exits without security scan or execution. Intended
|
||||
as a fast sanity-check between `/ultraplan-local` and full execution:
|
||||
```bash
|
||||
/ultraplan-local "task"
|
||||
/ultraexecute-local --validate <plan>.md # READY or actionable FAIL
|
||||
/ultraexecute-local <plan>.md # full execution
|
||||
```
|
||||
|
||||
### Changed
|
||||
|
||||
- `planning-orchestrator` Phase 5 now embeds the canonical Step template
|
||||
inline (~60 lines of literal example) rather than referring to
|
||||
`templates/plan-template.md`. Template file remains authoritative for
|
||||
cross-referencing but is no longer load-bearing for plan generation.
|
||||
- `ultraexecute-local` Phase 2.3 added as a hard exit point for
|
||||
`--validate` mode; Phase 2.4 security scan explicitly skips this mode.
|
||||
|
||||
### Rationale
|
||||
|
||||
v1.7.0's self-verifying chain assumed the orchestrator reliably produces
|
||||
the v1.7 schema. That held on 4.6. v1.8.0 makes the assumption robust to
|
||||
4.7-style literal interpretation by moving from "describe the format" to
|
||||
"show the exact format and forbid alternatives", plus a self-check loop
|
||||
before human-visible output. Pairs with `--validate` as a user-facing
|
||||
verification step that catches any residual drift before execution side
|
||||
effects begin.
|
||||
|
||||
## [1.7.0] - 2026-04-12
|
||||
|
||||
### The self-verifying plan chain
|
||||
|
|
|
|||
|
|
@ -45,6 +45,7 @@ Flags can be combined: `--local --fg`, `--external --quick`.
|
|||
| _(default)_ | Execute plan — auto-detects Execution Strategy for multi-session |
|
||||
| `--resume` | Resume from last progress checkpoint |
|
||||
| `--dry-run` | Validate plan structure without executing |
|
||||
| `--validate` | Schema-only check — parse steps + manifests, report `READY \| FAIL`, no execution (v1.8) |
|
||||
| `--step N` | Execute only step N |
|
||||
| `--fg` | Force foreground — run all steps sequentially, ignore Execution Strategy |
|
||||
| `--session N` | Execute only session N from plan's Execution Strategy |
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
# ultraplan-local — Plan, Research, Execute
|
||||
|
||||

|
||||

|
||||

|
||||

|
||||
|
||||
|
|
@ -233,6 +233,7 @@ Reads a plan from `/ultraplan-local` and implements it with strict discipline. N
|
|||
| **Default** | `/ultraexecute-local plan.md` | Auto-detects Execution Strategy, parallel if available |
|
||||
| **Resume** | `/ultraexecute-local plan.md --resume` | Resume from last progress checkpoint |
|
||||
| **Dry run** | `/ultraexecute-local plan.md --dry-run` | Validate plan structure + preview sessions and billing |
|
||||
| **Validate** | `/ultraexecute-local plan.md --validate` | Schema-only check — parse steps + manifests, report `READY \| FAIL`, no execution |
|
||||
| **Single step** | `/ultraexecute-local plan.md --step 3` | Execute only step 3 |
|
||||
| **Foreground** | `/ultraexecute-local plan.md --fg` | Force sequential, ignore Execution Strategy |
|
||||
| **Single session** | `/ultraexecute-local plan.md --session 2` | Execute only session 2 from Execution Strategy |
|
||||
|
|
|
|||
|
|
@ -191,6 +191,62 @@ the title. This signals to ultraexecute-local that the plan includes per-step
|
|||
verification manifests and enables strict audit mode. Plans without this
|
||||
marker are treated as legacy v1.6 with synthesized minimal manifests.
|
||||
|
||||
### Mandatory step format — copy this exactly
|
||||
|
||||
The Implementation Plan section MUST contain numbered steps using the EXACT
|
||||
format shown below. The executor (`ultraexecute-local`) parses plans with
|
||||
strict regex matching. Any deviation breaks parsing and forces the user to
|
||||
re-run planning.
|
||||
|
||||
**FORBIDDEN heading formats** (the executor's parser rejects these):
|
||||
- `## Fase 1`, `### Fase 1` — Norwegian narrative format
|
||||
- `## Phase 1`, `### Phase 1` — narrative phase format
|
||||
- `## Stage 1`, `### Stage 1` — narrative stage format
|
||||
- `### 1.` or `### 1)` — numbered without "Step"
|
||||
- `### Step 1 —` (em-dash instead of colon)
|
||||
- Any heading that doesn't match the regex `^### Step \d+: `
|
||||
|
||||
**REQUIRED heading format:** `### Step N: <description>` (where N is 1, 2, 3, ...
|
||||
and the colon is followed by a single space then the description).
|
||||
|
||||
**REQUIRED step body** — every step MUST include all of these fields, in this
|
||||
order, formatted as bullet points:
|
||||
|
||||
```markdown
|
||||
### Step 1: Add JWT verification middleware
|
||||
|
||||
- **Files:** `src/middleware/jwt.ts`
|
||||
- **Changes:** Create new middleware function `verifyJWT(req, res, next)` that reads `Authorization: Bearer <token>` header, verifies signature with `process.env.JWT_SECRET`, attaches decoded payload to `req.user`, and returns 401 on invalid/missing token. (new file)
|
||||
- **Reuses:** `jsonwebtoken.verify()` (already in package.json), pattern from `src/middleware/cors.ts`
|
||||
- **Test first:**
|
||||
- File: `src/middleware/jwt.test.ts` (new)
|
||||
- Verifies: valid token attaches user; invalid token returns 401; missing header returns 401
|
||||
- Pattern: `src/middleware/cors.test.ts` (follow this style)
|
||||
- **Verify:** `npm test -- jwt.test.ts` → expected: `3 passing`
|
||||
- **On failure:** revert — `git checkout -- src/middleware/jwt.ts src/middleware/jwt.test.ts`
|
||||
- **Checkpoint:** `git commit -m "feat(auth): add JWT verification middleware"`
|
||||
- **Manifest:**
|
||||
```yaml
|
||||
manifest:
|
||||
expected_paths:
|
||||
- src/middleware/jwt.ts
|
||||
- src/middleware/jwt.test.ts
|
||||
min_file_count: 2
|
||||
commit_message_pattern: "^feat\\(auth\\): add JWT verification middleware$"
|
||||
bash_syntax_check: []
|
||||
forbidden_paths:
|
||||
- src/middleware/cors.ts
|
||||
must_contain:
|
||||
- path: src/middleware/jwt.ts
|
||||
pattern: "verifyJWT"
|
||||
```
|
||||
```
|
||||
|
||||
The example above is the canonical shape. Substitute your own file paths,
|
||||
descriptions, and patterns — but preserve the exact heading format, bullet
|
||||
field names, and Manifest YAML structure. Do not invent new field names. Do
|
||||
not skip fields. Do not nest steps under sub-headings.
|
||||
|
||||
### Manifest generation rules (REQUIRED for every step)
|
||||
|
||||
Every implementation step MUST include a `Manifest:` block as its last field,
|
||||
|
|
@ -230,6 +286,32 @@ Derive the manifest fields mechanically from the step's other fields:
|
|||
|
||||
If any validation fails, fix the plan before handing to Phase 6 review.
|
||||
|
||||
### Phase 5.5 — Schema self-check (REQUIRED before Phase 6)
|
||||
|
||||
After writing the plan file, verify the output conforms to the executor's
|
||||
parser BEFORE handing to plan-critic. Use Bash to grep the plan file:
|
||||
|
||||
```bash
|
||||
# Count canonical step headings
|
||||
grep -c '^### Step [0-9]\+: ' "$plan_path"
|
||||
|
||||
# Count manifest blocks
|
||||
grep -c '^ manifest:' "$plan_path"
|
||||
|
||||
# Detect forbidden narrative formats
|
||||
grep -cE '^(##|###) (Fase|Phase|Stage) [0-9]' "$plan_path"
|
||||
```
|
||||
|
||||
**Pass criteria:**
|
||||
- Step count ≥ 1
|
||||
- Manifest count == Step count
|
||||
- Forbidden narrative count == 0
|
||||
|
||||
**If the plan fails schema self-check:** rewrite the Implementation Plan
|
||||
section using the exact literal template shown earlier in Phase 5. Do NOT
|
||||
proceed to Phase 6 with a schema-failing plan — plan-critic cannot repair
|
||||
format drift, only content issues.
|
||||
|
||||
### Failure recovery (REQUIRED for every step)
|
||||
|
||||
Each implementation step MUST include:
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
name: ultraexecute-local
|
||||
description: Disciplined plan executor — single-session or multi-session with parallel orchestration, failure recovery, and headless support
|
||||
argument-hint: "[--fg | --resume | --dry-run | --step N | --session N] <plan.md>"
|
||||
argument-hint: "[--fg | --resume | --dry-run | --validate | --step N | --session N] <plan.md>"
|
||||
model: opus
|
||||
allowed-tools: Read, Write, Edit, Bash, Glob, Grep, AskUserQuestion
|
||||
---
|
||||
|
|
@ -21,11 +21,12 @@ Parse `$ARGUMENTS` for mode flags:
|
|||
1. If arguments contain `--fg`: extract the file path. Set **mode = foreground**.
|
||||
2. If arguments contain `--resume`: extract the file path. Set **mode = resume**.
|
||||
3. If arguments contain `--dry-run`: extract the file path. Set **mode = dry-run**.
|
||||
4. If arguments contain `--step N` (N is a positive integer): extract N and the file path.
|
||||
4. If arguments contain `--validate`: extract the file path. Set **mode = validate**.
|
||||
5. If arguments contain `--step N` (N is a positive integer): extract N and the file path.
|
||||
Set **mode = step**, **target-step = N**.
|
||||
5. If arguments contain `--session N` (N is a positive integer): extract N and the file path.
|
||||
6. If arguments contain `--session N` (N is a positive integer): extract N and the file path.
|
||||
Set **mode = session**, **target-session = N**.
|
||||
6. Otherwise: the entire argument string is the file path. Set **mode = execute**.
|
||||
7. Otherwise: the entire argument string is the file path. Set **mode = execute**.
|
||||
|
||||
If no path is provided, output usage and stop:
|
||||
|
||||
|
|
@ -34,6 +35,7 @@ Usage: /ultraexecute-local <plan.md>
|
|||
/ultraexecute-local --fg <plan.md>
|
||||
/ultraexecute-local --resume <plan.md>
|
||||
/ultraexecute-local --dry-run <plan.md>
|
||||
/ultraexecute-local --validate <plan.md>
|
||||
/ultraexecute-local --step N <plan.md>
|
||||
/ultraexecute-local --session N <plan.md>
|
||||
|
||||
|
|
@ -42,6 +44,7 @@ Modes:
|
|||
--fg Force foreground — all steps sequentially, ignore Execution Strategy
|
||||
--resume Resume from last progress checkpoint
|
||||
--dry-run Validate plan and show execution strategy without running
|
||||
--validate Schema-only check — parse steps + manifests, no security scan, no execution
|
||||
--step N Execute only step N (foreground)
|
||||
--session N Execute only session N from the plan's Execution Strategy
|
||||
|
||||
|
|
@ -50,6 +53,7 @@ Examples:
|
|||
/ultraexecute-local --fg .claude/plans/ultraplan-2026-04-06-auth-refactor.md
|
||||
/ultraexecute-local --session 2 .claude/plans/ultraplan-2026-04-06-auth-refactor.md
|
||||
/ultraexecute-local --dry-run .claude/plans/ultraplan-2026-04-06-auth-refactor.md
|
||||
/ultraexecute-local --validate .claude/plans/ultraplan-2026-04-06-auth-refactor.md
|
||||
```
|
||||
|
||||
If the file does not exist, report and stop:
|
||||
|
|
@ -153,9 +157,57 @@ Steps: {N}
|
|||
{if warnings}: Warnings: {list}
|
||||
```
|
||||
|
||||
## Phase 2.3 — Validate-only mode exit (if mode = validate)
|
||||
|
||||
**If mode = validate, stop after Phase 2 parsing** and emit a schema-only
|
||||
report. Do NOT run security scan, do NOT touch progress files, do NOT
|
||||
execute any steps. This gives the user a fast sanity-check of plan
|
||||
schema compliance without side effects.
|
||||
|
||||
If Phase 2 parsing succeeded (no fatal errors, every step has a valid
|
||||
Manifest block in strict mode, or synthesized manifests in legacy mode):
|
||||
|
||||
```
|
||||
=== Schema Validation: READY ===
|
||||
File: {path}
|
||||
Type: {plan | session-spec}
|
||||
plan_version: {1.7 | legacy}
|
||||
Steps: {N}
|
||||
Manifests: {N valid | N synthesized (legacy)}
|
||||
Warnings: {count}
|
||||
{if warnings}: - {each warning on own line}
|
||||
|
||||
Plan is schema-compliant. Safe to run:
|
||||
/ultraexecute-local {path}
|
||||
```
|
||||
|
||||
If Phase 2 parsing failed (unrecognized format, missing Manifest in strict
|
||||
mode, malformed YAML, invalid regex):
|
||||
|
||||
```
|
||||
=== Schema Validation: FAIL ===
|
||||
File: {path}
|
||||
Reason: {specific error from Phase 2}
|
||||
|
||||
{if format not recognized}:
|
||||
Detected heading format: {e.g. "### Fase 1:", "## Phase 1"}
|
||||
Expected: "### Step N: <description>"
|
||||
Fix: re-run /ultraplan-local — planning-orchestrator must emit v1.7 format
|
||||
|
||||
{if missing manifest}:
|
||||
Step {N} has no Manifest block (plan_version=1.7 requires one per step)
|
||||
Fix: re-run /ultraplan-local — planning-orchestrator must include manifest YAML
|
||||
|
||||
{if malformed YAML or invalid regex}:
|
||||
Step {N}: {specific YAML/regex error}
|
||||
Fix: edit the plan manually or re-run /ultraplan-local
|
||||
```
|
||||
|
||||
Exit after emitting the report. Do not continue to Phase 2.4 or later.
|
||||
|
||||
## Phase 2.4 — Pre-execution security scan
|
||||
|
||||
**Runs for all modes except dry-run** (dry-run has its own report format).
|
||||
**Runs for all modes except dry-run and validate** (those modes exit earlier or have their own report format).
|
||||
|
||||
Scan every `Verify:` and `Checkpoint:` command in the parsed plan against the
|
||||
executor security denylist. This catches dangerous commands before execution begins.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue