feat(ultraplan-local): v1.8.0 — close Opus 4.7 schema-drift gap

Opus 4.7 reads agent instructions more literally than 4.6. The v1.7
planning-orchestrator described the Step+Manifest schema via prose +
procedural rules, which 4.6 inferred correctly but 4.7 sometimes
rendered as narrative "Fase N" prose — producing plans ultraexecute
Phase 2 rejected. First observed 2026-04-17 during llm-security v6.2.0
planning.

v1.8.0 closes the gap:

- planning-orchestrator Phase 5 embeds a literal copyable Step+Manifest
  example (JWT middleware) replacing "read the template" prose
- Explicit forbidden-format clause: ## Fase N, ### Phase N, ### Stage N,
  and any non-"### Step N:" heading are denied
- Phase 5.5 schema self-check: grep-verify canonical Step count matches
  Manifest count and narrative heading count is zero, before handing to
  plan-critic
- ultraexecute-local --validate mode: schema-only check that parses
  steps + manifests, reports READY/FAIL with actionable error hints,
  no security scan, no execution. Fast sanity check between
  /ultraplan-local and full execution.

Static verification: 17/17 PASS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Kjell Tore Guttormsen 2026-04-17 18:01:14 +02:00
commit 9ecd66929c
7 changed files with 203 additions and 9 deletions

View file

@ -61,7 +61,7 @@ Key commands: `/config-audit posture`, `/config-audit feature-gap`, `/config-aud
--- ---
### [Ultra {research | plan | execute} - local](plugins/ultraplan-local/) `v1.7.0` ### [Ultra {research | plan | execute} - local](plugins/ultraplan-local/) `v1.8.0`
Deep research, implementation planning, and self-verifying execution with specialized agent swarms, adversarial review, and failure recovery. Deep research, implementation planning, and self-verifying execution with specialized agent swarms, adversarial review, and failure recovery.
@ -69,10 +69,12 @@ Three commands, one pipeline with clear division of labor:
- **`/ultraresearch-local`** — Gather context. Deep multi-source research with triangulation: 5 local agents + 4 external agents + Gemini bridge, producing structured briefs with confidence ratings. Makes no build decisions. - **`/ultraresearch-local`** — Gather context. Deep multi-source research with triangulation: 5 local agents + 4 external agents + Gemini bridge, producing structured briefs with confidence ratings. Makes no build decisions.
- **`/ultraplan-local`** — Transform intent into an executable contract. Per-step YAML manifests (`expected_paths`, `commit_message_pattern`, `bash_syntax_check`). Plan-critic is a hard gate on manifest quality. Accepts research briefs via `--research`. - **`/ultraplan-local`** — Transform intent into an executable contract. Per-step YAML manifests (`expected_paths`, `commit_message_pattern`, `bash_syntax_check`). Plan-critic is a hard gate on manifest quality. Accepts research briefs via `--research`.
- **`/ultraexecute-local`** — Execute the contract disciplined. Manifest-based verification, independent Phase 7.5 audit from git log + filesystem (ignores agent bookkeeping), Phase 7.6 bounded recovery dispatch for missing steps. Step 0 pre-flight catches sandbox push-denial before any work. - **`/ultraexecute-local`** — Execute the contract disciplined. Manifest-based verification, independent Phase 7.5 audit from git log + filesystem (ignores agent bookkeeping), Phase 7.6 bounded recovery dispatch for missing steps. Step 0 pre-flight catches sandbox push-denial before any work. `--validate` mode offers a fast schema-only sanity-check between planning and execution.
v1.7 self-verifying chain: a step may not be marked `completed` unless its manifest verifies. The executor's last tool call before reporting must be a manifest check, not an arbitrary file review — preventing hallucinated completion. v1.7 self-verifying chain: a step may not be marked `completed` unless its manifest verifies. The executor's last tool call before reporting must be a manifest check, not an arbitrary file review — preventing hallucinated completion.
v1.8 closes the Opus 4.7 prompt-literalism gap: planning-orchestrator now emits a literal copyable Step+Manifest template (not prose rules), explicitly forbids narrative `Fase/Phase/Stage` headers, and runs a schema self-check before handing off to plan-critic. Prevents the "plan generated, executor rejects" loop observed when 4.7 inferred the v1.7 schema from prose descriptions alone.
Defense-in-depth security: plugin hooks block destructive commands and sensitive path writes, prompt-level denylist works in headless sessions, pre-execution plan scan catches dangerous commands before they run, scoped `--allowedTools` replaces `--dangerously-skip-permissions` in parallel sessions. Defense-in-depth security: plugin hooks block destructive commands and sensitive path writes, prompt-level denylist works in headless sessions, pre-execution plan scan catches dangerous commands before they run, scoped `--allowedTools` replaces `--dangerously-skip-permissions` in parallel sessions.
Modes: default, spec-driven, research-enriched, foreground, quick, decompose, export Modes: default, spec-driven, research-enriched, foreground, quick, decompose, export

View file

@ -1,7 +1,7 @@
{ {
"name": "ultraplan-local", "name": "ultraplan-local",
"description": "Deep implementation planning and research with interview, specialized agent swarms, external research, triangulation, adversarial review, session decomposition, and headless execution support.", "description": "Deep implementation planning and research with interview, specialized agent swarms, external research, triangulation, adversarial review, session decomposition, and headless execution support.",
"version": "1.7.0", "version": "1.8.0",
"author": { "author": {
"name": "Kjell Tore Guttormsen" "name": "Kjell Tore Guttormsen"
}, },

View file

@ -4,6 +4,62 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [1.8.0] - 2026-04-17
### Opus 4.7 prompt literalism — closing the schema-drift gap
Opus 4.7 reads agent instructions more literally than 4.6 (per 4.7 system
card §6.3.1.1). The v1.7 planning-orchestrator described the Step+Manifest
schema via prose + procedural rules ("read the template"), which 4.6
inferred correctly but 4.7 sometimes rendered as narrative "Fase N" prose.
The result: plans that executed cleanly on 4.6 were rejected by
ultraexecute Phase 2 parsing on 4.7 — first observed during v6.2.0 planning
for `llm-security`. v1.8.0 closes the gap by replacing prose rules with a
literal copyable template, explicit forbidden-format clauses, and a
pre-handoff schema self-check.
### Added
- **Inline literal Step+Manifest template** in `planning-orchestrator`
Phase 5 — a complete, copyable example (JWT middleware step) replaces
"read the template" prose. Removes ambiguity about heading format,
field order, and manifest YAML structure.
- **Forbidden heading-format clause** in Phase 5 — explicit denylist for
`## Fase N`, `### Phase N`, `### Stage N`, and other narrative formats
the executor cannot parse. Negative constraints land harder on 4.7.
- **Phase 5.5 schema self-check** in `planning-orchestrator` — after
writing the plan, grep-verify canonical `### Step N:` count matches
`manifest:` count, and narrative heading count is zero. Rewrite plan
if self-check fails, before handing to plan-critic.
- **`--validate` mode** in `ultraexecute-local` — schema-only check that
parses steps and manifests, reports `READY | FAIL` with specific
error hints, and exits without security scan or execution. Intended
as a fast sanity-check between `/ultraplan-local` and full execution:
```bash
/ultraplan-local "task"
/ultraexecute-local --validate <plan>.md # READY or actionable FAIL
/ultraexecute-local <plan>.md # full execution
```
### Changed
- `planning-orchestrator` Phase 5 now embeds the canonical Step template
inline (~60 lines of literal example) rather than referring to
`templates/plan-template.md`. Template file remains authoritative for
cross-referencing but is no longer load-bearing for plan generation.
- `ultraexecute-local` Phase 2.3 added as a hard exit point for
`--validate` mode; Phase 2.4 security scan explicitly skips this mode.
### Rationale
v1.7.0's self-verifying chain assumed the orchestrator reliably produces
the v1.7 schema. That held on 4.6. v1.8.0 makes the assumption robust to
4.7-style literal interpretation by moving from "describe the format" to
"show the exact format and forbid alternatives", plus a self-check loop
before human-visible output. Pairs with `--validate` as a user-facing
verification step that catches any residual drift before execution side
effects begin.
## [1.7.0] - 2026-04-12 ## [1.7.0] - 2026-04-12
### The self-verifying plan chain ### The self-verifying plan chain

View file

@ -45,6 +45,7 @@ Flags can be combined: `--local --fg`, `--external --quick`.
| _(default)_ | Execute plan — auto-detects Execution Strategy for multi-session | | _(default)_ | Execute plan — auto-detects Execution Strategy for multi-session |
| `--resume` | Resume from last progress checkpoint | | `--resume` | Resume from last progress checkpoint |
| `--dry-run` | Validate plan structure without executing | | `--dry-run` | Validate plan structure without executing |
| `--validate` | Schema-only check — parse steps + manifests, report `READY \| FAIL`, no execution (v1.8) |
| `--step N` | Execute only step N | | `--step N` | Execute only step N |
| `--fg` | Force foreground — run all steps sequentially, ignore Execution Strategy | | `--fg` | Force foreground — run all steps sequentially, ignore Execution Strategy |
| `--session N` | Execute only session N from plan's Execution Strategy | | `--session N` | Execute only session N from plan's Execution Strategy |

View file

@ -1,6 +1,6 @@
# ultraplan-local — Plan, Research, Execute # ultraplan-local — Plan, Research, Execute
![Version](https://img.shields.io/badge/version-1.7.0-blue) ![Version](https://img.shields.io/badge/version-1.8.0-blue)
![License](https://img.shields.io/badge/license-MIT-green) ![License](https://img.shields.io/badge/license-MIT-green)
![Platform](https://img.shields.io/badge/platform-Claude%20Code-purple) ![Platform](https://img.shields.io/badge/platform-Claude%20Code-purple)
@ -233,6 +233,7 @@ Reads a plan from `/ultraplan-local` and implements it with strict discipline. N
| **Default** | `/ultraexecute-local plan.md` | Auto-detects Execution Strategy, parallel if available | | **Default** | `/ultraexecute-local plan.md` | Auto-detects Execution Strategy, parallel if available |
| **Resume** | `/ultraexecute-local plan.md --resume` | Resume from last progress checkpoint | | **Resume** | `/ultraexecute-local plan.md --resume` | Resume from last progress checkpoint |
| **Dry run** | `/ultraexecute-local plan.md --dry-run` | Validate plan structure + preview sessions and billing | | **Dry run** | `/ultraexecute-local plan.md --dry-run` | Validate plan structure + preview sessions and billing |
| **Validate** | `/ultraexecute-local plan.md --validate` | Schema-only check — parse steps + manifests, report `READY \| FAIL`, no execution |
| **Single step** | `/ultraexecute-local plan.md --step 3` | Execute only step 3 | | **Single step** | `/ultraexecute-local plan.md --step 3` | Execute only step 3 |
| **Foreground** | `/ultraexecute-local plan.md --fg` | Force sequential, ignore Execution Strategy | | **Foreground** | `/ultraexecute-local plan.md --fg` | Force sequential, ignore Execution Strategy |
| **Single session** | `/ultraexecute-local plan.md --session 2` | Execute only session 2 from Execution Strategy | | **Single session** | `/ultraexecute-local plan.md --session 2` | Execute only session 2 from Execution Strategy |

View file

@ -191,6 +191,62 @@ the title. This signals to ultraexecute-local that the plan includes per-step
verification manifests and enables strict audit mode. Plans without this verification manifests and enables strict audit mode. Plans without this
marker are treated as legacy v1.6 with synthesized minimal manifests. marker are treated as legacy v1.6 with synthesized minimal manifests.
### Mandatory step format — copy this exactly
The Implementation Plan section MUST contain numbered steps using the EXACT
format shown below. The executor (`ultraexecute-local`) parses plans with
strict regex matching. Any deviation breaks parsing and forces the user to
re-run planning.
**FORBIDDEN heading formats** (the executor's parser rejects these):
- `## Fase 1`, `### Fase 1` — Norwegian narrative format
- `## Phase 1`, `### Phase 1` — narrative phase format
- `## Stage 1`, `### Stage 1` — narrative stage format
- `### 1.` or `### 1)` — numbered without "Step"
- `### Step 1 —` (em-dash instead of colon)
- Any heading that doesn't match the regex `^### Step \d+: `
**REQUIRED heading format:** `### Step N: <description>` (where N is 1, 2, 3, ...
and the colon is followed by a single space then the description).
**REQUIRED step body** — every step MUST include all of these fields, in this
order, formatted as bullet points:
```markdown
### Step 1: Add JWT verification middleware
- **Files:** `src/middleware/jwt.ts`
- **Changes:** Create new middleware function `verifyJWT(req, res, next)` that reads `Authorization: Bearer <token>` header, verifies signature with `process.env.JWT_SECRET`, attaches decoded payload to `req.user`, and returns 401 on invalid/missing token. (new file)
- **Reuses:** `jsonwebtoken.verify()` (already in package.json), pattern from `src/middleware/cors.ts`
- **Test first:**
- File: `src/middleware/jwt.test.ts` (new)
- Verifies: valid token attaches user; invalid token returns 401; missing header returns 401
- Pattern: `src/middleware/cors.test.ts` (follow this style)
- **Verify:** `npm test -- jwt.test.ts` → expected: `3 passing`
- **On failure:** revert — `git checkout -- src/middleware/jwt.ts src/middleware/jwt.test.ts`
- **Checkpoint:** `git commit -m "feat(auth): add JWT verification middleware"`
- **Manifest:**
```yaml
manifest:
expected_paths:
- src/middleware/jwt.ts
- src/middleware/jwt.test.ts
min_file_count: 2
commit_message_pattern: "^feat\\(auth\\): add JWT verification middleware$"
bash_syntax_check: []
forbidden_paths:
- src/middleware/cors.ts
must_contain:
- path: src/middleware/jwt.ts
pattern: "verifyJWT"
```
```
The example above is the canonical shape. Substitute your own file paths,
descriptions, and patterns — but preserve the exact heading format, bullet
field names, and Manifest YAML structure. Do not invent new field names. Do
not skip fields. Do not nest steps under sub-headings.
### Manifest generation rules (REQUIRED for every step) ### Manifest generation rules (REQUIRED for every step)
Every implementation step MUST include a `Manifest:` block as its last field, Every implementation step MUST include a `Manifest:` block as its last field,
@ -230,6 +286,32 @@ Derive the manifest fields mechanically from the step's other fields:
If any validation fails, fix the plan before handing to Phase 6 review. If any validation fails, fix the plan before handing to Phase 6 review.
### Phase 5.5 — Schema self-check (REQUIRED before Phase 6)
After writing the plan file, verify the output conforms to the executor's
parser BEFORE handing to plan-critic. Use Bash to grep the plan file:
```bash
# Count canonical step headings
grep -c '^### Step [0-9]\+: ' "$plan_path"
# Count manifest blocks
grep -c '^ manifest:' "$plan_path"
# Detect forbidden narrative formats
grep -cE '^(##|###) (Fase|Phase|Stage) [0-9]' "$plan_path"
```
**Pass criteria:**
- Step count ≥ 1
- Manifest count == Step count
- Forbidden narrative count == 0
**If the plan fails schema self-check:** rewrite the Implementation Plan
section using the exact literal template shown earlier in Phase 5. Do NOT
proceed to Phase 6 with a schema-failing plan — plan-critic cannot repair
format drift, only content issues.
### Failure recovery (REQUIRED for every step) ### Failure recovery (REQUIRED for every step)
Each implementation step MUST include: Each implementation step MUST include:

View file

@ -1,7 +1,7 @@
--- ---
name: ultraexecute-local name: ultraexecute-local
description: Disciplined plan executor — single-session or multi-session with parallel orchestration, failure recovery, and headless support description: Disciplined plan executor — single-session or multi-session with parallel orchestration, failure recovery, and headless support
argument-hint: "[--fg | --resume | --dry-run | --step N | --session N] <plan.md>" argument-hint: "[--fg | --resume | --dry-run | --validate | --step N | --session N] <plan.md>"
model: opus model: opus
allowed-tools: Read, Write, Edit, Bash, Glob, Grep, AskUserQuestion allowed-tools: Read, Write, Edit, Bash, Glob, Grep, AskUserQuestion
--- ---
@ -21,11 +21,12 @@ Parse `$ARGUMENTS` for mode flags:
1. If arguments contain `--fg`: extract the file path. Set **mode = foreground**. 1. If arguments contain `--fg`: extract the file path. Set **mode = foreground**.
2. If arguments contain `--resume`: extract the file path. Set **mode = resume**. 2. If arguments contain `--resume`: extract the file path. Set **mode = resume**.
3. If arguments contain `--dry-run`: extract the file path. Set **mode = dry-run**. 3. If arguments contain `--dry-run`: extract the file path. Set **mode = dry-run**.
4. If arguments contain `--step N` (N is a positive integer): extract N and the file path. 4. If arguments contain `--validate`: extract the file path. Set **mode = validate**.
5. If arguments contain `--step N` (N is a positive integer): extract N and the file path.
Set **mode = step**, **target-step = N**. Set **mode = step**, **target-step = N**.
5. If arguments contain `--session N` (N is a positive integer): extract N and the file path. 6. If arguments contain `--session N` (N is a positive integer): extract N and the file path.
Set **mode = session**, **target-session = N**. Set **mode = session**, **target-session = N**.
6. Otherwise: the entire argument string is the file path. Set **mode = execute**. 7. Otherwise: the entire argument string is the file path. Set **mode = execute**.
If no path is provided, output usage and stop: If no path is provided, output usage and stop:
@ -34,6 +35,7 @@ Usage: /ultraexecute-local <plan.md>
/ultraexecute-local --fg <plan.md> /ultraexecute-local --fg <plan.md>
/ultraexecute-local --resume <plan.md> /ultraexecute-local --resume <plan.md>
/ultraexecute-local --dry-run <plan.md> /ultraexecute-local --dry-run <plan.md>
/ultraexecute-local --validate <plan.md>
/ultraexecute-local --step N <plan.md> /ultraexecute-local --step N <plan.md>
/ultraexecute-local --session N <plan.md> /ultraexecute-local --session N <plan.md>
@ -42,6 +44,7 @@ Modes:
--fg Force foreground — all steps sequentially, ignore Execution Strategy --fg Force foreground — all steps sequentially, ignore Execution Strategy
--resume Resume from last progress checkpoint --resume Resume from last progress checkpoint
--dry-run Validate plan and show execution strategy without running --dry-run Validate plan and show execution strategy without running
--validate Schema-only check — parse steps + manifests, no security scan, no execution
--step N Execute only step N (foreground) --step N Execute only step N (foreground)
--session N Execute only session N from the plan's Execution Strategy --session N Execute only session N from the plan's Execution Strategy
@ -50,6 +53,7 @@ Examples:
/ultraexecute-local --fg .claude/plans/ultraplan-2026-04-06-auth-refactor.md /ultraexecute-local --fg .claude/plans/ultraplan-2026-04-06-auth-refactor.md
/ultraexecute-local --session 2 .claude/plans/ultraplan-2026-04-06-auth-refactor.md /ultraexecute-local --session 2 .claude/plans/ultraplan-2026-04-06-auth-refactor.md
/ultraexecute-local --dry-run .claude/plans/ultraplan-2026-04-06-auth-refactor.md /ultraexecute-local --dry-run .claude/plans/ultraplan-2026-04-06-auth-refactor.md
/ultraexecute-local --validate .claude/plans/ultraplan-2026-04-06-auth-refactor.md
``` ```
If the file does not exist, report and stop: If the file does not exist, report and stop:
@ -153,9 +157,57 @@ Steps: {N}
{if warnings}: Warnings: {list} {if warnings}: Warnings: {list}
``` ```
## Phase 2.3 — Validate-only mode exit (if mode = validate)
**If mode = validate, stop after Phase 2 parsing** and emit a schema-only
report. Do NOT run security scan, do NOT touch progress files, do NOT
execute any steps. This gives the user a fast sanity-check of plan
schema compliance without side effects.
If Phase 2 parsing succeeded (no fatal errors, every step has a valid
Manifest block in strict mode, or synthesized manifests in legacy mode):
```
=== Schema Validation: READY ===
File: {path}
Type: {plan | session-spec}
plan_version: {1.7 | legacy}
Steps: {N}
Manifests: {N valid | N synthesized (legacy)}
Warnings: {count}
{if warnings}: - {each warning on own line}
Plan is schema-compliant. Safe to run:
/ultraexecute-local {path}
```
If Phase 2 parsing failed (unrecognized format, missing Manifest in strict
mode, malformed YAML, invalid regex):
```
=== Schema Validation: FAIL ===
File: {path}
Reason: {specific error from Phase 2}
{if format not recognized}:
Detected heading format: {e.g. "### Fase 1:", "## Phase 1"}
Expected: "### Step N: <description>"
Fix: re-run /ultraplan-local — planning-orchestrator must emit v1.7 format
{if missing manifest}:
Step {N} has no Manifest block (plan_version=1.7 requires one per step)
Fix: re-run /ultraplan-local — planning-orchestrator must include manifest YAML
{if malformed YAML or invalid regex}:
Step {N}: {specific YAML/regex error}
Fix: edit the plan manually or re-run /ultraplan-local
```
Exit after emitting the report. Do not continue to Phase 2.4 or later.
## Phase 2.4 — Pre-execution security scan ## Phase 2.4 — Pre-execution security scan
**Runs for all modes except dry-run** (dry-run has its own report format). **Runs for all modes except dry-run and validate** (those modes exit earlier or have their own report format).
Scan every `Verify:` and `Checkpoint:` command in the parsed plan against the Scan every `Verify:` and `Checkpoint:` command in the parsed plan against the
executor security denylist. This catches dangerous commands before execution begins. executor security denylist. This catches dangerous commands before execution begins.