feat(ultraplan-local): v1.8.0 — close Opus 4.7 schema-drift gap

Opus 4.7 reads agent instructions more literally than 4.6. The v1.7
planning-orchestrator described the Step+Manifest schema via prose +
procedural rules, which 4.6 inferred correctly but 4.7 sometimes
rendered as narrative "Fase N" prose — producing plans ultraexecute
Phase 2 rejected. First observed 2026-04-17 during llm-security v6.2.0
planning.

v1.8.0 closes the gap:

- planning-orchestrator Phase 5 embeds a literal copyable Step+Manifest
  example (JWT middleware) replacing "read the template" prose
- Explicit forbidden-format clause: ## Fase N, ### Phase N, ### Stage N,
  and any non-"### Step N:" heading are denied
- Phase 5.5 schema self-check: grep-verify canonical Step count matches
  Manifest count and narrative heading count is zero, before handing to
  plan-critic
- ultraexecute-local --validate mode: schema-only check that parses
  steps + manifests, reports READY/FAIL with actionable error hints,
  no security scan, no execution. Fast sanity check between
  /ultraplan-local and full execution.

Static verification: 17/17 PASS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Kjell Tore Guttormsen 2026-04-17 18:01:14 +02:00
commit 9ecd66929c
7 changed files with 203 additions and 9 deletions

View file

@ -61,7 +61,7 @@ Key commands: `/config-audit posture`, `/config-audit feature-gap`, `/config-aud
---
### [Ultra {research | plan | execute} - local](plugins/ultraplan-local/) `v1.7.0`
### [Ultra {research | plan | execute} - local](plugins/ultraplan-local/) `v1.8.0`
Deep research, implementation planning, and self-verifying execution with specialized agent swarms, adversarial review, and failure recovery.
@ -69,10 +69,12 @@ Three commands, one pipeline with clear division of labor:
- **`/ultraresearch-local`** — Gather context. Deep multi-source research with triangulation: 5 local agents + 4 external agents + Gemini bridge, producing structured briefs with confidence ratings. Makes no build decisions.
- **`/ultraplan-local`** — Transform intent into an executable contract. Per-step YAML manifests (`expected_paths`, `commit_message_pattern`, `bash_syntax_check`). Plan-critic is a hard gate on manifest quality. Accepts research briefs via `--research`.
- **`/ultraexecute-local`** — Execute the contract disciplined. Manifest-based verification, independent Phase 7.5 audit from git log + filesystem (ignores agent bookkeeping), Phase 7.6 bounded recovery dispatch for missing steps. Step 0 pre-flight catches sandbox push-denial before any work.
- **`/ultraexecute-local`** — Execute the contract disciplined. Manifest-based verification, independent Phase 7.5 audit from git log + filesystem (ignores agent bookkeeping), Phase 7.6 bounded recovery dispatch for missing steps. Step 0 pre-flight catches sandbox push-denial before any work. `--validate` mode offers a fast schema-only sanity-check between planning and execution.
v1.7 self-verifying chain: a step may not be marked `completed` unless its manifest verifies. The executor's last tool call before reporting must be a manifest check, not an arbitrary file review — preventing hallucinated completion.
v1.8 closes the Opus 4.7 prompt-literalism gap: planning-orchestrator now emits a literal copyable Step+Manifest template (not prose rules), explicitly forbids narrative `Fase/Phase/Stage` headers, and runs a schema self-check before handing off to plan-critic. Prevents the "plan generated, executor rejects" loop observed when 4.7 inferred the v1.7 schema from prose descriptions alone.
Defense-in-depth security: plugin hooks block destructive commands and sensitive path writes, prompt-level denylist works in headless sessions, pre-execution plan scan catches dangerous commands before they run, scoped `--allowedTools` replaces `--dangerously-skip-permissions` in parallel sessions.
Modes: default, spec-driven, research-enriched, foreground, quick, decompose, export

View file

@ -1,7 +1,7 @@
{
"name": "ultraplan-local",
"description": "Deep implementation planning and research with interview, specialized agent swarms, external research, triangulation, adversarial review, session decomposition, and headless execution support.",
"version": "1.7.0",
"version": "1.8.0",
"author": {
"name": "Kjell Tore Guttormsen"
},

View file

@ -4,6 +4,62 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [1.8.0] - 2026-04-17
### Opus 4.7 prompt literalism — closing the schema-drift gap
Opus 4.7 reads agent instructions more literally than 4.6 (per 4.7 system
card §6.3.1.1). The v1.7 planning-orchestrator described the Step+Manifest
schema via prose + procedural rules ("read the template"), which 4.6
inferred correctly but 4.7 sometimes rendered as narrative "Fase N" prose.
The result: plans that executed cleanly on 4.6 were rejected by
ultraexecute Phase 2 parsing on 4.7 — first observed during v6.2.0 planning
for `llm-security`. v1.8.0 closes the gap by replacing prose rules with a
literal copyable template, explicit forbidden-format clauses, and a
pre-handoff schema self-check.
### Added
- **Inline literal Step+Manifest template** in `planning-orchestrator`
Phase 5 — a complete, copyable example (JWT middleware step) replaces
"read the template" prose. Removes ambiguity about heading format,
field order, and manifest YAML structure.
- **Forbidden heading-format clause** in Phase 5 — explicit denylist for
`## Fase N`, `### Phase N`, `### Stage N`, and other narrative formats
the executor cannot parse. Negative constraints land harder on 4.7.
- **Phase 5.5 schema self-check** in `planning-orchestrator` — after
writing the plan, grep-verify canonical `### Step N:` count matches
`manifest:` count, and narrative heading count is zero. Rewrite plan
if self-check fails, before handing to plan-critic.
- **`--validate` mode** in `ultraexecute-local` — schema-only check that
parses steps and manifests, reports `READY | FAIL` with specific
error hints, and exits without security scan or execution. Intended
as a fast sanity-check between `/ultraplan-local` and full execution:
```bash
/ultraplan-local "task"
/ultraexecute-local --validate <plan>.md # READY or actionable FAIL
/ultraexecute-local <plan>.md # full execution
```
### Changed
- `planning-orchestrator` Phase 5 now embeds the canonical Step template
inline (~60 lines of literal example) rather than referring to
`templates/plan-template.md`. Template file remains authoritative for
cross-referencing but is no longer load-bearing for plan generation.
- `ultraexecute-local` Phase 2.3 added as a hard exit point for
`--validate` mode; Phase 2.4 security scan explicitly skips this mode.
### Rationale
v1.7.0's self-verifying chain assumed the orchestrator reliably produces
the v1.7 schema. That held on 4.6. v1.8.0 makes the assumption robust to
4.7-style literal interpretation by moving from "describe the format" to
"show the exact format and forbid alternatives", plus a self-check loop
before human-visible output. Pairs with `--validate` as a user-facing
verification step that catches any residual drift before execution side
effects begin.
## [1.7.0] - 2026-04-12
### The self-verifying plan chain

View file

@ -45,6 +45,7 @@ Flags can be combined: `--local --fg`, `--external --quick`.
| _(default)_ | Execute plan — auto-detects Execution Strategy for multi-session |
| `--resume` | Resume from last progress checkpoint |
| `--dry-run` | Validate plan structure without executing |
| `--validate` | Schema-only check — parse steps + manifests, report `READY \| FAIL`, no execution (v1.8) |
| `--step N` | Execute only step N |
| `--fg` | Force foreground — run all steps sequentially, ignore Execution Strategy |
| `--session N` | Execute only session N from plan's Execution Strategy |

View file

@ -1,6 +1,6 @@
# ultraplan-local — Plan, Research, Execute
![Version](https://img.shields.io/badge/version-1.7.0-blue)
![Version](https://img.shields.io/badge/version-1.8.0-blue)
![License](https://img.shields.io/badge/license-MIT-green)
![Platform](https://img.shields.io/badge/platform-Claude%20Code-purple)
@ -233,6 +233,7 @@ Reads a plan from `/ultraplan-local` and implements it with strict discipline. N
| **Default** | `/ultraexecute-local plan.md` | Auto-detects Execution Strategy, parallel if available |
| **Resume** | `/ultraexecute-local plan.md --resume` | Resume from last progress checkpoint |
| **Dry run** | `/ultraexecute-local plan.md --dry-run` | Validate plan structure + preview sessions and billing |
| **Validate** | `/ultraexecute-local plan.md --validate` | Schema-only check — parse steps + manifests, report `READY \| FAIL`, no execution |
| **Single step** | `/ultraexecute-local plan.md --step 3` | Execute only step 3 |
| **Foreground** | `/ultraexecute-local plan.md --fg` | Force sequential, ignore Execution Strategy |
| **Single session** | `/ultraexecute-local plan.md --session 2` | Execute only session 2 from Execution Strategy |

View file

@ -191,6 +191,62 @@ the title. This signals to ultraexecute-local that the plan includes per-step
verification manifests and enables strict audit mode. Plans without this
marker are treated as legacy v1.6 with synthesized minimal manifests.
### Mandatory step format — copy this exactly
The Implementation Plan section MUST contain numbered steps using the EXACT
format shown below. The executor (`ultraexecute-local`) parses plans with
strict regex matching. Any deviation breaks parsing and forces the user to
re-run planning.
**FORBIDDEN heading formats** (the executor's parser rejects these):
- `## Fase 1`, `### Fase 1` — Norwegian narrative format
- `## Phase 1`, `### Phase 1` — narrative phase format
- `## Stage 1`, `### Stage 1` — narrative stage format
- `### 1.` or `### 1)` — numbered without "Step"
- `### Step 1 —` (em-dash instead of colon)
- Any heading that doesn't match the regex `^### Step \d+: `
**REQUIRED heading format:** `### Step N: <description>` (where N is 1, 2, 3, ...
and the colon is followed by a single space then the description).
**REQUIRED step body** — every step MUST include all of these fields, in this
order, formatted as bullet points:
```markdown
### Step 1: Add JWT verification middleware
- **Files:** `src/middleware/jwt.ts`
- **Changes:** Create new middleware function `verifyJWT(req, res, next)` that reads `Authorization: Bearer <token>` header, verifies signature with `process.env.JWT_SECRET`, attaches decoded payload to `req.user`, and returns 401 on invalid/missing token. (new file)
- **Reuses:** `jsonwebtoken.verify()` (already in package.json), pattern from `src/middleware/cors.ts`
- **Test first:**
- File: `src/middleware/jwt.test.ts` (new)
- Verifies: valid token attaches user; invalid token returns 401; missing header returns 401
- Pattern: `src/middleware/cors.test.ts` (follow this style)
- **Verify:** `npm test -- jwt.test.ts` → expected: `3 passing`
- **On failure:** revert — `git checkout -- src/middleware/jwt.ts src/middleware/jwt.test.ts`
- **Checkpoint:** `git commit -m "feat(auth): add JWT verification middleware"`
- **Manifest:**
```yaml
manifest:
expected_paths:
- src/middleware/jwt.ts
- src/middleware/jwt.test.ts
min_file_count: 2
commit_message_pattern: "^feat\\(auth\\): add JWT verification middleware$"
bash_syntax_check: []
forbidden_paths:
- src/middleware/cors.ts
must_contain:
- path: src/middleware/jwt.ts
pattern: "verifyJWT"
```
```
The example above is the canonical shape. Substitute your own file paths,
descriptions, and patterns — but preserve the exact heading format, bullet
field names, and Manifest YAML structure. Do not invent new field names. Do
not skip fields. Do not nest steps under sub-headings.
### Manifest generation rules (REQUIRED for every step)
Every implementation step MUST include a `Manifest:` block as its last field,
@ -230,6 +286,32 @@ Derive the manifest fields mechanically from the step's other fields:
If any validation fails, fix the plan before handing to Phase 6 review.
### Phase 5.5 — Schema self-check (REQUIRED before Phase 6)
After writing the plan file, verify the output conforms to the executor's
parser BEFORE handing to plan-critic. Use Bash to grep the plan file:
```bash
# Count canonical step headings
grep -c '^### Step [0-9]\+: ' "$plan_path"
# Count manifest blocks
grep -c '^ manifest:' "$plan_path"
# Detect forbidden narrative formats
grep -cE '^(##|###) (Fase|Phase|Stage) [0-9]' "$plan_path"
```
**Pass criteria:**
- Step count ≥ 1
- Manifest count == Step count
- Forbidden narrative count == 0
**If the plan fails schema self-check:** rewrite the Implementation Plan
section using the exact literal template shown earlier in Phase 5. Do NOT
proceed to Phase 6 with a schema-failing plan — plan-critic cannot repair
format drift, only content issues.
### Failure recovery (REQUIRED for every step)
Each implementation step MUST include:

View file

@ -1,7 +1,7 @@
---
name: ultraexecute-local
description: Disciplined plan executor — single-session or multi-session with parallel orchestration, failure recovery, and headless support
argument-hint: "[--fg | --resume | --dry-run | --step N | --session N] <plan.md>"
argument-hint: "[--fg | --resume | --dry-run | --validate | --step N | --session N] <plan.md>"
model: opus
allowed-tools: Read, Write, Edit, Bash, Glob, Grep, AskUserQuestion
---
@ -21,11 +21,12 @@ Parse `$ARGUMENTS` for mode flags:
1. If arguments contain `--fg`: extract the file path. Set **mode = foreground**.
2. If arguments contain `--resume`: extract the file path. Set **mode = resume**.
3. If arguments contain `--dry-run`: extract the file path. Set **mode = dry-run**.
4. If arguments contain `--step N` (N is a positive integer): extract N and the file path.
4. If arguments contain `--validate`: extract the file path. Set **mode = validate**.
5. If arguments contain `--step N` (N is a positive integer): extract N and the file path.
Set **mode = step**, **target-step = N**.
5. If arguments contain `--session N` (N is a positive integer): extract N and the file path.
6. If arguments contain `--session N` (N is a positive integer): extract N and the file path.
Set **mode = session**, **target-session = N**.
6. Otherwise: the entire argument string is the file path. Set **mode = execute**.
7. Otherwise: the entire argument string is the file path. Set **mode = execute**.
If no path is provided, output usage and stop:
@ -34,6 +35,7 @@ Usage: /ultraexecute-local <plan.md>
/ultraexecute-local --fg <plan.md>
/ultraexecute-local --resume <plan.md>
/ultraexecute-local --dry-run <plan.md>
/ultraexecute-local --validate <plan.md>
/ultraexecute-local --step N <plan.md>
/ultraexecute-local --session N <plan.md>
@ -42,6 +44,7 @@ Modes:
--fg Force foreground — all steps sequentially, ignore Execution Strategy
--resume Resume from last progress checkpoint
--dry-run Validate plan and show execution strategy without running
--validate Schema-only check — parse steps + manifests, no security scan, no execution
--step N Execute only step N (foreground)
--session N Execute only session N from the plan's Execution Strategy
@ -50,6 +53,7 @@ Examples:
/ultraexecute-local --fg .claude/plans/ultraplan-2026-04-06-auth-refactor.md
/ultraexecute-local --session 2 .claude/plans/ultraplan-2026-04-06-auth-refactor.md
/ultraexecute-local --dry-run .claude/plans/ultraplan-2026-04-06-auth-refactor.md
/ultraexecute-local --validate .claude/plans/ultraplan-2026-04-06-auth-refactor.md
```
If the file does not exist, report and stop:
@ -153,9 +157,57 @@ Steps: {N}
{if warnings}: Warnings: {list}
```
## Phase 2.3 — Validate-only mode exit (if mode = validate)
**If mode = validate, stop after Phase 2 parsing** and emit a schema-only
report. Do NOT run security scan, do NOT touch progress files, do NOT
execute any steps. This gives the user a fast sanity-check of plan
schema compliance without side effects.
If Phase 2 parsing succeeded (no fatal errors, every step has a valid
Manifest block in strict mode, or synthesized manifests in legacy mode):
```
=== Schema Validation: READY ===
File: {path}
Type: {plan | session-spec}
plan_version: {1.7 | legacy}
Steps: {N}
Manifests: {N valid | N synthesized (legacy)}
Warnings: {count}
{if warnings}: - {each warning on own line}
Plan is schema-compliant. Safe to run:
/ultraexecute-local {path}
```
If Phase 2 parsing failed (unrecognized format, missing Manifest in strict
mode, malformed YAML, invalid regex):
```
=== Schema Validation: FAIL ===
File: {path}
Reason: {specific error from Phase 2}
{if format not recognized}:
Detected heading format: {e.g. "### Fase 1:", "## Phase 1"}
Expected: "### Step N: <description>"
Fix: re-run /ultraplan-local — planning-orchestrator must emit v1.7 format
{if missing manifest}:
Step {N} has no Manifest block (plan_version=1.7 requires one per step)
Fix: re-run /ultraplan-local — planning-orchestrator must include manifest YAML
{if malformed YAML or invalid regex}:
Step {N}: {specific YAML/regex error}
Fix: edit the plan manually or re-run /ultraplan-local
```
Exit after emitting the report. Do not continue to Phase 2.4 or later.
## Phase 2.4 — Pre-execution security scan
**Runs for all modes except dry-run** (dry-run has its own report format).
**Runs for all modes except dry-run and validate** (those modes exit earlier or have their own report format).
Scan every `Verify:` and `Checkpoint:` command in the parsed plan against the
executor security denylist. This catches dangerous commands before execution begins.