From 9ecd66929c136b959fc5a36fc95b81b010747e3d Mon Sep 17 00:00:00 2001 From: Kjell Tore Guttormsen Date: Fri, 17 Apr 2026 18:01:14 +0200 Subject: [PATCH] =?UTF-8?q?feat(ultraplan-local):=20v1.8.0=20=E2=80=94=20c?= =?UTF-8?q?lose=20Opus=204.7=20schema-drift=20gap?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Opus 4.7 reads agent instructions more literally than 4.6. The v1.7 planning-orchestrator described the Step+Manifest schema via prose + procedural rules, which 4.6 inferred correctly but 4.7 sometimes rendered as narrative "Fase N" prose — producing plans ultraexecute Phase 2 rejected. First observed 2026-04-17 during llm-security v6.2.0 planning. v1.8.0 closes the gap: - planning-orchestrator Phase 5 embeds a literal copyable Step+Manifest example (JWT middleware) replacing "read the template" prose - Explicit forbidden-format clause: ## Fase N, ### Phase N, ### Stage N, and any non-"### Step N:" heading are denied - Phase 5.5 schema self-check: grep-verify canonical Step count matches Manifest count and narrative heading count is zero, before handing to plan-critic - ultraexecute-local --validate mode: schema-only check that parses steps + manifests, reports READY/FAIL with actionable error hints, no security scan, no execution. Fast sanity check between /ultraplan-local and full execution. Static verification: 17/17 PASS. Co-Authored-By: Claude Opus 4.7 --- README.md | 6 +- .../.claude-plugin/plugin.json | 2 +- plugins/ultraplan-local/CHANGELOG.md | 56 +++++++++++++ plugins/ultraplan-local/CLAUDE.md | 1 + plugins/ultraplan-local/README.md | 3 +- .../agents/planning-orchestrator.md | 82 +++++++++++++++++++ .../commands/ultraexecute-local.md | 62 ++++++++++++-- 7 files changed, 203 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 8cdffda..17417a5 100644 --- a/README.md +++ b/README.md @@ -61,7 +61,7 @@ Key commands: `/config-audit posture`, `/config-audit feature-gap`, `/config-aud --- -### [Ultra {research | plan | execute} - local](plugins/ultraplan-local/) `v1.7.0` +### [Ultra {research | plan | execute} - local](plugins/ultraplan-local/) `v1.8.0` Deep research, implementation planning, and self-verifying execution with specialized agent swarms, adversarial review, and failure recovery. @@ -69,10 +69,12 @@ Three commands, one pipeline with clear division of labor: - **`/ultraresearch-local`** — Gather context. Deep multi-source research with triangulation: 5 local agents + 4 external agents + Gemini bridge, producing structured briefs with confidence ratings. Makes no build decisions. - **`/ultraplan-local`** — Transform intent into an executable contract. Per-step YAML manifests (`expected_paths`, `commit_message_pattern`, `bash_syntax_check`). Plan-critic is a hard gate on manifest quality. Accepts research briefs via `--research`. -- **`/ultraexecute-local`** — Execute the contract disciplined. Manifest-based verification, independent Phase 7.5 audit from git log + filesystem (ignores agent bookkeeping), Phase 7.6 bounded recovery dispatch for missing steps. Step 0 pre-flight catches sandbox push-denial before any work. +- **`/ultraexecute-local`** — Execute the contract disciplined. Manifest-based verification, independent Phase 7.5 audit from git log + filesystem (ignores agent bookkeeping), Phase 7.6 bounded recovery dispatch for missing steps. Step 0 pre-flight catches sandbox push-denial before any work. `--validate` mode offers a fast schema-only sanity-check between planning and execution. v1.7 self-verifying chain: a step may not be marked `completed` unless its manifest verifies. The executor's last tool call before reporting must be a manifest check, not an arbitrary file review — preventing hallucinated completion. +v1.8 closes the Opus 4.7 prompt-literalism gap: planning-orchestrator now emits a literal copyable Step+Manifest template (not prose rules), explicitly forbids narrative `Fase/Phase/Stage` headers, and runs a schema self-check before handing off to plan-critic. Prevents the "plan generated, executor rejects" loop observed when 4.7 inferred the v1.7 schema from prose descriptions alone. + Defense-in-depth security: plugin hooks block destructive commands and sensitive path writes, prompt-level denylist works in headless sessions, pre-execution plan scan catches dangerous commands before they run, scoped `--allowedTools` replaces `--dangerously-skip-permissions` in parallel sessions. Modes: default, spec-driven, research-enriched, foreground, quick, decompose, export diff --git a/plugins/ultraplan-local/.claude-plugin/plugin.json b/plugins/ultraplan-local/.claude-plugin/plugin.json index 5de84ad..0c61cd3 100644 --- a/plugins/ultraplan-local/.claude-plugin/plugin.json +++ b/plugins/ultraplan-local/.claude-plugin/plugin.json @@ -1,7 +1,7 @@ { "name": "ultraplan-local", "description": "Deep implementation planning and research with interview, specialized agent swarms, external research, triangulation, adversarial review, session decomposition, and headless execution support.", - "version": "1.7.0", + "version": "1.8.0", "author": { "name": "Kjell Tore Guttormsen" }, diff --git a/plugins/ultraplan-local/CHANGELOG.md b/plugins/ultraplan-local/CHANGELOG.md index 86f8f95..35668a0 100644 --- a/plugins/ultraplan-local/CHANGELOG.md +++ b/plugins/ultraplan-local/CHANGELOG.md @@ -4,6 +4,62 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). +## [1.8.0] - 2026-04-17 + +### Opus 4.7 prompt literalism — closing the schema-drift gap + +Opus 4.7 reads agent instructions more literally than 4.6 (per 4.7 system +card §6.3.1.1). The v1.7 planning-orchestrator described the Step+Manifest +schema via prose + procedural rules ("read the template"), which 4.6 +inferred correctly but 4.7 sometimes rendered as narrative "Fase N" prose. +The result: plans that executed cleanly on 4.6 were rejected by +ultraexecute Phase 2 parsing on 4.7 — first observed during v6.2.0 planning +for `llm-security`. v1.8.0 closes the gap by replacing prose rules with a +literal copyable template, explicit forbidden-format clauses, and a +pre-handoff schema self-check. + +### Added + +- **Inline literal Step+Manifest template** in `planning-orchestrator` + Phase 5 — a complete, copyable example (JWT middleware step) replaces + "read the template" prose. Removes ambiguity about heading format, + field order, and manifest YAML structure. +- **Forbidden heading-format clause** in Phase 5 — explicit denylist for + `## Fase N`, `### Phase N`, `### Stage N`, and other narrative formats + the executor cannot parse. Negative constraints land harder on 4.7. +- **Phase 5.5 schema self-check** in `planning-orchestrator` — after + writing the plan, grep-verify canonical `### Step N:` count matches + `manifest:` count, and narrative heading count is zero. Rewrite plan + if self-check fails, before handing to plan-critic. +- **`--validate` mode** in `ultraexecute-local` — schema-only check that + parses steps and manifests, reports `READY | FAIL` with specific + error hints, and exits without security scan or execution. Intended + as a fast sanity-check between `/ultraplan-local` and full execution: + ```bash + /ultraplan-local "task" + /ultraexecute-local --validate .md # READY or actionable FAIL + /ultraexecute-local .md # full execution + ``` + +### Changed + +- `planning-orchestrator` Phase 5 now embeds the canonical Step template + inline (~60 lines of literal example) rather than referring to + `templates/plan-template.md`. Template file remains authoritative for + cross-referencing but is no longer load-bearing for plan generation. +- `ultraexecute-local` Phase 2.3 added as a hard exit point for + `--validate` mode; Phase 2.4 security scan explicitly skips this mode. + +### Rationale + +v1.7.0's self-verifying chain assumed the orchestrator reliably produces +the v1.7 schema. That held on 4.6. v1.8.0 makes the assumption robust to +4.7-style literal interpretation by moving from "describe the format" to +"show the exact format and forbid alternatives", plus a self-check loop +before human-visible output. Pairs with `--validate` as a user-facing +verification step that catches any residual drift before execution side +effects begin. + ## [1.7.0] - 2026-04-12 ### The self-verifying plan chain diff --git a/plugins/ultraplan-local/CLAUDE.md b/plugins/ultraplan-local/CLAUDE.md index 99f8894..8658c9c 100644 --- a/plugins/ultraplan-local/CLAUDE.md +++ b/plugins/ultraplan-local/CLAUDE.md @@ -45,6 +45,7 @@ Flags can be combined: `--local --fg`, `--external --quick`. | _(default)_ | Execute plan — auto-detects Execution Strategy for multi-session | | `--resume` | Resume from last progress checkpoint | | `--dry-run` | Validate plan structure without executing | +| `--validate` | Schema-only check — parse steps + manifests, report `READY \| FAIL`, no execution (v1.8) | | `--step N` | Execute only step N | | `--fg` | Force foreground — run all steps sequentially, ignore Execution Strategy | | `--session N` | Execute only session N from plan's Execution Strategy | diff --git a/plugins/ultraplan-local/README.md b/plugins/ultraplan-local/README.md index ad15963..7d800c0 100644 --- a/plugins/ultraplan-local/README.md +++ b/plugins/ultraplan-local/README.md @@ -1,6 +1,6 @@ # ultraplan-local — Plan, Research, Execute -![Version](https://img.shields.io/badge/version-1.7.0-blue) +![Version](https://img.shields.io/badge/version-1.8.0-blue) ![License](https://img.shields.io/badge/license-MIT-green) ![Platform](https://img.shields.io/badge/platform-Claude%20Code-purple) @@ -233,6 +233,7 @@ Reads a plan from `/ultraplan-local` and implements it with strict discipline. N | **Default** | `/ultraexecute-local plan.md` | Auto-detects Execution Strategy, parallel if available | | **Resume** | `/ultraexecute-local plan.md --resume` | Resume from last progress checkpoint | | **Dry run** | `/ultraexecute-local plan.md --dry-run` | Validate plan structure + preview sessions and billing | +| **Validate** | `/ultraexecute-local plan.md --validate` | Schema-only check — parse steps + manifests, report `READY \| FAIL`, no execution | | **Single step** | `/ultraexecute-local plan.md --step 3` | Execute only step 3 | | **Foreground** | `/ultraexecute-local plan.md --fg` | Force sequential, ignore Execution Strategy | | **Single session** | `/ultraexecute-local plan.md --session 2` | Execute only session 2 from Execution Strategy | diff --git a/plugins/ultraplan-local/agents/planning-orchestrator.md b/plugins/ultraplan-local/agents/planning-orchestrator.md index 27e1e95..597ba73 100644 --- a/plugins/ultraplan-local/agents/planning-orchestrator.md +++ b/plugins/ultraplan-local/agents/planning-orchestrator.md @@ -191,6 +191,62 @@ the title. This signals to ultraexecute-local that the plan includes per-step verification manifests and enables strict audit mode. Plans without this marker are treated as legacy v1.6 with synthesized minimal manifests. +### Mandatory step format — copy this exactly + +The Implementation Plan section MUST contain numbered steps using the EXACT +format shown below. The executor (`ultraexecute-local`) parses plans with +strict regex matching. Any deviation breaks parsing and forces the user to +re-run planning. + +**FORBIDDEN heading formats** (the executor's parser rejects these): +- `## Fase 1`, `### Fase 1` — Norwegian narrative format +- `## Phase 1`, `### Phase 1` — narrative phase format +- `## Stage 1`, `### Stage 1` — narrative stage format +- `### 1.` or `### 1)` — numbered without "Step" +- `### Step 1 —` (em-dash instead of colon) +- Any heading that doesn't match the regex `^### Step \d+: ` + +**REQUIRED heading format:** `### Step N: ` (where N is 1, 2, 3, ... +and the colon is followed by a single space then the description). + +**REQUIRED step body** — every step MUST include all of these fields, in this +order, formatted as bullet points: + +```markdown +### Step 1: Add JWT verification middleware + +- **Files:** `src/middleware/jwt.ts` +- **Changes:** Create new middleware function `verifyJWT(req, res, next)` that reads `Authorization: Bearer ` header, verifies signature with `process.env.JWT_SECRET`, attaches decoded payload to `req.user`, and returns 401 on invalid/missing token. (new file) +- **Reuses:** `jsonwebtoken.verify()` (already in package.json), pattern from `src/middleware/cors.ts` +- **Test first:** + - File: `src/middleware/jwt.test.ts` (new) + - Verifies: valid token attaches user; invalid token returns 401; missing header returns 401 + - Pattern: `src/middleware/cors.test.ts` (follow this style) +- **Verify:** `npm test -- jwt.test.ts` → expected: `3 passing` +- **On failure:** revert — `git checkout -- src/middleware/jwt.ts src/middleware/jwt.test.ts` +- **Checkpoint:** `git commit -m "feat(auth): add JWT verification middleware"` +- **Manifest:** + ```yaml + manifest: + expected_paths: + - src/middleware/jwt.ts + - src/middleware/jwt.test.ts + min_file_count: 2 + commit_message_pattern: "^feat\\(auth\\): add JWT verification middleware$" + bash_syntax_check: [] + forbidden_paths: + - src/middleware/cors.ts + must_contain: + - path: src/middleware/jwt.ts + pattern: "verifyJWT" + ``` +``` + +The example above is the canonical shape. Substitute your own file paths, +descriptions, and patterns — but preserve the exact heading format, bullet +field names, and Manifest YAML structure. Do not invent new field names. Do +not skip fields. Do not nest steps under sub-headings. + ### Manifest generation rules (REQUIRED for every step) Every implementation step MUST include a `Manifest:` block as its last field, @@ -230,6 +286,32 @@ Derive the manifest fields mechanically from the step's other fields: If any validation fails, fix the plan before handing to Phase 6 review. +### Phase 5.5 — Schema self-check (REQUIRED before Phase 6) + +After writing the plan file, verify the output conforms to the executor's +parser BEFORE handing to plan-critic. Use Bash to grep the plan file: + +```bash +# Count canonical step headings +grep -c '^### Step [0-9]\+: ' "$plan_path" + +# Count manifest blocks +grep -c '^ manifest:' "$plan_path" + +# Detect forbidden narrative formats +grep -cE '^(##|###) (Fase|Phase|Stage) [0-9]' "$plan_path" +``` + +**Pass criteria:** +- Step count ≥ 1 +- Manifest count == Step count +- Forbidden narrative count == 0 + +**If the plan fails schema self-check:** rewrite the Implementation Plan +section using the exact literal template shown earlier in Phase 5. Do NOT +proceed to Phase 6 with a schema-failing plan — plan-critic cannot repair +format drift, only content issues. + ### Failure recovery (REQUIRED for every step) Each implementation step MUST include: diff --git a/plugins/ultraplan-local/commands/ultraexecute-local.md b/plugins/ultraplan-local/commands/ultraexecute-local.md index aaeed0b..c403a53 100644 --- a/plugins/ultraplan-local/commands/ultraexecute-local.md +++ b/plugins/ultraplan-local/commands/ultraexecute-local.md @@ -1,7 +1,7 @@ --- name: ultraexecute-local description: Disciplined plan executor — single-session or multi-session with parallel orchestration, failure recovery, and headless support -argument-hint: "[--fg | --resume | --dry-run | --step N | --session N] " +argument-hint: "[--fg | --resume | --dry-run | --validate | --step N | --session N] " model: opus allowed-tools: Read, Write, Edit, Bash, Glob, Grep, AskUserQuestion --- @@ -21,11 +21,12 @@ Parse `$ARGUMENTS` for mode flags: 1. If arguments contain `--fg`: extract the file path. Set **mode = foreground**. 2. If arguments contain `--resume`: extract the file path. Set **mode = resume**. 3. If arguments contain `--dry-run`: extract the file path. Set **mode = dry-run**. -4. If arguments contain `--step N` (N is a positive integer): extract N and the file path. +4. If arguments contain `--validate`: extract the file path. Set **mode = validate**. +5. If arguments contain `--step N` (N is a positive integer): extract N and the file path. Set **mode = step**, **target-step = N**. -5. If arguments contain `--session N` (N is a positive integer): extract N and the file path. +6. If arguments contain `--session N` (N is a positive integer): extract N and the file path. Set **mode = session**, **target-session = N**. -6. Otherwise: the entire argument string is the file path. Set **mode = execute**. +7. Otherwise: the entire argument string is the file path. Set **mode = execute**. If no path is provided, output usage and stop: @@ -34,6 +35,7 @@ Usage: /ultraexecute-local /ultraexecute-local --fg /ultraexecute-local --resume /ultraexecute-local --dry-run + /ultraexecute-local --validate /ultraexecute-local --step N /ultraexecute-local --session N @@ -42,6 +44,7 @@ Modes: --fg Force foreground — all steps sequentially, ignore Execution Strategy --resume Resume from last progress checkpoint --dry-run Validate plan and show execution strategy without running + --validate Schema-only check — parse steps + manifests, no security scan, no execution --step N Execute only step N (foreground) --session N Execute only session N from the plan's Execution Strategy @@ -50,6 +53,7 @@ Examples: /ultraexecute-local --fg .claude/plans/ultraplan-2026-04-06-auth-refactor.md /ultraexecute-local --session 2 .claude/plans/ultraplan-2026-04-06-auth-refactor.md /ultraexecute-local --dry-run .claude/plans/ultraplan-2026-04-06-auth-refactor.md + /ultraexecute-local --validate .claude/plans/ultraplan-2026-04-06-auth-refactor.md ``` If the file does not exist, report and stop: @@ -153,9 +157,57 @@ Steps: {N} {if warnings}: Warnings: {list} ``` +## Phase 2.3 — Validate-only mode exit (if mode = validate) + +**If mode = validate, stop after Phase 2 parsing** and emit a schema-only +report. Do NOT run security scan, do NOT touch progress files, do NOT +execute any steps. This gives the user a fast sanity-check of plan +schema compliance without side effects. + +If Phase 2 parsing succeeded (no fatal errors, every step has a valid +Manifest block in strict mode, or synthesized manifests in legacy mode): + +``` +=== Schema Validation: READY === +File: {path} +Type: {plan | session-spec} +plan_version: {1.7 | legacy} +Steps: {N} +Manifests: {N valid | N synthesized (legacy)} +Warnings: {count} +{if warnings}: - {each warning on own line} + +Plan is schema-compliant. Safe to run: + /ultraexecute-local {path} +``` + +If Phase 2 parsing failed (unrecognized format, missing Manifest in strict +mode, malformed YAML, invalid regex): + +``` +=== Schema Validation: FAIL === +File: {path} +Reason: {specific error from Phase 2} + +{if format not recognized}: + Detected heading format: {e.g. "### Fase 1:", "## Phase 1"} + Expected: "### Step N: " + Fix: re-run /ultraplan-local — planning-orchestrator must emit v1.7 format + +{if missing manifest}: + Step {N} has no Manifest block (plan_version=1.7 requires one per step) + Fix: re-run /ultraplan-local — planning-orchestrator must include manifest YAML + +{if malformed YAML or invalid regex}: + Step {N}: {specific YAML/regex error} + Fix: edit the plan manually or re-run /ultraplan-local +``` + +Exit after emitting the report. Do not continue to Phase 2.4 or later. + ## Phase 2.4 — Pre-execution security scan -**Runs for all modes except dry-run** (dry-run has its own report format). +**Runs for all modes except dry-run and validate** (those modes exit earlier or have their own report format). Scan every `Verify:` and `Checkpoint:` command in the parsed plan against the executor security denylist. This catches dangerous commands before execution begins.