Wave 1 of a 6-session parallel build revealed three failure modes: (1) hallucinated completion (status=completed after 2/5 steps, last tool call was an arbitrary file review), (2) fail-late bash (3/6 sessions had push blocked inside sub-agent sandbox after all work was done), (3) no objective verification (plans were prose). v1.7 closes all three by making the plan an executable contract. Per-step YAML manifest (expected_paths, commit_message_pattern, bash_syntax_check, forbidden_paths, must_contain) is the objective completion predicate. Plan-critic dimension 10 (Manifest quality) is a hard gate. Session decomposer propagates manifests verbatim and emits an obligatory Step 0 pre-flight (git push --dry-run, exit 77 sentinel) in every session spec. ultraexecute-local gets Phase 7.5 (independent manifest audit from git log + filesystem, ignoring agent bookkeeping) and Phase 7.6 (bounded recovery dispatch, recovery_depth ≤ 2). Hard Rule 17 forbids marking a step passed without manifest verification. Hard Rule 18 forbids ending on an arbitrary tool call before reporting. Division of labor is made explicit: - /ultraresearch-local gathers context (no build decisions) - /ultraplan-local produces an executable contract (manifests, plan-critic gate) - /ultraexecute-local executes disciplined (does NOT compensate for weak plans — escalates) Code complete. Docs partial (Arbeidsdeling table + manifest section added to plugin + marketplace READMEs). Verification tests (10-sequence) pending — see REMEMBER.md. Backward compat: v1.6 plans without plan_version marker get legacy mode with synthesized manifests and legacy_plan: true in progress file. Plan-critic emits advisory, not block. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
340 lines
22 KiB
Markdown
340 lines
22 KiB
Markdown
# Changelog
|
||
|
||
All notable changes to this project will be documented in this file.
|
||
|
||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||
|
||
## [1.7.0] - 2026-04-12
|
||
|
||
### The self-verifying plan chain
|
||
|
||
Wave 1 of a parallel 6-session build revealed three failure modes: (1) a
|
||
session reported `status=completed` after only 2/5 steps — last tool call
|
||
was an arbitrary file review, not a completion check; (2) 3/6 sessions
|
||
had push blocked inside the sub-agent bash sandbox *after* all work was
|
||
done; (3) plans and blueprints were prose, so the orchestrator had no
|
||
machine-readable way to verify completion. v1.7.0 closes all three by
|
||
making the plan itself an executable contract.
|
||
|
||
### Added
|
||
|
||
- **Per-step verification manifest** in plan format (`plan_version: 1.7`).
|
||
Every step now ends with a YAML `manifest:` block declaring
|
||
`expected_paths`, `min_file_count`, `commit_message_pattern`,
|
||
`bash_syntax_check`, `forbidden_paths`, `must_contain`. The manifest is
|
||
the objective completion predicate — the Verify command is necessary but
|
||
not sufficient.
|
||
- **Plan-critic dimension 10 — Manifest quality (hard gate).** Missing
|
||
or invalid manifest (unparseable regex, path contradiction, missing
|
||
block) is a `major` finding. v1.6 plans get a legacy-mode warning
|
||
instead of a block.
|
||
- **Session Manifest aggregate** in session specs — synthesized by
|
||
`session-decomposer` as the union of per-step manifests. Gives
|
||
`ultraexecute-local` a single YAML block per session to audit against.
|
||
- **Step 0: Sandbox pre-flight** — obligatory first step in every
|
||
generated session spec. Runs `git push --dry-run origin HEAD`; exit 77
|
||
= sandbox cannot push, session status becomes `blocked` (not `failed`),
|
||
no real work attempted. Escape hatch: `ULTRAEXECUTE_SKIP_PREFLIGHT=1`.
|
||
- **Launch script pre-flight** — `headless-launch-template.md` adds a
|
||
`git push --dry-run` check outside the sandbox, before any session
|
||
spawns, catching credential issues at the earliest possible point.
|
||
- **Phase 7.5 — Manifest audit (independent).** After all steps complete,
|
||
`ultraexecute-local` re-verifies expected paths, commit count, commit
|
||
message patterns, bash syntax, and forbidden-path untouched-ness from
|
||
git log and filesystem. Agent's own bookkeeping is ignored. Disagreement
|
||
with progress file → status overridden to `partial`.
|
||
- **Phase 7.6 — Recovery dispatch (bounded).** When Phase 7.5 detects
|
||
drift in multi-session parent context, synthesize a temp session spec
|
||
containing only missing steps and dispatch via existing
|
||
`claude -p "/ultraexecute-local --session N"`. `recovery_depth ≤ 2`
|
||
hard cap — third drift escalates to user.
|
||
- **Hard Rule 17: Manifest is the completion predicate.** A step may
|
||
not be marked passed if its manifest does not verify, regardless of
|
||
Verify's exit code.
|
||
- **Hard Rule 18: Last-activity rule.** Executor's final tool call
|
||
before Phase 8 must be a manifest check, never an arbitrary file
|
||
review. Prevents hallucinated completion.
|
||
|
||
### Changed
|
||
|
||
- **Plan template (`templates/plan-template.md`)** — adds
|
||
`plan_version: 1.7` metadata line, `Manifest:` field on every step,
|
||
"Manifest — objective completion predicate" section.
|
||
- **Plan-critic scoring** rebalanced: Headless readiness 0.15 → 0.10,
|
||
Manifest quality 0.05 added. Legacy v1.6 plans skip the Manifest
|
||
dimension and keep Headless readiness at 0.15.
|
||
- **Planning-orchestrator Phase 5** adds "Manifest generation rules
|
||
(REQUIRED for every step)" with mechanical derivation from `Files:`
|
||
and Checkpoint. Validates regex compilation and path existence before
|
||
handoff to plan-critic.
|
||
- **Session-decomposer** parses plan manifests and propagates them
|
||
verbatim into session specs. For v1.7+ plans with missing manifests:
|
||
abort with pointer to failing step. For legacy v1.6 plans: synthesize
|
||
minimal manifests and flag `legacy_synthesis: true`.
|
||
- **ultraexecute-local Phase 2** parses manifest YAML. Ugyldig YAML =
|
||
abort with pointer to step. v<1.7 plans: synthesize + log
|
||
`legacy_plan: true`.
|
||
- **ultraexecute-local Phase 6** — sub-step D renamed to D1 "Command
|
||
verification"; new D2 "Manifest verification" runs after D1 with 5
|
||
checks. F "Checkpoint" adds `checkpoint_drift` logging when HEAD
|
||
message doesn't match `commit_message_pattern` (non-fatal).
|
||
- **Phase 8 report** — table gets Manifest column; JSON summary adds
|
||
`plan_version`, `manifest_audit`, `drift_details`, `recovery_dispatched`,
|
||
`recovery_depth`, `legacy_plan`. Result vocabulary strict:
|
||
`completed | partial | blocked | failed | stopped`.
|
||
- **Division of labor clarified** in README — `/ultraresearch-local`
|
||
gathers context (no decisions), `/ultraplan-local` transforms intent
|
||
into an executable contract (manifests, plan-critic gate),
|
||
`/ultraexecute-local` executes the contract disciplined (does NOT
|
||
compensate for weak plans — escalates).
|
||
|
||
## [1.6.0] - 2026-04-08
|
||
|
||
### Added
|
||
|
||
- **`/ultraresearch-local` command** — deep research combining local codebase analysis
|
||
with external knowledge. Produces structured research briefs with triangulation,
|
||
confidence ratings, and source quality assessment. Supports modes: default (background),
|
||
`--quick` (inline), `--local` (codebase only), `--external` (web only), `--fg` (foreground).
|
||
- **6 new agents** for the research pipeline:
|
||
- `research-orchestrator` (opus) — runs full research pipeline as background task
|
||
- `docs-researcher` (sonnet) — official documentation via Tavily, WebSearch, Microsoft Learn
|
||
- `community-researcher` (sonnet) — real-world experience from issues, blogs, discussions
|
||
- `security-researcher` (sonnet) — CVEs, audit history, supply chain risks
|
||
- `contrarian-researcher` (sonnet) — counter-evidence and overlooked alternatives
|
||
- `gemini-bridge` (sonnet) — independent second opinion via Gemini Deep Research MCP
|
||
- **Research brief template** (`templates/research-brief-template.md`) — structured format
|
||
with dimensions, confidence ratings, triangulation, and source quality assessment.
|
||
- **`--research` flag for `/ultraplan-local`** — accepts up to 3 research brief paths.
|
||
Enriches the interview (focuses on decisions, not facts) and injects brief context into
|
||
exploration agents. Research-scout skips already-covered technologies.
|
||
- **Research-aware planning orchestrator** — `planning-orchestrator.md` now accepts research
|
||
briefs, injects summaries into sub-agent prompts, and cross-references brief findings
|
||
during synthesis.
|
||
- **Research settings** in `settings.json` — configurable Gemini bridge (enabled/timeout),
|
||
interview depth, dimension limits, and stats tracking.
|
||
|
||
### Changed
|
||
|
||
- Plugin description and keywords updated to reflect research capabilities.
|
||
- CLAUDE.md expanded with ultraresearch command, modes, agents, architecture, and state.
|
||
|
||
## [1.5.0] - 2026-04-07
|
||
|
||
### Fixed
|
||
|
||
- **CRITICAL: Parallel session data loss** — Phase 2.6 ran parallel `claude -p` sessions
|
||
in the same working directory, causing git race conditions and repository corruption.
|
||
Each parallel session now runs in its own git worktree with isolated branch, index,
|
||
and working files. Branches are merged back sequentially after each wave completes.
|
||
|
||
### Added
|
||
|
||
- **Phase 2.55 (Pre-flight safety checks)** — validates clean working tree, committed
|
||
plan file, no scope fence overlaps between parallel sessions, and no stale worktrees
|
||
before launching parallel execution.
|
||
- **Git worktree isolation** for all parallel sessions — one branch per session
|
||
(`ultraplan/{slug}/session-{N}`), merged with `--no-ff` after wave completion.
|
||
- **Merge conflict detection** — if merging a session branch produces conflicts, the merge
|
||
is aborted and conflicting files are reported. No silent data loss.
|
||
- **Unconditional worktree cleanup** — worktrees and session branches are always removed,
|
||
even on failure. Manual cleanup commands are reported if automated cleanup fails.
|
||
- **Hard rules 11-13** — worktree isolation mandatory, cleanup unconditional, merge
|
||
sequentially with conflict abort.
|
||
- **Session-scoped progress file naming** — `--session N` uses
|
||
`.ultraexecute-progress-{slug}-session-{N}.json` to prevent merge conflicts.
|
||
|
||
### Changed
|
||
|
||
- Headless launch template uses git worktrees with `cleanup_worktrees` trap on EXIT,
|
||
clean-tree pre-flight check, and sequential merge after each wave.
|
||
- Phase 2.6 rewritten with 5-step worktree lifecycle: create → launch → wait → merge → cleanup.
|
||
|
||
## [1.4.0] - 2026-04-06
|
||
|
||
### Renamed
|
||
|
||
- **`/ultraexecute` → `/ultraexecute-local`** — renamed for namespace consistency with `/ultraplan-local` and future-proofing against potential Anthropic naming. File: `commands/ultraexecute.md` → `commands/ultraexecute-local.md`. Note: `ultraexecute_summary` JSON key and `ultraexecute-stats.jsonl` filename are unchanged for backward compatibility.
|
||
|
||
### Added
|
||
|
||
- **`convention-scanner` agent** (sonnet) — dedicated agent for discovering coding conventions: naming, directory layout, import style, error handling, test patterns, git commit style, documentation patterns. Replaces inline Explore agent prompt for medium+ codebases.
|
||
- **Success Criteria section** in spec template — falsifiable "definition of done" conditions that the spec-reviewer validates and ultraexecute-local uses for verification.
|
||
- **Dry-run multi-session preview** — `--dry-run` now shows session groupings, wave structure, billing status, and `claude -p` commands when plan has an Execution Strategy.
|
||
- **External verification rule** in headless launch template — wave verification must run commands independently, never parse session logs as proof.
|
||
- **Billing preamble** in headless launch template — `unset ANTHROPIC_API_KEY` prevents accidental API billing.
|
||
- **Phase mapping comment** in planning-orchestrator — documents how orchestrator phases 1-7 map to command phases 4-10.
|
||
|
||
### Fixed
|
||
|
||
- **`git add -A` in escalation** — replaced with targeted staging of only files from completed steps. Prevents staging secrets, binaries, or unrelated work.
|
||
- **False `background: true` claim** — command documentation incorrectly stated the orchestrator has `background: true` in its frontmatter. Corrected to explain `run_in_background` on the Agent tool.
|
||
|
||
### Changed
|
||
|
||
- Execution Strategy reconciliation in session-decomposer — respects existing `## Execution Strategy` as input instead of re-analyzing from scratch. Warns on file-overlap conflicts.
|
||
- Headless launch template uses `--dangerously-skip-permissions` instead of `--allowedTools` for more robust headless execution.
|
||
- Session-decomposer updated with `--dangerously-skip-permissions` and `unset ANTHROPIC_API_KEY` for generated scripts.
|
||
- Convention Scanner references in command and orchestrator updated to use dedicated plugin agent.
|
||
- ROADMAP.md translated from Norwegian to English.
|
||
- plugin.json: added homepage, repository, license, keywords. Version bumped to 1.4.0.
|
||
- README badge updated to v1.4.0.
|
||
|
||
## [1.3.0] - 2026-04-06
|
||
|
||
### Added
|
||
|
||
- **Session-aware parallel execution** — `/ultraexecute` auto-detects `## Execution Strategy` in plans and orchestrates multi-session parallel execution via `claude -p`. No manual `bash launch.sh` required.
|
||
- **`--fg` flag** — force foreground sequential execution, ignoring Execution Strategy
|
||
- **`--session N` flag** — execute only session N from the plan's Execution Strategy (used by child processes)
|
||
- **Phase 2.5 (Execution strategy decision)** — determines single-session vs multi-session mode
|
||
- **Phase 2.6 (Multi-session orchestration)** — launches parallel `claude -p` sessions per wave, waits for completion, aggregates results
|
||
- **Execution Strategy in plan template** — new `## Execution Strategy` section with sessions, waves, scope fences, and execution order. Generated by planning-orchestrator for plans with > 5 steps.
|
||
- **Execution Strategy generation in planning-orchestrator** — Phase 5 analyzes step file-overlap to build dependency graph, groups connected components into sessions of 3–5 steps, and organizes sessions into parallel waves.
|
||
|
||
### Changed
|
||
|
||
- planning-orchestrator Phase 5 extended with Execution Strategy generation logic
|
||
- ultraplan-local Phase 8 now lists Execution Strategy as 10th required plan section
|
||
- Plan template includes `## Execution Strategy` section template with grouping rules
|
||
- CLAUDE.md updated with new ultraexecute modes and architecture
|
||
- plugin.json version bumped to 1.3.0
|
||
|
||
## [1.2.0] - 2026-04-06
|
||
|
||
### Added
|
||
|
||
- **`/ultraexecute` command** — disciplined plan executor with 9-phase workflow. Reads an ultraplan or session spec, executes steps sequentially with strict failure recovery, tracks progress for resume, and reports results in machine-parseable JSON.
|
||
- 4 modes: default (execute), `--resume` (continue from checkpoint), `--dry-run` (validate without executing), `--step N` (single step)
|
||
- Per-step protocol: implement → verify → on-failure handling → checkpoint
|
||
- Failure recovery from plan's On failure clauses (revert/retry/skip/escalate)
|
||
- 3-attempt retry cap per step (initial + 2 retries)
|
||
- Progress file (`.ultraexecute-progress-{slug}.json`) for crash recovery and resume
|
||
- Entry/exit condition checking for session specs
|
||
- Scope fence enforcement for session specs (never-touch file protection)
|
||
- JSON summary block in output for headless log parsing
|
||
- Stats tracking to `ultraexecute-stats.jsonl`
|
||
|
||
### Changed
|
||
|
||
- CLAUDE.md restructured with two commands table (plan + execute)
|
||
- plugin.json version bumped to 1.2.0
|
||
|
||
## [1.1.0] - 2026-04-06
|
||
|
||
### Added
|
||
|
||
- **`--decompose` mode** — splits an existing plan into self-contained headless sessions. Analyzes step dependencies, groups steps into sessions of 3–5 steps each, identifies parallel execution waves, and generates session specs + dependency graph + launch script.
|
||
- **`--export headless` format** — shortcut for `--decompose`. Produces the same session decomposition output.
|
||
- **session-decomposer agent** (sonnet) — dedicated agent for plan decomposition. Parses step dependencies, builds dependency graph, groups steps into sessions, generates session specs with scope fences and failure handling.
|
||
- **Session spec template** (`templates/session-spec-template.md`) — defines the format for individual session specs: context, scope fence, steps, entry/exit conditions, failure handling, handoff state.
|
||
- **Headless launch template** (`templates/headless-launch-template.md`) — template for generating bash launch scripts that execute sessions in parallel waves using `claude -p`.
|
||
- **Failure recovery per step** — plan template now includes `On failure:` (revert/retry/skip/escalate) and `Checkpoint:` (git commit) fields for every implementation step.
|
||
- **Headless readiness dimension** in plan-critic — new 9th review dimension checking for On failure clauses, Checkpoint fields, and circuit breakers. Weighted at 0.15 in the quality score.
|
||
|
||
### Changed
|
||
|
||
- Plan-critic scoring rebalanced: 6 dimensions (was 5), weights adjusted to accommodate headless readiness
|
||
- Plan template step format extended with On failure and Checkpoint fields
|
||
- Planning-orchestrator Phase 5 updated with failure recovery generation requirements
|
||
- CLAUDE.md updated with new agent, modes, and state paths
|
||
|
||
## [1.0.0] - 2026-04-06
|
||
|
||
### Added
|
||
|
||
- **`--quick` mode** — skips exploration agent swarm. Runs interview → lightweight Glob/Grep scan → planning → adversarial review. For when the developer knows the codebase and needs structure, not cartography.
|
||
- **`--export` mode** — generates shareable output from an existing plan file. Three formats: `pr` (PR description), `issue` (issue comment), `markdown` (clean plan without internal metadata).
|
||
- **task-finder three-tier categorization** — findings categorized as Must-change (must be modified), Must-respect (contract that must not break), or Reference (context/reuse). Replaces flat file list.
|
||
- **Adaptive interview depth** — interview adapts to answer quality. Detailed answers trigger fewer, more targeted questions. Short/uncertain answers trigger simpler questions with offered alternatives.
|
||
- **Complete `plugin.json` metadata** — author, homepage, repository, license, keywords added.
|
||
- **README badges** — version, license, and platform badges.
|
||
- **Known limitations section in README** — IaC projects (Terraform, Helm, Pulumi, CDK) get reduced value from exploration agents.
|
||
- **Forgejo issue templates** — bug report and feature request YAML templates.
|
||
- **CONTRIBUTING.md** — rewritten for honest solo-project model.
|
||
|
||
### Changed
|
||
|
||
- plugin.json version bumped to 1.0.0
|
||
- Command header updated to Ultraplan Local v1.0
|
||
- Orchestrator accepts `mode: quick` in prompt for lightweight scanning path
|
||
|
||
## [0.4.0] - 2026-04-06
|
||
|
||
### Added
|
||
|
||
- **3 new agents** for information-complete planning:
|
||
- `task-finder` — dedicated agent for finding task-relevant files, functions, types, and reuse candidates. Replaces inline Explore agent.
|
||
- `git-historian` — analyzes git log, blame, active branches, code ownership, and hot files for planning context.
|
||
- `spec-reviewer` — reviews spec quality (completeness, consistency, testability, scope clarity) before exploration begins. New Phase 1b/4b.
|
||
- **Plan scoring** — plan-critic produces a quantitative quality score (0–100) across 5 weighted dimensions with letter grades (A–D) and verdicts (APPROVE/REVISE/REPLAN).
|
||
- **No-placeholder rule** — plan-critic flags TBD, TODO, vague instructions, and underspecified steps as unconditional blockers. 3+ blockers = REPLAN regardless of score.
|
||
- **`[ASSUMPTION]` marking** — planning-orchestrator marks all unverifiable claims and warns when >3 assumptions exist.
|
||
|
||
### Changed
|
||
|
||
- **All agents run for all codebase sizes.** Small codebases get the same 6 core agents as large ones. Agent turns scale down for small codebases instead of dropping agents entirely.
|
||
- Phase 4b (spec review) added before exploration in both command and orchestrator.
|
||
- Orchestrator Phase 2 agent table expanded: 6 always + 1 conditional + 1 medium-only.
|
||
- Plan-critic review checklist expanded with no-placeholder checks (section 7) and scoring output.
|
||
- Orchestrator rules updated with assumption-marking and no-placeholder requirements.
|
||
|
||
## [0.3.0] - 2026-04-05
|
||
|
||
### Added
|
||
|
||
- **planning-orchestrator agent** — dedicated background agent (`background: true`) that handles Phases 4–10 autonomously. Replaces generic background agent spawning with a purpose-built orchestrator running on Opus with `maxTurns: 50`.
|
||
- **`effort` and `maxTurns` on all agents** — fine-grained cost and depth control:
|
||
- Exploration agents: `effort: medium`, `maxTurns: 15–20`
|
||
- Review agents (plan-critic, scope-guardian): `effort: high`, `maxTurns: 10`
|
||
- Research-scout: `effort: medium`, `maxTurns: 10`
|
||
- **Plugin `settings.json`** — default configuration for mode, research, agent counts, interview limits, and team settings. Users can override in their own settings.
|
||
- **Worktree isolation for Agent Teams** — team members use `isolation: "worktree"` to prevent file conflicts during parallel implementation
|
||
- **Session tracking** (Phase 12) — writes JSONL records to `${CLAUDE_PLUGIN_DATA}/ultraplan-stats.jsonl` with task metadata, agent counts, review verdicts, and outcomes
|
||
|
||
### Changed
|
||
|
||
- Phase 3 now launches the `planning-orchestrator` agent instead of a generic background agent
|
||
- Agent Team implementation uses worktree isolation by default
|
||
|
||
## [0.2.0] - 2026-04-05
|
||
|
||
### Added
|
||
|
||
- **Interview phase** — iterative requirements gathering with AskUserQuestion before exploration. Produces a spec file that feeds into planning.
|
||
- **7 specialized agents** in `agents/` directory:
|
||
- `architecture-mapper` — deep architecture analysis, anti-patterns, smell detection
|
||
- `dependency-tracer` — import-chain following, data-flow analysis, side-effect catching
|
||
- `test-strategist` — test strategy design based on existing patterns
|
||
- `risk-assessor` — threat modeling, edge cases, failure modes
|
||
- `plan-critic` — dedicated adversarial reviewer with hardcoded critical perspective
|
||
- `scope-guardian` — scope creep and scope gap detection
|
||
- `research-scout` — external research via WebSearch/Tavily for unfamiliar technologies
|
||
- **External research capability** — research-scout agent searches documentation, known issues, and best practices when the task involves external/unfamiliar technology
|
||
- **Background mode** — default mode runs interview in foreground, then plans in background. User is notified when done.
|
||
- **Spec-driven mode** (`--spec`) — skip interview, provide a pre-written spec file, plan entirely in background
|
||
- **Foreground mode** (`--fg`) — all phases in foreground, blocks session (v0.1.0 behavior)
|
||
- **Agent Team support** — when plan has 3+ independent steps, offers parallel implementation via Agent Teams
|
||
- **Spec template** in `templates/spec-template.md`
|
||
- **Research Sources section** in plan template for citing external research
|
||
- **Dual adversarial review** — plan-critic and scope-guardian run in parallel
|
||
|
||
### Changed
|
||
|
||
- Exploration agents replaced with named specialized agents from `agents/` directory
|
||
- Agent count scales with codebase: 3 (small), 5 (medium), 7 (large)
|
||
- Plan template extended with Research Sources and external tech fields
|
||
- Handoff phase supports "execute with team" option
|
||
- Command workflow expanded from 9 to 11 phases
|
||
|
||
## [0.1.0] - 2026-04-05
|
||
|
||
### Added
|
||
|
||
- Initial release
|
||
- `/ultraplan` slash command with 6-phase workflow
|
||
- Parallel Sonnet exploration (3 agents: architecture, task-relevant, conventions)
|
||
- Opus-driven plan generation from structured template
|
||
- Plan refinement loop with execute/save handoff
|
||
- Plan template with context, analysis, steps, alternatives, risks, verification
|
||
- Cross-platform support (Mac, Linux, Windows) — pure markdown, no scripts
|