ktg-plugin-marketplace/plugins/ultraplan-local/CHANGELOG.md
Kjell Tore Guttormsen d1befac35a feat(ultraplan-local): v1.7.0 — self-verifying plan chain
Wave 1 of a 6-session parallel build revealed three failure modes:
(1) hallucinated completion (status=completed after 2/5 steps, last
tool call was an arbitrary file review), (2) fail-late bash (3/6
sessions had push blocked inside sub-agent sandbox after all work
was done), (3) no objective verification (plans were prose).

v1.7 closes all three by making the plan an executable contract.

Per-step YAML manifest (expected_paths, commit_message_pattern,
bash_syntax_check, forbidden_paths, must_contain) is the objective
completion predicate. Plan-critic dimension 10 (Manifest quality)
is a hard gate. Session decomposer propagates manifests verbatim
and emits an obligatory Step 0 pre-flight (git push --dry-run,
exit 77 sentinel) in every session spec.

ultraexecute-local gets Phase 7.5 (independent manifest audit from
git log + filesystem, ignoring agent bookkeeping) and Phase 7.6
(bounded recovery dispatch, recovery_depth ≤ 2). Hard Rule 17
forbids marking a step passed without manifest verification. Hard
Rule 18 forbids ending on an arbitrary tool call before reporting.

Division of labor is made explicit:
- /ultraresearch-local gathers context (no build decisions)
- /ultraplan-local produces an executable contract (manifests,
  plan-critic gate)
- /ultraexecute-local executes disciplined (does NOT compensate
  for weak plans — escalates)

Code complete. Docs partial (Arbeidsdeling table + manifest section
added to plugin + marketplace READMEs). Verification tests
(10-sequence) pending — see REMEMBER.md.

Backward compat: v1.6 plans without plan_version marker get
legacy mode with synthesized manifests and legacy_plan: true in
progress file. Plan-critic emits advisory, not block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 07:38:16 +02:00

340 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [1.7.0] - 2026-04-12
### The self-verifying plan chain
Wave 1 of a parallel 6-session build revealed three failure modes: (1) a
session reported `status=completed` after only 2/5 steps — last tool call
was an arbitrary file review, not a completion check; (2) 3/6 sessions
had push blocked inside the sub-agent bash sandbox *after* all work was
done; (3) plans and blueprints were prose, so the orchestrator had no
machine-readable way to verify completion. v1.7.0 closes all three by
making the plan itself an executable contract.
### Added
- **Per-step verification manifest** in plan format (`plan_version: 1.7`).
Every step now ends with a YAML `manifest:` block declaring
`expected_paths`, `min_file_count`, `commit_message_pattern`,
`bash_syntax_check`, `forbidden_paths`, `must_contain`. The manifest is
the objective completion predicate — the Verify command is necessary but
not sufficient.
- **Plan-critic dimension 10 — Manifest quality (hard gate).** Missing
or invalid manifest (unparseable regex, path contradiction, missing
block) is a `major` finding. v1.6 plans get a legacy-mode warning
instead of a block.
- **Session Manifest aggregate** in session specs — synthesized by
`session-decomposer` as the union of per-step manifests. Gives
`ultraexecute-local` a single YAML block per session to audit against.
- **Step 0: Sandbox pre-flight** — obligatory first step in every
generated session spec. Runs `git push --dry-run origin HEAD`; exit 77
= sandbox cannot push, session status becomes `blocked` (not `failed`),
no real work attempted. Escape hatch: `ULTRAEXECUTE_SKIP_PREFLIGHT=1`.
- **Launch script pre-flight** — `headless-launch-template.md` adds a
`git push --dry-run` check outside the sandbox, before any session
spawns, catching credential issues at the earliest possible point.
- **Phase 7.5 — Manifest audit (independent).** After all steps complete,
`ultraexecute-local` re-verifies expected paths, commit count, commit
message patterns, bash syntax, and forbidden-path untouched-ness from
git log and filesystem. Agent's own bookkeeping is ignored. Disagreement
with progress file → status overridden to `partial`.
- **Phase 7.6 — Recovery dispatch (bounded).** When Phase 7.5 detects
drift in multi-session parent context, synthesize a temp session spec
containing only missing steps and dispatch via existing
`claude -p "/ultraexecute-local --session N"`. `recovery_depth ≤ 2`
hard cap — third drift escalates to user.
- **Hard Rule 17: Manifest is the completion predicate.** A step may
not be marked passed if its manifest does not verify, regardless of
Verify's exit code.
- **Hard Rule 18: Last-activity rule.** Executor's final tool call
before Phase 8 must be a manifest check, never an arbitrary file
review. Prevents hallucinated completion.
### Changed
- **Plan template (`templates/plan-template.md`)** — adds
`plan_version: 1.7` metadata line, `Manifest:` field on every step,
"Manifest — objective completion predicate" section.
- **Plan-critic scoring** rebalanced: Headless readiness 0.15 → 0.10,
Manifest quality 0.05 added. Legacy v1.6 plans skip the Manifest
dimension and keep Headless readiness at 0.15.
- **Planning-orchestrator Phase 5** adds "Manifest generation rules
(REQUIRED for every step)" with mechanical derivation from `Files:`
and Checkpoint. Validates regex compilation and path existence before
handoff to plan-critic.
- **Session-decomposer** parses plan manifests and propagates them
verbatim into session specs. For v1.7+ plans with missing manifests:
abort with pointer to failing step. For legacy v1.6 plans: synthesize
minimal manifests and flag `legacy_synthesis: true`.
- **ultraexecute-local Phase 2** parses manifest YAML. Ugyldig YAML =
abort with pointer to step. v<1.7 plans: synthesize + log
`legacy_plan: true`.
- **ultraexecute-local Phase 6** — sub-step D renamed to D1 "Command
verification"; new D2 "Manifest verification" runs after D1 with 5
checks. F "Checkpoint" adds `checkpoint_drift` logging when HEAD
message doesn't match `commit_message_pattern` (non-fatal).
- **Phase 8 report** — table gets Manifest column; JSON summary adds
`plan_version`, `manifest_audit`, `drift_details`, `recovery_dispatched`,
`recovery_depth`, `legacy_plan`. Result vocabulary strict:
`completed | partial | blocked | failed | stopped`.
- **Division of labor clarified** in README — `/ultraresearch-local`
gathers context (no decisions), `/ultraplan-local` transforms intent
into an executable contract (manifests, plan-critic gate),
`/ultraexecute-local` executes the contract disciplined (does NOT
compensate for weak plans — escalates).
## [1.6.0] - 2026-04-08
### Added
- **`/ultraresearch-local` command** — deep research combining local codebase analysis
with external knowledge. Produces structured research briefs with triangulation,
confidence ratings, and source quality assessment. Supports modes: default (background),
`--quick` (inline), `--local` (codebase only), `--external` (web only), `--fg` (foreground).
- **6 new agents** for the research pipeline:
- `research-orchestrator` (opus) — runs full research pipeline as background task
- `docs-researcher` (sonnet) — official documentation via Tavily, WebSearch, Microsoft Learn
- `community-researcher` (sonnet) — real-world experience from issues, blogs, discussions
- `security-researcher` (sonnet) — CVEs, audit history, supply chain risks
- `contrarian-researcher` (sonnet) — counter-evidence and overlooked alternatives
- `gemini-bridge` (sonnet) — independent second opinion via Gemini Deep Research MCP
- **Research brief template** (`templates/research-brief-template.md`) — structured format
with dimensions, confidence ratings, triangulation, and source quality assessment.
- **`--research` flag for `/ultraplan-local`** — accepts up to 3 research brief paths.
Enriches the interview (focuses on decisions, not facts) and injects brief context into
exploration agents. Research-scout skips already-covered technologies.
- **Research-aware planning orchestrator** — `planning-orchestrator.md` now accepts research
briefs, injects summaries into sub-agent prompts, and cross-references brief findings
during synthesis.
- **Research settings** in `settings.json` — configurable Gemini bridge (enabled/timeout),
interview depth, dimension limits, and stats tracking.
### Changed
- Plugin description and keywords updated to reflect research capabilities.
- CLAUDE.md expanded with ultraresearch command, modes, agents, architecture, and state.
## [1.5.0] - 2026-04-07
### Fixed
- **CRITICAL: Parallel session data loss** — Phase 2.6 ran parallel `claude -p` sessions
in the same working directory, causing git race conditions and repository corruption.
Each parallel session now runs in its own git worktree with isolated branch, index,
and working files. Branches are merged back sequentially after each wave completes.
### Added
- **Phase 2.55 (Pre-flight safety checks)** — validates clean working tree, committed
plan file, no scope fence overlaps between parallel sessions, and no stale worktrees
before launching parallel execution.
- **Git worktree isolation** for all parallel sessions — one branch per session
(`ultraplan/{slug}/session-{N}`), merged with `--no-ff` after wave completion.
- **Merge conflict detection** — if merging a session branch produces conflicts, the merge
is aborted and conflicting files are reported. No silent data loss.
- **Unconditional worktree cleanup** — worktrees and session branches are always removed,
even on failure. Manual cleanup commands are reported if automated cleanup fails.
- **Hard rules 11-13** — worktree isolation mandatory, cleanup unconditional, merge
sequentially with conflict abort.
- **Session-scoped progress file naming** — `--session N` uses
`.ultraexecute-progress-{slug}-session-{N}.json` to prevent merge conflicts.
### Changed
- Headless launch template uses git worktrees with `cleanup_worktrees` trap on EXIT,
clean-tree pre-flight check, and sequential merge after each wave.
- Phase 2.6 rewritten with 5-step worktree lifecycle: create → launch → wait → merge → cleanup.
## [1.4.0] - 2026-04-06
### Renamed
- **`/ultraexecute``/ultraexecute-local`** — renamed for namespace consistency with `/ultraplan-local` and future-proofing against potential Anthropic naming. File: `commands/ultraexecute.md``commands/ultraexecute-local.md`. Note: `ultraexecute_summary` JSON key and `ultraexecute-stats.jsonl` filename are unchanged for backward compatibility.
### Added
- **`convention-scanner` agent** (sonnet) — dedicated agent for discovering coding conventions: naming, directory layout, import style, error handling, test patterns, git commit style, documentation patterns. Replaces inline Explore agent prompt for medium+ codebases.
- **Success Criteria section** in spec template — falsifiable "definition of done" conditions that the spec-reviewer validates and ultraexecute-local uses for verification.
- **Dry-run multi-session preview** — `--dry-run` now shows session groupings, wave structure, billing status, and `claude -p` commands when plan has an Execution Strategy.
- **External verification rule** in headless launch template — wave verification must run commands independently, never parse session logs as proof.
- **Billing preamble** in headless launch template — `unset ANTHROPIC_API_KEY` prevents accidental API billing.
- **Phase mapping comment** in planning-orchestrator — documents how orchestrator phases 1-7 map to command phases 4-10.
### Fixed
- **`git add -A` in escalation** — replaced with targeted staging of only files from completed steps. Prevents staging secrets, binaries, or unrelated work.
- **False `background: true` claim** — command documentation incorrectly stated the orchestrator has `background: true` in its frontmatter. Corrected to explain `run_in_background` on the Agent tool.
### Changed
- Execution Strategy reconciliation in session-decomposer — respects existing `## Execution Strategy` as input instead of re-analyzing from scratch. Warns on file-overlap conflicts.
- Headless launch template uses `--dangerously-skip-permissions` instead of `--allowedTools` for more robust headless execution.
- Session-decomposer updated with `--dangerously-skip-permissions` and `unset ANTHROPIC_API_KEY` for generated scripts.
- Convention Scanner references in command and orchestrator updated to use dedicated plugin agent.
- ROADMAP.md translated from Norwegian to English.
- plugin.json: added homepage, repository, license, keywords. Version bumped to 1.4.0.
- README badge updated to v1.4.0.
## [1.3.0] - 2026-04-06
### Added
- **Session-aware parallel execution** — `/ultraexecute` auto-detects `## Execution Strategy` in plans and orchestrates multi-session parallel execution via `claude -p`. No manual `bash launch.sh` required.
- **`--fg` flag** — force foreground sequential execution, ignoring Execution Strategy
- **`--session N` flag** — execute only session N from the plan's Execution Strategy (used by child processes)
- **Phase 2.5 (Execution strategy decision)** — determines single-session vs multi-session mode
- **Phase 2.6 (Multi-session orchestration)** — launches parallel `claude -p` sessions per wave, waits for completion, aggregates results
- **Execution Strategy in plan template** — new `## Execution Strategy` section with sessions, waves, scope fences, and execution order. Generated by planning-orchestrator for plans with > 5 steps.
- **Execution Strategy generation in planning-orchestrator** — Phase 5 analyzes step file-overlap to build dependency graph, groups connected components into sessions of 35 steps, and organizes sessions into parallel waves.
### Changed
- planning-orchestrator Phase 5 extended with Execution Strategy generation logic
- ultraplan-local Phase 8 now lists Execution Strategy as 10th required plan section
- Plan template includes `## Execution Strategy` section template with grouping rules
- CLAUDE.md updated with new ultraexecute modes and architecture
- plugin.json version bumped to 1.3.0
## [1.2.0] - 2026-04-06
### Added
- **`/ultraexecute` command** — disciplined plan executor with 9-phase workflow. Reads an ultraplan or session spec, executes steps sequentially with strict failure recovery, tracks progress for resume, and reports results in machine-parseable JSON.
- 4 modes: default (execute), `--resume` (continue from checkpoint), `--dry-run` (validate without executing), `--step N` (single step)
- Per-step protocol: implement → verify → on-failure handling → checkpoint
- Failure recovery from plan's On failure clauses (revert/retry/skip/escalate)
- 3-attempt retry cap per step (initial + 2 retries)
- Progress file (`.ultraexecute-progress-{slug}.json`) for crash recovery and resume
- Entry/exit condition checking for session specs
- Scope fence enforcement for session specs (never-touch file protection)
- JSON summary block in output for headless log parsing
- Stats tracking to `ultraexecute-stats.jsonl`
### Changed
- CLAUDE.md restructured with two commands table (plan + execute)
- plugin.json version bumped to 1.2.0
## [1.1.0] - 2026-04-06
### Added
- **`--decompose` mode** — splits an existing plan into self-contained headless sessions. Analyzes step dependencies, groups steps into sessions of 35 steps each, identifies parallel execution waves, and generates session specs + dependency graph + launch script.
- **`--export headless` format** — shortcut for `--decompose`. Produces the same session decomposition output.
- **session-decomposer agent** (sonnet) — dedicated agent for plan decomposition. Parses step dependencies, builds dependency graph, groups steps into sessions, generates session specs with scope fences and failure handling.
- **Session spec template** (`templates/session-spec-template.md`) — defines the format for individual session specs: context, scope fence, steps, entry/exit conditions, failure handling, handoff state.
- **Headless launch template** (`templates/headless-launch-template.md`) — template for generating bash launch scripts that execute sessions in parallel waves using `claude -p`.
- **Failure recovery per step** — plan template now includes `On failure:` (revert/retry/skip/escalate) and `Checkpoint:` (git commit) fields for every implementation step.
- **Headless readiness dimension** in plan-critic — new 9th review dimension checking for On failure clauses, Checkpoint fields, and circuit breakers. Weighted at 0.15 in the quality score.
### Changed
- Plan-critic scoring rebalanced: 6 dimensions (was 5), weights adjusted to accommodate headless readiness
- Plan template step format extended with On failure and Checkpoint fields
- Planning-orchestrator Phase 5 updated with failure recovery generation requirements
- CLAUDE.md updated with new agent, modes, and state paths
## [1.0.0] - 2026-04-06
### Added
- **`--quick` mode** — skips exploration agent swarm. Runs interview → lightweight Glob/Grep scan → planning → adversarial review. For when the developer knows the codebase and needs structure, not cartography.
- **`--export` mode** — generates shareable output from an existing plan file. Three formats: `pr` (PR description), `issue` (issue comment), `markdown` (clean plan without internal metadata).
- **task-finder three-tier categorization** — findings categorized as Must-change (must be modified), Must-respect (contract that must not break), or Reference (context/reuse). Replaces flat file list.
- **Adaptive interview depth** — interview adapts to answer quality. Detailed answers trigger fewer, more targeted questions. Short/uncertain answers trigger simpler questions with offered alternatives.
- **Complete `plugin.json` metadata** — author, homepage, repository, license, keywords added.
- **README badges** — version, license, and platform badges.
- **Known limitations section in README** — IaC projects (Terraform, Helm, Pulumi, CDK) get reduced value from exploration agents.
- **Forgejo issue templates** — bug report and feature request YAML templates.
- **CONTRIBUTING.md** — rewritten for honest solo-project model.
### Changed
- plugin.json version bumped to 1.0.0
- Command header updated to Ultraplan Local v1.0
- Orchestrator accepts `mode: quick` in prompt for lightweight scanning path
## [0.4.0] - 2026-04-06
### Added
- **3 new agents** for information-complete planning:
- `task-finder` — dedicated agent for finding task-relevant files, functions, types, and reuse candidates. Replaces inline Explore agent.
- `git-historian` — analyzes git log, blame, active branches, code ownership, and hot files for planning context.
- `spec-reviewer` — reviews spec quality (completeness, consistency, testability, scope clarity) before exploration begins. New Phase 1b/4b.
- **Plan scoring** — plan-critic produces a quantitative quality score (0100) across 5 weighted dimensions with letter grades (AD) and verdicts (APPROVE/REVISE/REPLAN).
- **No-placeholder rule** — plan-critic flags TBD, TODO, vague instructions, and underspecified steps as unconditional blockers. 3+ blockers = REPLAN regardless of score.
- **`[ASSUMPTION]` marking** — planning-orchestrator marks all unverifiable claims and warns when >3 assumptions exist.
### Changed
- **All agents run for all codebase sizes.** Small codebases get the same 6 core agents as large ones. Agent turns scale down for small codebases instead of dropping agents entirely.
- Phase 4b (spec review) added before exploration in both command and orchestrator.
- Orchestrator Phase 2 agent table expanded: 6 always + 1 conditional + 1 medium-only.
- Plan-critic review checklist expanded with no-placeholder checks (section 7) and scoring output.
- Orchestrator rules updated with assumption-marking and no-placeholder requirements.
## [0.3.0] - 2026-04-05
### Added
- **planning-orchestrator agent** — dedicated background agent (`background: true`) that handles Phases 410 autonomously. Replaces generic background agent spawning with a purpose-built orchestrator running on Opus with `maxTurns: 50`.
- **`effort` and `maxTurns` on all agents** — fine-grained cost and depth control:
- Exploration agents: `effort: medium`, `maxTurns: 1520`
- Review agents (plan-critic, scope-guardian): `effort: high`, `maxTurns: 10`
- Research-scout: `effort: medium`, `maxTurns: 10`
- **Plugin `settings.json`** — default configuration for mode, research, agent counts, interview limits, and team settings. Users can override in their own settings.
- **Worktree isolation for Agent Teams** — team members use `isolation: "worktree"` to prevent file conflicts during parallel implementation
- **Session tracking** (Phase 12) — writes JSONL records to `${CLAUDE_PLUGIN_DATA}/ultraplan-stats.jsonl` with task metadata, agent counts, review verdicts, and outcomes
### Changed
- Phase 3 now launches the `planning-orchestrator` agent instead of a generic background agent
- Agent Team implementation uses worktree isolation by default
## [0.2.0] - 2026-04-05
### Added
- **Interview phase** — iterative requirements gathering with AskUserQuestion before exploration. Produces a spec file that feeds into planning.
- **7 specialized agents** in `agents/` directory:
- `architecture-mapper` — deep architecture analysis, anti-patterns, smell detection
- `dependency-tracer` — import-chain following, data-flow analysis, side-effect catching
- `test-strategist` — test strategy design based on existing patterns
- `risk-assessor` — threat modeling, edge cases, failure modes
- `plan-critic` — dedicated adversarial reviewer with hardcoded critical perspective
- `scope-guardian` — scope creep and scope gap detection
- `research-scout` — external research via WebSearch/Tavily for unfamiliar technologies
- **External research capability** — research-scout agent searches documentation, known issues, and best practices when the task involves external/unfamiliar technology
- **Background mode** — default mode runs interview in foreground, then plans in background. User is notified when done.
- **Spec-driven mode** (`--spec`) — skip interview, provide a pre-written spec file, plan entirely in background
- **Foreground mode** (`--fg`) — all phases in foreground, blocks session (v0.1.0 behavior)
- **Agent Team support** — when plan has 3+ independent steps, offers parallel implementation via Agent Teams
- **Spec template** in `templates/spec-template.md`
- **Research Sources section** in plan template for citing external research
- **Dual adversarial review** — plan-critic and scope-guardian run in parallel
### Changed
- Exploration agents replaced with named specialized agents from `agents/` directory
- Agent count scales with codebase: 3 (small), 5 (medium), 7 (large)
- Plan template extended with Research Sources and external tech fields
- Handoff phase supports "execute with team" option
- Command workflow expanded from 9 to 11 phases
## [0.1.0] - 2026-04-05
### Added
- Initial release
- `/ultraplan` slash command with 6-phase workflow
- Parallel Sonnet exploration (3 agents: architecture, task-relevant, conventions)
- Opus-driven plan generation from structured template
- Plan refinement loop with execute/save handoff
- Plan template with context, analysis, steps, alternatives, risks, verification
- Cross-platform support (Mac, Linux, Windows) — pure markdown, no scripts