ktg-plugin-marketplace/plugins/voyage/CHANGELOG.md

1822 lines
113 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## v5.1.0 — 2026-05-13 — Per-phase effort + model dialog
Additive. No breaking changes. Forward-compat with all v5.0.x briefs.
### Why
The voyage pipeline runs a single profile-tier setting for every task. For
typo fixes and small bugfixes the full `brief → research → plan → execute
→ review` ceremony is over-engineered; for risky migrations the same
profile-tier is too thin. v5.1 hands ceremony-level back to the operator
per phase in the same dialog that produces the brief — without removing
the disciplined defaults that protect high-stakes work. Independent of
v4.1's profile system: composition happens at the command level (brief
signal wins per-phase, profile fills gaps). No `/trekflow`, no helper
module, no per-command effort dictionary — composition is documented
prose in each downstream command.
### Added
- **`/trekbrief` Phase 3.5** — between Phase 3 completeness exit and
Phase 4 draft, 4 tier-coupled `AskUserQuestion` calls commit an effort
level (`low | standard | high`) and an optional `model` (`sonnet |
opus`) per downstream phase (`research`, `plan`, `execute`, `review`).
Tier mapping: `low → {effort: low, model: sonnet}`, `standard →
{effort: standard}` (model omitted; composition falls through to
profile), `high → {effort: high, model: opus}`. Force-stop pattern
(Phase 4f verbatim) records `phase_signals_partial: true` instead.
`--quick` skips Phase 3.5 entirely and auto-writes
`phase_signals_partial: true`.
- **`brief-validator` extension** — 6 new issue codes:
`BRIEF_INVALID_PHASE_SIGNALS`, `BRIEF_INVALID_PHASE_SIGNAL_PHASE`,
`BRIEF_INVALID_EFFORT`, `BRIEF_INVALID_MODEL`,
`BRIEF_SIGNALS_MUTUALLY_EXCLUSIVE`, `BRIEF_V51_MISSING_SIGNALS` +
exported `PHASE_SIGNAL_PHASES` + `EFFORT_LEVELS` constants. The
`BASE_ALLOWED_MODELS` const in `lib/validators/profile-validator.mjs`
was promoted to `export const` so the brief validator can re-use it.
- **HANDOVER-CONTRACTS amendments** — Handover 1 gets 5 inserts:
versioning row → `2.1`, two new schema-table rows (`phase_signals`,
`phase_signals_partial`), v5.1 sequencing-gate validation row,
versioning-paragraph expansion explaining the version-conditional
gate, 6 new failure-mode bullets.
- **Template bump** — `templates/trekbrief-template.md` → `brief_version
2.1` with a default `phase_signals:` block (4 phases × `effort:
standard`, model omitted) and a commented `phase_signals_partial:
true` line showing the force-stop alternative.
- **Composition rule (v5.1)** — new `## Composition rule (v5.1)`
sub-section in each of `commands/{trekplan,trekresearch,trekexecute,
trekreview}.md`. Documents `effort_for_phase = brief.phase_signals[
<phase>]?.effort ?? 'standard'` and `model_for_phase =
brief.phase_signals[<phase>]?.model ?? profile.phase_models[<phase>]`.
Per command: `effort == low` activates that command's existing
`--quick`-equivalent code-path (`/trekplan` skips Phase 5 agent swarm,
`/trekresearch` inline research, `/trekreview` correctness-only,
`/trekexecute` `--gates open` + sequential-only).
- **Sequencing-gate surface** in 4 downstream commands — when
`brief-validator.mjs` returns `BRIEF_V51_MISSING_SIGNALS` in `errors`,
halt with a one-line user-readable message pointing back to
`/trekbrief`. Enforcement is validator-only.
- **5 new minimal command test files** under `tests/commands/` —
`trekbrief.test.mjs` (3 cases), `trekplan.test.mjs` /
`trekresearch.test.mjs` / `trekreview.test.mjs` (2 cases each),
`trekexecute.test.mjs` (2 cases). Pattern D (read .md, assert prose
patterns). Verifies sequencing-gate surface + low-effort prose.
- **5 new doc-consistency pins** — template `brief_version 2.1` +
`phase_signals:` block, HANDOVER schema rows, voyage CLAUDE.md +
README.md mention `phase_signals`.
- **2 new fixtures** — `tests/fixtures/brief-with-phase-signals.md` +
`brief-without-phase-signals.md` (backward-compat).
### Changed
- `brief_version` bumped `2.0 → 2.1`. The bump exists because v2.1
activates the **version-conditional sequencing gate** — the only check
in the brief validator that triggers on `brief_version` rather than
field-presence. The forward-compat policy still applies to the field
itself (unknown frontmatter keys flow through).
### Notes
- Test count grows by ≥ 17 new cases minimum: 6 brief-validator + 11
command-test minimums. Realistic delta is ~25 new cases (Step 6 adds 5
doc-consistency pins on top). Target ≥ 533 pass at Step 10 verify.
- `MIGRATION.md` was deliberately NOT created — v5.1 is an additive
minor (brief_version 2.0 → 2.1, not major). v5.4 may promote
`phase_signals` from optional to required (breaking change → 3.0).
- High-effort behaviors for `/trekplan` / `/trekresearch` /
`/trekreview` are deferred to v5.1.1 per brief Non-Goal ("No complete
per-phase effort dictionary"). v5.1 locks only the low-effort floor.
- `phase_signals_present` stats emission is also deferred to v5.1.1
(opt-in observability per Research 03 Q5).
## v5.0.3 — 2026-05-13 — Annotation UX matches the claude-code-100x reference
**No new breaking changes beyond v5.0.0.** Forks consuming v5.0.2's
annotation HTML keep working — the file path and entry point are
unchanged. The internal localStorage key bumps from `voyage-annotate:` to
`voyage-annotate:v2:` to avoid mixing the v5.0.2 shape (line-click,
freeform notes) with the v5.0.3 shape (intent-tagged annotations).
### Why
v5.0.2 shipped a too-simple annotation surface: click a line, write a
freeform note, save. The operator pointed at the existing
`claude-code-100x/build-site.js` annotation system as the actual
reference — pencil-toggle mode, text-selection capture, three intent
categories (**Fiks** / **Endre** / **Spørsmål**), a popover form at the
cursor, structured markdown export with intent labels. v5.0.3 brings
`scripts/annotate.mjs` up to that pattern.
This reference had been mentioned by the operator early in the
conversation; the iteration through v5.0.0 / v5.0.1 / v5.0.2 reflects me
reading past it and trying alternatives instead of just matching it. The
loss is real and is documented here in plain terms so future maintainers
don't repeat it.
### Changed
- **`scripts/annotate.mjs`** — rewritten to match the
`claude-code-100x/build-site.js` UX:
- **Article rendering** — markdown is rendered to proper HTML elements
(`<h1>`/`<p>`/`<ul>`/`<li>`/`<table>`/`<blockquote>`/`<pre>`), not as
line-numbered raw lines. Document reads as a normal article.
- **Annotatable elements** — every heading, paragraph, list item, table
cell, blockquote, and code block gets a stable `data-anchor-id`.
- **Pencil-toggle button** in the topbar — annotation mode default ON.
Toggle OFF to read normally and follow links.
- **Click any annotatable element** (in mode) → opens a form popover
at the cursor with: section context (auto-detected from nearest
h1/h2), anchored snippet (the exact selected substring via
`window.getSelection()` if any text is highlighted, else the
element's text content up to 200 chars), three intent buttons
(**Fiks** / **Endre** / **Spørsmål**), comment textarea, Cancel +
Save. Save is disabled until an intent is picked.
- **Sidebar panel** — collapsed by default; "Show annotations" button
in the topbar opens it. Annotations grouped by section, sorted by
document order. Each card shows the intent badge (colored by
category), the anchored snippet, the operator comment, and a delete
button. Click a card to scroll the article to that element + flash
highlight.
- **Copy Prompt** — structured markdown:
`### N. [Intent] Section: <section>` + `Quote: «<snippet>»` +
`Comment: <text>`. Copies to clipboard.
- **Clear all** — wipes every annotation for the current artifact
(after confirm).
- **Persistence** — `localStorage` key `voyage-annotate:v2:<abs path>`.
Refresh/close/reopen the same HTML keeps every annotation.
- **Toast feedback** for save / copy / clear.
- **`tests/scripts/annotate.test.mjs`** — refreshed for the v5.0.3 shape:
pins the three intent buttons (`data-intent="fiks"` / `"endre"` /
`"spørsmål"`), form popover, selection capture, section auto-detect,
`voyage-annotate:v2:` storage key prefix, `data-anchor-id` coverage,
Copy Prompt + Clear all affordances, and the markdown renderer's
heading / list / table / blockquote / code-fence output. 12 tests
(up from 10), all passing.
### Notes
- The producing commands (`/trekbrief` Step 4g, `/trekplan` Phase 10,
`/trekreview` Phase 8) call `scripts/annotate.mjs` the same way as in
v5.0.2 — no change to their wiring beyond the build-output now being
the v5.0.3 interactive surface.
- `npm test`: 518 tests, 516 pass, 0 fail, 2 skipped (up from 516 — 2
new annotate tests for hostile-content escape + renderMarkdown table/
blockquote coverage).
- Reference: `~/repos/claude-code-100x/claude-code-100x/build-site.js`
lines 14312255 (annotation UI section).
- Version bump 5.0.2 → 5.0.3 in `.claude-plugin/plugin.json`,
`package.json`, `package-lock.json`, plugin `README.md` badge.
## v5.0.2 — 2026-05-13 — Operator-driven annotation HTML (the actual fix)
**No new breaking changes beyond v5.0.0.** Forks that consumed the v5.0.1
`/playground document-critique` invocation from the producing commands'
final report should switch to opening the `.html` that `scripts/annotate.mjs`
now produces directly.
### Why
v5.0.0 added a read-only `scripts/render-artifact.mjs` HTML render that
didn't afford annotation. v5.0.1 deleted that and pointed operators at
`/playground document-critique` instead — but the `document-critique`
template pre-generates **Claude's** suggestions and asks the operator to
approve/reject them. The operator asked for the opposite: a surface where
**they** select content and write **their own** notes, then ship those
notes back to Claude. v5.0.1 still missed the actual ask.
v5.0.2 ships `scripts/annotate.mjs` — a small, focused, zero-dependency
Node script that takes any artifact `.md` and writes a self-contained
HTML next to it. The HTML renders the document with line numbers, lets
the operator click any line to attach their own note, keeps a sidebar of
all notes (editable + deletable, persisted in `localStorage` per artifact
path so refresh doesn't lose work), and exposes a "Copy Prompt" button
that gathers every note into one structured prompt. The operator copies
that prompt and pastes it back into Claude; Claude revises the `.md`
freehand from the notes. **One file → one HTML → click + write notes →
copy prompt → paste back.** No Claude-generated suggestions in the loop.
The operator drives every annotation.
This is the v4.2/v4.3 *concept* (operator-driven annotation) without the
broken v4.2/v4.3 UX, without the 388 KB SPA, without `/trekrevise`,
without anchor parsers + Handover 8 + the JSON batch round-trip. ~430
lines of self-contained `.mjs`. Zero npm deps. Deterministic.
### Added
- **`scripts/annotate.mjs`** — operator-annotation HTML generator. Takes `<artifact.md>`, writes `<artifact>.html` (or `--out <file>`). Self-contained, design-system-aligned (light + dark + print), zero external network, deterministic. CLI: `node scripts/annotate.mjs <artifact.md> [--out <file.html>]`. Also `npm run annotate -- <artifact.md>`.
- **`tests/scripts/annotate.test.mjs`** (10 tests) — self-contained HTML shape, no external `<link>`/`<script src>`, inline script parses, source content + path embedded, HTML escaping in title + body (XSS surface), determinism, default output path, arg parsing, and the operator-driven affordances (Click any line, Your annotations sidebar, Copy Prompt, Clear all, localStorage).
- **`npm run annotate`** convenience script.
### Changed
- **`commands/trekbrief.md` Step 4g, `commands/trekplan.md` Phase 10, `commands/trekreview.md` Phase 8** — each now runs `scripts/annotate.mjs` after the artifact is final and prints the resulting `file://<abs path>` link with explicit "Click any line to add YOUR OWN note" instructions. The v5.0.1 `/playground build a document-critique playground for …` line is removed from all three.
- **`tests/lib/doc-consistency.test.mjs`** — replaced the v5.0.1 `/playground` pins with v5.0.2 pins: `scripts/annotate.mjs` exists; producing commands invoke it; producing commands no longer print the v5.0.1 `/playground document-critique` line; producing commands signal operator-driven annotation in their prose; CHANGELOG has a v5.0.2 entry.
- **Plugin `CLAUDE.md` + `README.md` + root `CLAUDE.md` + root `README.md` + `.claude-plugin/marketplace.json`** — voyage description updated from "v5.0.1 /playground invocation" to "v5.0.2 operator-annotation HTML (`scripts/annotate.mjs`)".
### Notes
- `/playground` is unchanged — the official `claude-plugins-official` `playground` skill is great for the Claude-leads, operator-reacts flow; it just wasn't the right tool for operator-leads, Claude-reacts.
- `npm test`: 516 tests, 514 pass, 0 fail, 2 skipped (up from 503 — 10 new `annotate.test.mjs` tests + 3 net new doc-consistency pins).
- Version bump 5.0.1 → 5.0.2 in `.claude-plugin/plugin.json`, `package.json`, `package-lock.json`, plugin `README.md` badge.
## v5.0.1 — 2026-05-13 — Drop the standalone HTML render; print a literal /playground invocation
**No new breaking changes beyond v5.0.0.** Forks that consumed
`scripts/render-artifact.mjs` directly (or invoked `npm run render`) must
remove that integration. Nothing else moves.
### Why
v5.0.0 had `/trekbrief`, `/trekplan`, and `/trekreview` each finish by
*both* rendering a read-only `{artifact}.html` view (via the new
`scripts/render-artifact.mjs`) *and* printing a vague instruction to "run
the `/playground` plugin (`document-critique` template) on the `.md` and
paste the prompt back". In practice the operator saw two HTMLs in their
project dir, no annotation UI on the rendered `.html`, and had to guess
the right `/playground` invocation. The read-only `.html` added confusion
without affording annotation — it duplicated work the `/playground`
HTML already does (formatted document on the left, annotations on the
right, Copy Prompt button at the bottom).
v5.0.1 deletes the redundant render and makes the printed `/playground`
invocation literal and copy-paste-ready. One paste from the operator
launches the `playground` skill, which loads its `document-critique`
template, reads the `.md`, builds the interactive HTML, opens it. Mark
suggestions, click Copy Prompt, paste back. Done.
### Removed
- **`scripts/render-artifact.mjs`** — the v5.0.0 standalone Markdown→HTML renderer (~280 lines, zero deps). Redundant with `/playground`'s HTML.
- **`tests/scripts/render-artifact.test.mjs`** (and the now-empty `tests/scripts/` dir).
- **`npm run render`** script alias in `package.json`.
- All references to `render-artifact.mjs`, `brief.html`, `plan.html`, `review.html` in `CLAUDE.md` (plugin + root), `README.md` (plugin + root), `.claude-plugin/marketplace.json`, and the three command files' final-output blocks.
### Changed
- **`commands/trekbrief.md` Step 4g (Finalize), `commands/trekplan.md` Phase 10 (Present and refine), `commands/trekreview.md` Phase 9 (Present summary)** — each now ends by printing a single boxed block with the literal text `/playground build a document-critique playground for {abs_path}` and a one-paragraph explanation of the paste-mark-copy-paste loop. The literal string is pinned by `tests/lib/doc-consistency.test.mjs` so it cannot soften back into "run the `/playground` plugin" without a test failure.
- **`tests/lib/doc-consistency.test.mjs`** — replaced the v5.0.0 `render-artifact.mjs exists` + `producing commands reference render-artifact.mjs` pins with v5.0.1 pins: `render-artifact.mjs` *no longer* exists; producing commands include the literal `/playground build a document-critique playground for` invocation; producing commands no longer reference `render-artifact.mjs`; `package.json scripts.render` is gone; CHANGELOG has both v5.0.0 and v5.0.1 entries.
- **Plugin `CLAUDE.md`** — "Render-and-link (v5.0.0)" paragraph rewritten to "Post-command annotation invocation (v5.0.1)" explaining the literal-paste contract; project-directory contract no longer lists `.html` siblings; "State" section's project-root inventory no longer lists `.html` files.
- **Plugin `README.md`** — "Rendered artifacts & annotation (v5.0.0)" section rewritten to "Reviewing and annotating artifacts (v5.0.1)" with a worked example of the printed output and a "What v5.0.1 changed from v5.0.0" sub-note. Top-of-README one-liner + bottom "Known limitations" note updated.
- **Root `CLAUDE.md`** + **root `README.md`** + **`.claude-plugin/marketplace.json`** — voyage description updated to v5.0.1 + the one-paste invocation model.
### Notes
- `/playground` is the `playground` skill from `claude-plugins-official`. It must be installed in the operator's environment for the printed command to work. If it isn't, the same effect is achievable by pasting the `.md` content into Claude with "review this and suggest changes" — manual freehand revision.
- `npm test`: 503 tests, 501 pass, 0 fail, 2 skipped (down from 509 — 8 `render-artifact.test.mjs` tests removed; the doc-consistency pins were updated to v5.0.1 contracts, net +2 tests).
- Version bump 5.0.0 → 5.0.1 in `.claude-plugin/plugin.json`, `package.json`, `package-lock.json`, plugin `README.md` badge.
## v5.0.0 — 2026-05-12 — Remove the bespoke playground; render artifacts to HTML + link
**Breaking.** `/trekrevise` is removed. The `playground/` directory, Handover 8
(annotation → revision), and all of its supporting `lib/` modules and tests are
gone. Forks that depended on `/trekrevise`, the playground HTML, `lib/parsers/anchor-parser.mjs`,
`lib/parsers/annotation-digest.mjs`, `lib/util/markdown-write.mjs`, or
`lib/util/revision-guard.mjs` must migrate to the official `/playground` plugin
(`document-critique` / `diff-review` templates).
### Why
v4.2/v4.3 built a ~388 KB bespoke playground SPA — vendored markdown-it +
highlight.js + DOMPurify, a design-system copy, a dashboard, drill-down, custom
annotation gestures, an anchor parser, and a Playwright e2e suite — to let
operators annotate brief/plan/review artifacts in a browser and fold those
annotations back via `/trekrevise` (Handover 8). A 2026-05-12 browser
walkthrough found it borderline unusable (annotation broken in the drill-down,
dashboard didn't take over, no in-context anchor markers, wrong anchor
derivation, broken `guide-panel` chrome). And it duplicated work the official
`/playground` plugin already does well: `document-critique` and `diff-review`
templates produce clean, self-contained single-file HTML for exactly this. The
NIH cost was real; the lasting value of v4.2/v4.3 is this cautionary record.
Lesson: reach for existing capabilities before building bespoke ones, and
walk the UI in a browser before shipping it.
### Removed
- **`plugins/voyage/playground/`** — the whole directory: `voyage-playground.html`, its `README.md`, vendored `lib/` (markdown-it / highlight.js / DOMPurify / manifests), and the vendored `playground-design-system/` copy (the canonical `shared/playground-design-system/` is untouched — other plugins still use it).
- **`/trekrevise`** — `commands/trekrevise.md` and the `trekrevise` block in `settings.json`.
- **Handover 8 (annotation → revision)** — deleted from `docs/HANDOVER-CONTRACTS.md`; back to seven handovers.
- **`lib/parsers/anchor-parser.mjs`**, **`lib/parsers/annotation-digest.mjs`**, **`lib/util/markdown-write.mjs`** (`readAndUpdate`), **`lib/util/revision-guard.mjs`**.
- **`scripts/vendor-playground-libs.mjs`**, **`playwright.config.mjs`**, **`tests/e2e/`** (a11y + network specs + snapshots), **`tests/playground/`**, **`tests/fixtures/playground/`**, **`tests/fixtures/screenshot-project/`**, **`tests/fixtures/annotation/`**.
- **Tests for the removed modules** — `tests/parsers/anchor-parser.test.mjs`, `tests/parsers/annotation-digest.test.mjs`, `tests/integration/annotation-roundtrip.test.mjs`, `tests/integration/annotation-block-boundary.test.mjs`, `tests/integration/annotation-export-schema.test.mjs`, `tests/integration/schema-rollback.test.mjs`, `tests/lib/revision-guard.test.mjs`, `tests/lib/markdown-write.test.mjs`, `tests/lib/source-annotations.test.mjs`, `tests/validators/{brief,plan,review}-validator-annotation-fields.test.mjs`, and the old `tests/scripts/render-artifact.test.mjs` (a fresh one ships in this release).
- **`docs/annotation-quickstart.md`**, **`docs/sc1-checklist-verification.md`**, **`docs/screenshots/`**.
- **`commands/trekplan.md` Phase 9 `plan_critic`-injection block** (and its `agents/planning-orchestrator.md` mirror) — it `import`ed the now-deleted `lib/util/markdown-write.mjs`. The `plan_critic` frontmatter field is no longer written.
- **`devDependencies` (`@axe-core/playwright`, `@playwright/test`)** and the `test:e2e` script in `package.json`; `package-lock.json` synced (no runtime deps — it's near-empty).
- The annotation-frontmatter HTML-comment preambles in `templates/{trekbrief,plan,trekreview}-template.md` (the validators still tolerate unknown frontmatter keys; nothing emits `revision:` / `source_annotations:` / `annotation_digest:` anymore).
### Added
- **`scripts/render-artifact.mjs`** — a small (~280 lines), zero-dependency Node renderer. Reads a `.md` artifact, folds frontmatter into a `<details>` block, renders a hand-rolled markdown subset (ATX headings, ordered/unordered/nested lists, fenced code blocks, inline code, bold/italic, links, blockquotes, GitHub-style tables, horizontal rules), and emits a self-contained HTML file with an inlined, design-system-aligned stylesheet (light + dark + print). **Zero external network, zero build step, deterministic** (byte-identical on re-run). CLI: `node scripts/render-artifact.mjs <artifact.md> [--out <file.html>]`; also `npm run render`.
- **Render-and-link step** at the end of `/trekbrief`, `/trekplan`, and `/trekreview` — each renders its just-written `.md` to `{project_dir}/{artifact}.html` and prints the `file://` link plus a one-liner: to annotate, run the `/playground` plugin (`document-critique`) on the `.md` and paste the generated prompt back; Claude revises the artifact freehand.
- **`tests/scripts/render-artifact.test.mjs`** (fresh, 8 tests) — self-contained-document shape, frontmatter `<details>` folding, title derivation, headings/code-fences/lists/tables/blockquotes rendering, HTML escaping, determinism, default output path, arg parsing.
- **`tests/lib/doc-consistency.test.mjs` v5.0.0 pins** — `playground/` gone, `commands/trekrevise.md` gone, Handover 8 gone, `render-artifact.mjs` exists, producing commands reference `render-artifact.mjs` and `/playground`, CHANGELOG has a v5.0.0 entry and retains the v4.2.0 entry, and no source file (outside CHANGELOG) references `trekrevise`.
### Notes
- Resolves the three v4.3.1-deferred findings as moot: `87069b35` and `c6c64a58` targeted `playground/voyage-playground.html` (deleted); `4cc3bfc9` targeted the Phase 9 `readAndUpdate` block in `commands/trekplan.md` (deleted).
- Command count: seven → six (`/trekbrief`, `/trekresearch`, `/trekplan`, `/trekexecute`, `/trekreview`, `/trekcontinue`; plus the `/trekendsession` helper).
- `npm test` is the fork-readiness gate; `npm run test:e2e` is gone (no Playwright).
## v4.3.0 — 2026-05-10 — Playground rebuild: dashboard-centric + visual parity + anchor-rendering matures
**Additive. No breaking changes. Forward-compat with every brief / plan / review / playground export written before v4.3.**
The v4.2 playground proved the annotation loop. v4.3 rebuilds it from the ground up against the design-system reference (`plugins/llm-security/playground/llm-security-playground.html`) and brings anchor-rendering to maturity. Dashboard-centric layout, file://-loader (webkitdirectory + drag-drop + URL-parameter), block-boundary anchor placement, screenshots-spor convention, and a hardened test pyramid (Groups A-D).
### Added
- **Dashboard-centric layout** — `fleet-grid` of `fleet-tile` per artifact (brief / plan / research / review) with status badges; click drills into artifact-detail surface. Two-overflate-modell matches voyage's domain (one plugin, no onboarding) — alternative 4-overflate-router rejected per Alternatives Considered.
- **File://-loader with three entry points** — `webkitdirectory` directory-picker (Chromium primary, Firefox 150+ secondary with Windows-bug guard), drag-drop with `webkitGetAsEntry` recursive walk, and URL-parameter `?project=/abs/path` ergonomic shortcut. Path-traversal + symlink/dotfile filter (`isProjectPathSafe`) blocks `..` / `node_modules/` / `dist/` / `build/` / hidden-paths with `aria-live` announces.
- **Anchor-rendering matures** — block-boundary placement with code-fence/table/list-item fallback (Prettier #18066 workaround); browser-side `parseAnchor` mirrors Node-side `lib/parsers/anchor-parser.mjs` regex; numbered-badge gutter + yellow-tint highlight replaces v4.2's pencil-icon; hidden-by-default sidebar-rail with ordered list, filter, and jumplist count; J/K keyboard navigation + Esc dismiss + aria-live announces; two-opacity pattern (active 100% / inactive 40% / resolved 30% strikethrough).
- **A11Y panel built from DS-primitives** (Wave 5 Step 22) — `guide-panel--info` container with `key-stats` severity-grid + `findings__items` ordered list; `wireA11yToggle` couples topbar-button to panel; `window.__voyage.scheduleRender({ a11yViolations })` hook lets Playwright spec populate severities.
- **Screenshots-spor convention** (Wave 5 Step 23) — `window.__voyage` namespace with `navigate(target)`, `scheduleRender({ a11yViolations })`, and `getProjectArtifacts()` methods; `docs/screenshots/README.md` documents manual + Playwright-driven screenshot procedure (defers inline gallery to v4.4).
- **DOMPurify ≥ 3.1.1 vendored** (Wave 5 Step 24) — `playground/lib/dompurify.min.js` (22 KB UMD bundle); `sanitizeAnnotation` wraps user-comment markdown rendering; total bundle 388 KB / 460 KB HALT-gate (72 KB margin). HTML-comment indirect prompt-injection mitigation (Wave 5 Step 25) gates `stripUnsafeComments` via `parseAnchor` allowlist before `md.render`.
- **Voyage-scope DS-tokens** (Wave 0 Step 1) — `--color-scope-voyage` + `badge--scope-voyage` added additively to `shared/playground-design-system/dist/base.css` and re-synced into the playground vendor copy. Other plugins (architect / okr / llm-security / config-audit) re-sync on next playground touch (additive-only change).
- **Test pyramid Groups A-D** (Wave 7) — Group A 17 static-HTML grep tests (SC1 10-element checklist + SC3 webkitdirectory/drag-drop + SC6 export markers + SC7 tag-level no-CDN), Group B 9 structure tests (DS-tokens + theme-toggle + sidebar-tab + keyboard pattern), Group C 7 export-bundle schema + `annotation_digest` SHA-256 validity tests (SC-GAP-3), Group D 4 Playwright e2e tests (light/dark axe-core + pixel-diff baseline + zero-external-network authoritative gate).
- **Playwright + @axe-core/playwright devDeps** + `playwright.config.mjs` — `testDir: 'tests/e2e'`, `snapshotPathTemplate: '{testDir}/snapshots/{arg}{ext}'`, `npm run test:e2e` script. Chromium browser binary installed locally.
- **`tests/fixtures/playground/`** — `v43-export-bundle.json` (canonical export shape) + `v43-plan-pre-annotate.md` (revision: 0 seed with ANN-0001/ANN-0002 anchors).
- **`tests/e2e/snapshots/a11y-baseline.json`** — WCAG-violations delta-baseline (`aria-hidden-focus`, `color-contrast` per `.key-stat--critical__label`); a11y spec compares current against baseline, fails only on NEW or GROWN violations. Actual fix deferred to v4.4 (HTML FROZEN in Sesjon 6).
- **`docs/sc1-checklist-verification.md`** — per-element pass/fail with evidence (Group A test references + manual side-by-side); 8/10 PASS literal + 2/10 PASS-redef (Element 4 onboarding-grid → fleet-grid, Element 6 screenshots-spor → hooks + docs convention) per scope-guardian Assumptions 21+22.
- **`playground/README.md`** — v4.3 architecture, three entry-point usage, `.claude/projects/` discoverability, annotation flow, `window.__voyage` hooks, known limitations (FF150-Win, no FSA, baseline'd WCAG), bundle-size breakdown, test-suite overview.
### Changed
- **Test count: 672 → 711 pass / 0 fail / 2 skipped** (713 total — Groups A/B/C node-test additions from the rebuild and the Sesjon 1318 re-review remediation; +5 Playwright e2e specs run via `npm run test:e2e`).
- **Theme bootstrap IIFE** (Wave 2 Step 6) sets `data-theme="dark"` as default; respects `localStorage('voyage-theme')` → `prefers-color-scheme: dark` matchMedia → fallback `dark`. Theme-toggle button in topbar (Wave 2 Step 7) persists user choice.
- **Page-shell pattern** (Wave 2 Step 9) — `page__eyebrow` + `page__title` + `page__lede` + `page__meta` matches DS reference.
- **Annotation export filename** — `annotated-{target}.md` where target = brief|plan|review|artifact (derived from frontmatter).
### Fixed
- **Browser-side anchor regex** synced with Node-side allowlist (Wave 4 Step 16) — `VOYAGE_ANCHOR_RE` / `VOYAGE_ANCHOR_ID_RE` mirror `lib/parsers/anchor-parser.mjs:20-25`; `parseAnchor` validates ID + intent + line-number per attribute regex.
- **Hard-coded WIP token strings** (Wave 1 Step 4) — voyage tokens normalized to canonical `--color-scope-voyage` + `--ds-color-*` tokens; literal pixel font-sizes replaced with DS scale.
- **Plan-determinism test reference path** (Wave 0 Step 3) — `tests/synthetic/plan-determinism.test.mjs:115` updated to `plan-run-C.md` (alphabetic convention matching A/B).
### Re-review remediation (Sesjon 1318)
After the rebuild, an independent `/trekreview` (Sesjon 13) flagged 11 findings (5 BLOCKER + 5 MAJOR + 1 MINOR). Waves 13 of remediation closed **all 11** with delivered code + tests — notably reinstating an **inline screenshots gallery** (`renderScreenshotGallery`, finding `31d28f65`; supersedes the original "deferred to v4.4" item below), wrapping `renderArtifact`'s `bodyHtml` in `DOMPurify.sanitize` (`1d3591d4`), converting the SC2 a11y spec to absolute zero-violation mode and removing `a11y-baseline.json` (`09132940`), documenting the Phase 9 `plan_critic` frontmatter injection (`906f155d`), and asserting the `.fleet-grid` 4-column CSS parity (`99707f51`). A Sesjon 18 re-review then found 3 **new** findings introduced by the remediation code itself — deferred to v4.3.1 (see *Known issues* below).
### Deferred to v4.4
- WCAG-violations fix (HTML FROZEN in Sesjon 6 per Wave 7 verification-only scope; superseded for the a11y-spec by the absolute zero-violation conversion in the Sesjon 1318 remediation).
- File System Access API (FSA) write-back (currently `Blob`-download only).
- `<project>/design/`-folder traversal.
- `IndexedDB` primary persistence (localStorage stays primary for v4.3).
- Hybrid claude-design-skill → canvas → frontend-design workflow (research/02 deferred to v4.4+).
### Known issues — deferred to v4.3.1 → all resolved in v5.0.0 (moot)
The Sesjon 18 re-review surfaced 3 findings in code the Sesjon 1318 remediation introduced. They were deferred to a planned v4.3.1 patch; **v5.0.0 makes all three moot** by removing the playground entirely:
- **`87069b35` (SECURITY_INJECTION, defense-in-depth)** — `renderScreenshotGallery()` interpolated `screenshots[].dataUrl` raw into an `<img src>`. **Moot in v5.0.0** — `playground/voyage-playground.html` is deleted.
- **`4cc3bfc9` (PLAN_EXECUTE_DRIFT)** — `commands/trekplan.md` Phase 9 used a backtick template literal as an ES `import` specifier (`SyntaxError`). **Moot in v5.0.0** — the Phase 9 `plan_critic`-injection-via-`readAndUpdate` block is deleted.
- **`c6c64a58` (MISSING_TEST)** — no test covered the gallery `dataUrl` injection path. **Moot in v5.0.0** — the gallery and its host file are deleted.
### Notes
- Brief, research (4 briefs), plan, and execute (6 sessions) all produced from the v4.3 pipeline itself. SC11 pipeline-self-eat gate continues to hold.
- Path A/B/C decision (cache-first / sequential `--no-ff` waves / hybrid identical-tool) unchanged from v3.4.0 — Path B remains in production.
- Plan quality score 86/100 Grade A APPROVE_WITH_NOTES (adversarial review Phase 9, 22 revisions documented in plan.md Revisions table). Sesjon 1318 remediation plans reviewed independently; Sesjon 18 re-review verdict BLOCK with 3 findings (now v4.3.1).
## v4.2.0 — 2026-05-09 — Annotation pipeline + first voyage playground
**Additive. No breaking changes. Forward-compat with every brief / plan / review written before v4.2.**
Voyage's first interactive playground. The `/trekrevise` command and the
annotation pipeline (Handover 8) close the operator-feedback loop: render an
artifact in the browser, anchor comments at block boundaries, export a batch,
paste back as `/trekrevise --apply` for in-place revision with audit trail.
### Added
- **`/trekrevise` command** (`commands/trekrevise.md`) — seventh pipeline
command. Phase 1 parse/validate, Phase 2 read source + rollback hygiene,
Phase 3 parse anchors + validate placement, Phase 4 compute revision
diff + digest, Phase 5 atomic apply, Phase 6 post-write integrity check,
Phase 7 optional review-gate, Phase 8 stats + report. Accepts
`--profile`, `--gates`, `--reason`, `--from-file`, `--target`.
- **Handover 8 (annotation → revision)** documented in
`docs/HANDOVER-CONTRACTS.md` (~100 lines). Producer/consumer contract,
schema for the four additive frontmatter fields (`revision`,
`source_annotations`, `annotation_digest`, `revision_reason`), block-level
anchor format, lifecycle, single-iteration MVP rationale.
- **`playground/voyage-playground.html`** — single self-contained HTML file
with vendored `markdown-it` v14.1+ + `highlight.js` (no `marked` per Issue
#3515). Three annotation creation gestures (drag-select, hover-anchor,
click-anchor), modal form with intent taxonomy (change/add/remove/
clarify/risk), sidebar with critique-card-list, export flow with
one-click clipboard copy, A11Y baseline (keyboard nav, ARIA roles,
focus traps).
- **`playground/lib/`** vendored markdown-it + highlight.js + front-matter
plugin, with `VENDOR-MANIFEST.json` recording version + license + sha.
Locked at minimum versions per research-03.
- **`scripts/render-artifact.mjs`** server-side render CLI — emits the same
HTML the playground produces. Used by the SC11 pipeline-self-eat gate
(rendering `.claude/projects/*/brief.md` + `plan.md` exits 0 with
non-empty output).
- **`scripts/vendor-playground-libs.mjs`** vendor-refresh helper — fetches
pinned versions and updates the manifest.
- **`lib/parsers/anchor-parser.mjs`** — pure-function anchor parser:
`parseAnchors`, `addAnchors`, `stripAnchors`, `validateAnchorPlacement`.
Block-boundary discipline (no in-list, no mid-paragraph, no line-start
collisions). Round-trip tested with byte-identical fixture
(`tests/fixtures/annotation/annotation-example.md`).
- **`lib/parsers/annotation-digest.mjs`** — pure-function deterministic
SHA-256 over canonical-sorted `source_annotations`. First 16 hex chars.
Idempotent: same batch → same digest.
- **`lib/util/markdown-write.mjs`** — `serializeFrontmatter` +
`atomicWriteMarkdown` (tmp-file + rename, same crash-safety as
`atomic-write.mjs`).
- **`lib/util/revision-guard.mjs`** — atomic-write rollback guard for
`/trekrevise` Phase 6 round-trip integrity check (plan-critic M4).
Restores byte-identical pre-write file from `*.local.bak` on failure.
- **`docs/annotation-quickstart.md`** — operator-facing 7-step
walkthrough, references `tests/fixtures/annotation/annotation-example.md`
for hands-on verification.
- **`tests/fixtures/annotation/`** — 5 fixtures covering `brief`, `plan`,
`plan-large`, `review`, and the canonical `annotation-example.md`.
- **9 new test files** + extension to `tests/lib/doc-consistency.test.mjs`:
- `tests/lib/markdown-write.test.mjs` (round-trip)
- `tests/parsers/anchor-parser.test.mjs` (parse/add/strip/validate)
- `tests/parsers/annotation-digest.test.mjs` (determinism)
- `tests/validators/{brief,plan,review}-validator-annotation-fields.test.mjs`
(forward-compat: validators tolerate the four optional fields)
- `tests/integration/annotation-roundtrip.test.mjs` (SC2/SC3/SC7
byte-identical + scale)
- `tests/integration/schema-rollback.test.mjs` (SC5b validator-FAIL
rollback)
- `tests/integration/source-annotations.test.mjs` (frontmatter
audit trail)
### Changed
- **Forward-compat policy** documented in `lib/validators/{brief,plan,
review}-validator.mjs` headers — validators silently accept the four
optional annotation fields without bumping `*_version`.
- **`tests/lib/doc-consistency.test.mjs`** extended with ~16 new pins:
Handover 8 section, templates' annotation-field comment blocks,
`playground/` directory, `marked`-ban, `scripts/render-artifact.mjs`,
`lib/util/revision-guard.mjs`, `parseAnchors` round-trip on the example
fixture, `/trekrevise` in CLAUDE.md / plugin README / marketplace root
README, `## v4.2.0` in CHANGELOG, `docs/annotation-quickstart.md`
existence + ≤7 numbered steps + `annotation-example.md` literal
reference.
- **`PIPELINE_COMMANDS` constant** in doc-consistency tests — was 6
(brief/research/plan/execute/review/continue), now 7 (+ revise). Pin
test renamed from "all six pipeline commands" to "all seven pipeline
commands".
- **`settings.json` known-scopes allowlist** — `'trekrevise'` added to
the recognized top-level scope list (was added in Wave 2 Step 6 to
keep tests green pre-Step 12).
- **Templates** (`templates/{plan,trekbrief,trekreview}-template.md`)
prefixed with comment-only documentation block of the four optional
annotation fields. No `*_version` change; existing templates still
parse and validate identically.
### Notes
- **`marked` is banned** from `playground/*` (risk-assessor H1 +
doc-consistency pin). Issue #3515 corrupts content silently after
HTML comments in lists — voyage anchors after step-list would be
invisible. `markdown-it` v14.1+ is the locked renderer.
- **Single-iteration MVP** per research-05. Each operator batch produces
one `revision:` increment. Multi-iteration loops (revise → re-review
→ revise again) are deferred indefinitely; the brief's SC4 wording is
single-revision.
- **No `*_version` bump** for v4.2 — the four new fields are additive.
Brief/plan/review artifacts written before v4.2 validate as
`revision: 0` without migration.
- **Atomic-write contract** for `/trekrevise` — single
`writeFileSync`+`renameSync` (research-05 Option B). Option C
(`atomicWriteJson + Edit-tool` split-write) was inadmissible per
risk-assessor C2: no crash-safe window between frontmatter patch and
body Edit.
- **608 tests pass** (606 → 0 fail → 2 skipped Docker) — baseline
before v4.2 was 490; this release adds ~118 tests.
## v4.1.0 — 2026-05-09
**Additive. No breaking changes. Forward-compat with all v4.0.0 plans.**
Two new feature surfaces: model profiles (`--profile <name>`) and
opt-in OpenTelemetry / Prometheus export at session-end.
### Added
- **Model profile system** (`lib/profiles/{economy,balanced,premium}.yaml`,
`lib/profiles/resolver.mjs`, `lib/validators/profile-validator.mjs`).
Three built-in tiers + custom-yaml support. Lookup order: explicit
`--profile` flag → plan frontmatter `profile:` → `VOYAGE_PROFILE`
env-var → `balanced` default. See `docs/profiles.md` for the decision
tree, custom authoring, and cost estimation disclaimer.
- **`--profile <name>` flag** documented in all 6 pipeline commands
(`commands/trek{brief,research,plan,execute,review,continue}.md`).
- **5 additive profile fields** in JSONL stats (`profile`,
`profile_source`, `phase_models`, `model_used`,
`phase_models_resolved`) for cost-attribution and drift detection.
- **OpenTelemetry / Prometheus export** at session-end via new Stop
hook (`hooks/scripts/otel-export.mjs`, wired in `hooks/hooks.json`).
Strict opt-in via `VOYAGE_EXPORT_MODE`:
- `textfile` — Prometheus exposition format → node-exporter textfile
- `otlp` — OTLP/JSON POST → otel-collector
- `off` (default) — no work done
- **Local Docker Compose stack** at `examples/observability/`
(Prometheus 3.0.1 + node-exporter 1.10.2 + Grafana 11.4.0 + OTel
Collector 0.115.0, all version-pinned per research/01).
- **Operator documentation** at `docs/observability.md` (151 lines:
env-var matrix, security mitigations, limitations, CVE-pinned
minimum versions).
- **Cross-tier Jaccard smoke test**
(`tests/integration/profile-jaccard-smoke.test.mjs`) with parked-
synthetic fixtures and 0.55 conservative starting threshold per
research/02. Empirical recalibration deferred to v4.2.
- **MANIFEST_PROFILE_DRIFT warning** in `plan-validator --strict`
when plan frontmatter `profile:` differs from any step manifest's
`profile_used`. Plan remains valid (warning, not error).
### Changed
- **`lib/parsers/arg-parser.mjs` FLAG_SCHEMA** — additively adds
`--profile <name>` to all 6 commands. Existing flags unchanged.
- **`lib/parsers/manifest-yaml.mjs` schema** — additively adds new
`OPTIONAL_STRING_KEYS` collection with `profile_used` as the first
member. Forward-compat: legacy plans without `profile_used` parse
unchanged; new plans with it round-trip cleanly.
### Fixed
- **Doc-consistency coverage** now spans all 6 pipeline commands
(was 5 — `/trekcontinue` was missing per HIGH risk-assessor).
- **Plan-validator strict mode** detects profile-drift between plan-
level frontmatter and step-level manifests (brief Assumptions
block 7).
### Notes
- Forward-compat: every v4.0.0 plan validates `valid: true` under
v4.1.0's `plan-validator --strict`. No migration needed.
- Tests: 361 baseline → 484 pass + 2 skipped (Docker-dependent).
- Path A/B/C decision (v3.4.0) is unchanged. Path C remains closed.
- v4.2 deferred items: ROUGE-L scoring, char-4gram MinHash, empirical
Jaccard re-calibration, `balanced.external_research_enabled`
operator-override.
## v4.0.0 — 2026-05-05
**Breaking. Rebrand. No migration path.**
The `ultraplan-local` plugin is renamed to **Voyage**. All seven commands
are renamed from `/ultra*-local` to `/trek*`:
- `/ultrabrief-local` → `/trekbrief`
- `/ultraresearch-local` → `/trekresearch`
- `/ultraplan-local` → `/trekplan`
- `/ultraexecute-local` → `/trekexecute`
- `/ultrareview-local` → `/trekreview`
- `/ultracontinue-local` → `/trekcontinue`
- `/ultraplan-end-session-local` → `/trekendsession`
Internal renames: FLAG_SCHEMA keys, stats filenames, env vars, branch
namespace, type discriminators, hook stderr prefixes, settings.json scope
keys all updated to `trek*` form. Plugin identity is now `voyage`.
No backward compatibility. Old artifacts on user disks are orphaned.
Fork-and-own users re-fork from main if they want to follow upstream;
there is no migration helper.
See `TRADEMARKS.md` for the new trademark hygiene notice clarifying
independence from Anthropic.
Out-of-scope handoff (operator follow-up): plugin directory rename
`plugins/ultraplan-local/` → `plugins/voyage/`, marketplace-root
coordination across `../../README.md`, `../../CLAUDE.md`, and
`../../.claude-plugin/marketplace.json`.
## [3.4.1] - 2026-05-04 — `/ultracontinue-local` hot-fix + ultra-cc-architect doc-rydding
Forward-patch on top of v3.4.0. Fixes four bugs in the multi-session
resumption path discovered post-3.3.0 ship, plus a doc-rydding sweep
that generalizes references to the `ultra-cc-architect` plugin (no
longer publicly distributed). Non-breaking: the `architecture/overview.md`
filesystem contract (Handover 3) is preserved for any compatible
producer.
### Fixed
- **Bug 1 — `/ultracontinue-local` Phase 0 + auto-discovery.**
`commands/ultracontinue-local.md` no longer prints a usage block on
bare invocation; it enters auto-discovery (sorts active state files
numerically by `Date.parse(updated_at)`, not lexicographically — fixes
cross-timezone wrong-order). `--help` / `-h` is the only flag that
surfaces the usage block. (Steps 13)
- **Bug 2 — Phase 2 placeholder leakage.** `{state-file-path}`,
`{project-dir}`, `{path-a}` and similar template tokens are gone from
`commands/ultracontinue-local.md` Phase 2 prose. The command now reads
the resolved path inline rather than asking the model to substitute.
(Steps 45)
- **Bug 3 — frontmatter consistency for `NEXT-SESSION-PROMPT.local.md`.**
Producers (`/ultraexecute-local` Phase 8, `/ultraplan-end-session-local`)
now write `produced_by:` + `produced_at:` (ISO-8601) frontmatter on the
prompt file. `/ultracontinue-local` Phase 1.5 cross-checks `produced_at`
against the sibling state file's `updated_at` and refuses to proceed on
inconsistency (`NEXT_SESSION_PROMPT_INCONSISTENT`). Files without
frontmatter remain tolerated (warning, not error) for backwards
compatibility. State-anchored staleness check is the primary signal;
24h wall-clock is a soft warning only. (Steps 68)
- **Bug 4 — `--cleanup` for completed projects.** `/ultracontinue-local
--cleanup <project-dir>` (dry-run by default) and
`/ultracontinue-local --cleanup --confirm <project-dir>` (deletes the
state file + sibling prompt file). Refuses non-completed status with
no force flag. Idempotent — safe to re-run after partial cleanup.
(Steps 910)
- **ESM/CJS regression in `commands/ultraplan-end-session-local.md`.**
Phase 3 now invokes `atomicWriteJson` via `node --input-type=module`
rather than the broken `require()` call. State writes from the helper
command are no longer silently lost on Node 18+. (Collateral in Step 7)
- **`plugin.json` description drift.** "Five-command" → "Six-command";
trailing `ultra-cc-architect` sentence removed. (Step 13)
### Added
- `lib/util/cleanup.mjs` — refuse-unless-completed gate + dry-run +
confirm-aware deleter for state file and sibling prompt file.
- `lib/validators/next-session-prompt-validator.mjs` — frontmatter
validator + state-anchored staleness check + CLI shim.
- `tests/commands/ultracontinue.test.mjs` — new test directory; covers
SC-1 (auto-discovery sort + usage block), SC-2 (.md diagnostic),
SC-3 (path-guard ALLOW + no placeholder), SC-4 (consistency), SC-5
(cleanup wiring).
- `tests/validators/next-session-prompt-validator.test.mjs` — unit
coverage for frontmatter consistency: matching, mismatch, wall-clock,
no-frontmatter, missing-field.
- `tests/lib/cleanup.test.mjs` — cleanup util tests.
- `--cleanup` + `--confirm` flags on `/ultracontinue-local`.
### Changed
- `docs/HANDOVER-CONTRACTS.md` Handover 7 gains a § Lifecycle subsection
documenting producer/consumer arbeidsdeling, the stale-file principle
(operator-invoked `--cleanup`, no auto-cleanup, no force flag), the
`produced_by:` + `produced_at:` frontmatter contract, and idempotency.
- `CLAUDE.md`, `README.md`, `SECURITY.md` — generalize references to
`ultra-cc-architect` plugin to "opt-in upstream architect plugin (not
bundled)". `CHANGELOG.md` historical references are preserved with a
2026-05-04 banner stating the plugin is no longer publicly distributed.
### Tests
- 322 (v3.3.0 baseline) → 358 (post-Wave 4) → 361 (Step 11 doc-pins) →
~363 at release. New tests: arg-parser FLAG_SCHEMA extensions,
ultracontinue auto-discovery + diagnostic + path-guard +
consistency + cleanup wiring, next-session-prompt-validator
frontmatter checks, cleanup util, and 3 new doc-consistency pins
for Handover 7 § Lifecycle + next-session-prompt-validator CLI shim.
## [3.4.0] - 2026-05-04 — Ultra-pipeline speedup
Hardenings + scaffolding to support an autonomy chain from brief approval
through main-merge with parallel-wave execution. Non-breaking: all existing
flags continue to work; `--gates` is opt-in autonomy control. Plan-step
delivery covered by 18 plan-v2 steps split across three Waves.
### Added — autonomy + parallelism
- `lib/util/autonomy-gate.mjs` — autonomy-gate state machine (idle → approved → executing → merge-pending → main-merged) for the brief-to-merge chain (Step 4)
- `lib/review/plan-review-dedup.mjs` — deterministic plan-review dedup helpers reused by Phase 9 inline dedup (Step 5)
- `lib/stats/event-emit.mjs` — single-source stats event emitter for autonomy-gate transitions and main-merge-gate (Step 6 + 12)
- `--gates {open|closed|adaptive}` flag on all four pipeline commands (Step 11). Controls how many autonomy checkpoints surface to the operator. Default `adaptive`.
- `hooks/scripts/post-compact-flush.mjs` — PostCompact hook re-injects session-state after context compaction so multi-session work survives a compaction boundary (Step 13)
### Added — hardenings + defense-in-depth
- `commands/ultraplan-local.md` Phase 8 schema-drift defense — post-generation `plan-validator --strict` run blocks plans containing narrative `Fase`/`Phase` headers or missing Manifest YAML; addresses the documented Opus-4.7 plan/list-emission drift (Step 7, builds on v1.8.0 fix in commit `9ecd669`)
- `commands/ultraplan-local.md` Phase 9 parallelization with inline dedup — single-message multi-Agent dispatch + plan-review-dedup helper (Step 8)
- `commands/ultraexecute-local.md` Phase 2.6 wave-executor — 11 sub-hardenings for plugin-in-monorepo + gitignored-state topology: GIT_OPTIONAL_LOCKS, --max-turns, --max-budget-usd, scoped --allowedTools, --append-system-prompt-file, SHARED_CONTEXT_FILE, SAFETY_PREAMBLE, deferred git push origin, push-before-cleanup ordering, GH #36071 awareness (Step 9)
- `templates/headless-launch-template.md` — mirrors Phase 2.6 hardenings so the headless launcher is consistent with Phase 2.6 prose (Step 10)
- `commands/ultraplan-local.md` + `agents/planning-orchestrator.md` Phase 8 main-merge gate — emit stats event for main-merge gate decisions (Step 12)
### Added — quality infrastructure
- `tests/lib/agent-frontmatter.test.mjs` — pins agent frontmatter invariants (Step 1-test)
- `tests/synthetic/plan-run-A.md` + `plan-run-B.md` + `plan-determinism.test.mjs` — SC7 plan-determinism floor (Jaccard ≥ 0.833) (Step 15)
- `tests/synthetic/review-run-A.md` + `review-run-B.md` + `review-determinism.test.mjs` — SC7 review-determinism floor (Jaccard ≥ 0.833) (Step 15)
- `tests/hooks/path-guard.test.mjs` — pins pre-write-executor BLOCK rules; regression detection if BLOCK_RULES are silently weakened (Step 18)
- `tests/hooks/bash-guard.test.mjs` — pins pre-bash-executor BLOCK rules (Step 18)
- `examples/01-add-verbose-flag/perf-measure.local.sh` (gitignored) — operator-driven SC1 wall-time measurement harness (Step 16)
- `examples/01-add-verbose-flag/perf-baseline.local.md` (gitignored) — SC1 gate template + measurement protocol (Step 16)
### Changed
- `lib/parsers/manifest-yaml.mjs` — schema extended additively to recognize `skip_commit_check` and `memory_write` flags (forward-compat: unknown keys ignored). Used by Step 14 and other steps that legitimately don't produce a git commit (Step 4)
- `.gitignore` — generalized `*.local.md` + `*.local.json` rules into a single `*.local.*` glob, covering `.local.sh` and any future `.local.<ext>` (Step 16)
- `package.json` — `version` 3.3.0 → 3.4.0
- `.claude-plugin/plugin.json` — `version` 3.3.0 → 3.4.0
- `README.md` + `CLAUDE.md` — version refs + `--gates` flag docs + Wave 1+2+3 architecture notes
- Marketplace root `README.md` — ultraplan-local version line bumped 3.3.0 → 3.4.0
### Memory updates (Step 14)
- `~/.claude/projects/-Users-ktg--claude-plugins-marketplaces-ktg-plugin-marketplace/memory/project_ultraplan_opus47_gap.md` — rewritten per Path B (memory truth gate). New body:
(a) historical context referencing commit `9ecd669` (v1.8.0 fix, 2026-04-17),
(b) empirical evidence (5 organic plans 2026-04-18 → 2026-05-01 all clean per research/04 D1),
(c) defense-in-depth added in Step 7 (`commands/ultraplan-local.md` Phase 8),
(d) residual risk surface for plugins NOT using ultraplan-local prompt arch.
- `~/.claude/projects/.../memory/MEMORY.md` — one-liner updated to flag mitigation status. The drift mechanism stays a known model-level risk; the plugin-level mitigation is documented.
These memory edits are recorded here because the files live outside the
repo and have no git checkpoint of their own.
### Architecture decision (Path B not Path C)
Brief offered three architectural options for the speedup work:
- Path A — cache-first (drop --allowedTools per child) — REJECTED. Inverts security model; plugin hooks don't fire reliably in `claude -p` (research/06 GH #36071).
- Path B — sequential `--no-ff` parallel waves with manifest-driven failure recovery — CHOSEN. v3.4.0 ships this.
- Path C — hybrid (cache-warm sentinel + identical-tool parallel) — DEFERRED to v3.5.0 contingent on cache-telemetry data harvested by Step 6's stats events. Requires unverified Q3 (CLAUDE_CODE_FORK_SUBAGENT cache-prefix preservation at 150-250K context).
### Tests
- 265 baseline (post-Wave-2) → ~290+ green after Wave 3 (Steps 15 + 18 add ~25 tests; doc-consistency may add a couple more pin tests)
### Open Questions for v1.1
- Real-LLM determinism measurement — synthetic floor (0.833) is anchored; the foreign-repo run against `examples/01-add-verbose-flag/` for SC1+SC4+SC5 measurement is a separate post-ship session.
- Cache-telemetry harvest — Step 6 stats events emit shapes useful for Path C decision; harvesting + analysis is v1.1 work.
- Marketplace-level autonomy-gate — currently per-plugin scope. Marketplace-wide policy + dashboards are out of scope for this solo project.
## [3.3.0] - 2026-05-01
Adds `/ultracontinue-local` — a zero-friction multi-session resumption command —
plus the contracted **Handover 7 (.session-state.local.json)** that any
session-end mechanism may write. Non-breaking: existing brief / research / plan /
execute / review pipelines are untouched. The state file is the contract;
`/ultracontinue` only reads.
### Added
- `/ultracontinue-local` command — read `.session-state.local.json` and immediately resume the next session in a multi-session ultraplan project (model: opus)
- `/ultraplan-end-session-local` helper — informal multi-session flows write the state file via this command (model: sonnet)
- `lib/validators/session-state-validator.mjs` — schema-v1 validator + CLI shim for `.session-state.local.json` (forward-compat: unknown top-level keys ignored)
- `lib/util/atomic-write.mjs` — `atomicWriteJson(path, obj)` extracted from `pre-compact-flush.mjs` for reuse
- `tests/fixtures/session-state/{valid-in-progress,malformed}.json` — synthetic fixtures
- Handover 7 (.session-state.local.json) — full schema documented in HANDOVER-CONTRACTS.md
- Doc-consistency pins: `Handover 7`, `session-state-validator` CLI shim, `/ultracontinue-local` in CLAUDE.md commands table
- ~22 new tests (163 baseline → 185 green; atomic-write + session-state-validator + doc-consistency pin extensions)
### Changed
- `commands/ultraexecute-local.md` — Phase 8 (canonical end-of-session), Phase 2.55 (pre-flight stop), Phase 4 (entry-condition stop) now write `.session-state.local.json` as a sibling to `progress.json` (Handover 7 producer)
- `hooks/scripts/pre-compact-flush.mjs` — also refreshes `.session-state.local.json` before context compaction (monotonic; never advances state to a non-resumable status)
- `.gitignore` — covers `*.local.json` (state files never enter git)
### Tests
- 163 baseline → 185 green (+22 new)
### Open Questions for v1.1
- graceful-handoff convergence: graceful-handoff v2.2 may dual-write `.session-state.local.json` so a single chat-end mechanism feeds `/ultracontinue`. v3.3.0 ships the contract; convergence is additive.
- `/continue` builtin Claude Code collision (unverified) — `/ultracontinue` prefix mitigates risk; monitor release notes.
## [3.2.0] - 2026-05-01
Adds `/ultrareview-local` as the fifth and final command in the ultra-suite —
independent post-hoc review of delivered code against the brief — and the
contracted **Handover 6 (review → plan)** feedback loop. Non-breaking: existing
brief/research/plan/execute pipelines are untouched.
### Added
- `/ultrareview-local` command — fifth and final command in ultra-suite (independent post-hoc review of delivered code against brief)
- `lib/validators/review-validator.mjs` — schema validator for new `type: ultrareview` artifact
- `lib/review/rule-catalogue.mjs` — 12 canonical rule keys (version-pinned)
- `lib/parsers/finding-id.mjs` — stable SHA1 finding-ID computation
- `lib/parsers/jaccard.mjs` — Jaccard similarity for determinism testing
- 4 new agents: review-orchestrator (opus inline ref), brief-conformance-reviewer, code-correctness-reviewer, review-coordinator (Judge Agent pattern)
- Handover 6 (review → plan) — contracted feedback loop documented in HANDOVER-CONTRACTS.md
- ~43 new tests (109 → ~152 baseline; rule-catalogue + finding-id + jaccard + review-validator + brief-validator extension + arg-parser extension + project-discovery extension + 4 doc-consistency pins + 3 review-determinism integration + 3 source-findings SC3(b) integration)
### Changed
- `brief-validator.mjs` accepts `type: ultrareview` in addition to `ultrabrief` (Handover 6 enabler)
- `/ultraplan-local` consumes `type: ultrareview` brief — extracts findings as plan goals + populates `source_findings` array in plan frontmatter
- `lib/parsers/project-discovery.mjs` discovers `review.md` and supports `'review'` phase
- `lib/parsers/arg-parser.mjs` adds `ultrareview` command to FLAG_SCHEMA
- `hooks/scripts/session-title.mjs` adds `/ultrareview-local` to COMMANDS map
- Severity vocabulary aligned with llm-security: `Critical | High | Medium | Low | Info` (rule-catalogue retains the 4-tier brief vocabulary; 5-tier migration is a v1.1 candidate)
### Tests
- 109 baseline → ~152 green (+43 new)
### Open Questions for v1.1
- Migrate 4-tier severity (BLOCKER/MAJOR/MINOR/SUGGESTION) → llm-security 5-tier (Critical/High/Medium/Low/Info) — research/02 advised but deferred to keep brief contract intact.
- Real-LLM determinism measurement (current SC4 test uses synthetic fixtures).
- SC2 end-to-end false-positive integration test (currently agent-prompt-level only).
## [3.1.0] - 2026-05-01
Quality program: pure quality lift, no scope creep. Built around the
fork-er onramp — anyone cloning this plugin should get value out of
the box. 109 zero-dep tests gate readiness.
### Added — testing & validation infrastructure (Spor 0+1)
- `package.json` with `node:test` runner and zero npm dependencies (`engines.node>=18`)
- `lib/util/{result,frontmatter}.mjs` — Result-shape helpers and a hand-rolled YAML subset parser supporting list-of-dicts (the form used by `must_contain` in real plans)
- `lib/parsers/{plan-schema,manifest-yaml,project-discovery,arg-parser,bash-normalize}.mjs` — primitive parsers
- `lib/validators/{brief,research,plan,progress}-validator.mjs` and `lib/validators/architecture-discovery.mjs` — wrappers over parsers, each with a CLI shim (`if (import.meta.url === ...)`) so they can be invoked from Bash via `node ${CLAUDE_PLUGIN_ROOT}/lib/validators/X.mjs`
- `tests/lib/*` + `tests/validators/*` — 109 tests
- `tests/lib/doc-consistency.test.mjs` — pins agent-table count, command-table coverage, plan_version invariant, and settings.json scope cleanliness; first-red surfaces drift between docs and code
### Changed — wiring (Spor 1)
| Site | Was | Is |
|------|-----|-----|
| `/ultrabrief-local` Phase 4g | implicit trust | `node lib/validators/brief-validator.mjs --json` |
| `/ultraplan-local` Phase 1 | inline frontmatter parse + `test -f` | brief-validator `--soft` + research-validator `--dir` + architecture-discovery |
| `agents/planning-orchestrator.md` Phase 5.5 | three `grep -cE` calls | single `node lib/validators/plan-validator.mjs --strict --json` |
| `/ultraexecute-local` Phase 2.3 (`--validate`) | inline regex | `plan-validator --strict` + `progress-validator` |
Validators default to `strict: false`. Only `--validate` mode and Phase 5.5 use `--strict`. Existing in-flight projects under `.claude/projects/` continue to work.
### Added — handover contracts (Spor 2)
- `docs/HANDOVER-CONTRACTS.md` (~310 lines) — single source of truth for the 5 pipeline handovers (brief→research, research→plan, architecture→plan EXTERNAL, plan→execute, progress.json resume). Per-handover sections: Producer / Consumer / Path conventions / Frontmatter schema / Body invariants / Validation strategy / Versioning / Failure modes. Architecture handover is explicitly an external contract — drift-WARN never drift-FAIL.
### Added — PreCompact resume integrity (Spor 2, P0 fix)
- `hooks/scripts/pre-compact-flush.mjs` — PreCompact-event hook (CC v2.1.105+) that reconciles `progress.json` with git history before context compaction. Atomic write (tmp + rename), monotonic only (current_step never decreases), fail-open (always exit 0). Closes the documented progress.json drift bug from `docs/ultraexecute-v2-observations-from-config-audit-v4.md` — `--resume` now works after long conversations.
- `hooks/hooks.json` registers the PreCompact entry.
### Added — examples (Spor 3)
- `examples/01-add-verbose-flag/` — calibrated end-to-end pipeline demo for a small realistic task (add `--verbose` to a CLI parser). Includes `brief.md`, `research/01-cli-parser-conventions.md`, `plan.md` (7 steps with full manifest YAML), and `progress.json`. All four artifacts pass their respective validators. Hand-calibrated, not LLM-generated, so the example stays reviewable.
- `examples/REGENERATED.md` per example documents calibration date and regeneration triggers.
- `examples/README.md` walks fork-ers through what to study first.
### Changed — plan-critic semantic rubric (Spor 3, P0 fix)
`agents/plan-critic.md` rule #7 split into two parts:
1. **Literal blockers** (exact-string): `TBD`, `TODO`, `FIXME`, `XXX` always fire.
2. **Semantic rubric** (instruction-shaped): 8 deferred-decision tests covering vague modifiers, imperatives without targets, forward references without expansion, volume/quality without spec, edge-cases delegated, production-readiness delegated, path mismatch, and over-stuffed steps.
Calibrated against 5-phrase corpus the v3.0 exact-string blacklist missed: "implement as needed", "wire it up", "make it production-ready", "add tests where appropriate", "handle edge cases" — all five now flagged with rule citations.
### Added — CC v2.1.x feature adoption (Spor 3)
- **F8** — `MCP_CONNECTION_NONBLOCKING=true` documented under "Headless multi-session tuning" in README (CC v2.1.89+) for parallel `claude -p` sessions
- **F9** — `hooks/scripts/session-title.mjs` (UserPromptSubmit, CC v2.1.94+) sets `ultra:<command>:<slug>` titles for ultra invocations
- **F3** — `hooks/scripts/post-bash-stats.mjs` (PostToolUse, CC v2.1.97+) appends `duration_ms` per Bash call to `${CLAUDE_PLUGIN_DATA}/ultraexecute-stats.jsonl`
- **F12** — `disableSkillShellExecution: true` recommended in README "Security hardening" + `SECURITY.md` (CC v2.1.91+) for fork-ers handling untrusted plans
**Deferred:** F2 (hook `if`-field scoping). The plan called for scoping pre-bash-executor / pre-write-executor to ultraexecute sessions to "reduce false-positives in brief/plan" — but brief/plan don't issue Bash commands that match the destructive denylist, so the rationale doesn't hold. Universal protection wins.
### Added — security & extension docs (Spor 3)
- `SECURITY.md` — Forgejo private-issue reporting, supported = current minor only, scope (4 hooks + denylist), hardening recommendations
- `docs/architect-bridge-test.md` — manual smoke checklist for the ultraplan ↔ ultra-cc-architect bridge (1 paragraph, intentionally not CI)
- `README.md` — new "Extending the plugin" section: how to add an agent, switch the planning model, disable external research, find the data contract, disable the architect bridge
### Removed — vestigial config (Spor 0)
- `settings.json` `exploration` and `agentTeam` blocks (read by zero code; verified via grep before deletion)
- `docs/ultra-suite-brief_2.md` archived as `_archive-` (was paste from another plugin's work)
### Tests
109 tests, all green. `npm test` is the fork-readiness gate.
### Hook table (after v3.1.0)
| Hook | Event | CC version | Purpose |
|------|-------|-----------|---------|
| pre-bash-executor.mjs | PreToolUse(Bash) | 2.0+ | Block destructive shell commands |
| pre-write-executor.mjs | PreToolUse(Write) | 2.0+ | Block writes to sensitive paths |
| session-title.mjs | UserPromptSubmit | 2.1.94+ | Set `ultra:<cmd>:<slug>` titles |
| post-bash-stats.mjs | PostToolUse(Bash) | 2.1.97+ | Log Bash `duration_ms` to JSONL |
| pre-compact-flush.mjs | PreCompact | 2.1.105+ | Reconcile progress.json with git history |
### Files changed
This release is plugin-internal — no breaking changes to artifact formats or CLI surface. Forkers should `npm test` after pulling to confirm readiness.
## [3.0.0] - 2026-04-30
_Note (2026-05-04): the ultra-cc-architect plugin is no longer publicly distributed. The architecture/overview.md filesystem contract remains supported but no public producer ships._
### Architect extracted to its own plugin
The `/ultra-cc-architect-local` and `/ultra-skill-author-local` commands, all
seven of their agents, the `cc-architect-catalog` skill, the `ngram-overlap.mjs`
script, and the skill-factory test fixtures moved out of `ultraplan-local` and
into the new `ultra-cc-architect` plugin (v0.1.0).
### Why
`ultraplan-local` had drifted into containing two distinct domains:
1. A **universal planning pipeline** (brief → research → plan → execute) that
is technology-agnostic and works for any implementation task.
2. A **Claude-Code-specific architecture phase** that only makes sense when
building features for Claude Code itself.
Keeping them in one plugin caused three problems:
- Users who wanted only the planning pipeline had to clone an unfinished
CC-feature catalog and seven architect/skill-author agents they would
never invoke.
- The architect catalog (~11 seed skills) and the planning pipeline lived on
different release cadences. Architect work blocked pipeline development
and vice-versa.
- New users saw six commands when only four belonged to the core flow.
The architect was already marked `optional, v2.2` and was fully decoupled at
the code level — only one filesystem touchpoint remained: `/ultraplan-local`
auto-discovers `architecture/overview.md` if present, and gracefully handles
its absence. The split is therefore non-breaking for the planning flow.
### What moved to `ultra-cc-architect`
- **Commands:** `/ultra-cc-architect-local`, `/ultra-skill-author-local`
- **Agents:** `architect-orchestrator`, `feature-matcher`, `gap-identifier`,
`architecture-critic`, `skill-author-orchestrator`, `concept-extractor`,
`skill-drafter`, `ip-hygiene-checker`
- **Skills:** `skills/cc-architect-catalog/` (13 files)
- **Scripts:** `scripts/ngram-overlap.mjs`, `scripts/ngram-overlap.test.mjs`
- **Test fixtures:** `tests/fixtures/skill-factory/`,
`tests/fixtures/skill-drafter/`
All moves used `git mv`, so history follows the files into the new plugin.
### What stayed unchanged in ultraplan-local
- `/ultraplan-local` Phase 1 still auto-discovers `architecture/overview.md`.
The discovery is filesystem-based, not plugin-based — installing both
plugins gives you the full pipeline (brief → research → architect → plan
→ execute).
- `agents/planning-orchestrator.md` retains its architecture-note
cross-reference.
- All other commands (`/ultrabrief-local`, `/ultraresearch-local`,
`/ultraexecute-local`) are untouched.
### Migration
If you only used `/ultrabrief-local`, `/ultraresearch-local`,
`/ultraplan-local`, and `/ultraexecute-local`: no action needed. Update the
plugin and continue.
If you used `/ultra-cc-architect-local` or `/ultra-skill-author-local`:
install the new plugin alongside this one. In `~/.claude/settings.json`:
```json
{
"enabledPlugins": {
"ultraplan-local@ktg-plugin-marketplace": true,
"ultra-cc-architect@ktg-plugin-marketplace": true
}
}
```
Custom seed skills you added to `cc-architect-catalog/` follow with the
catalog. Use `git log --follow` if you need to track them in the new
location.
### plugin.json changes
- `version`: `2.4.0` → `3.0.0`
- `description`: now describes a four-command pipeline; CC-feature matching
is described as living in the separate `ultra-cc-architect` plugin
- `keywords`: removed `cc-architecture`
## [2.4.0] - 2026-04-19
### Breaking change — background mode removed
Default mode for `/ultraplan-local`, `/ultraresearch-local`,
`/ultra-cc-architect-local`, and auto-mode in `/ultrabrief-local` is now
foreground. The command blocks the session until the brief/plan is ready.
### Why
Background mode promised a swarm of specialized agents (docs-researcher,
community-researcher, architecture-mapper, plan-critic, feature-matcher,
gap-identifier, …) spawned by a background orchestrator. It did not work:
the Claude Code harness does not expose the Agent tool to sub-agents
(including ones run with `run_in_background: true`). The orchestrator
silently degraded to inline reasoning in a single Opus context, without
WebSearch, Tavily, WebFetch, or Gemini. Briefs were tagged "high confidence"
but were built on guesses from training data.
Source: github.com/anthropics/claude-code/issues/19077
Empirically confirmed in a plugin investigation on 2026-04-19: a probe
sub-agent reported its exposed tools as `Bash, Edit, Glob, Grep, Read,
Skill, ToolSearch, Write`. The Agent tool was not present, active or
deferred. Foreground execution from the main context spawns sub-agents
correctly (docs-researcher with Tavily intact).
### What changes
- Default execution: foreground for all four commands.
- `--fg` flag is preserved as a no-op alias (backward compatibility).
- Orchestrator agent files (`agents/research-orchestrator.md`,
`agents/planning-orchestrator.md`, `agents/architect-orchestrator.md`)
are redefined from "background executor" to "inline reference" — the
command markdown itself runs the phases in the main context where the
Agent tool is available.
- `ultrabrief-local` auto-mode now runs research sequentially inline
instead of parallel background. The ANTHROPIC_API_KEY billing check
is removed (irrelevant for foreground runs on subscription).
- `skills/cc-architect-catalog/background-agents-reference.md` has been
corrected: Shape A (orchestrator handoff) is now documented as an
anti-pattern, nested spawn from a sub-agent is NOT SUPPORTED.
### For headless/long-running runs
Use `claude -p` in a separate terminal window. Each headless session has
its own main context with the Agent tool, so the swarm works correctly.
### Migration
No code changes required for users. Scripts or documentation that assume
background execution must be updated — the commands now block until done.
## [2.3.2] - 2026-04-18
### Fixed — skill-drafter slug-collision hint
`skill-drafter` now checks for an existing file at
`{catalog_root}/<slug>.md` before writing its draft to `.drafts/`.
When a collision is detected, the agent prepends a warning block to
its confirmation output showing the overwrite risk and a suggested
qualified slug derived from the concept handle. The draft is still
written to `.drafts/<slug>.md` — the check is a hint, not a block.
**Why.** v2.3.0 dogfood surfaced the risk (logged as
`post_dogfood_findings[0]` in that run's `progress.json`): when the
drafter produced `.drafts/hooks-pattern.md` with an existing approved
`hooks-pattern.md` seed present at the catalog root, the pipeline
gave no signal that manual `mv` during promotion would silently
overwrite the seed. The v2.3.1 qualified-slug convention gave us the
mechanism to resolve collisions, but `skill-drafter` still didn't
surface them at the right moment — before promotion, not after.
**Changes.**
- `agents/skill-drafter.md` — new Step 2 "Check for slug collision at
the catalog root" between slug computation (Step 1) and reading the
source (Step 3). Subsequent workflow steps renumbered 3→7. New
"Suggesting a qualifier" guidance derives a kebab-case qualifier
from the `concept` field (or source basename as fallback). Output
format gains a `Collision:` field (`none | approved | pending |
auto-merged | soft`) and an optional warning block when the
collision is non-none. New Hard Rule "Slug-collision pre-write
check".
- `tests/fixtures/skill-drafter/slug-collision-expected.md` — new
reference fixture documenting the expected confirmation-output
shape across four scenarios (no collision, approved collision,
soft pending collision, collision with no good qualifier).
Skill-drafter is prompt-driven and not auto-tested; the fixture
anchors the shape for human verification and downstream parsers.
**Non-breaking.** No changes to `.drafts/` layout, frontmatter
contract, tool scope, or filename regex. Existing pipelines see an
extra field (`Collision:`) and an optional warning block — both
purely additive. No version-gated changes in
`skill-author-orchestrator` or `ip-hygiene-checker`.
## [2.3.1] - 2026-04-18
### Added — Qualified slug convention for cc-architect-catalog
Catalog files now follow `<cc_feature>[-<qualifier>]-<layer>.md`. The
unqualified slug (e.g., `hooks-pattern.md`) remains the canonical
baseline for a `(feature, layer)` pair. Qualified slugs (e.g.,
`hooks-observability-pattern.md`) cover specific sub-patterns without
displacing the baseline.
**Why.** v2.3.0 dogfood surfaced a design gap: the skill-factory
produced a draft `hooks-pattern.md` from a specialized source (progressive-
alert observability) that collided with the existing generic `hooks-pattern.md`
seed. Promoting would have replaced the general pattern with a narrow
one; discarding would have lost real catalog growth. Qualified slugs
resolve this by letting one feature host multiple named patterns at
different abstraction levels.
**Changes.**
- `skills/cc-architect-catalog/SKILL.md` — slug convention section added;
coverage table gains "qualified patterns" column; matcher logic
documented for N patterns per feature; modification rules cover
qualified-vs-canonical choice and slug-collision handling.
- `agents/feature-matcher.md` — catalog map is now
`cc_feature → {layer → [skills]}`; new "Selecting among multiple
patterns per feature" section (baseline by default, qualified when
justified, multiple when non-overlapping, never purely cosmetic);
`supporting_skill` accepts one-or-more skill names.
- `agents/gap-identifier.md` — adds `pattern_count[cc_feature]` signal
to the catalog coverage audit.
- `agents/architecture-critic.md` — adds supporting-skill verification:
every cited skill name must exist in the catalog; blocker severity.
- First qualified skill: `hooks-observability-pattern.md` (promoted from
`.drafts/`, sourced from `ai-psychosis/README.md`, ngram-overlap 0.01,
review_status approved).
**Non-breaking.** Existing unqualified slugs keep working. No changes to
`cc_feature` taxonomy. Hallucination gate unchanged (still validates
against `cc_feature` values, not slugs).
## [2.3.0] - 2026-04-18
### Added — Skill-factory Fase 1 MVP (`/ultra-skill-author-local`)
Manual one-skill-at-a-time generator for the `cc-architect-catalog`.
Channel 2 of the skill-factory strategy: a curated local source enters,
one draft skill exits in `skills/cc-architect-catalog/.drafts/`, with
n-gram containment scored against the source and stamped into the
draft frontmatter (or the draft is deleted when overlap is too high).
**Why now.** `/ultra-cc-architect-local` (v2.2.0) enforces a
hallucination gate that only permits feature proposals backed by the
catalog. With 10 seed skills covering 8 features × 2 layers, the
`feature-matcher` rarely finds a match and silently produces empty
proposals. Fase 1 unblocks catalog growth without spinning up
automation: one source, one draft, manual review, manual `mv` for
promotion.
**Pipeline.** Sequential, no retry, no parallelism:
```
/ultra-skill-author-local <source>
→ concept-extractor (sonnet, JSON output, gap-class C/D + cc_feature gate)
→ skill-drafter (sonnet, .drafts/<slug>.md with 9-field frontmatter)
→ ip-hygiene-checker (sonnet, runs scripts/ngram-overlap.mjs)
verdict accepted/needs-review → stamp ngram_overlap_score
verdict rejected → rm draft (no preservation)
```
**IP-hygiene utility.** Pure Node stdlib. Word-5-gram containment
similarity (asymmetric draft⊆source) plus longest-consecutive-shingle-
run secondary signal. Verdict bands: accepted (<0.15 AND <8),
needs-review (mid), rejected (≥0.35 OR ≥15). Short-text fallback to
n=4 when min(words) <500. CLI emits JSON.
**Calibration fixtures.** Three source/draft pairs in
`tests/fixtures/skill-factory/` pin the verdict bands against
representative prose: accepted (containment 0.014), needs-review
(0.211), rejected (0.676). Re-verify any threshold change against
these fixtures.
**New files:**
- `commands/ultra-skill-author-local.md`
- `agents/skill-author-orchestrator.md` (opus)
- `agents/concept-extractor.md` (sonnet)
- `agents/skill-drafter.md` (sonnet)
- `agents/ip-hygiene-checker.md` (sonnet)
- `scripts/ngram-overlap.mjs` + `scripts/ngram-overlap.test.mjs`
- `skills/cc-architect-catalog/.drafts/.gitkeep`
- `tests/fixtures/skill-factory/{source,draft}-{accepted,needs-review,rejected}.md`
- `tests/fixtures/skill-factory/README.md`
**Non-goals (explicit, Fase 1):**
- No automation, cron, or watcher
- No CC changelog diffing or auto-research
- No batch processing or review command
- No decision-layer skills (cross-feature comparison is Fase 2+)
- No URL or remote sources — local files only
- Manual `mv` from `.drafts/` to catalog root is the promotion mechanism
**New stats file:**
`${CLAUDE_PLUGIN_DATA}/ultra-skill-author-local-stats.jsonl`.
## [2.2.0] - 2026-04-18
### Added — `/ultra-cc-architect-local` optional pipeline step
New optional command that sits between `/ultraresearch-local` and
`/ultraplan-local`. Reads the task brief plus any research briefs, matches the
task against available Claude Code features (Hooks, Subagents, Skills, Output
Styles, MCP, Plan Mode, Worktrees, Background Agents), and produces an
**architecture note** with brief-anchored rationale and an explicit coverage-
gap section.
Pipeline position (5-steg):
```
/ultrabrief-local → /ultraresearch-local → /ultra-cc-architect-local (optional)
→ /ultraplan-local → /ultraexecute-local
```
The architecture note is designed as *priors* for planning, not mandates —
`/ultraplan-local` still runs its own exploration and may override proposals
with evidence.
**New files:**
- `commands/ultra-cc-architect-local.md` — command entry point (7 faser:
parse → background → read inputs → feature matching → synthesize → review
→ present).
- `agents/architect-orchestrator.md` (opus) — background orchestrator.
- `agents/feature-matcher.md` (sonnet) — matches CC features against brief +
research, with brief-anchored rationale per feature and a documented fallback
minimum list when the catalog is empty.
- `agents/gap-identifier.md` (sonnet) — emits issue-ready gap drafts (no
auto-posting; no auto-generation).
- `agents/architecture-critic.md` (sonnet) — adversarial review with a
hallucination gate (features outside catalog + fallback list → blocker).
- `skills/cc-architect-catalog/SKILL.md` — manifest + frontmatter contract.
- 10 seed skills in `skills/cc-architect-catalog/`:
`hooks-reference`, `hooks-pattern`, `subagents-reference`, `subagents-pattern`,
`skills-reference`, `output-styles-reference`, `mcp-reference`,
`plan-mode-reference`, `worktrees-reference`, `background-agents-reference`.
All handwritten (no third-party content copied, per brief §4.4).
**New flags:**
- `--project <dir>` (required) — reads `{dir}/brief.md` + `{dir}/research/*.md`,
writes to `{dir}/architecture/`.
- `--fg` — foreground execution.
- `--quick` — skip adversarial review.
- `--no-gaps` — do not write `gaps.md` (gap-section remains inside
`overview.md`).
**New stats file:** `${CLAUDE_PLUGIN_DATA}/ultra-cc-architect-local-stats.jsonl`.
### Changed — `/ultraplan-local` auto-discovers architecture notes
Brief §5 of the `ultra-cc-architect` design said "no changes to existing
commands". This release makes one permitted, documented exception:
`/ultraplan-local` now auto-discovers `{project_dir}/architecture/overview.md`
when running in project mode. The file is additive context — missing file
produces no error, and behavior when the file is absent is identical to
v2.1.0. Non-breaking.
**Minimal edits:**
- `commands/ultraplan-local.md` — Phase 1 `--project` branch sets
`has_architecture_note` / `architecture_note_path` when the file exists.
Phase 3 launch prompt passes `Architecture note: {path or "none"}` to the
orchestrator.
- `agents/planning-orchestrator.md` — Input section documents the new
optional field. Phase 4 synthesis cross-references proposed features with
exploration findings and carries the note's Coverage gaps into Risks and
Mitigations when relevant.
### Non-goals (explicit)
- **Skill-factory is not part of this release.** Cataloging, n-gram
computation, auto-generation from CC changelog, concept extraction from
reference repos — all deferred to a separate development process. Together
with v2.2.0's architect command, that process will eventually land as v3.0.
- No `/ultra-auto` chaining.
- No auto-creation of Forgejo/GitHub issues from gap drafts.
- No changes to `/ultrabrief-local`, `/ultraresearch-local`, or
`/ultraexecute-local`.
## [2.1.0] - 2026-04-18
### Changed — Dynamic, quality-gated interview in `/ultrabrief-local`
The Phase 3 interview is no longer a hardcoded Q1Q8 list with a numeric
cap (34 questions in `--quick`, 58 in default). It is now a
**section-driven completeness loop**: the command maintains per-section
state (Intent, Goal, Success Criteria, Research Plan, and five optional
sections), picks the next question from the section with the weakest
signal, and keeps probing until all four required sections meet an
initial-signal gate. Quality drives the loop, not a counter.
Phase 4 adds a **draft → brief-reviewer → revise** loop. The brief is
drafted in memory, written to `brief.md.draft`, reviewed by the
`brief-reviewer` agent as a stop-gate, and only renamed to `brief.md`
after all five dimensions pass (`completeness/consistency/testability/
scope_clarity ≥ 4` and `research_plan == 5`). If the gate fails, a
targeted follow-up is generated from the weakest dimension's detail
field and the draft is re-reviewed. The loop is capped at 3 review
iterations to bound cost; exhaustion writes the brief with
`brief_quality: partial` and an explicit `## Brief Quality` section.
Force-stop path: when the user says "stop" during Phase 4, the current
review findings are surfaced with per-dimension scores before asking
whether to continue or accept a partial brief. No silent exits.
Not breaking. The `/ultrabrief-local [--quick] <task>` interface is
unchanged from the outside; only internals change. `--quick` now means
"start compact, escalate if gates fail" rather than "max 4 questions".
### Added
- **JSON output from `brief-reviewer`** — the agent now emits a final
fenced `json` block with per-dimension `score` (15) and `detail`
arrays (`gaps`, `issues`, `weak_criteria`, `unclear_sections`,
`invalid_topics`) alongside the existing prose report. The JSON block
is mandatory; empty arrays and `score: 5` are required when a
dimension passes cleanly. `planning-orchestrator` continues to use
the prose verdict unchanged.
- **`brief_quality` frontmatter field** on task briefs — `complete`
(default) when the Phase 4 gate passed, or `partial` when the
iteration cap was hit or the user force-stopped with known issues.
`planning-orchestrator` can inspect this to decide how heavily to
weight brief sections as assumptions.
- **`review_iterations` and `brief_quality` in ultrabrief-stats** —
recorded per run for telemetry.
### Changed
- Hard rule added: `/ultrabrief-local` never writes `brief.md` while the
review gate is pending. The draft lives in `brief.md.draft` until the
loop terminates.
- Hard rule added: no hard cap on Phase 3 questions; the brief-review
gate is the only loop bound (3-iteration cap) and is in Phase 4.
## [2.0.0] - 2026-04-18
### Breaking — Four-command pipeline with dedicated brief step
v2.0.0 introduces `/ultrabrief-local` as a first-class step in the pipeline.
The interview previously embedded inside `/ultraplan-local` has been extracted
into a dedicated command that produces a structured **task brief** — the
contract between user intent and planning. `/ultraplan-local` now requires
a brief as input and no longer conducts interviews.
All artifacts converge into one **project directory**:
`.claude/projects/{YYYY-MM-DD}-{slug}/` contains `brief.md`, `research/NN-*.md`,
`plan.md`, `sessions/`, and `progress.json`. `--project <dir>` works across
`/ultraresearch-local`, `/ultraplan-local`, and `/ultraexecute-local`.
See [MIGRATION.md](MIGRATION.md) for v1 → v2 upgrade guide.
### Breaking changes
- **`/ultraplan-local` requires `--brief <path>` or `--project <dir>`.** Running
without either exits with an error and a pointer to `/ultrabrief-local`.
- **`/ultraplan-local --spec <path>` is removed.** Convert specs to briefs by
adding `## Intent` and `## Research Plan` sections (see MIGRATION.md).
- **Interview inside `/ultraplan-local` is removed.** Planning no longer asks
questions — all intent must be captured in the brief upstream.
- **`spec-reviewer` agent renamed to `brief-reviewer`** with a new 5th dimension
(Research Plan validity). Old spec-reviewer file is deleted.
### Added
- **`/ultrabrief-local` command** — interactive interview (3-8 questions with
adaptive depth) that produces a task brief with explicit research plan.
Features: project directory creation, research topic identification with
copy-paste-ready `/ultraresearch-local` commands, optional auto-orchestration
(Claude runs research + plan in foreground), and stats tracking.
- **`templates/ultrabrief-template.md`** — canonical task brief format with
`## Intent`, `## Goal`, `## Non-Goals`, `## Constraints / Preferences / NFRs`,
`## Success Criteria`, `## Research Plan` (N topics with research_question,
scope, confidence, cost), and `## Open Questions / Assumptions`.
- **`brief-reviewer` agent** — renamed from spec-reviewer with a new
5th dimension: Research Plan validity (each topic has a valid
research question ending in `?`, has `Required for plan steps:` and
`Confidence needed:`, and research files exist when `auto_research: true`).
- **`--project <dir>` flag** on `/ultraresearch-local`, `/ultraplan-local`,
and `/ultraexecute-local`. Single directory holds the full pipeline's
artifacts. `/ultraresearch-local --project` auto-increments
`{dir}/research/NN-slug.md`.
- **Two-kinds-of-briefs terminology** documented across README and CLAUDE.md:
"task brief" (from `/ultrabrief-local`) vs "research brief" (from
`/ultraresearch-local`). Prefix used consistently in agent prompts and docs.
- **MIGRATION.md** — step-by-step guide for upgrading v1 projects to v2.
### Changed
- **`planning-orchestrator`** now accepts `Brief file:` input instead of
`Spec file:`. Intent→Context mapping: brief's `## Intent` + `## Goal` feed
the plan's Context section directly (structured, no inference needed).
Phase 1b now uses `brief-reviewer` instead of `spec-reviewer`. With
`Project dir:` in input, writes plan to `{dir}/plan.md`.
- **`/ultraresearch-local`** supports `--project <dir>` with auto-indexed
output path (`{dir}/research/NN-slug.md`, where NN is the next available
two-digit index).
- **`/ultraexecute-local`** supports `--project <dir>`. Reads `{dir}/plan.md`,
writes progress to `{dir}/progress.json`.
- **plugin.json** description rewritten to reflect four-command pipeline.
### Removed
- `/ultraplan-local --spec <path>` flag. Spec files are not a valid input for
v2.0+. Convert to brief via `/ultrabrief-local` or manual conversion (see
MIGRATION.md).
- Interview Phase in `/ultraplan-local` (was Phase 2). Use `/ultrabrief-local`
to conduct the interview upstream.
- `agents/spec-reviewer.md` file. Replaced by `agents/brief-reviewer.md`.
### Rationale
The v1.x interview inside `/ultraplan-local` conflated two concerns: capturing
intent and producing an executable plan. Briefs and plans have different
lifecycles — a brief should be reviewable and editable before any research or
planning starts, because every downstream decision traces back to it. Extracting
the brief into its own command makes the pipeline more honest: the brief is the
source of truth for *what we want*, research briefs are sources of truth for
*what we learned*, and the plan is the contract for *how we'll do it*. Separating
these makes each artifact reviewable on its own terms and enables deterministic
re-planning from the same brief when research reveals new constraints.
The explicit `## Research Plan` section in briefs closes a common gap: plans
were implicitly assuming knowledge that neither the user nor Claude had verified.
Research topics are now declared upfront, scoped, and traceable back to plan
decisions.
## [1.8.0] - 2026-04-17
### Opus 4.7 prompt literalism — closing the schema-drift gap
Opus 4.7 reads agent instructions more literally than 4.6 (per 4.7 system
card §6.3.1.1). The v1.7 planning-orchestrator described the Step+Manifest
schema via prose + procedural rules ("read the template"), which 4.6
inferred correctly but 4.7 sometimes rendered as narrative "Fase N" prose.
The result: plans that executed cleanly on 4.6 were rejected by
ultraexecute Phase 2 parsing on 4.7 — first observed during v6.2.0 planning
for `llm-security`. v1.8.0 closes the gap by replacing prose rules with a
literal copyable template, explicit forbidden-format clauses, and a
pre-handoff schema self-check.
### Added
- **Inline literal Step+Manifest template** in `planning-orchestrator`
Phase 5 — a complete, copyable example (JWT middleware step) replaces
"read the template" prose. Removes ambiguity about heading format,
field order, and manifest YAML structure.
- **Forbidden heading-format clause** in Phase 5 — explicit denylist for
`## Fase N`, `### Phase N`, `### Stage N`, and other narrative formats
the executor cannot parse. Negative constraints land harder on 4.7.
- **Phase 5.5 schema self-check** in `planning-orchestrator` — after
writing the plan, grep-verify canonical `### Step N:` count matches
`manifest:` count, and narrative heading count is zero. Rewrite plan
if self-check fails, before handing to plan-critic.
- **`--validate` mode** in `ultraexecute-local` — schema-only check that
parses steps and manifests, reports `READY | FAIL` with specific
error hints, and exits without security scan or execution. Intended
as a fast sanity-check between `/ultraplan-local` and full execution:
```bash
/ultraplan-local "task"
/ultraexecute-local --validate <plan>.md # READY or actionable FAIL
/ultraexecute-local <plan>.md # full execution
```
### Changed
- `planning-orchestrator` Phase 5 now embeds the canonical Step template
inline (~60 lines of literal example) rather than referring to
`templates/plan-template.md`. Template file remains authoritative for
cross-referencing but is no longer load-bearing for plan generation.
- `ultraexecute-local` Phase 2.3 added as a hard exit point for
`--validate` mode; Phase 2.4 security scan explicitly skips this mode.
### Rationale
v1.7.0's self-verifying chain assumed the orchestrator reliably produces
the v1.7 schema. That held on 4.6. v1.8.0 makes the assumption robust to
4.7-style literal interpretation by moving from "describe the format" to
"show the exact format and forbid alternatives", plus a self-check loop
before human-visible output. Pairs with `--validate` as a user-facing
verification step that catches any residual drift before execution side
effects begin.
## [1.7.0] - 2026-04-12
### The self-verifying plan chain
Wave 1 of a parallel 6-session build revealed three failure modes: (1) a
session reported `status=completed` after only 2/5 steps — last tool call
was an arbitrary file review, not a completion check; (2) 3/6 sessions
had push blocked inside the sub-agent bash sandbox *after* all work was
done; (3) plans and blueprints were prose, so the orchestrator had no
machine-readable way to verify completion. v1.7.0 closes all three by
making the plan itself an executable contract.
### Added
- **Per-step verification manifest** in plan format (`plan_version: 1.7`).
Every step now ends with a YAML `manifest:` block declaring
`expected_paths`, `min_file_count`, `commit_message_pattern`,
`bash_syntax_check`, `forbidden_paths`, `must_contain`. The manifest is
the objective completion predicate — the Verify command is necessary but
not sufficient.
- **Plan-critic dimension 10 — Manifest quality (hard gate).** Missing
or invalid manifest (unparseable regex, path contradiction, missing
block) is a `major` finding. v1.6 plans get a legacy-mode warning
instead of a block.
- **Session Manifest aggregate** in session specs — synthesized by
`session-decomposer` as the union of per-step manifests. Gives
`ultraexecute-local` a single YAML block per session to audit against.
- **Step 0: Sandbox pre-flight** — obligatory first step in every
generated session spec. Runs `git push --dry-run origin HEAD`; exit 77
= sandbox cannot push, session status becomes `blocked` (not `failed`),
no real work attempted. Escape hatch: `ULTRAEXECUTE_SKIP_PREFLIGHT=1`.
- **Launch script pre-flight** — `headless-launch-template.md` adds a
`git push --dry-run` check outside the sandbox, before any session
spawns, catching credential issues at the earliest possible point.
- **Phase 7.5 — Manifest audit (independent).** After all steps complete,
`ultraexecute-local` re-verifies expected paths, commit count, commit
message patterns, bash syntax, and forbidden-path untouched-ness from
git log and filesystem. Agent's own bookkeeping is ignored. Disagreement
with progress file → status overridden to `partial`.
- **Phase 7.6 — Recovery dispatch (bounded).** When Phase 7.5 detects
drift in multi-session parent context, synthesize a temp session spec
containing only missing steps and dispatch via existing
`claude -p "/ultraexecute-local --session N"`. `recovery_depth ≤ 2`
hard cap — third drift escalates to user.
- **Hard Rule 17: Manifest is the completion predicate.** A step may
not be marked passed if its manifest does not verify, regardless of
Verify's exit code.
- **Hard Rule 18: Last-activity rule.** Executor's final tool call
before Phase 8 must be a manifest check, never an arbitrary file
review. Prevents hallucinated completion.
### Changed
- **Plan template (`templates/plan-template.md`)** — adds
`plan_version: 1.7` metadata line, `Manifest:` field on every step,
"Manifest — objective completion predicate" section.
- **Plan-critic scoring** rebalanced: Headless readiness 0.15 → 0.10,
Manifest quality 0.05 added. Legacy v1.6 plans skip the Manifest
dimension and keep Headless readiness at 0.15.
- **Planning-orchestrator Phase 5** adds "Manifest generation rules
(REQUIRED for every step)" with mechanical derivation from `Files:`
and Checkpoint. Validates regex compilation and path existence before
handoff to plan-critic.
- **Session-decomposer** parses plan manifests and propagates them
verbatim into session specs. For v1.7+ plans with missing manifests:
abort with pointer to failing step. For legacy v1.6 plans: synthesize
minimal manifests and flag `legacy_synthesis: true`.
- **ultraexecute-local Phase 2** parses manifest YAML. Ugyldig YAML =
abort with pointer to step. v<1.7 plans: synthesize + log
`legacy_plan: true`.
- **ultraexecute-local Phase 6** — sub-step D renamed to D1 "Command
verification"; new D2 "Manifest verification" runs after D1 with 5
checks. F "Checkpoint" adds `checkpoint_drift` logging when HEAD
message doesn't match `commit_message_pattern` (non-fatal).
- **Phase 8 report** — table gets Manifest column; JSON summary adds
`plan_version`, `manifest_audit`, `drift_details`, `recovery_dispatched`,
`recovery_depth`, `legacy_plan`. Result vocabulary strict:
`completed | partial | blocked | failed | stopped`.
- **Division of labor clarified** in README — `/ultraresearch-local`
gathers context (no decisions), `/ultraplan-local` transforms intent
into an executable contract (manifests, plan-critic gate),
`/ultraexecute-local` executes the contract disciplined (does NOT
compensate for weak plans — escalates).
## [1.6.0] - 2026-04-08
### Added
- **`/ultraresearch-local` command** — deep research combining local codebase analysis
with external knowledge. Produces structured research briefs with triangulation,
confidence ratings, and source quality assessment. Supports modes: default (background),
`--quick` (inline), `--local` (codebase only), `--external` (web only), `--fg` (foreground).
- **6 new agents** for the research pipeline:
- `research-orchestrator` (opus) — runs full research pipeline as background task
- `docs-researcher` (sonnet) — official documentation via Tavily, WebSearch, Microsoft Learn
- `community-researcher` (sonnet) — real-world experience from issues, blogs, discussions
- `security-researcher` (sonnet) — CVEs, audit history, supply chain risks
- `contrarian-researcher` (sonnet) — counter-evidence and overlooked alternatives
- `gemini-bridge` (sonnet) — independent second opinion via Gemini Deep Research MCP
- **Research brief template** (`templates/research-brief-template.md`) — structured format
with dimensions, confidence ratings, triangulation, and source quality assessment.
- **`--research` flag for `/ultraplan-local`** — accepts up to 3 research brief paths.
Enriches the interview (focuses on decisions, not facts) and injects brief context into
exploration agents. Research-scout skips already-covered technologies.
- **Research-aware planning orchestrator** — `planning-orchestrator.md` now accepts research
briefs, injects summaries into sub-agent prompts, and cross-references brief findings
during synthesis.
- **Research settings** in `settings.json` — configurable Gemini bridge (enabled/timeout),
interview depth, dimension limits, and stats tracking.
### Changed
- Plugin description and keywords updated to reflect research capabilities.
- CLAUDE.md expanded with ultraresearch command, modes, agents, architecture, and state.
## [1.5.0] - 2026-04-07
### Fixed
- **CRITICAL: Parallel session data loss** — Phase 2.6 ran parallel `claude -p` sessions
in the same working directory, causing git race conditions and repository corruption.
Each parallel session now runs in its own git worktree with isolated branch, index,
and working files. Branches are merged back sequentially after each wave completes.
### Added
- **Phase 2.55 (Pre-flight safety checks)** — validates clean working tree, committed
plan file, no scope fence overlaps between parallel sessions, and no stale worktrees
before launching parallel execution.
- **Git worktree isolation** for all parallel sessions — one branch per session
(`ultraplan/{slug}/session-{N}`), merged with `--no-ff` after wave completion.
- **Merge conflict detection** — if merging a session branch produces conflicts, the merge
is aborted and conflicting files are reported. No silent data loss.
- **Unconditional worktree cleanup** — worktrees and session branches are always removed,
even on failure. Manual cleanup commands are reported if automated cleanup fails.
- **Hard rules 11-13** — worktree isolation mandatory, cleanup unconditional, merge
sequentially with conflict abort.
- **Session-scoped progress file naming** — `--session N` uses
`.ultraexecute-progress-{slug}-session-{N}.json` to prevent merge conflicts.
### Changed
- Headless launch template uses git worktrees with `cleanup_worktrees` trap on EXIT,
clean-tree pre-flight check, and sequential merge after each wave.
- Phase 2.6 rewritten with 5-step worktree lifecycle: create → launch → wait → merge → cleanup.
## [1.4.0] - 2026-04-06
### Renamed
- **`/ultraexecute` → `/ultraexecute-local`** — renamed for namespace consistency with `/ultraplan-local` and future-proofing against potential Anthropic naming. File: `commands/ultraexecute.md` → `commands/ultraexecute-local.md`. Note: `ultraexecute_summary` JSON key and `ultraexecute-stats.jsonl` filename are unchanged for backward compatibility.
### Added
- **`convention-scanner` agent** (sonnet) — dedicated agent for discovering coding conventions: naming, directory layout, import style, error handling, test patterns, git commit style, documentation patterns. Replaces inline Explore agent prompt for medium+ codebases.
- **Success Criteria section** in spec template — falsifiable "definition of done" conditions that the spec-reviewer validates and ultraexecute-local uses for verification.
- **Dry-run multi-session preview** — `--dry-run` now shows session groupings, wave structure, billing status, and `claude -p` commands when plan has an Execution Strategy.
- **External verification rule** in headless launch template — wave verification must run commands independently, never parse session logs as proof.
- **Billing preamble** in headless launch template — `unset ANTHROPIC_API_KEY` prevents accidental API billing.
- **Phase mapping comment** in planning-orchestrator — documents how orchestrator phases 1-7 map to command phases 4-10.
### Fixed
- **`git add -A` in escalation** — replaced with targeted staging of only files from completed steps. Prevents staging secrets, binaries, or unrelated work.
- **False `background: true` claim** — command documentation incorrectly stated the orchestrator has `background: true` in its frontmatter. Corrected to explain `run_in_background` on the Agent tool.
### Changed
- Execution Strategy reconciliation in session-decomposer — respects existing `## Execution Strategy` as input instead of re-analyzing from scratch. Warns on file-overlap conflicts.
- Headless launch template uses `--dangerously-skip-permissions` instead of `--allowedTools` for more robust headless execution.
- Session-decomposer updated with `--dangerously-skip-permissions` and `unset ANTHROPIC_API_KEY` for generated scripts.
- Convention Scanner references in command and orchestrator updated to use dedicated plugin agent.
- ROADMAP.md translated from Norwegian to English.
- plugin.json: added homepage, repository, license, keywords. Version bumped to 1.4.0.
- README badge updated to v1.4.0.
## [1.3.0] - 2026-04-06
### Added
- **Session-aware parallel execution** — `/ultraexecute` auto-detects `## Execution Strategy` in plans and orchestrates multi-session parallel execution via `claude -p`. No manual `bash launch.sh` required.
- **`--fg` flag** — force foreground sequential execution, ignoring Execution Strategy
- **`--session N` flag** — execute only session N from the plan's Execution Strategy (used by child processes)
- **Phase 2.5 (Execution strategy decision)** — determines single-session vs multi-session mode
- **Phase 2.6 (Multi-session orchestration)** — launches parallel `claude -p` sessions per wave, waits for completion, aggregates results
- **Execution Strategy in plan template** — new `## Execution Strategy` section with sessions, waves, scope fences, and execution order. Generated by planning-orchestrator for plans with > 5 steps.
- **Execution Strategy generation in planning-orchestrator** — Phase 5 analyzes step file-overlap to build dependency graph, groups connected components into sessions of 35 steps, and organizes sessions into parallel waves.
### Changed
- planning-orchestrator Phase 5 extended with Execution Strategy generation logic
- ultraplan-local Phase 8 now lists Execution Strategy as 10th required plan section
- Plan template includes `## Execution Strategy` section template with grouping rules
- CLAUDE.md updated with new ultraexecute modes and architecture
- plugin.json version bumped to 1.3.0
## [1.2.0] - 2026-04-06
### Added
- **`/ultraexecute` command** — disciplined plan executor with 9-phase workflow. Reads an ultraplan or session spec, executes steps sequentially with strict failure recovery, tracks progress for resume, and reports results in machine-parseable JSON.
- 4 modes: default (execute), `--resume` (continue from checkpoint), `--dry-run` (validate without executing), `--step N` (single step)
- Per-step protocol: implement → verify → on-failure handling → checkpoint
- Failure recovery from plan's On failure clauses (revert/retry/skip/escalate)
- 3-attempt retry cap per step (initial + 2 retries)
- Progress file (`.ultraexecute-progress-{slug}.json`) for crash recovery and resume
- Entry/exit condition checking for session specs
- Scope fence enforcement for session specs (never-touch file protection)
- JSON summary block in output for headless log parsing
- Stats tracking to `ultraexecute-stats.jsonl`
### Changed
- CLAUDE.md restructured with two commands table (plan + execute)
- plugin.json version bumped to 1.2.0
## [1.1.0] - 2026-04-06
### Added
- **`--decompose` mode** — splits an existing plan into self-contained headless sessions. Analyzes step dependencies, groups steps into sessions of 35 steps each, identifies parallel execution waves, and generates session specs + dependency graph + launch script.
- **`--export headless` format** — shortcut for `--decompose`. Produces the same session decomposition output.
- **session-decomposer agent** (sonnet) — dedicated agent for plan decomposition. Parses step dependencies, builds dependency graph, groups steps into sessions, generates session specs with scope fences and failure handling.
- **Session spec template** (`templates/session-spec-template.md`) — defines the format for individual session specs: context, scope fence, steps, entry/exit conditions, failure handling, handoff state.
- **Headless launch template** (`templates/headless-launch-template.md`) — template for generating bash launch scripts that execute sessions in parallel waves using `claude -p`.
- **Failure recovery per step** — plan template now includes `On failure:` (revert/retry/skip/escalate) and `Checkpoint:` (git commit) fields for every implementation step.
- **Headless readiness dimension** in plan-critic — new 9th review dimension checking for On failure clauses, Checkpoint fields, and circuit breakers. Weighted at 0.15 in the quality score.
### Changed
- Plan-critic scoring rebalanced: 6 dimensions (was 5), weights adjusted to accommodate headless readiness
- Plan template step format extended with On failure and Checkpoint fields
- Planning-orchestrator Phase 5 updated with failure recovery generation requirements
- CLAUDE.md updated with new agent, modes, and state paths
## [1.0.0] - 2026-04-06
### Added
- **`--quick` mode** — skips exploration agent swarm. Runs interview → lightweight Glob/Grep scan → planning → adversarial review. For when the developer knows the codebase and needs structure, not cartography.
- **`--export` mode** — generates shareable output from an existing plan file. Three formats: `pr` (PR description), `issue` (issue comment), `markdown` (clean plan without internal metadata).
- **task-finder three-tier categorization** — findings categorized as Must-change (must be modified), Must-respect (contract that must not break), or Reference (context/reuse). Replaces flat file list.
- **Adaptive interview depth** — interview adapts to answer quality. Detailed answers trigger fewer, more targeted questions. Short/uncertain answers trigger simpler questions with offered alternatives.
- **Complete `plugin.json` metadata** — author, homepage, repository, license, keywords added.
- **README badges** — version, license, and platform badges.
- **Known limitations section in README** — IaC projects (Terraform, Helm, Pulumi, CDK) get reduced value from exploration agents.
- **Forgejo issue templates** — bug report and feature request YAML templates.
- **CONTRIBUTING.md** — rewritten for honest solo-project model.
### Changed
- plugin.json version bumped to 1.0.0
- Command header updated to Ultraplan Local v1.0
- Orchestrator accepts `mode: quick` in prompt for lightweight scanning path
## [0.4.0] - 2026-04-06
### Added
- **3 new agents** for information-complete planning:
- `task-finder` — dedicated agent for finding task-relevant files, functions, types, and reuse candidates. Replaces inline Explore agent.
- `git-historian` — analyzes git log, blame, active branches, code ownership, and hot files for planning context.
- `spec-reviewer` — reviews spec quality (completeness, consistency, testability, scope clarity) before exploration begins. New Phase 1b/4b.
- **Plan scoring** — plan-critic produces a quantitative quality score (0100) across 5 weighted dimensions with letter grades (AD) and verdicts (APPROVE/REVISE/REPLAN).
- **No-placeholder rule** — plan-critic flags TBD, TODO, vague instructions, and underspecified steps as unconditional blockers. 3+ blockers = REPLAN regardless of score.
- **`[ASSUMPTION]` marking** — planning-orchestrator marks all unverifiable claims and warns when >3 assumptions exist.
### Changed
- **All agents run for all codebase sizes.** Small codebases get the same 6 core agents as large ones. Agent turns scale down for small codebases instead of dropping agents entirely.
- Phase 4b (spec review) added before exploration in both command and orchestrator.
- Orchestrator Phase 2 agent table expanded: 6 always + 1 conditional + 1 medium-only.
- Plan-critic review checklist expanded with no-placeholder checks (section 7) and scoring output.
- Orchestrator rules updated with assumption-marking and no-placeholder requirements.
## [0.3.0] - 2026-04-05
### Added
- **planning-orchestrator agent** — dedicated background agent (`background: true`) that handles Phases 410 autonomously. Replaces generic background agent spawning with a purpose-built orchestrator running on Opus with `maxTurns: 50`.
- **`effort` and `maxTurns` on all agents** — fine-grained cost and depth control:
- Exploration agents: `effort: medium`, `maxTurns: 1520`
- Review agents (plan-critic, scope-guardian): `effort: high`, `maxTurns: 10`
- Research-scout: `effort: medium`, `maxTurns: 10`
- **Plugin `settings.json`** — default configuration for mode, research, agent counts, interview limits, and team settings. Users can override in their own settings.
- **Worktree isolation for Agent Teams** — team members use `isolation: "worktree"` to prevent file conflicts during parallel implementation
- **Session tracking** (Phase 12) — writes JSONL records to `${CLAUDE_PLUGIN_DATA}/ultraplan-stats.jsonl` with task metadata, agent counts, review verdicts, and outcomes
### Changed
- Phase 3 now launches the `planning-orchestrator` agent instead of a generic background agent
- Agent Team implementation uses worktree isolation by default
## [0.2.0] - 2026-04-05
### Added
- **Interview phase** — iterative requirements gathering with AskUserQuestion before exploration. Produces a spec file that feeds into planning.
- **7 specialized agents** in `agents/` directory:
- `architecture-mapper` — deep architecture analysis, anti-patterns, smell detection
- `dependency-tracer` — import-chain following, data-flow analysis, side-effect catching
- `test-strategist` — test strategy design based on existing patterns
- `risk-assessor` — threat modeling, edge cases, failure modes
- `plan-critic` — dedicated adversarial reviewer with hardcoded critical perspective
- `scope-guardian` — scope creep and scope gap detection
- `research-scout` — external research via WebSearch/Tavily for unfamiliar technologies
- **External research capability** — research-scout agent searches documentation, known issues, and best practices when the task involves external/unfamiliar technology
- **Background mode** — default mode runs interview in foreground, then plans in background. User is notified when done.
- **Spec-driven mode** (`--spec`) — skip interview, provide a pre-written spec file, plan entirely in background
- **Foreground mode** (`--fg`) — all phases in foreground, blocks session (v0.1.0 behavior)
- **Agent Team support** — when plan has 3+ independent steps, offers parallel implementation via Agent Teams
- **Spec template** in `templates/spec-template.md`
- **Research Sources section** in plan template for citing external research
- **Dual adversarial review** — plan-critic and scope-guardian run in parallel
### Changed
- Exploration agents replaced with named specialized agents from `agents/` directory
- Agent count scales with codebase: 3 (small), 5 (medium), 7 (large)
- Plan template extended with Research Sources and external tech fields
- Handoff phase supports "execute with team" option
- Command workflow expanded from 9 to 11 phases
## [0.1.0] - 2026-04-05
### Added
- Initial release
- `/ultraplan` slash command with 6-phase workflow
- Parallel Sonnet exploration (3 agents: architecture, task-relevant, conventions)
- Opus-driven plan generation from structured template
- Plan refinement loop with execute/save handoff
- Plan template with context, analysis, steps, alternatives, risks, verification
- Cross-platform support (Mac, Linux, Windows) — pure markdown, no scripts