feat(ultraplan-local): v2.1.0 — dynamic quality-gated interview

Replace hardcoded Q1-Q8 in /ultrabrief-local with a section-driven
completeness loop (Phase 3) and a draft/review/revise loop with
brief-reviewer as stop-gate (Phase 4). Quality drives the interview,
not a question counter.

brief-reviewer now emits a machine-readable JSON block with per-dimension
scores (1-5) and detail arrays alongside the existing prose report;
planning-orchestrator continues to consume the prose verdict unchanged.

Phase 4 gate: all dimensions >= 4 AND research_plan = 5. On fail, a
targeted follow-up is generated from the weakest dimension's detail
field and the draft is re-reviewed. Max 3 review iterations bound cost;
exhaustion writes brief.md with brief_quality: partial and an explicit
Brief Quality section. Force-stop surfaces per-dimension findings before
the user chooses continue or partial.

Not breaking. /ultrabrief-local [--quick] <task> interface unchanged.
--quick now means compact start with escalation, not a max-N cap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Kjell Tore Guttormsen 2026-04-18 09:43:43 +02:00
commit 1634197853
8 changed files with 501 additions and 120 deletions

View file

@ -12,7 +12,7 @@ plugins/
llm-security/ v6.0.0 — Security scanning, auditing, threat modeling llm-security/ v6.0.0 — Security scanning, auditing, threat modeling
ms-ai-architect/ v1.8.0 — Microsoft AI architecture (Cosmo Skyberg persona) ms-ai-architect/ v1.8.0 — Microsoft AI architecture (Cosmo Skyberg persona)
okr/ v1.0.0 — OKR guidance for Norwegian public sector okr/ v1.0.0 — OKR guidance for Norwegian public sector
ultraplan-local/ v2.0.0 — Brief, research, plan, execute (four-command pipeline) ultraplan-local/ v2.1.0 — Brief, research, plan, execute (four-command pipeline)
``` ```
Hvert plugin er selvstendig med egen CLAUDE.md, README, hooks, agents og commands. Hvert plugin er selvstendig med egen CLAUDE.md, README, hooks, agents og commands.

View file

@ -61,20 +61,20 @@ Key commands: `/config-audit posture`, `/config-audit feature-gap`, `/config-aud
--- ---
### [Ultra {brief | research | plan | execute} - local](plugins/ultraplan-local/) `v2.0.0` ### [Ultra {brief | research | plan | execute} - local](plugins/ultraplan-local/) `v2.1.0`
Deep requirements gathering, research, implementation planning, and self-verifying execution with specialized agent swarms, adversarial review, and failure recovery. Deep requirements gathering, research, implementation planning, and self-verifying execution with specialized agent swarms, adversarial review, and failure recovery.
Four commands, one pipeline with clear division of labor: Four commands, one pipeline with clear division of labor:
- **`/ultrabrief-local`** — Capture intent. Interactive interview (3-8 adaptive questions) produces a task brief with explicit research plan. Identifies research topics with copy-paste-ready `/ultraresearch-local` commands. Optional auto-orchestration runs research + planning in foreground. Always interactive. - **`/ultrabrief-local`** — Capture intent. Dynamic, quality-gated interview: a section-driven completeness loop (Phase 3) followed by a `brief-reviewer` stop-gate (Phase 4, max 3 review iterations). Required sections must reach an initial-signal gate AND pass review across completeness, consistency, testability, scope clarity, and research-plan validity before `brief.md` is written. Identifies research topics with copy-paste-ready `/ultraresearch-local` commands. Optional auto-orchestration runs research + planning in foreground. Always interactive.
- **`/ultraresearch-local`** — Gather context. Deep multi-source research with triangulation: 5 local agents + 4 external agents + Gemini bridge, producing structured briefs with confidence ratings. Makes no build decisions. - **`/ultraresearch-local`** — Gather context. Deep multi-source research with triangulation: 5 local agents + 4 external agents + Gemini bridge, producing structured briefs with confidence ratings. Makes no build decisions.
- **`/ultraplan-local`** — Transform intent into an executable contract. Per-step YAML manifests (`expected_paths`, `commit_message_pattern`, `bash_syntax_check`). Plan-critic is a hard gate on manifest quality. Requires a task brief as input (`--brief` or `--project`). - **`/ultraplan-local`** — Transform intent into an executable contract. Per-step YAML manifests (`expected_paths`, `commit_message_pattern`, `bash_syntax_check`). Plan-critic is a hard gate on manifest quality. Requires a task brief as input (`--brief` or `--project`).
- **`/ultraexecute-local`** — Execute the contract disciplined. Manifest-based verification, independent Phase 7.5 audit from git log + filesystem (ignores agent bookkeeping), Phase 7.6 bounded recovery dispatch for missing steps. Step 0 pre-flight catches sandbox push-denial before any work. `--validate` mode offers a fast schema-only sanity-check between planning and execution. - **`/ultraexecute-local`** — Execute the contract disciplined. Manifest-based verification, independent Phase 7.5 audit from git log + filesystem (ignores agent bookkeeping), Phase 7.6 bounded recovery dispatch for missing steps. Step 0 pre-flight catches sandbox push-denial before any work. `--validate` mode offers a fast schema-only sanity-check between planning and execution.
All artifacts land in one project directory: `.claude/projects/{YYYY-MM-DD}-{slug}/` contains `brief.md`, `research/NN-*.md`, `plan.md`, `sessions/`, and `progress.json`. `--project <dir>` works across `/ultraresearch-local`, `/ultraplan-local`, and `/ultraexecute-local`. All artifacts land in one project directory: `.claude/projects/{YYYY-MM-DD}-{slug}/` contains `brief.md`, `research/NN-*.md`, `plan.md`, `sessions/`, and `progress.json`. `--project <dir>` works across `/ultraresearch-local`, `/ultraplan-local`, and `/ultraexecute-local`.
v2.0 extracts interview from planning: briefs are now reviewable artifacts with explicit intent, research plan, and falsifiable success criteria — each a contract that downstream agents (`brief-reviewer`, `plan-critic`, `scope-guardian`) can validate independently. Breaking change: `/ultraplan-local` no longer runs an interview and requires `--brief` or `--project`. See `plugins/ultraplan-local/MIGRATION.md`. v2.1 (non-breaking) replaces the hardcoded Q1Q8 interview with a dynamic, quality-gated loop; `brief-reviewer` now emits machine-readable per-dimension JSON scores so `/ultrabrief-local` can use it as an internal stop-gate. v2.0 extracts the interview from planning: briefs are reviewable artifacts with explicit intent, research plan, and falsifiable success criteria — each a contract that downstream agents (`brief-reviewer`, `plan-critic`, `scope-guardian`) can validate independently. v2.0 is breaking: `/ultraplan-local` no longer runs an interview and requires `--brief` or `--project`. See `plugins/ultraplan-local/MIGRATION.md`.
v1.7 self-verifying chain (preserved): a step may not be marked `completed` unless its manifest verifies. v1.8 Opus 4.7 literalism fixes (preserved): literal Step+Manifest template, forbidden narrative headers, schema self-check. v1.7 self-verifying chain (preserved): a step may not be marked `completed` unless its manifest verifies. v1.8 Opus 4.7 literalism fixes (preserved): literal Step+Manifest template, forbidden narrative headers, schema self-check.

View file

@ -1,7 +1,7 @@
{ {
"name": "ultraplan-local", "name": "ultraplan-local",
"description": "Four-command context-engineering pipeline (brief → research → plan → execute) with project folders, specialized agent swarms, external research triangulation, adversarial review, session decomposition, and headless execution.", "description": "Four-command context-engineering pipeline (brief → research → plan → execute) with project folders, specialized agent swarms, external research triangulation, adversarial review, session decomposition, and headless execution.",
"version": "2.0.0", "version": "2.1.0",
"author": { "author": {
"name": "Kjell Tore Guttormsen" "name": "Kjell Tore Guttormsen"
}, },

View file

@ -4,6 +4,61 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [2.1.0] - 2026-04-18
### Changed — Dynamic, quality-gated interview in `/ultrabrief-local`
The Phase 3 interview is no longer a hardcoded Q1Q8 list with a numeric
cap (34 questions in `--quick`, 58 in default). It is now a
**section-driven completeness loop**: the command maintains per-section
state (Intent, Goal, Success Criteria, Research Plan, and five optional
sections), picks the next question from the section with the weakest
signal, and keeps probing until all four required sections meet an
initial-signal gate. Quality drives the loop, not a counter.
Phase 4 adds a **draft → brief-reviewer → revise** loop. The brief is
drafted in memory, written to `brief.md.draft`, reviewed by the
`brief-reviewer` agent as a stop-gate, and only renamed to `brief.md`
after all five dimensions pass (`completeness/consistency/testability/
scope_clarity ≥ 4` and `research_plan == 5`). If the gate fails, a
targeted follow-up is generated from the weakest dimension's detail
field and the draft is re-reviewed. The loop is capped at 3 review
iterations to bound cost; exhaustion writes the brief with
`brief_quality: partial` and an explicit `## Brief Quality` section.
Force-stop path: when the user says "stop" during Phase 4, the current
review findings are surfaced with per-dimension scores before asking
whether to continue or accept a partial brief. No silent exits.
Not breaking. The `/ultrabrief-local [--quick] <task>` interface is
unchanged from the outside; only internals change. `--quick` now means
"start compact, escalate if gates fail" rather than "max 4 questions".
### Added
- **JSON output from `brief-reviewer`** — the agent now emits a final
fenced `json` block with per-dimension `score` (15) and `detail`
arrays (`gaps`, `issues`, `weak_criteria`, `unclear_sections`,
`invalid_topics`) alongside the existing prose report. The JSON block
is mandatory; empty arrays and `score: 5` are required when a
dimension passes cleanly. `planning-orchestrator` continues to use
the prose verdict unchanged.
- **`brief_quality` frontmatter field** on task briefs — `complete`
(default) when the Phase 4 gate passed, or `partial` when the
iteration cap was hit or the user force-stopped with known issues.
`planning-orchestrator` can inspect this to decide how heavily to
weight brief sections as assumptions.
- **`review_iterations` and `brief_quality` in ultrabrief-stats** —
recorded per run for telemetry.
### Changed
- Hard rule added: `/ultrabrief-local` never writes `brief.md` while the
review gate is pending. The draft lives in `brief.md.draft` until the
loop terminates.
- Hard rule added: no hard cap on Phase 3 questions; the brief-review
gate is the only loop bound (3-iteration cap) and is in Phase 4.
## [2.0.0] - 2026-04-18 ## [2.0.0] - 2026-04-18
### Breaking — Four-command pipeline with dedicated brief step ### Breaking — Four-command pipeline with dedicated brief step

View file

@ -17,10 +17,10 @@ Deep implementation planning and research with an explicit brief step, specializ
| Flag | Behavior | | Flag | Behavior |
|------|----------| |------|----------|
| _(default)_ | Full interview (5-8 questions) → brief.md with research plan | | _(default)_ | Dynamic interview until quality gates pass → brief.md with research plan |
| `--quick` | Short interview (3-4 questions) → brief.md with research plan | | `--quick` | Compact start; still escalates if required sections are weak or the brief-review gate fails → brief.md with research plan |
Always interactive. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground). Always interactive. Phase 3 is a section-driven completeness loop (no hard cap on question count); Phase 4 runs a `brief-reviewer` stop-gate with max 3 review iterations. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground).
### /ultraresearch-local modes ### /ultraresearch-local modes
@ -88,7 +88,7 @@ Flags combine: `--project <dir> --local --fg`, `--external --quick`.
## Architecture ## Architecture
**Brief:** 7-phase workflow: Parse mode → Create project dir → Interactive interview (adaptive depth) → Identify research topics → Write `brief.md` → Manual/auto opt-in → Stats. Always interactive. Auto mode blocks foreground until plan is ready. **Brief:** 7-phase workflow: Parse mode → Create project dir → Phase 3 completeness loop (section-driven, no question cap) → Phase 4 draft/review/revise with `brief-reviewer` as stop-gate (max 3 iterations; gate = all dimensions ≥ 4 and research plan = 5) → Finalize (`brief.md` on pass, or `brief_quality: partial` on cap/force-stop) → Manual/auto opt-in → Stats. Always interactive. Auto mode blocks foreground until plan is ready.
**Research:** 8-phase workflow: Parse mode → Interview → Background transition → Parallel research (5 local + 4 external + 1 bridge) → Follow-ups → Triangulation → Synthesis + brief → Stats. With `--project`, writes to `{dir}/research/NN-slug.md`. **Research:** 8-phase workflow: Parse mode → Interview → Background transition → Parallel research (5 local + 4 external + 1 bridge) → Follow-ups → Triangulation → Synthesis + brief → Stats. With `--project`, writes to `{dir}/research/NN-slug.md`.

View file

@ -84,17 +84,18 @@ Or opt into auto-mode in `/ultrabrief-local` — it will launch research and pla
## `/ultrabrief-local` — Brief ## `/ultrabrief-local` — Brief
Interactive requirements-gathering command. Runs a focused interview (3-8 questions depending on mode and adaptive depth) and produces a **task brief** with an explicit research plan. Optionally orchestrates the rest of the pipeline. Interactive requirements-gathering command. Runs a **dynamic, quality-gated interview** and produces a **task brief** with an explicit research plan. Optionally orchestrates the rest of the pipeline.
### How it works ### How it works (v2.1 — quality-gated)
1. **Parse mode**`default` (5-8 questions) or `--quick` (3-4 questions) 1. **Parse mode**`default` (dynamic; probes until quality gates pass) or `--quick` (starts compact, still escalates if gates fail)
2. **Create project directory**`.claude/projects/{YYYY-MM-DD}-{slug}/` with `research/` subdirectory 2. **Create project directory**`.claude/projects/{YYYY-MM-DD}-{slug}/` with `research/` subdirectory
3. **Interview** — one question at a time via `AskUserQuestion`, starting with Intent → Goal → Success Criteria, then optional Non-Goals, Constraints, Preferences, NFRs, Prior Attempts 3. **Phase 3 — Completeness loop** — a section-driven loop (not a fixed question list) picks the next question from the section with the weakest signal (Intent → Goal → Success Criteria → Research Plan first, then optional sections). Required sections must reach an initial-signal gate before Phase 3 exits. No hard cap on question count.
4. **Identify research topics** — probe for unfamiliar tech, version upgrades, security decisions, architectural choices 4. **Identify research topics** — inline during Phase 3; probe for unfamiliar tech, version upgrades, security-sensitive decisions, architectural choices. Topics get a research question, scope, confidence, and cost.
5. **Write brief**`{project_dir}/brief.md` with all sections and a copy-paste-ready research plan 5. **Phase 4 — Draft, review, revise** — draft the brief in memory, write to `brief.md.draft`, launch `brief-reviewer` as a stop-gate. The reviewer scores five dimensions (completeness, consistency, testability, scope clarity, research plan) 15 and returns machine-readable JSON. Gate passes when all ≥ 4 and research plan = 5. On fail, a targeted follow-up is generated from the weakest dimension's detail field and the draft is re-reviewed. Max 3 review iterations.
6. **Auto-orchestration opt-in** — user chooses manual (default) or auto (Claude-managed research + plan in foreground) 6. **Finalize** — rename draft to `brief.md` on pass, or write with `brief_quality: partial` + a `## Brief Quality` section if the cap is hit or the user force-stops.
7. **Stats tracking** — append to `ultrabrief-stats.jsonl` 7. **Auto-orchestration opt-in** — user chooses manual (default) or auto (Claude-managed research + plan in foreground)
8. **Stats tracking** — append to `ultrabrief-stats.jsonl` (includes `review_iterations` and `brief_quality`)
Output: `.claude/projects/{YYYY-MM-DD}-{slug}/brief.md` Output: `.claude/projects/{YYYY-MM-DD}-{slug}/brief.md`
@ -102,11 +103,17 @@ Output: `.claude/projects/{YYYY-MM-DD}-{slug}/brief.md`
| Mode | Usage | Behavior | | Mode | Usage | Behavior |
|------|-------|----------| |------|-------|----------|
| **Default** | `/ultrabrief-local <task>` | Full interview (5-8 questions) | | **Default** | `/ultrabrief-local <task>` | Dynamic interview until quality gates pass. No question cap. |
| **Quick** | `/ultrabrief-local --quick <task>` | Short interview (3-4 questions) | | **Quick** | `/ultrabrief-local --quick <task>` | Starts compact (optional sections get at most one probe), still escalates on weak required sections or failed review gate. |
`/ultrabrief-local` is **always interactive**. There is no foreground/background mode — the interview requires user input. `/ultrabrief-local` is **always interactive**. There is no foreground/background mode — the interview requires user input.
### Force-stop
If you say "stop" or "enough" during Phase 4, the current review findings are surfaced with per-dimension scores and you choose:
- **Answer one more follow-up** — the loop continues.
- **Stop now (accept partial brief)** — the brief is finalized with `brief_quality: partial` and a `## Brief Quality` section listing the failing dimensions. Downstream planning will treat these as reduced-confidence areas.
### What the brief contains ### What the brief contains
- **Intent** — why this matters, motivation, user need (load-bearing) - **Intent** — why this matters, motivation, user need (load-bearing)

View file

@ -150,13 +150,35 @@ Flag as **research-plan invalid** if:
## Rating ## Rating
Rate each dimension: Rate each dimension on two parallel scales:
**Verbal rating** (used in the prose report and the summary table):
- **Pass** — adequate for planning - **Pass** — adequate for planning
- **Weak** — has issues but exploration can proceed with noted risks - **Weak** — has issues but exploration can proceed with noted risks
- **Fail** — must be addressed before exploration (wastes tokens otherwise) - **Fail** — must be addressed before exploration (wastes tokens otherwise)
**Numeric score 15** (used in the machine-readable JSON block):
- **5** — no issues; section is strong
- **4** — minor issues that do not block exploration (maps to Pass)
- **3** — weak but usable; assumptions should be carried (maps to Weak)
- **2** — serious gap; exploration risks wasted work (maps to Fail)
- **1** — section is effectively missing or contradictory (maps to Fail)
Use both. The verbal rating drives the human-readable verdict. The numeric
score drives callers (such as `/ultrabrief-local` Phase 4) that use the
review as a quality gate and need per-dimension granularity.
## Output format ## Output format
Produce **two artifacts in this order**:
1. A prose report (for humans and for `planning-orchestrator` Phase 1b).
2. A final fenced `json` block with per-dimension numeric scores (for callers
that gate on machine-readable output, such as `/ultrabrief-local` Phase 4).
The JSON block MUST be the last fenced block in your output so parsers can
find it by reading the last `json` code fence.
``` ```
## Brief Review ## Brief Review
@ -187,7 +209,42 @@ information that would strengthen the brief. List only if actionable.}
- **{PROCEED}** — brief is adequate for exploration - **{PROCEED}** — brief is adequate for exploration
- **{PROCEED_WITH_RISKS}** — brief has weaknesses; note them as assumptions in the plan - **{PROCEED_WITH_RISKS}** — brief has weaknesses; note them as assumptions in the plan
- **{REVISE}** — brief needs fixes before exploration (list what to fix) - **{REVISE}** — brief needs fixes before exploration (list what to fix)
### Machine-readable scores
```json
{
"completeness": { "score": 1-5, "gaps": [ "{short gap description}", ... ] },
"consistency": { "score": 1-5, "issues": [ "{short issue description}", ... ] },
"testability": { "score": 1-5, "weak_criteria": [ "{quoted weak criterion}", ... ] },
"scope_clarity": { "score": 1-5, "unclear_sections":[ "{section name}", ... ] },
"research_plan": {
"score": 1-5,
"invalid_topics": [
{ "topic": "{topic title}", "issue": "{what is missing or wrong}" }
]
},
"verdict": "PROCEED | PROCEED_WITH_RISKS | REVISE"
}
``` ```
```
### JSON output rules
- The JSON block is mandatory. Emit it even when everything passes — use
empty arrays and `"score": 5` in that case.
- Every dimension key must be present. Do not omit dimensions.
- `score` is an integer 15. Use the mapping in the Rating section.
- Array fields must be strings (or objects in the case of `invalid_topics`)
that are short, concrete, and actionable — never sentences spanning lines.
- `verdict` must match the verbal verdict in the prose section. If the JSON
verdict disagrees with the prose, the caller will fall back to the prose
verdict — but the mismatch is a bug in your output.
- Do not include trailing commas, comments, or non-JSON text inside the
fence. The block must parse with a strict JSON parser.
- If a dimension's score is 4 or 5, its detail array may be `[]`. A score of
3 or below SHOULD populate the detail array so callers can generate
targeted follow-up questions.
## Rules ## Rules

View file

@ -6,7 +6,7 @@ model: opus
allowed-tools: Agent, Read, Glob, Grep, Write, Edit, Bash, AskUserQuestion allowed-tools: Agent, Read, Glob, Grep, Write, Edit, Bash, AskUserQuestion
--- ---
# Ultrabrief Local v2.0 # Ultrabrief Local v2.1
Interactive requirements-gathering command. Produces a **task brief** — a Interactive requirements-gathering command. Produces a **task brief** — a
structured markdown file that declares intent, goal, constraints, and an structured markdown file that declares intent, goal, constraints, and an
@ -33,11 +33,14 @@ foreground if the user opts in.
Parse `$ARGUMENTS`: Parse `$ARGUMENTS`:
1. If arguments start with `--quick`: set **mode = quick**. Interview is 1. If arguments start with `--quick`: set **mode = quick**. The interview
shorter (3-4 questions instead of 5-8). Strip the flag; remainder is starts more compactly (fewer opening probes per section) but still
the task description. escalates automatically if quality gates fail. There is no hard cap on
question count — quality drives the loop, not a counter. Strip the flag;
remainder is the task description.
2. Otherwise: **mode = default**. Full interview (5-8 questions). 2. Otherwise: **mode = default**. Interview probes each section until the
completeness gate (Phase 3) and brief-review gate (Phase 4) both pass.
If no task description is provided, output usage and stop: If no task description is provided, output usage and stop:
@ -46,8 +49,8 @@ Usage: /ultrabrief-local <task description>
/ultrabrief-local --quick <task description> /ultrabrief-local --quick <task description>
Modes: Modes:
default Full interview (5-8 questions) → brief with research plan default Dynamic interview until quality gates pass — brief with research plan
--quick Short interview (3-4 questions) → brief with research plan --quick Compact start; still escalates on weak sections — brief with research plan
Examples: Examples:
/ultrabrief-local Add user authentication with JWT tokens /ultrabrief-local Add user authentication with JWT tokens
@ -86,74 +89,144 @@ If the directory already exists and is non-empty, warn and ask:
Use `AskUserQuestion` with three options. If "pick new slug", ask for a Use `AskUserQuestion` with three options. If "pick new slug", ask for a
new slug and restart Phase 2. new slug and restart Phase 2.
## Phase 3 — Interview ## Phase 3 — Completeness loop
Use `AskUserQuestion` throughout. **Ask one question at a time.** Never Phase 3 is a **section-driven completeness loop**. Instead of a numbered
dump all questions at once. Follow up based on answers. question list, maintain an internal state of brief sections and keep asking
until every required section has substantive content. Quality drives the
loop — there is no hard cap on question count.
### Interview flow Use `AskUserQuestion` for every question. **Ask one question at a time.**
Never dump multiple questions.
**Question 1 (always) — Intent:** ### Internal state
> "What is the motivation for this task? Why does it matter? What happens
> if we don't do it? (The plan will use this to justify every implementation
> decision.)"
**Question 2 (always) — Goal:** Track this structure in memory as the loop runs:
> "What does success look like concretely? Describe the end state in
> 1-3 sentences — specific enough to disagree with."
**Question 3 (always) — Success criteria:** ```
> "How will we verify it's done? List 2-4 specific, testable conditions state = {
> (commands to run, observations, metrics). Avoid 'it works'." intent: { content: "", probes: 0 }, # required
goal: { content: "", probes: 0 }, # required
success_criteria: { content: [], probes: 0 }, # required
research_plan: { topics: [], probes: 0 }, # required
non_goals: { content: [], probes: 0 }, # optional
constraints: { content: [], probes: 0 }, # optional
preferences: { content: [], probes: 0 }, # optional
nfrs: { content: [], probes: 0 }, # optional
prior_attempts: { content: "", probes: 0 }, # optional
question_history: [] # list of questions asked
}
```
**Question 4 (usually) — Non-goals:** `content` is raw user answers merged; `probes` is how many times this
> "What is explicitly NOT in scope? (Prevents scope-guardian flagging gaps section has been asked; `question_history` prevents re-asking the same
> for things we deliberately don't do.)" variant twice.
**Question 5 (conditional) — Constraints:** ### Required sections (initial-signal gate)
> "Are there technical, time, or resource constraints? (Dependencies,
> compatibility, deadlines, budget.)"
>
> Skip if the user already mentioned constraints.
**Question 6 (conditional) — Preferences:** Four sections MUST have substantive content before exiting Phase 3:
> "Preferences on libraries, patterns, or architectural style?"
>
> Skip for small tasks or when constraints already imply them.
**Question 7 (conditional) — Non-functional requirements:** 1. **Intent** — full sentence or paragraph (not a single word or phrase)
> "Performance, security, accessibility, or scalability targets? (Quantified 2. **Goal** — full sentence or paragraph
> where possible.)" 3. **Success Criteria** — at least one concrete, testable item
> 4. **Research Plan** — either ≥ 1 topic probed, OR the user has explicitly
> Skip if not applicable. confirmed "no external research needed"
**Question 8 (conditional) — Prior attempts:** "Substantive" means: non-empty, not a trivial one-word reply, not
> "Has this been tried before? What worked or failed?" "I don't know" without a recorded assumption. The strict falsifiability
> check happens in Phase 4 (brief-review gate); Phase 3 is just the
> Skip if the task is clearly fresh. initial-signal bar.
### Adaptive depth Optional sections (Non-Goals, Constraints, Preferences, NFRs, Prior
Attempts) do not gate exit. If they remain empty after the required
sections pass, they will be recorded as "Not discussed — no constraints
assumed" in Phase 4's draft.
After each answer: ### Question bank (per section)
- **Detailed technical answer (2+ sentences, domain vocabulary):** skip Pick the next question from the section's bank based on `content` and
obvious follow-ups the user already covered. Aim for 4-5 total questions. `probes`. Wording must stay conversational — only the *selection* is
- **Short or uncertain answer ("I don't know", "not sure", vague):** offer section-driven, not the tone.
alternatives instead of open questions. Record uncertainty as an
open assumption.
- **"Skip" / "just make it" / "proceed":** stop interviewing. Write a
minimal brief from the task description and answers so far. Mark
uncovered sections as "Not discussed — no constraints assumed."
### Quick mode **Intent** (required):
- _Anchor_ (probes=0, content empty): "Why are we doing this? What is the
motivation, the user need, or the strategic context behind the task?"
- _Follow-up_ (probes≥1, content present but shallow): "What happens if
we do nothing? Who is affected?"
- _Sharpen_ (user mentioned a symptom): "You mentioned {X}. Is {X} the
symptom or the underlying cause?"
If **mode = quick**, ask only Questions 1, 2, 3, and 4. Maximum 4 questions. **Goal** (required):
Skip the rest. - _Anchor_: "Describe the end state in 13 sentences — specific enough to
disagree with."
- _Follow-up_: "How would you recognize this is done when looking at
the UI / API / codebase?"
### Research topic identification (CRITICAL) **Success Criteria** (required):
- _Anchor_: "How do we verify it is actually done? List 24 concrete,
testable conditions — commands to run, observations, or metrics."
- _Sharpen_ (criterion is vague): "'{quoted criterion}' is subjective.
Which command, observation, or metric would prove this is met?"
- _Quantify_ (performance/quality claim): "You mentioned it should be
{fast/reliable/secure}. What number or threshold counts as success?"
As the interview progresses, identify topics that will need research for **Research Plan** (required, strictest):
the plan to be high-confidence. Watch for: - _Anchor_ (no topics yet): "Are there technologies, libraries, or
decisions in this task you do not have solid current knowledge of?
Examples might be library choice, a protocol, or a security pattern."
- _Per-topic sharpen_ (topic exists but incomplete): "For topic
'{title}': which parts of the plan depend on the answer? What
confidence level do you need — high, medium, or low?"
- _Scope question_: "Is '{topic}' answerable from the existing codebase,
from external docs, or both?"
- _Confirm none_ (user refuses all topics): "Confirming: no external
research needed — you already know everything the plan will depend on?"
**Non-Goals** (optional):
- _Anchor_: "What is explicitly NOT in scope? This prevents scope-guardian
from flagging gaps for things we deliberately don't do."
**Constraints** (optional):
- _Anchor_: "Technical, time, or resource constraints the plan must
respect? Dependencies, compatibility, deadlines, or budget."
- _Sharpen_: "You mentioned {deadline / budget / compatibility}. Is it
hard or guidance?"
**Preferences** (optional):
- _Anchor_: "Preferences for libraries, patterns, or architectural style?"
**NFRs** (optional):
- _Anchor_: "Performance, security, accessibility, or scalability targets?
Quantified wherever possible."
**Prior Attempts** (optional):
- _Anchor_: "Has this been attempted before? What worked or failed?"
### Selection rule
On each loop iteration:
1. Compute the next section to probe:
- If any required section is below the initial-signal gate → pick the
weakest required section in this priority order:
Intent → Goal → Success Criteria → Research Plan.
- Else if an optional section is clearly missing and likely material
to scope (heuristic: the task description hints at constraints or
NFRs) → probe it at most once.
- Else: exit Phase 3.
2. Within the chosen section, pick the question variant:
- If `probes == 0` and content is empty → _Anchor_.
- If content exists but is shallow → _Follow-up_ or _Sharpen_.
- If the section is Research Plan and topics exist → iterate per-topic
sharpen across incomplete topics.
3. Ensure the exact question is NOT already in `question_history`. If it
is, pick the next variant or skip to the next weakest section.
4. Ask via `AskUserQuestion`. Append question to history. Increment probes.
5. Record the answer into `content`. Never overwrite — merge.
### Research topic identification
As the user answers Intent, Goal, or Success Criteria, listen for:
- **Unfamiliar technologies** — libraries, frameworks, protocols not - **Unfamiliar technologies** — libraries, frameworks, protocols not
clearly present in the codebase clearly present in the codebase
@ -163,48 +236,226 @@ the plan to be high-confidence. Watch for:
- **Unknown integrations** — third-party APIs, external services - **Unknown integrations** — third-party APIs, external services
- **Compliance / legal** — GDPR, accessibility, industry regulations - **Compliance / legal** — GDPR, accessibility, industry regulations
For each potential topic, probe briefly: When you hear one, add a *candidate* topic to `research_plan.topics` with
> "Do you already know {topic}, or should I plan a research step for it?" only a title and why-it-matters. Probe it on the next Research Plan
iteration using the per-topic sharpen question to fill in:
Record: - Research question (must end in `?`)
- Topic title (short) - Required for plan steps
- Why it matters for the plan - Scope (local / external / both)
- Exact research question (phrased for `/ultraresearch-local`)
- Suggested scope (local / external / both)
- Confidence needed (high / medium / low) - Confidence needed (high / medium / low)
- Estimated cost (quick / standard / deep) - Estimated cost (quick / standard / deep)
If the user says "I know this" — do not add it as a topic. Trust the user. If the user says "I know this" to a candidate topic, remove it from the
list. Trust the user. If no topics emerge after probing, the user confirms
"no external research needed" → `research_plan` gate passes with 0 topics.
If no topics emerge: `research_topics = 0` is valid. Not every task needs ### Quick mode adjustments
external research.
## Phase 4 — Write the brief If **mode = quick**:
- For optional sections, cap probes at 1 each. Do not revisit optional
sections during Phase 3.
- Required sections still have no probe cap — quality gate still applies.
- Prefer _Anchor_ variants over _Sharpen_ on the first pass.
Read the brief template: ### Force-stop path
If the user says "skip", "stop asking", "just proceed", "enough", or
similar, break the loop immediately:
- Mark any required sections still below the initial-signal gate as
`{ incomplete_forced_stop: true }` in state.
- Proceed to Phase 4 with a note that the brief will carry a reduced
confidence flag.
### Exit condition
Exit Phase 3 when:
- All four required sections meet the initial-signal gate, OR
- The user has force-stopped.
Report:
```
Phase 3 complete: {N} questions asked across {M} sections.
Proceeding to draft and review.
```
## Phase 4 — Draft, review, and revise
Phase 4 runs a **draft → brief-reviewer → revise** loop. The draft is
not written to disk until the brief-review quality gate passes (or the
iteration cap is hit). This ensures the brief that reaches `/ultraplan-local`
has already survived a critical review.
Read the brief template first:
`@${CLAUDE_PLUGIN_ROOT}/templates/ultrabrief-template.md` `@${CLAUDE_PLUGIN_ROOT}/templates/ultrabrief-template.md`
Write the brief to: `{PROJECT_DIR}/brief.md`. ### Loop bound
Fill in every section based on interview answers: **Maximum 3 review iterations.** This bounds cost in the worst case while
leaving room for two rounds of targeted follow-ups.
- **Frontmatter:** populate `task`, `slug`, `project_dir`, `research_topics`, ### Iteration step-by-step
`research_status: pending`, `auto_research: false` (will update in Phase 5
if user opts in), `interview_turns`, `source: interview`. **Step 4a — Draft in memory**
- **Intent:** expand the user's motivation into 3-5 sentences. This is
load-bearing — be explicit. Build the brief text from Phase 3 state by filling the template:
- **Frontmatter:** populate `task`, `slug`, `project_dir`, `research_topics`
(count of topics), `research_status: pending`, `auto_research: false`
(will update in Phase 5 if user opts in), `interview_turns` (total
questions asked across Phase 3 + Phase 4), `source: interview`.
- **Intent:** expand the user's motivation into 35 sentences. Load-bearing.
- **Goal:** concrete end state. - **Goal:** concrete end state.
- **Non-Goals:** from user's answer, or empty list with note. - **Non-Goals:** from state, or "- None explicitly stated" bullet if empty.
- **Constraints / Preferences / NFRs:** from user's answers. Mark - **Constraints / Preferences / NFRs:** from state, or "Not discussed — no
"Not discussed — no constraints assumed" if not covered. constraints assumed" note if empty.
- **Success Criteria:** falsifiable commands/observations. Reject vague - **Success Criteria:** falsifiable commands/observations from state.
criteria from the user — ask for a concrete version if needed. - **Research Plan:** one `### Topic N: {title}` section per topic with the
- **Research Plan:** one section per identified topic, with the full full structure from the template. If 0 topics: write the "No external
structure from the template. If 0 topics, write the "No external research needed — user confirmed solid knowledge of all plan
research needed" note. dependencies" note.
- **Open Questions / Assumptions:** from "I don't know" answers and - **Open Questions / Assumptions:** from any `"I don't know"` answers
implicit gaps. recorded during Phase 3, plus implicit gaps.
- **Prior Attempts:** from user's answer, or "None — fresh task." - **Prior Attempts:** from state, or "None — fresh task."
**Step 4b — Write draft to disk**
Write the draft to `{PROJECT_DIR}/brief.md.draft` (not `brief.md` — the
final file is only written after the gate passes).
**Step 4c — Launch brief-reviewer**
Launch the `brief-reviewer` agent (foreground, blocking) with the prompt:
> "Review this task brief for quality: `{PROJECT_DIR}/brief.md.draft`.
> Check completeness, consistency, testability, scope clarity, and
> research-plan validity. Report findings, verdict, and the required
> machine-readable JSON block."
**Step 4d — Parse JSON scores**
Parse the agent's output. Locate the **last** fenced ```json``` block.
Extract per-dimension scores:
```
review = {
completeness: { score, gaps },
consistency: { score, issues },
testability: { score, weak_criteria },
scope_clarity: { score, unclear_sections },
research_plan: { score, invalid_topics },
verdict: "PROCEED | PROCEED_WITH_RISKS | REVISE"
}
```
**JSON fallback:** if the JSON block is missing, invalid, or a dimension
is missing, treat all dimensions as `score: 3` and set the `verdict` from
the prose verdict if present, otherwise `PROCEED_WITH_RISKS`. Emit an
internal note that the reviewer output was degraded. This ensures the
loop never deadlocks on a parser error.
**Step 4e — Gate evaluation**
The gate **passes** when all of the following are true:
- `completeness.score ≥ 4`
- `consistency.score ≥ 4`
- `testability.score ≥ 4`
- `scope_clarity.score ≥ 4`
- `research_plan.score == 5`
(Research Plan requires a perfect score because its format is checked
mechanically: ends in `?`, `Required for plan steps` filled, scope is
one of `local | external | both`, confidence is `high | medium | low`.
Anything less means at least one topic is malformed and planning will
stumble.)
**If gate passes:**
1. Move `brief.md.draft``brief.md` (atomic rename).
2. Delete the draft file if rename is not possible on the OS; write
`brief.md` fresh.
3. Break the loop and proceed to Step 4g.
**If gate fails AND iteration count < 3:**
1. Identify the weakest dimension (lowest score; tie broken by priority:
research_plan > testability > completeness > consistency > scope_clarity).
2. Generate a targeted follow-up question from the dimension's detail
field (gaps / issues / weak_criteria / unclear_sections / invalid_topics).
Example generators:
- `completeness.gaps: ["Non-Goals empty, unclear if deliberate"]`
→ "You did not specify anything out-of-scope. Is that deliberate, or
are there things we should explicitly exclude?"
- `testability.weak_criteria: ["'system should be fast'"]`
→ "'System should be fast' is not falsifiable. Which metric or
threshold proves this is met — e.g., p95 < 200ms, or throughput
≥ X requests/sec?"
- `research_plan.invalid_topics: [{"topic":"JWT","issue":"Required for plan steps empty"}]`
→ "For research topic 'JWT': which plan steps depend on the answer?
Give one or two concrete kinds of step (e.g., 'library selection',
'threat model', 'migration strategy')."
3. Ask via `AskUserQuestion`. Record the answer into Phase 3 state.
4. Return to Step 4a with incremented iteration count. The reviewer sees
an updated draft, so you MUST re-read the brief and regenerate the
review each iteration — do not reuse stale scores.
5. When launching the reviewer on iteration 2 or 3, include prior
questions in the prompt so it does not produce circular follow-ups:
> "Questions already asked during this interview: {list from
> question_history}. Focus on issues that remain after those answers —
> do not re-raise gaps that have already been addressed."
**If gate fails AND iteration count == 3 (loop exhausted):**
1. Move `brief.md.draft``brief.md`.
2. Add `brief_quality: partial` to the frontmatter (edit the file
post-rename — insert the key above the closing `---`).
3. Add a `## Brief Quality` section near the end with the failing
dimensions and their `detail` arrays from the final review, formatted:
```
## Brief Quality
Review loop exhausted after 3 iterations. The following dimensions
did not reach the pass threshold:
- **Research Plan (score 2/5):** Topic 'JWT library' missing
Required-for-plan-steps field.
- **Testability (score 3/5):** Success criterion "works correctly"
is not falsifiable.
Downstream planning will treat these as reduced-confidence areas.
```
4. Break the loop and proceed to Step 4g.
### Step 4f — Force-stop handling
If during any `AskUserQuestion` in Step 4e the user says "stop", "skip",
"enough", "just write it", or similar, do NOT exit the loop immediately.
Instead, surface the current review findings in plain text:
```
Brief-reviewer would flag these issues:
- Research Plan (score 2/5): Topic 'JWT library choice' missing Required-for-plan-steps field.
- Testability (score 3/5): Success criterion "works correctly" is not falsifiable.
Continue anyway? The plan will have lower confidence in these areas.
```
Then ask via `AskUserQuestion`:
| Option | Action |
|--------|--------|
| **Answer one more follow-up** | Return to Step 4e with the current weakest-dimension question. |
| **Stop now (accept partial brief)** | Finalize brief with `brief_quality: partial` and the `## Brief Quality` section (same path as iteration-cap exhaustion). Break loop. |
The force-stop path is distinct from a silent iteration cap: the user
sees exactly which dimensions are weak and chooses informed.
### Step 4g — Finalize
After the loop exits (pass, cap, or force-stop), ensure:
- `brief.md` exists at `{PROJECT_DIR}/brief.md`.
- `brief.md.draft` no longer exists.
- If the loop ended without a clean pass, frontmatter contains
`brief_quality: partial` and a `## Brief Quality` section exists.
- If the loop ended with a clean pass, `brief_quality` is either
absent or set to `complete`.
Populate the "How to continue" footer with the actual project path and Populate the "How to continue" footer with the actual project path and
topic questions. topic questions.
@ -212,6 +463,8 @@ topic questions.
Report: Report:
``` ```
Brief written: {PROJECT_DIR}/brief.md Brief written: {PROJECT_DIR}/brief.md
Review iterations: {1..3}
Final quality: {complete | partial}
Research topics identified: {N} Research topics identified: {N}
``` ```
@ -387,6 +640,8 @@ Append one record to `${CLAUDE_PLUGIN_DATA}/ultrabrief-stats.jsonl`:
"slug": "{slug}", "slug": "{slug}",
"mode": "{default | quick}", "mode": "{default | quick}",
"interview_turns": {N}, "interview_turns": {N},
"review_iterations": {1..3},
"brief_quality": "{complete | partial}",
"research_topics": {N}, "research_topics": {N},
"auto_research": {true | false}, "auto_research": {true | false},
"auto_result": "{completed | cancelled | failed | manual}", "auto_result": "{completed | cancelled | failed | manual}",
@ -417,4 +672,11 @@ Never let stats failures block the workflow.
7. **Auto mode blocks foreground.** If the user opts into auto, this 7. **Auto mode blocks foreground.** If the user opts into auto, this
session waits for research + planning to complete. Document this in session waits for research + planning to complete. Document this in
the opt-in question. the opt-in question.
8. **Privacy:** never log prompt text, secrets, or credentials. 8. **Quality gates, not question counts.** Phase 3 and Phase 4 are
quality-gated loops; do not enforce a hard cap on interview questions.
The brief-review gate (Phase 4) caps at 3 review iterations to bound
cost, but Phase 3 has no cap — required-section content drives exit.
9. **Never write `brief.md` while the review gate is still pending.**
Draft lives in `brief.md.draft` until the loop terminates. A caller
that sees `brief.md` must be able to trust that Phase 4 finished.
10. **Privacy:** never log prompt text, secrets, or credentials.