feat(ultraplan-local): v2.1.0 — dynamic quality-gated interview

Replace hardcoded Q1-Q8 in /ultrabrief-local with a section-driven completeness loop (Phase 3) and a draft/review/revise loop with brief-reviewer as stop-gate (Phase 4). Quality drives the interview, not a question counter. brief-reviewer now emits a machine-readable JSON block with per-dimension scores (1-5) and detail arrays alongside the existing prose report; planning-orchestrator continues to consume the prose verdict unchanged. Phase 4 gate: all dimensions >= 4 AND research_plan = 5. On fail, a targeted follow-up is generated from the weakest dimension's detail field and the draft is re-reviewed. Max 3 review iterations bound cost; exhaustion writes brief.md with brief_quality: partial and an explicit Brief Quality section. Force-stop surfaces per-dimension findings before the user chooses continue or partial. Not breaking. /ultrabrief-local [--quick] <task> interface unchanged. --quick now means compact start with escalation, not a max-N cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 09:43:43 +02:00 · 2026-04-18 09:43:43 +02:00 · 1634197853
commit 1634197853
parent 2bc405e14a
8 changed files with 501 additions and 120 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -12,7 +12,7 @@ plugins/
  llm-security/          v6.0.0  — Security scanning, auditing, threat modeling
  ms-ai-architect/       v1.8.0  — Microsoft AI architecture (Cosmo Skyberg persona)
  okr/                   v1.0.0  — OKR guidance for Norwegian public sector
-  ultraplan-local/       v2.0.0  — Brief, research, plan, execute (four-command pipeline)
+  ultraplan-local/       v2.1.0  — Brief, research, plan, execute (four-command pipeline)
 ```
 Hvert plugin er selvstendig med egen CLAUDE.md, README, hooks, agents og commands.
--- a/README.md
+++ b/README.md
@ -61,20 +61,20 @@ Key commands: `/config-audit posture`, `/config-audit feature-gap`, `/config-aud
 ---
-### [Ultra {brief | research | plan | execute} - local](plugins/ultraplan-local/) `v2.0.0`
+### [Ultra {brief | research | plan | execute} - local](plugins/ultraplan-local/) `v2.1.0`
 Deep requirements gathering, research, implementation planning, and self-verifying execution with specialized agent swarms, adversarial review, and failure recovery.
 Four commands, one pipeline with clear division of labor:
- **`/ultrabrief-local`** — Capture intent. Interactive interview (3-8 adaptive questions) produces a task brief with explicit research plan. Identifies research topics with copy-paste-ready `/ultraresearch-local` commands. Optional auto-orchestration runs research + planning in foreground. Always interactive.
+- **`/ultrabrief-local`** — Capture intent. Dynamic, quality-gated interview: a section-driven completeness loop (Phase 3) followed by a `brief-reviewer` stop-gate (Phase 4, max 3 review iterations). Required sections must reach an initial-signal gate AND pass review across completeness, consistency, testability, scope clarity, and research-plan validity before `brief.md` is written. Identifies research topics with copy-paste-ready `/ultraresearch-local` commands. Optional auto-orchestration runs research + planning in foreground. Always interactive.
 - **`/ultraresearch-local`** — Gather context. Deep multi-source research with triangulation: 5 local agents + 4 external agents + Gemini bridge, producing structured briefs with confidence ratings. Makes no build decisions.
 - **`/ultraplan-local`** — Transform intent into an executable contract. Per-step YAML manifests (`expected_paths`, `commit_message_pattern`, `bash_syntax_check`). Plan-critic is a hard gate on manifest quality. Requires a task brief as input (`--brief` or `--project`).
 - **`/ultraexecute-local`** — Execute the contract disciplined. Manifest-based verification, independent Phase 7.5 audit from git log + filesystem (ignores agent bookkeeping), Phase 7.6 bounded recovery dispatch for missing steps. Step 0 pre-flight catches sandbox push-denial before any work. `--validate` mode offers a fast schema-only sanity-check between planning and execution.
 All artifacts land in one project directory: `.claude/projects/{YYYY-MM-DD}-{slug}/` contains `brief.md`, `research/NN-*.md`, `plan.md`, `sessions/`, and `progress.json`. `--project <dir>` works across `/ultraresearch-local`, `/ultraplan-local`, and `/ultraexecute-local`.
-v2.0 extracts interview from planning: briefs are now reviewable artifacts with explicit intent, research plan, and falsifiable success criteria — each a contract that downstream agents (`brief-reviewer`, `plan-critic`, `scope-guardian`) can validate independently. Breaking change: `/ultraplan-local` no longer runs an interview and requires `--brief` or `--project`. See `plugins/ultraplan-local/MIGRATION.md`.
+v2.1 (non-breaking) replaces the hardcoded Q1–Q8 interview with a dynamic, quality-gated loop; `brief-reviewer` now emits machine-readable per-dimension JSON scores so `/ultrabrief-local` can use it as an internal stop-gate. v2.0 extracts the interview from planning: briefs are reviewable artifacts with explicit intent, research plan, and falsifiable success criteria — each a contract that downstream agents (`brief-reviewer`, `plan-critic`, `scope-guardian`) can validate independently. v2.0 is breaking: `/ultraplan-local` no longer runs an interview and requires `--brief` or `--project`. See `plugins/ultraplan-local/MIGRATION.md`.
 v1.7 self-verifying chain (preserved): a step may not be marked `completed` unless its manifest verifies. v1.8 Opus 4.7 literalism fixes (preserved): literal Step+Manifest template, forbidden narrative headers, schema self-check.
--- a/plugins/ultraplan-local/.claude-plugin/plugin.json
+++ b/plugins/ultraplan-local/.claude-plugin/plugin.json
@ -1,7 +1,7 @@
 {
  "name": "ultraplan-local",
  "description": "Four-command context-engineering pipeline (brief → research → plan → execute) with project folders, specialized agent swarms, external research triangulation, adversarial review, session decomposition, and headless execution.",
-  "version": "2.0.0",
+  "version": "2.1.0",
  "author": {
    "name": "Kjell Tore Guttormsen"
  },
--- a/plugins/ultraplan-local/CHANGELOG.md
+++ b/plugins/ultraplan-local/CHANGELOG.md
@ -4,6 +4,61 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 ## [2.1.0] - 2026-04-18
 ### Changed — Dynamic, quality-gated interview in `/ultrabrief-local`
 The Phase 3 interview is no longer a hardcoded Q1–Q8 list with a numeric
 cap (3–4 questions in `--quick`, 5–8 in default). It is now a
 **section-driven completeness loop**: the command maintains per-section
 state (Intent, Goal, Success Criteria, Research Plan, and five optional
 sections), picks the next question from the section with the weakest
 signal, and keeps probing until all four required sections meet an
 initial-signal gate. Quality drives the loop, not a counter.
 Phase 4 adds a **draft → brief-reviewer → revise** loop. The brief is
 drafted in memory, written to `brief.md.draft`, reviewed by the
 `brief-reviewer` agent as a stop-gate, and only renamed to `brief.md`
 after all five dimensions pass (`completeness/consistency/testability/
 scope_clarity ≥ 4` and `research_plan == 5`). If the gate fails, a
 targeted follow-up is generated from the weakest dimension's detail
 field and the draft is re-reviewed. The loop is capped at 3 review
 iterations to bound cost; exhaustion writes the brief with
 `brief_quality: partial` and an explicit `## Brief Quality` section.
 Force-stop path: when the user says "stop" during Phase 4, the current
 review findings are surfaced with per-dimension scores before asking
 whether to continue or accept a partial brief. No silent exits.
 Not breaking. The `/ultrabrief-local [--quick] <task>` interface is
 unchanged from the outside; only internals change. `--quick` now means
 "start compact, escalate if gates fail" rather than "max 4 questions".
 ### Added
 - **JSON output from `brief-reviewer`** — the agent now emits a final
  fenced `json` block with per-dimension `score` (1–5) and `detail`
  arrays (`gaps`, `issues`, `weak_criteria`, `unclear_sections`,
  `invalid_topics`) alongside the existing prose report. The JSON block
  is mandatory; empty arrays and `score: 5` are required when a
  dimension passes cleanly. `planning-orchestrator` continues to use
  the prose verdict unchanged.
 - **`brief_quality` frontmatter field** on task briefs — `complete`
  (default) when the Phase 4 gate passed, or `partial` when the
  iteration cap was hit or the user force-stopped with known issues.
  `planning-orchestrator` can inspect this to decide how heavily to
  weight brief sections as assumptions.
 - **`review_iterations` and `brief_quality` in ultrabrief-stats** —
  recorded per run for telemetry.
 ### Changed
 - Hard rule added: `/ultrabrief-local` never writes `brief.md` while the
  review gate is pending. The draft lives in `brief.md.draft` until the
  loop terminates.
 - Hard rule added: no hard cap on Phase 3 questions; the brief-review
  gate is the only loop bound (3-iteration cap) and is in Phase 4.
 ## [2.0.0] - 2026-04-18
 ### Breaking — Four-command pipeline with dedicated brief step
--- a/plugins/ultraplan-local/CLAUDE.md
+++ b/plugins/ultraplan-local/CLAUDE.md
@ -17,10 +17,10 @@ Deep implementation planning and research with an explicit brief step, specializ
 | Flag | Behavior |
 |------|----------|
-| _(default)_ | Full interview (5-8 questions) → brief.md with research plan |
+| _(default)_ | Dynamic interview until quality gates pass → brief.md with research plan |
-| `--quick` | Short interview (3-4 questions) → brief.md with research plan |
+| `--quick` | Compact start; still escalates if required sections are weak or the brief-review gate fails → brief.md with research plan |
-Always interactive. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground).
+Always interactive. Phase 3 is a section-driven completeness loop (no hard cap on question count); Phase 4 runs a `brief-reviewer` stop-gate with max 3 review iterations. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground).
 ### /ultraresearch-local modes
@ -88,7 +88,7 @@ Flags combine: `--project <dir> --local --fg`, `--external --quick`.
 ## Architecture
-**Brief:** 7-phase workflow: Parse mode → Create project dir → Interactive interview (adaptive depth) → Identify research topics → Write `brief.md` → Manual/auto opt-in → Stats. Always interactive. Auto mode blocks foreground until plan is ready.
+**Brief:** 7-phase workflow: Parse mode → Create project dir → Phase 3 completeness loop (section-driven, no question cap) → Phase 4 draft/review/revise with `brief-reviewer` as stop-gate (max 3 iterations; gate = all dimensions ≥ 4 and research plan = 5) → Finalize (`brief.md` on pass, or `brief_quality: partial` on cap/force-stop) → Manual/auto opt-in → Stats. Always interactive. Auto mode blocks foreground until plan is ready.
 **Research:** 8-phase workflow: Parse mode → Interview → Background transition → Parallel research (5 local + 4 external + 1 bridge) → Follow-ups → Triangulation → Synthesis + brief → Stats. With `--project`, writes to `{dir}/research/NN-slug.md`.
--- a/plugins/ultraplan-local/README.md
+++ b/plugins/ultraplan-local/README.md
@ -84,17 +84,18 @@ Or opt into auto-mode in `/ultrabrief-local` — it will launch research and pla
 ## `/ultrabrief-local` — Brief
-Interactive requirements-gathering command. Runs a focused interview (3-8 questions depending on mode and adaptive depth) and produces a **task brief** with an explicit research plan. Optionally orchestrates the rest of the pipeline.
+Interactive requirements-gathering command. Runs a **dynamic, quality-gated interview** and produces a **task brief** with an explicit research plan. Optionally orchestrates the rest of the pipeline.
-### How it works
+### How it works (v2.1 — quality-gated)
-1. **Parse mode** — `default` (5-8 questions) or `--quick` (3-4 questions)
+1. **Parse mode** — `default` (dynamic; probes until quality gates pass) or `--quick` (starts compact, still escalates if gates fail)
 2. **Create project directory** — `.claude/projects/{YYYY-MM-DD}-{slug}/` with `research/` subdirectory
-3. **Interview** — one question at a time via `AskUserQuestion`, starting with Intent → Goal → Success Criteria, then optional Non-Goals, Constraints, Preferences, NFRs, Prior Attempts
+3. **Phase 3 — Completeness loop** — a section-driven loop (not a fixed question list) picks the next question from the section with the weakest signal (Intent → Goal → Success Criteria → Research Plan first, then optional sections). Required sections must reach an initial-signal gate before Phase 3 exits. No hard cap on question count.
-4. **Identify research topics** — probe for unfamiliar tech, version upgrades, security decisions, architectural choices
+4. **Identify research topics** — inline during Phase 3; probe for unfamiliar tech, version upgrades, security-sensitive decisions, architectural choices. Topics get a research question, scope, confidence, and cost.
-5. **Write brief** — `{project_dir}/brief.md` with all sections and a copy-paste-ready research plan
+5. **Phase 4 — Draft, review, revise** — draft the brief in memory, write to `brief.md.draft`, launch `brief-reviewer` as a stop-gate. The reviewer scores five dimensions (completeness, consistency, testability, scope clarity, research plan) 1–5 and returns machine-readable JSON. Gate passes when all ≥ 4 and research plan = 5. On fail, a targeted follow-up is generated from the weakest dimension's detail field and the draft is re-reviewed. Max 3 review iterations.
-6. **Auto-orchestration opt-in** — user chooses manual (default) or auto (Claude-managed research + plan in foreground)
+6. **Finalize** — rename draft to `brief.md` on pass, or write with `brief_quality: partial` + a `## Brief Quality` section if the cap is hit or the user force-stops.
-7. **Stats tracking** — append to `ultrabrief-stats.jsonl`
+7. **Auto-orchestration opt-in** — user chooses manual (default) or auto (Claude-managed research + plan in foreground)
 8. **Stats tracking** — append to `ultrabrief-stats.jsonl` (includes `review_iterations` and `brief_quality`)
 Output: `.claude/projects/{YYYY-MM-DD}-{slug}/brief.md`
@ -102,11 +103,17 @@ Output: `.claude/projects/{YYYY-MM-DD}-{slug}/brief.md`
 | Mode | Usage | Behavior |
 |------|-------|----------|
-| **Default** | `/ultrabrief-local <task>` | Full interview (5-8 questions) |
+| **Default** | `/ultrabrief-local <task>` | Dynamic interview until quality gates pass. No question cap. |
-| **Quick** | `/ultrabrief-local --quick <task>` | Short interview (3-4 questions) |
+| **Quick** | `/ultrabrief-local --quick <task>` | Starts compact (optional sections get at most one probe), still escalates on weak required sections or failed review gate. |
 `/ultrabrief-local` is **always interactive**. There is no foreground/background mode — the interview requires user input.
 ### Force-stop
 If you say "stop" or "enough" during Phase 4, the current review findings are surfaced with per-dimension scores and you choose:
 - **Answer one more follow-up** — the loop continues.
 - **Stop now (accept partial brief)** — the brief is finalized with `brief_quality: partial` and a `## Brief Quality` section listing the failing dimensions. Downstream planning will treat these as reduced-confidence areas.
 ### What the brief contains
 - **Intent** — why this matters, motivation, user need (load-bearing)
--- a/plugins/ultraplan-local/agents/brief-reviewer.md
+++ b/plugins/ultraplan-local/agents/brief-reviewer.md
@ -150,13 +150,35 @@ Flag as **research-plan invalid** if:
 ## Rating
-Rate each dimension:
+Rate each dimension on two parallel scales:
 **Verbal rating** (used in the prose report and the summary table):
 - **Pass** — adequate for planning
 - **Weak** — has issues but exploration can proceed with noted risks
 - **Fail** — must be addressed before exploration (wastes tokens otherwise)
 **Numeric score 1–5** (used in the machine-readable JSON block):
 - **5** — no issues; section is strong
 - **4** — minor issues that do not block exploration (maps to Pass)
 - **3** — weak but usable; assumptions should be carried (maps to Weak)
 - **2** — serious gap; exploration risks wasted work (maps to Fail)
 - **1** — section is effectively missing or contradictory (maps to Fail)
 Use both. The verbal rating drives the human-readable verdict. The numeric
 score drives callers (such as `/ultrabrief-local` Phase 4) that use the
 review as a quality gate and need per-dimension granularity.
 ## Output format
 Produce **two artifacts in this order**:
 1. A prose report (for humans and for `planning-orchestrator` Phase 1b).
 2. A final fenced `json` block with per-dimension numeric scores (for callers
   that gate on machine-readable output, such as `/ultrabrief-local` Phase 4).
 The JSON block MUST be the last fenced block in your output so parsers can
 find it by reading the last `json` code fence.
 ```
 ## Brief Review
@ -187,7 +209,42 @@ information that would strengthen the brief. List only if actionable.}
 - **{PROCEED}** — brief is adequate for exploration
 - **{PROCEED_WITH_RISKS}** — brief has weaknesses; note them as assumptions in the plan
 - **{REVISE}** — brief needs fixes before exploration (list what to fix)
 ### Machine-readable scores
 ```json
 {
  "completeness":   { "score": 1-5, "gaps":            [ "{short gap description}", ... ] },
  "consistency":    { "score": 1-5, "issues":          [ "{short issue description}", ... ] },
  "testability":    { "score": 1-5, "weak_criteria":   [ "{quoted weak criterion}", ... ] },
  "scope_clarity":  { "score": 1-5, "unclear_sections":[ "{section name}", ... ] },
  "research_plan":  {
    "score": 1-5,
    "invalid_topics": [
      { "topic": "{topic title}", "issue": "{what is missing or wrong}" }
    ]
  },
  "verdict": "PROCEED | PROCEED_WITH_RISKS | REVISE"
 }
 ```
 ```
 ### JSON output rules
 - The JSON block is mandatory. Emit it even when everything passes — use
  empty arrays and `"score": 5` in that case.
 - Every dimension key must be present. Do not omit dimensions.
 - `score` is an integer 1–5. Use the mapping in the Rating section.
 - Array fields must be strings (or objects in the case of `invalid_topics`)
  that are short, concrete, and actionable — never sentences spanning lines.
 - `verdict` must match the verbal verdict in the prose section. If the JSON
  verdict disagrees with the prose, the caller will fall back to the prose
  verdict — but the mismatch is a bug in your output.
 - Do not include trailing commas, comments, or non-JSON text inside the
  fence. The block must parse with a strict JSON parser.
 - If a dimension's score is 4 or 5, its detail array may be `[]`. A score of
  3 or below SHOULD populate the detail array so callers can generate
  targeted follow-up questions.
 ## Rules
--- a/plugins/ultraplan-local/commands/ultrabrief-local.md
+++ b/plugins/ultraplan-local/commands/ultrabrief-local.md
@ -6,7 +6,7 @@ model: opus
 allowed-tools: Agent, Read, Glob, Grep, Write, Edit, Bash, AskUserQuestion
 ---
-# Ultrabrief Local v2.0
+# Ultrabrief Local v2.1
 Interactive requirements-gathering command. Produces a **task brief** — a
 structured markdown file that declares intent, goal, constraints, and an
@ -33,11 +33,14 @@ foreground if the user opts in.
 Parse `$ARGUMENTS`:
-1. If arguments start with `--quick`: set **mode = quick**. Interview is
+1. If arguments start with `--quick`: set **mode = quick**. The interview
-   shorter (3-4 questions instead of 5-8). Strip the flag; remainder is
+   starts more compactly (fewer opening probes per section) but still
-   the task description.
+   escalates automatically if quality gates fail. There is no hard cap on
   question count — quality drives the loop, not a counter. Strip the flag;
   remainder is the task description.
-2. Otherwise: **mode = default**. Full interview (5-8 questions).
+2. Otherwise: **mode = default**. Interview probes each section until the
   completeness gate (Phase 3) and brief-review gate (Phase 4) both pass.
 If no task description is provided, output usage and stop:
@ -46,8 +49,8 @@ Usage: /ultrabrief-local <task description>
       /ultrabrief-local --quick <task description>
 Modes:
-  default       Full interview (5-8 questions) → brief with research plan
+  default       Dynamic interview until quality gates pass — brief with research plan
-  --quick       Short interview (3-4 questions) → brief with research plan
+  --quick       Compact start; still escalates on weak sections — brief with research plan
 Examples:
  /ultrabrief-local Add user authentication with JWT tokens
@ -86,74 +89,144 @@ If the directory already exists and is non-empty, warn and ask:
 Use `AskUserQuestion` with three options. If "pick new slug", ask for a
 new slug and restart Phase 2.
-## Phase 3 — Interview
+## Phase 3 — Completeness loop
-Use `AskUserQuestion` throughout. **Ask one question at a time.** Never
+Phase 3 is a **section-driven completeness loop**. Instead of a numbered
-dump all questions at once. Follow up based on answers.
+question list, maintain an internal state of brief sections and keep asking
 until every required section has substantive content. Quality drives the
 loop — there is no hard cap on question count.
-### Interview flow
+Use `AskUserQuestion` for every question. **Ask one question at a time.**
 Never dump multiple questions.
-**Question 1 (always) — Intent:**
+### Internal state
 > "What is the motivation for this task? Why does it matter? What happens
 > if we don't do it? (The plan will use this to justify every implementation
 > decision.)"
-**Question 2 (always) — Goal:**
+Track this structure in memory as the loop runs:
 > "What does success look like concretely? Describe the end state in
 > 1-3 sentences — specific enough to disagree with."
-**Question 3 (always) — Success criteria:**
+```
-> "How will we verify it's done? List 2-4 specific, testable conditions
+state = {
-> (commands to run, observations, metrics). Avoid 'it works'."
+  intent:           { content: "", probes: 0 },           # required
  goal:             { content: "", probes: 0 },           # required
  success_criteria: { content: [], probes: 0 },           # required
  research_plan:    { topics: [], probes: 0 },            # required
  non_goals:        { content: [], probes: 0 },           # optional
  constraints:      { content: [], probes: 0 },           # optional
  preferences:      { content: [], probes: 0 },           # optional
  nfrs:             { content: [], probes: 0 },           # optional
  prior_attempts:   { content: "", probes: 0 },           # optional
  question_history: []                                    # list of questions asked
 }
 ```
-**Question 4 (usually) — Non-goals:**
+`content` is raw user answers merged; `probes` is how many times this
-> "What is explicitly NOT in scope? (Prevents scope-guardian flagging gaps
+section has been asked; `question_history` prevents re-asking the same
-> for things we deliberately don't do.)"
+variant twice.
-**Question 5 (conditional) — Constraints:**
+### Required sections (initial-signal gate)
 > "Are there technical, time, or resource constraints? (Dependencies,
 > compatibility, deadlines, budget.)"
 >
 > Skip if the user already mentioned constraints.
-**Question 6 (conditional) — Preferences:**
+Four sections MUST have substantive content before exiting Phase 3:
 > "Preferences on libraries, patterns, or architectural style?"
 >
 > Skip for small tasks or when constraints already imply them.
-**Question 7 (conditional) — Non-functional requirements:**
+1. **Intent** — full sentence or paragraph (not a single word or phrase)
-> "Performance, security, accessibility, or scalability targets? (Quantified
+2. **Goal** — full sentence or paragraph
-> where possible.)"
+3. **Success Criteria** — at least one concrete, testable item
->
+4. **Research Plan** — either ≥ 1 topic probed, OR the user has explicitly
-> Skip if not applicable.
+   confirmed "no external research needed"
-**Question 8 (conditional) — Prior attempts:**
+"Substantive" means: non-empty, not a trivial one-word reply, not
-> "Has this been tried before? What worked or failed?"
+"I don't know" without a recorded assumption. The strict falsifiability
->
+check happens in Phase 4 (brief-review gate); Phase 3 is just the
-> Skip if the task is clearly fresh.
+initial-signal bar.
-### Adaptive depth
+Optional sections (Non-Goals, Constraints, Preferences, NFRs, Prior
 Attempts) do not gate exit. If they remain empty after the required
 sections pass, they will be recorded as "Not discussed — no constraints
 assumed" in Phase 4's draft.
-After each answer:
+### Question bank (per section)
- **Detailed technical answer (2+ sentences, domain vocabulary):** skip
+Pick the next question from the section's bank based on `content` and
-  obvious follow-ups the user already covered. Aim for 4-5 total questions.
+`probes`. Wording must stay conversational — only the *selection* is
- **Short or uncertain answer ("I don't know", "not sure", vague):** offer
+section-driven, not the tone.
  alternatives instead of open questions. Record uncertainty as an
  open assumption.
 - **"Skip" / "just make it" / "proceed":** stop interviewing. Write a
  minimal brief from the task description and answers so far. Mark
  uncovered sections as "Not discussed — no constraints assumed."
-### Quick mode
+**Intent** (required):
 - _Anchor_ (probes=0, content empty): "Why are we doing this? What is the
  motivation, the user need, or the strategic context behind the task?"
 - _Follow-up_ (probes≥1, content present but shallow): "What happens if
  we do nothing? Who is affected?"
 - _Sharpen_ (user mentioned a symptom): "You mentioned {X}. Is {X} the
  symptom or the underlying cause?"
-If **mode = quick**, ask only Questions 1, 2, 3, and 4. Maximum 4 questions.
+**Goal** (required):
-Skip the rest.
+- _Anchor_: "Describe the end state in 1–3 sentences — specific enough to
  disagree with."
 - _Follow-up_: "How would you recognize this is done when looking at
  the UI / API / codebase?"
-### Research topic identification (CRITICAL)
+**Success Criteria** (required):
 - _Anchor_: "How do we verify it is actually done? List 2–4 concrete,
  testable conditions — commands to run, observations, or metrics."
 - _Sharpen_ (criterion is vague): "'{quoted criterion}' is subjective.
  Which command, observation, or metric would prove this is met?"
 - _Quantify_ (performance/quality claim): "You mentioned it should be
  {fast/reliable/secure}. What number or threshold counts as success?"
-As the interview progresses, identify topics that will need research for
+**Research Plan** (required, strictest):
-the plan to be high-confidence. Watch for:
+- _Anchor_ (no topics yet): "Are there technologies, libraries, or
  decisions in this task you do not have solid current knowledge of?
  Examples might be library choice, a protocol, or a security pattern."
 - _Per-topic sharpen_ (topic exists but incomplete): "For topic
  '{title}': which parts of the plan depend on the answer? What
  confidence level do you need — high, medium, or low?"
 - _Scope question_: "Is '{topic}' answerable from the existing codebase,
  from external docs, or both?"
 - _Confirm none_ (user refuses all topics): "Confirming: no external
  research needed — you already know everything the plan will depend on?"
 **Non-Goals** (optional):
 - _Anchor_: "What is explicitly NOT in scope? This prevents scope-guardian
  from flagging gaps for things we deliberately don't do."
 **Constraints** (optional):
 - _Anchor_: "Technical, time, or resource constraints the plan must
  respect? Dependencies, compatibility, deadlines, or budget."
 - _Sharpen_: "You mentioned {deadline / budget / compatibility}. Is it
  hard or guidance?"
 **Preferences** (optional):
 - _Anchor_: "Preferences for libraries, patterns, or architectural style?"
 **NFRs** (optional):
 - _Anchor_: "Performance, security, accessibility, or scalability targets?
  Quantified wherever possible."
 **Prior Attempts** (optional):
 - _Anchor_: "Has this been attempted before? What worked or failed?"
 ### Selection rule
 On each loop iteration:
 1. Compute the next section to probe:
   - If any required section is below the initial-signal gate → pick the
     weakest required section in this priority order:
     Intent → Goal → Success Criteria → Research Plan.
   - Else if an optional section is clearly missing and likely material
     to scope (heuristic: the task description hints at constraints or
     NFRs) → probe it at most once.
   - Else: exit Phase 3.
 2. Within the chosen section, pick the question variant:
   - If `probes == 0` and content is empty → _Anchor_.
   - If content exists but is shallow → _Follow-up_ or _Sharpen_.
   - If the section is Research Plan and topics exist → iterate per-topic
     sharpen across incomplete topics.
 3. Ensure the exact question is NOT already in `question_history`. If it
   is, pick the next variant or skip to the next weakest section.
 4. Ask via `AskUserQuestion`. Append question to history. Increment probes.
 5. Record the answer into `content`. Never overwrite — merge.
 ### Research topic identification
 As the user answers Intent, Goal, or Success Criteria, listen for:
 - **Unfamiliar technologies** — libraries, frameworks, protocols not
  clearly present in the codebase
@ -163,48 +236,226 @@ the plan to be high-confidence. Watch for:
 - **Unknown integrations** — third-party APIs, external services
 - **Compliance / legal** — GDPR, accessibility, industry regulations
-For each potential topic, probe briefly:
+When you hear one, add a *candidate* topic to `research_plan.topics` with
-> "Do you already know {topic}, or should I plan a research step for it?"
+only a title and why-it-matters. Probe it on the next Research Plan
-
+iteration using the per-topic sharpen question to fill in:
-Record:
+- Research question (must end in `?`)
- Topic title (short)
+- Required for plan steps
- Why it matters for the plan
+- Scope (local / external / both)
 - Exact research question (phrased for `/ultraresearch-local`)
 - Suggested scope (local / external / both)
 - Confidence needed (high / medium / low)
 - Estimated cost (quick / standard / deep)
-If the user says "I know this" — do not add it as a topic. Trust the user.
+If the user says "I know this" to a candidate topic, remove it from the
 list. Trust the user. If no topics emerge after probing, the user confirms
 "no external research needed" → `research_plan` gate passes with 0 topics.
-If no topics emerge: `research_topics = 0` is valid. Not every task needs
+### Quick mode adjustments
 external research.
-## Phase 4 — Write the brief
+If **mode = quick**:
 - For optional sections, cap probes at 1 each. Do not revisit optional
  sections during Phase 3.
 - Required sections still have no probe cap — quality gate still applies.
 - Prefer _Anchor_ variants over _Sharpen_ on the first pass.
-Read the brief template:
+### Force-stop path
 If the user says "skip", "stop asking", "just proceed", "enough", or
 similar, break the loop immediately:
 - Mark any required sections still below the initial-signal gate as
  `{ incomplete_forced_stop: true }` in state.
 - Proceed to Phase 4 with a note that the brief will carry a reduced
  confidence flag.
 ### Exit condition
 Exit Phase 3 when:
 - All four required sections meet the initial-signal gate, OR
 - The user has force-stopped.
 Report:
 ```
 Phase 3 complete: {N} questions asked across {M} sections.
 Proceeding to draft and review.
 ```
 ## Phase 4 — Draft, review, and revise
 Phase 4 runs a **draft → brief-reviewer → revise** loop. The draft is
 not written to disk until the brief-review quality gate passes (or the
 iteration cap is hit). This ensures the brief that reaches `/ultraplan-local`
 has already survived a critical review.
 Read the brief template first:
 `@${CLAUDE_PLUGIN_ROOT}/templates/ultrabrief-template.md`
-Write the brief to: `{PROJECT_DIR}/brief.md`.
+### Loop bound
-Fill in every section based on interview answers:
+**Maximum 3 review iterations.** This bounds cost in the worst case while
 leaving room for two rounds of targeted follow-ups.
- **Frontmatter:** populate `task`, `slug`, `project_dir`, `research_topics`,
+### Iteration step-by-step
-  `research_status: pending`, `auto_research: false` (will update in Phase 5
+
-  if user opts in), `interview_turns`, `source: interview`.
+**Step 4a — Draft in memory**
- **Intent:** expand the user's motivation into 3-5 sentences. This is
+
-  load-bearing — be explicit.
+Build the brief text from Phase 3 state by filling the template:
 - **Frontmatter:** populate `task`, `slug`, `project_dir`, `research_topics`
  (count of topics), `research_status: pending`, `auto_research: false`
  (will update in Phase 5 if user opts in), `interview_turns` (total
  questions asked across Phase 3 + Phase 4), `source: interview`.
 - **Intent:** expand the user's motivation into 3–5 sentences. Load-bearing.
 - **Goal:** concrete end state.
- **Non-Goals:** from user's answer, or empty list with note.
+- **Non-Goals:** from state, or "- None explicitly stated" bullet if empty.
- **Constraints / Preferences / NFRs:** from user's answers. Mark
+- **Constraints / Preferences / NFRs:** from state, or "Not discussed — no
-  "Not discussed — no constraints assumed" if not covered.
+  constraints assumed" note if empty.
- **Success Criteria:** falsifiable commands/observations. Reject vague
+- **Success Criteria:** falsifiable commands/observations from state.
-  criteria from the user — ask for a concrete version if needed.
+- **Research Plan:** one `### Topic N: {title}` section per topic with the
- **Research Plan:** one section per identified topic, with the full
+  full structure from the template. If 0 topics: write the "No external
-  structure from the template. If 0 topics, write the "No external
+  research needed — user confirmed solid knowledge of all plan
-  research needed" note.
+  dependencies" note.
- **Open Questions / Assumptions:** from "I don't know" answers and
+- **Open Questions / Assumptions:** from any `"I don't know"` answers
-  implicit gaps.
+  recorded during Phase 3, plus implicit gaps.
- **Prior Attempts:** from user's answer, or "None — fresh task."
+- **Prior Attempts:** from state, or "None — fresh task."
 **Step 4b — Write draft to disk**
 Write the draft to `{PROJECT_DIR}/brief.md.draft` (not `brief.md` — the
 final file is only written after the gate passes).
 **Step 4c — Launch brief-reviewer**
 Launch the `brief-reviewer` agent (foreground, blocking) with the prompt:
 > "Review this task brief for quality: `{PROJECT_DIR}/brief.md.draft`.
 > Check completeness, consistency, testability, scope clarity, and
 > research-plan validity. Report findings, verdict, and the required
 > machine-readable JSON block."
 **Step 4d — Parse JSON scores**
 Parse the agent's output. Locate the **last** fenced ```json``` block.
 Extract per-dimension scores:
 ```
 review = {
  completeness:   { score, gaps },
  consistency:    { score, issues },
  testability:    { score, weak_criteria },
  scope_clarity:  { score, unclear_sections },
  research_plan:  { score, invalid_topics },
  verdict:        "PROCEED | PROCEED_WITH_RISKS | REVISE"
 }
 ```
 **JSON fallback:** if the JSON block is missing, invalid, or a dimension
 is missing, treat all dimensions as `score: 3` and set the `verdict` from
 the prose verdict if present, otherwise `PROCEED_WITH_RISKS`. Emit an
 internal note that the reviewer output was degraded. This ensures the
 loop never deadlocks on a parser error.
 **Step 4e — Gate evaluation**
 The gate **passes** when all of the following are true:
 - `completeness.score ≥ 4`
 - `consistency.score ≥ 4`
 - `testability.score ≥ 4`
 - `scope_clarity.score ≥ 4`
 - `research_plan.score == 5`
 (Research Plan requires a perfect score because its format is checked
 mechanically: ends in `?`, `Required for plan steps` filled, scope is
 one of `local | external | both`, confidence is `high | medium | low`.
 Anything less means at least one topic is malformed and planning will
 stumble.)
 **If gate passes:**
 1. Move `brief.md.draft` → `brief.md` (atomic rename).
 2. Delete the draft file if rename is not possible on the OS; write
   `brief.md` fresh.
 3. Break the loop and proceed to Step 4g.
 **If gate fails AND iteration count < 3:**
 1. Identify the weakest dimension (lowest score; tie broken by priority:
   research_plan > testability > completeness > consistency > scope_clarity).
 2. Generate a targeted follow-up question from the dimension's detail
   field (gaps / issues / weak_criteria / unclear_sections / invalid_topics).
   Example generators:
   - `completeness.gaps: ["Non-Goals empty, unclear if deliberate"]`
     → "You did not specify anything out-of-scope. Is that deliberate, or
     are there things we should explicitly exclude?"
   - `testability.weak_criteria: ["'system should be fast'"]`
     → "'System should be fast' is not falsifiable. Which metric or
     threshold proves this is met — e.g., p95 < 200ms, or throughput
     ≥ X requests/sec?"
   - `research_plan.invalid_topics: [{"topic":"JWT","issue":"Required for plan steps empty"}]`
     → "For research topic 'JWT': which plan steps depend on the answer?
     Give one or two concrete kinds of step (e.g., 'library selection',
     'threat model', 'migration strategy')."
 3. Ask via `AskUserQuestion`. Record the answer into Phase 3 state.
 4. Return to Step 4a with incremented iteration count. The reviewer sees
   an updated draft, so you MUST re-read the brief and regenerate the
   review each iteration — do not reuse stale scores.
 5. When launching the reviewer on iteration 2 or 3, include prior
   questions in the prompt so it does not produce circular follow-ups:
   > "Questions already asked during this interview: {list from
   > question_history}. Focus on issues that remain after those answers —
   > do not re-raise gaps that have already been addressed."
 **If gate fails AND iteration count == 3 (loop exhausted):**
 1. Move `brief.md.draft` → `brief.md`.
 2. Add `brief_quality: partial` to the frontmatter (edit the file
   post-rename — insert the key above the closing `---`).
 3. Add a `## Brief Quality` section near the end with the failing
   dimensions and their `detail` arrays from the final review, formatted:
   ```
   ## Brief Quality
   Review loop exhausted after 3 iterations. The following dimensions
   did not reach the pass threshold:
   - **Research Plan (score 2/5):** Topic 'JWT library' missing
     Required-for-plan-steps field.
   - **Testability (score 3/5):** Success criterion "works correctly"
     is not falsifiable.
   Downstream planning will treat these as reduced-confidence areas.
   ```
 4. Break the loop and proceed to Step 4g.
 ### Step 4f — Force-stop handling
 If during any `AskUserQuestion` in Step 4e the user says "stop", "skip",
 "enough", "just write it", or similar, do NOT exit the loop immediately.
 Instead, surface the current review findings in plain text:
 ```
 Brief-reviewer would flag these issues:
 - Research Plan (score 2/5): Topic 'JWT library choice' missing Required-for-plan-steps field.
 - Testability (score 3/5): Success criterion "works correctly" is not falsifiable.
 Continue anyway? The plan will have lower confidence in these areas.
 ```
 Then ask via `AskUserQuestion`:
 | Option | Action |
 |--------|--------|
 | **Answer one more follow-up** | Return to Step 4e with the current weakest-dimension question. |
 | **Stop now (accept partial brief)** | Finalize brief with `brief_quality: partial` and the `## Brief Quality` section (same path as iteration-cap exhaustion). Break loop. |
 The force-stop path is distinct from a silent iteration cap: the user
 sees exactly which dimensions are weak and chooses informed.
 ### Step 4g — Finalize
 After the loop exits (pass, cap, or force-stop), ensure:
 - `brief.md` exists at `{PROJECT_DIR}/brief.md`.
 - `brief.md.draft` no longer exists.
 - If the loop ended without a clean pass, frontmatter contains
  `brief_quality: partial` and a `## Brief Quality` section exists.
 - If the loop ended with a clean pass, `brief_quality` is either
  absent or set to `complete`.
 Populate the "How to continue" footer with the actual project path and
 topic questions.
@ -212,6 +463,8 @@ topic questions.
 Report:
 ```
 Brief written: {PROJECT_DIR}/brief.md
 Review iterations: {1..3}
 Final quality: {complete | partial}
 Research topics identified: {N}
 ```
@ -387,6 +640,8 @@ Append one record to `${CLAUDE_PLUGIN_DATA}/ultrabrief-stats.jsonl`:
  "slug": "{slug}",
  "mode": "{default | quick}",
  "interview_turns": {N},
  "review_iterations": {1..3},
  "brief_quality": "{complete | partial}",
  "research_topics": {N},
  "auto_research": {true | false},
  "auto_result": "{completed | cancelled | failed | manual}",
@ -417,4 +672,11 @@ Never let stats failures block the workflow.
 7. **Auto mode blocks foreground.** If the user opts into auto, this
   session waits for research + planning to complete. Document this in
   the opt-in question.
-8. **Privacy:** never log prompt text, secrets, or credentials.
+8. **Quality gates, not question counts.** Phase 3 and Phase 4 are
   quality-gated loops; do not enforce a hard cap on interview questions.
   The brief-review gate (Phase 4) caps at 3 review iterations to bound
   cost, but Phase 3 has no cap — required-section content drives exit.
 9. **Never write `brief.md` while the review gate is still pending.**
   Draft lives in `brief.md.draft` until the loop terminates. A caller
   that sees `brief.md` must be able to trust that Phase 4 finished.
 10. **Privacy:** never log prompt text, secrets, or credentials.