diff --git a/CLAUDE.md b/CLAUDE.md index cf7f5e9..61ae88c 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -12,7 +12,7 @@ plugins/ llm-security/ v6.0.0 — Security scanning, auditing, threat modeling ms-ai-architect/ v1.8.0 — Microsoft AI architecture (Cosmo Skyberg persona) okr/ v1.0.0 — OKR guidance for Norwegian public sector - ultraplan-local/ v2.0.0 — Brief, research, plan, execute (four-command pipeline) + ultraplan-local/ v2.1.0 — Brief, research, plan, execute (four-command pipeline) ``` Hvert plugin er selvstendig med egen CLAUDE.md, README, hooks, agents og commands. diff --git a/README.md b/README.md index 6673266..c2ee9d0 100644 --- a/README.md +++ b/README.md @@ -61,20 +61,20 @@ Key commands: `/config-audit posture`, `/config-audit feature-gap`, `/config-aud --- -### [Ultra {brief | research | plan | execute} - local](plugins/ultraplan-local/) `v2.0.0` +### [Ultra {brief | research | plan | execute} - local](plugins/ultraplan-local/) `v2.1.0` Deep requirements gathering, research, implementation planning, and self-verifying execution with specialized agent swarms, adversarial review, and failure recovery. Four commands, one pipeline with clear division of labor: -- **`/ultrabrief-local`** — Capture intent. Interactive interview (3-8 adaptive questions) produces a task brief with explicit research plan. Identifies research topics with copy-paste-ready `/ultraresearch-local` commands. Optional auto-orchestration runs research + planning in foreground. Always interactive. +- **`/ultrabrief-local`** — Capture intent. Dynamic, quality-gated interview: a section-driven completeness loop (Phase 3) followed by a `brief-reviewer` stop-gate (Phase 4, max 3 review iterations). Required sections must reach an initial-signal gate AND pass review across completeness, consistency, testability, scope clarity, and research-plan validity before `brief.md` is written. Identifies research topics with copy-paste-ready `/ultraresearch-local` commands. Optional auto-orchestration runs research + planning in foreground. Always interactive. - **`/ultraresearch-local`** — Gather context. Deep multi-source research with triangulation: 5 local agents + 4 external agents + Gemini bridge, producing structured briefs with confidence ratings. Makes no build decisions. - **`/ultraplan-local`** — Transform intent into an executable contract. Per-step YAML manifests (`expected_paths`, `commit_message_pattern`, `bash_syntax_check`). Plan-critic is a hard gate on manifest quality. Requires a task brief as input (`--brief` or `--project`). - **`/ultraexecute-local`** — Execute the contract disciplined. Manifest-based verification, independent Phase 7.5 audit from git log + filesystem (ignores agent bookkeeping), Phase 7.6 bounded recovery dispatch for missing steps. Step 0 pre-flight catches sandbox push-denial before any work. `--validate` mode offers a fast schema-only sanity-check between planning and execution. All artifacts land in one project directory: `.claude/projects/{YYYY-MM-DD}-{slug}/` contains `brief.md`, `research/NN-*.md`, `plan.md`, `sessions/`, and `progress.json`. `--project ` works across `/ultraresearch-local`, `/ultraplan-local`, and `/ultraexecute-local`. -v2.0 extracts interview from planning: briefs are now reviewable artifacts with explicit intent, research plan, and falsifiable success criteria — each a contract that downstream agents (`brief-reviewer`, `plan-critic`, `scope-guardian`) can validate independently. Breaking change: `/ultraplan-local` no longer runs an interview and requires `--brief` or `--project`. See `plugins/ultraplan-local/MIGRATION.md`. +v2.1 (non-breaking) replaces the hardcoded Q1–Q8 interview with a dynamic, quality-gated loop; `brief-reviewer` now emits machine-readable per-dimension JSON scores so `/ultrabrief-local` can use it as an internal stop-gate. v2.0 extracts the interview from planning: briefs are reviewable artifacts with explicit intent, research plan, and falsifiable success criteria — each a contract that downstream agents (`brief-reviewer`, `plan-critic`, `scope-guardian`) can validate independently. v2.0 is breaking: `/ultraplan-local` no longer runs an interview and requires `--brief` or `--project`. See `plugins/ultraplan-local/MIGRATION.md`. v1.7 self-verifying chain (preserved): a step may not be marked `completed` unless its manifest verifies. v1.8 Opus 4.7 literalism fixes (preserved): literal Step+Manifest template, forbidden narrative headers, schema self-check. diff --git a/plugins/ultraplan-local/.claude-plugin/plugin.json b/plugins/ultraplan-local/.claude-plugin/plugin.json index 60b3f37..259205f 100644 --- a/plugins/ultraplan-local/.claude-plugin/plugin.json +++ b/plugins/ultraplan-local/.claude-plugin/plugin.json @@ -1,7 +1,7 @@ { "name": "ultraplan-local", "description": "Four-command context-engineering pipeline (brief → research → plan → execute) with project folders, specialized agent swarms, external research triangulation, adversarial review, session decomposition, and headless execution.", - "version": "2.0.0", + "version": "2.1.0", "author": { "name": "Kjell Tore Guttormsen" }, diff --git a/plugins/ultraplan-local/CHANGELOG.md b/plugins/ultraplan-local/CHANGELOG.md index e151ba3..789202f 100644 --- a/plugins/ultraplan-local/CHANGELOG.md +++ b/plugins/ultraplan-local/CHANGELOG.md @@ -4,6 +4,61 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). +## [2.1.0] - 2026-04-18 + +### Changed — Dynamic, quality-gated interview in `/ultrabrief-local` + +The Phase 3 interview is no longer a hardcoded Q1–Q8 list with a numeric +cap (3–4 questions in `--quick`, 5–8 in default). It is now a +**section-driven completeness loop**: the command maintains per-section +state (Intent, Goal, Success Criteria, Research Plan, and five optional +sections), picks the next question from the section with the weakest +signal, and keeps probing until all four required sections meet an +initial-signal gate. Quality drives the loop, not a counter. + +Phase 4 adds a **draft → brief-reviewer → revise** loop. The brief is +drafted in memory, written to `brief.md.draft`, reviewed by the +`brief-reviewer` agent as a stop-gate, and only renamed to `brief.md` +after all five dimensions pass (`completeness/consistency/testability/ +scope_clarity ≥ 4` and `research_plan == 5`). If the gate fails, a +targeted follow-up is generated from the weakest dimension's detail +field and the draft is re-reviewed. The loop is capped at 3 review +iterations to bound cost; exhaustion writes the brief with +`brief_quality: partial` and an explicit `## Brief Quality` section. + +Force-stop path: when the user says "stop" during Phase 4, the current +review findings are surfaced with per-dimension scores before asking +whether to continue or accept a partial brief. No silent exits. + +Not breaking. The `/ultrabrief-local [--quick] ` interface is +unchanged from the outside; only internals change. `--quick` now means +"start compact, escalate if gates fail" rather than "max 4 questions". + +### Added + +- **JSON output from `brief-reviewer`** — the agent now emits a final + fenced `json` block with per-dimension `score` (1–5) and `detail` + arrays (`gaps`, `issues`, `weak_criteria`, `unclear_sections`, + `invalid_topics`) alongside the existing prose report. The JSON block + is mandatory; empty arrays and `score: 5` are required when a + dimension passes cleanly. `planning-orchestrator` continues to use + the prose verdict unchanged. +- **`brief_quality` frontmatter field** on task briefs — `complete` + (default) when the Phase 4 gate passed, or `partial` when the + iteration cap was hit or the user force-stopped with known issues. + `planning-orchestrator` can inspect this to decide how heavily to + weight brief sections as assumptions. +- **`review_iterations` and `brief_quality` in ultrabrief-stats** — + recorded per run for telemetry. + +### Changed + +- Hard rule added: `/ultrabrief-local` never writes `brief.md` while the + review gate is pending. The draft lives in `brief.md.draft` until the + loop terminates. +- Hard rule added: no hard cap on Phase 3 questions; the brief-review + gate is the only loop bound (3-iteration cap) and is in Phase 4. + ## [2.0.0] - 2026-04-18 ### Breaking — Four-command pipeline with dedicated brief step diff --git a/plugins/ultraplan-local/CLAUDE.md b/plugins/ultraplan-local/CLAUDE.md index c627c77..f20d581 100644 --- a/plugins/ultraplan-local/CLAUDE.md +++ b/plugins/ultraplan-local/CLAUDE.md @@ -17,10 +17,10 @@ Deep implementation planning and research with an explicit brief step, specializ | Flag | Behavior | |------|----------| -| _(default)_ | Full interview (5-8 questions) → brief.md with research plan | -| `--quick` | Short interview (3-4 questions) → brief.md with research plan | +| _(default)_ | Dynamic interview until quality gates pass → brief.md with research plan | +| `--quick` | Compact start; still escalates if required sections are weak or the brief-review gate fails → brief.md with research plan | -Always interactive. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground). +Always interactive. Phase 3 is a section-driven completeness loop (no hard cap on question count); Phase 4 runs a `brief-reviewer` stop-gate with max 3 review iterations. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground). ### /ultraresearch-local modes @@ -88,7 +88,7 @@ Flags combine: `--project --local --fg`, `--external --quick`. ## Architecture -**Brief:** 7-phase workflow: Parse mode → Create project dir → Interactive interview (adaptive depth) → Identify research topics → Write `brief.md` → Manual/auto opt-in → Stats. Always interactive. Auto mode blocks foreground until plan is ready. +**Brief:** 7-phase workflow: Parse mode → Create project dir → Phase 3 completeness loop (section-driven, no question cap) → Phase 4 draft/review/revise with `brief-reviewer` as stop-gate (max 3 iterations; gate = all dimensions ≥ 4 and research plan = 5) → Finalize (`brief.md` on pass, or `brief_quality: partial` on cap/force-stop) → Manual/auto opt-in → Stats. Always interactive. Auto mode blocks foreground until plan is ready. **Research:** 8-phase workflow: Parse mode → Interview → Background transition → Parallel research (5 local + 4 external + 1 bridge) → Follow-ups → Triangulation → Synthesis + brief → Stats. With `--project`, writes to `{dir}/research/NN-slug.md`. diff --git a/plugins/ultraplan-local/README.md b/plugins/ultraplan-local/README.md index 3de33e0..bb4c4e6 100644 --- a/plugins/ultraplan-local/README.md +++ b/plugins/ultraplan-local/README.md @@ -84,17 +84,18 @@ Or opt into auto-mode in `/ultrabrief-local` — it will launch research and pla ## `/ultrabrief-local` — Brief -Interactive requirements-gathering command. Runs a focused interview (3-8 questions depending on mode and adaptive depth) and produces a **task brief** with an explicit research plan. Optionally orchestrates the rest of the pipeline. +Interactive requirements-gathering command. Runs a **dynamic, quality-gated interview** and produces a **task brief** with an explicit research plan. Optionally orchestrates the rest of the pipeline. -### How it works +### How it works (v2.1 — quality-gated) -1. **Parse mode** — `default` (5-8 questions) or `--quick` (3-4 questions) +1. **Parse mode** — `default` (dynamic; probes until quality gates pass) or `--quick` (starts compact, still escalates if gates fail) 2. **Create project directory** — `.claude/projects/{YYYY-MM-DD}-{slug}/` with `research/` subdirectory -3. **Interview** — one question at a time via `AskUserQuestion`, starting with Intent → Goal → Success Criteria, then optional Non-Goals, Constraints, Preferences, NFRs, Prior Attempts -4. **Identify research topics** — probe for unfamiliar tech, version upgrades, security decisions, architectural choices -5. **Write brief** — `{project_dir}/brief.md` with all sections and a copy-paste-ready research plan -6. **Auto-orchestration opt-in** — user chooses manual (default) or auto (Claude-managed research + plan in foreground) -7. **Stats tracking** — append to `ultrabrief-stats.jsonl` +3. **Phase 3 — Completeness loop** — a section-driven loop (not a fixed question list) picks the next question from the section with the weakest signal (Intent → Goal → Success Criteria → Research Plan first, then optional sections). Required sections must reach an initial-signal gate before Phase 3 exits. No hard cap on question count. +4. **Identify research topics** — inline during Phase 3; probe for unfamiliar tech, version upgrades, security-sensitive decisions, architectural choices. Topics get a research question, scope, confidence, and cost. +5. **Phase 4 — Draft, review, revise** — draft the brief in memory, write to `brief.md.draft`, launch `brief-reviewer` as a stop-gate. The reviewer scores five dimensions (completeness, consistency, testability, scope clarity, research plan) 1–5 and returns machine-readable JSON. Gate passes when all ≥ 4 and research plan = 5. On fail, a targeted follow-up is generated from the weakest dimension's detail field and the draft is re-reviewed. Max 3 review iterations. +6. **Finalize** — rename draft to `brief.md` on pass, or write with `brief_quality: partial` + a `## Brief Quality` section if the cap is hit or the user force-stops. +7. **Auto-orchestration opt-in** — user chooses manual (default) or auto (Claude-managed research + plan in foreground) +8. **Stats tracking** — append to `ultrabrief-stats.jsonl` (includes `review_iterations` and `brief_quality`) Output: `.claude/projects/{YYYY-MM-DD}-{slug}/brief.md` @@ -102,11 +103,17 @@ Output: `.claude/projects/{YYYY-MM-DD}-{slug}/brief.md` | Mode | Usage | Behavior | |------|-------|----------| -| **Default** | `/ultrabrief-local ` | Full interview (5-8 questions) | -| **Quick** | `/ultrabrief-local --quick ` | Short interview (3-4 questions) | +| **Default** | `/ultrabrief-local ` | Dynamic interview until quality gates pass. No question cap. | +| **Quick** | `/ultrabrief-local --quick ` | Starts compact (optional sections get at most one probe), still escalates on weak required sections or failed review gate. | `/ultrabrief-local` is **always interactive**. There is no foreground/background mode — the interview requires user input. +### Force-stop + +If you say "stop" or "enough" during Phase 4, the current review findings are surfaced with per-dimension scores and you choose: +- **Answer one more follow-up** — the loop continues. +- **Stop now (accept partial brief)** — the brief is finalized with `brief_quality: partial` and a `## Brief Quality` section listing the failing dimensions. Downstream planning will treat these as reduced-confidence areas. + ### What the brief contains - **Intent** — why this matters, motivation, user need (load-bearing) diff --git a/plugins/ultraplan-local/agents/brief-reviewer.md b/plugins/ultraplan-local/agents/brief-reviewer.md index 2a77a70..5878bd5 100644 --- a/plugins/ultraplan-local/agents/brief-reviewer.md +++ b/plugins/ultraplan-local/agents/brief-reviewer.md @@ -150,13 +150,35 @@ Flag as **research-plan invalid** if: ## Rating -Rate each dimension: +Rate each dimension on two parallel scales: + +**Verbal rating** (used in the prose report and the summary table): - **Pass** — adequate for planning - **Weak** — has issues but exploration can proceed with noted risks - **Fail** — must be addressed before exploration (wastes tokens otherwise) +**Numeric score 1–5** (used in the machine-readable JSON block): +- **5** — no issues; section is strong +- **4** — minor issues that do not block exploration (maps to Pass) +- **3** — weak but usable; assumptions should be carried (maps to Weak) +- **2** — serious gap; exploration risks wasted work (maps to Fail) +- **1** — section is effectively missing or contradictory (maps to Fail) + +Use both. The verbal rating drives the human-readable verdict. The numeric +score drives callers (such as `/ultrabrief-local` Phase 4) that use the +review as a quality gate and need per-dimension granularity. + ## Output format +Produce **two artifacts in this order**: + +1. A prose report (for humans and for `planning-orchestrator` Phase 1b). +2. A final fenced `json` block with per-dimension numeric scores (for callers + that gate on machine-readable output, such as `/ultrabrief-local` Phase 4). + +The JSON block MUST be the last fenced block in your output so parsers can +find it by reading the last `json` code fence. + ``` ## Brief Review @@ -187,7 +209,42 @@ information that would strengthen the brief. List only if actionable.} - **{PROCEED}** — brief is adequate for exploration - **{PROCEED_WITH_RISKS}** — brief has weaknesses; note them as assumptions in the plan - **{REVISE}** — brief needs fixes before exploration (list what to fix) + +### Machine-readable scores + +```json +{ + "completeness": { "score": 1-5, "gaps": [ "{short gap description}", ... ] }, + "consistency": { "score": 1-5, "issues": [ "{short issue description}", ... ] }, + "testability": { "score": 1-5, "weak_criteria": [ "{quoted weak criterion}", ... ] }, + "scope_clarity": { "score": 1-5, "unclear_sections":[ "{section name}", ... ] }, + "research_plan": { + "score": 1-5, + "invalid_topics": [ + { "topic": "{topic title}", "issue": "{what is missing or wrong}" } + ] + }, + "verdict": "PROCEED | PROCEED_WITH_RISKS | REVISE" +} ``` +``` + +### JSON output rules + +- The JSON block is mandatory. Emit it even when everything passes — use + empty arrays and `"score": 5` in that case. +- Every dimension key must be present. Do not omit dimensions. +- `score` is an integer 1–5. Use the mapping in the Rating section. +- Array fields must be strings (or objects in the case of `invalid_topics`) + that are short, concrete, and actionable — never sentences spanning lines. +- `verdict` must match the verbal verdict in the prose section. If the JSON + verdict disagrees with the prose, the caller will fall back to the prose + verdict — but the mismatch is a bug in your output. +- Do not include trailing commas, comments, or non-JSON text inside the + fence. The block must parse with a strict JSON parser. +- If a dimension's score is 4 or 5, its detail array may be `[]`. A score of + 3 or below SHOULD populate the detail array so callers can generate + targeted follow-up questions. ## Rules diff --git a/plugins/ultraplan-local/commands/ultrabrief-local.md b/plugins/ultraplan-local/commands/ultrabrief-local.md index 543507a..c950bc1 100644 --- a/plugins/ultraplan-local/commands/ultrabrief-local.md +++ b/plugins/ultraplan-local/commands/ultrabrief-local.md @@ -6,7 +6,7 @@ model: opus allowed-tools: Agent, Read, Glob, Grep, Write, Edit, Bash, AskUserQuestion --- -# Ultrabrief Local v2.0 +# Ultrabrief Local v2.1 Interactive requirements-gathering command. Produces a **task brief** — a structured markdown file that declares intent, goal, constraints, and an @@ -33,11 +33,14 @@ foreground if the user opts in. Parse `$ARGUMENTS`: -1. If arguments start with `--quick`: set **mode = quick**. Interview is - shorter (3-4 questions instead of 5-8). Strip the flag; remainder is - the task description. +1. If arguments start with `--quick`: set **mode = quick**. The interview + starts more compactly (fewer opening probes per section) but still + escalates automatically if quality gates fail. There is no hard cap on + question count — quality drives the loop, not a counter. Strip the flag; + remainder is the task description. -2. Otherwise: **mode = default**. Full interview (5-8 questions). +2. Otherwise: **mode = default**. Interview probes each section until the + completeness gate (Phase 3) and brief-review gate (Phase 4) both pass. If no task description is provided, output usage and stop: @@ -46,8 +49,8 @@ Usage: /ultrabrief-local /ultrabrief-local --quick Modes: - default Full interview (5-8 questions) → brief with research plan - --quick Short interview (3-4 questions) → brief with research plan + default Dynamic interview until quality gates pass — brief with research plan + --quick Compact start; still escalates on weak sections — brief with research plan Examples: /ultrabrief-local Add user authentication with JWT tokens @@ -86,74 +89,144 @@ If the directory already exists and is non-empty, warn and ask: Use `AskUserQuestion` with three options. If "pick new slug", ask for a new slug and restart Phase 2. -## Phase 3 — Interview +## Phase 3 — Completeness loop -Use `AskUserQuestion` throughout. **Ask one question at a time.** Never -dump all questions at once. Follow up based on answers. +Phase 3 is a **section-driven completeness loop**. Instead of a numbered +question list, maintain an internal state of brief sections and keep asking +until every required section has substantive content. Quality drives the +loop — there is no hard cap on question count. -### Interview flow +Use `AskUserQuestion` for every question. **Ask one question at a time.** +Never dump multiple questions. -**Question 1 (always) — Intent:** -> "What is the motivation for this task? Why does it matter? What happens -> if we don't do it? (The plan will use this to justify every implementation -> decision.)" +### Internal state -**Question 2 (always) — Goal:** -> "What does success look like concretely? Describe the end state in -> 1-3 sentences — specific enough to disagree with." +Track this structure in memory as the loop runs: -**Question 3 (always) — Success criteria:** -> "How will we verify it's done? List 2-4 specific, testable conditions -> (commands to run, observations, metrics). Avoid 'it works'." +``` +state = { + intent: { content: "", probes: 0 }, # required + goal: { content: "", probes: 0 }, # required + success_criteria: { content: [], probes: 0 }, # required + research_plan: { topics: [], probes: 0 }, # required + non_goals: { content: [], probes: 0 }, # optional + constraints: { content: [], probes: 0 }, # optional + preferences: { content: [], probes: 0 }, # optional + nfrs: { content: [], probes: 0 }, # optional + prior_attempts: { content: "", probes: 0 }, # optional + question_history: [] # list of questions asked +} +``` -**Question 4 (usually) — Non-goals:** -> "What is explicitly NOT in scope? (Prevents scope-guardian flagging gaps -> for things we deliberately don't do.)" +`content` is raw user answers merged; `probes` is how many times this +section has been asked; `question_history` prevents re-asking the same +variant twice. -**Question 5 (conditional) — Constraints:** -> "Are there technical, time, or resource constraints? (Dependencies, -> compatibility, deadlines, budget.)" -> -> Skip if the user already mentioned constraints. +### Required sections (initial-signal gate) -**Question 6 (conditional) — Preferences:** -> "Preferences on libraries, patterns, or architectural style?" -> -> Skip for small tasks or when constraints already imply them. +Four sections MUST have substantive content before exiting Phase 3: -**Question 7 (conditional) — Non-functional requirements:** -> "Performance, security, accessibility, or scalability targets? (Quantified -> where possible.)" -> -> Skip if not applicable. +1. **Intent** — full sentence or paragraph (not a single word or phrase) +2. **Goal** — full sentence or paragraph +3. **Success Criteria** — at least one concrete, testable item +4. **Research Plan** — either ≥ 1 topic probed, OR the user has explicitly + confirmed "no external research needed" -**Question 8 (conditional) — Prior attempts:** -> "Has this been tried before? What worked or failed?" -> -> Skip if the task is clearly fresh. +"Substantive" means: non-empty, not a trivial one-word reply, not +"I don't know" without a recorded assumption. The strict falsifiability +check happens in Phase 4 (brief-review gate); Phase 3 is just the +initial-signal bar. -### Adaptive depth +Optional sections (Non-Goals, Constraints, Preferences, NFRs, Prior +Attempts) do not gate exit. If they remain empty after the required +sections pass, they will be recorded as "Not discussed — no constraints +assumed" in Phase 4's draft. -After each answer: +### Question bank (per section) -- **Detailed technical answer (2+ sentences, domain vocabulary):** skip - obvious follow-ups the user already covered. Aim for 4-5 total questions. -- **Short or uncertain answer ("I don't know", "not sure", vague):** offer - alternatives instead of open questions. Record uncertainty as an - open assumption. -- **"Skip" / "just make it" / "proceed":** stop interviewing. Write a - minimal brief from the task description and answers so far. Mark - uncovered sections as "Not discussed — no constraints assumed." +Pick the next question from the section's bank based on `content` and +`probes`. Wording must stay conversational — only the *selection* is +section-driven, not the tone. -### Quick mode +**Intent** (required): +- _Anchor_ (probes=0, content empty): "Why are we doing this? What is the + motivation, the user need, or the strategic context behind the task?" +- _Follow-up_ (probes≥1, content present but shallow): "What happens if + we do nothing? Who is affected?" +- _Sharpen_ (user mentioned a symptom): "You mentioned {X}. Is {X} the + symptom or the underlying cause?" -If **mode = quick**, ask only Questions 1, 2, 3, and 4. Maximum 4 questions. -Skip the rest. +**Goal** (required): +- _Anchor_: "Describe the end state in 1–3 sentences — specific enough to + disagree with." +- _Follow-up_: "How would you recognize this is done when looking at + the UI / API / codebase?" -### Research topic identification (CRITICAL) +**Success Criteria** (required): +- _Anchor_: "How do we verify it is actually done? List 2–4 concrete, + testable conditions — commands to run, observations, or metrics." +- _Sharpen_ (criterion is vague): "'{quoted criterion}' is subjective. + Which command, observation, or metric would prove this is met?" +- _Quantify_ (performance/quality claim): "You mentioned it should be + {fast/reliable/secure}. What number or threshold counts as success?" -As the interview progresses, identify topics that will need research for -the plan to be high-confidence. Watch for: +**Research Plan** (required, strictest): +- _Anchor_ (no topics yet): "Are there technologies, libraries, or + decisions in this task you do not have solid current knowledge of? + Examples might be library choice, a protocol, or a security pattern." +- _Per-topic sharpen_ (topic exists but incomplete): "For topic + '{title}': which parts of the plan depend on the answer? What + confidence level do you need — high, medium, or low?" +- _Scope question_: "Is '{topic}' answerable from the existing codebase, + from external docs, or both?" +- _Confirm none_ (user refuses all topics): "Confirming: no external + research needed — you already know everything the plan will depend on?" + +**Non-Goals** (optional): +- _Anchor_: "What is explicitly NOT in scope? This prevents scope-guardian + from flagging gaps for things we deliberately don't do." + +**Constraints** (optional): +- _Anchor_: "Technical, time, or resource constraints the plan must + respect? Dependencies, compatibility, deadlines, or budget." +- _Sharpen_: "You mentioned {deadline / budget / compatibility}. Is it + hard or guidance?" + +**Preferences** (optional): +- _Anchor_: "Preferences for libraries, patterns, or architectural style?" + +**NFRs** (optional): +- _Anchor_: "Performance, security, accessibility, or scalability targets? + Quantified wherever possible." + +**Prior Attempts** (optional): +- _Anchor_: "Has this been attempted before? What worked or failed?" + +### Selection rule + +On each loop iteration: + +1. Compute the next section to probe: + - If any required section is below the initial-signal gate → pick the + weakest required section in this priority order: + Intent → Goal → Success Criteria → Research Plan. + - Else if an optional section is clearly missing and likely material + to scope (heuristic: the task description hints at constraints or + NFRs) → probe it at most once. + - Else: exit Phase 3. +2. Within the chosen section, pick the question variant: + - If `probes == 0` and content is empty → _Anchor_. + - If content exists but is shallow → _Follow-up_ or _Sharpen_. + - If the section is Research Plan and topics exist → iterate per-topic + sharpen across incomplete topics. +3. Ensure the exact question is NOT already in `question_history`. If it + is, pick the next variant or skip to the next weakest section. +4. Ask via `AskUserQuestion`. Append question to history. Increment probes. +5. Record the answer into `content`. Never overwrite — merge. + +### Research topic identification + +As the user answers Intent, Goal, or Success Criteria, listen for: - **Unfamiliar technologies** — libraries, frameworks, protocols not clearly present in the codebase @@ -163,48 +236,226 @@ the plan to be high-confidence. Watch for: - **Unknown integrations** — third-party APIs, external services - **Compliance / legal** — GDPR, accessibility, industry regulations -For each potential topic, probe briefly: -> "Do you already know {topic}, or should I plan a research step for it?" - -Record: -- Topic title (short) -- Why it matters for the plan -- Exact research question (phrased for `/ultraresearch-local`) -- Suggested scope (local / external / both) +When you hear one, add a *candidate* topic to `research_plan.topics` with +only a title and why-it-matters. Probe it on the next Research Plan +iteration using the per-topic sharpen question to fill in: +- Research question (must end in `?`) +- Required for plan steps +- Scope (local / external / both) - Confidence needed (high / medium / low) - Estimated cost (quick / standard / deep) -If the user says "I know this" — do not add it as a topic. Trust the user. +If the user says "I know this" to a candidate topic, remove it from the +list. Trust the user. If no topics emerge after probing, the user confirms +"no external research needed" → `research_plan` gate passes with 0 topics. -If no topics emerge: `research_topics = 0` is valid. Not every task needs -external research. +### Quick mode adjustments -## Phase 4 — Write the brief +If **mode = quick**: +- For optional sections, cap probes at 1 each. Do not revisit optional + sections during Phase 3. +- Required sections still have no probe cap — quality gate still applies. +- Prefer _Anchor_ variants over _Sharpen_ on the first pass. -Read the brief template: +### Force-stop path + +If the user says "skip", "stop asking", "just proceed", "enough", or +similar, break the loop immediately: +- Mark any required sections still below the initial-signal gate as + `{ incomplete_forced_stop: true }` in state. +- Proceed to Phase 4 with a note that the brief will carry a reduced + confidence flag. + +### Exit condition + +Exit Phase 3 when: +- All four required sections meet the initial-signal gate, OR +- The user has force-stopped. + +Report: +``` +Phase 3 complete: {N} questions asked across {M} sections. +Proceeding to draft and review. +``` + +## Phase 4 — Draft, review, and revise + +Phase 4 runs a **draft → brief-reviewer → revise** loop. The draft is +not written to disk until the brief-review quality gate passes (or the +iteration cap is hit). This ensures the brief that reaches `/ultraplan-local` +has already survived a critical review. + +Read the brief template first: `@${CLAUDE_PLUGIN_ROOT}/templates/ultrabrief-template.md` -Write the brief to: `{PROJECT_DIR}/brief.md`. +### Loop bound -Fill in every section based on interview answers: +**Maximum 3 review iterations.** This bounds cost in the worst case while +leaving room for two rounds of targeted follow-ups. -- **Frontmatter:** populate `task`, `slug`, `project_dir`, `research_topics`, - `research_status: pending`, `auto_research: false` (will update in Phase 5 - if user opts in), `interview_turns`, `source: interview`. -- **Intent:** expand the user's motivation into 3-5 sentences. This is - load-bearing — be explicit. +### Iteration step-by-step + +**Step 4a — Draft in memory** + +Build the brief text from Phase 3 state by filling the template: + +- **Frontmatter:** populate `task`, `slug`, `project_dir`, `research_topics` + (count of topics), `research_status: pending`, `auto_research: false` + (will update in Phase 5 if user opts in), `interview_turns` (total + questions asked across Phase 3 + Phase 4), `source: interview`. +- **Intent:** expand the user's motivation into 3–5 sentences. Load-bearing. - **Goal:** concrete end state. -- **Non-Goals:** from user's answer, or empty list with note. -- **Constraints / Preferences / NFRs:** from user's answers. Mark - "Not discussed — no constraints assumed" if not covered. -- **Success Criteria:** falsifiable commands/observations. Reject vague - criteria from the user — ask for a concrete version if needed. -- **Research Plan:** one section per identified topic, with the full - structure from the template. If 0 topics, write the "No external - research needed" note. -- **Open Questions / Assumptions:** from "I don't know" answers and - implicit gaps. -- **Prior Attempts:** from user's answer, or "None — fresh task." +- **Non-Goals:** from state, or "- None explicitly stated" bullet if empty. +- **Constraints / Preferences / NFRs:** from state, or "Not discussed — no + constraints assumed" note if empty. +- **Success Criteria:** falsifiable commands/observations from state. +- **Research Plan:** one `### Topic N: {title}` section per topic with the + full structure from the template. If 0 topics: write the "No external + research needed — user confirmed solid knowledge of all plan + dependencies" note. +- **Open Questions / Assumptions:** from any `"I don't know"` answers + recorded during Phase 3, plus implicit gaps. +- **Prior Attempts:** from state, or "None — fresh task." + +**Step 4b — Write draft to disk** + +Write the draft to `{PROJECT_DIR}/brief.md.draft` (not `brief.md` — the +final file is only written after the gate passes). + +**Step 4c — Launch brief-reviewer** + +Launch the `brief-reviewer` agent (foreground, blocking) with the prompt: + +> "Review this task brief for quality: `{PROJECT_DIR}/brief.md.draft`. +> Check completeness, consistency, testability, scope clarity, and +> research-plan validity. Report findings, verdict, and the required +> machine-readable JSON block." + +**Step 4d — Parse JSON scores** + +Parse the agent's output. Locate the **last** fenced ```json``` block. +Extract per-dimension scores: + +``` +review = { + completeness: { score, gaps }, + consistency: { score, issues }, + testability: { score, weak_criteria }, + scope_clarity: { score, unclear_sections }, + research_plan: { score, invalid_topics }, + verdict: "PROCEED | PROCEED_WITH_RISKS | REVISE" +} +``` + +**JSON fallback:** if the JSON block is missing, invalid, or a dimension +is missing, treat all dimensions as `score: 3` and set the `verdict` from +the prose verdict if present, otherwise `PROCEED_WITH_RISKS`. Emit an +internal note that the reviewer output was degraded. This ensures the +loop never deadlocks on a parser error. + +**Step 4e — Gate evaluation** + +The gate **passes** when all of the following are true: + +- `completeness.score ≥ 4` +- `consistency.score ≥ 4` +- `testability.score ≥ 4` +- `scope_clarity.score ≥ 4` +- `research_plan.score == 5` + +(Research Plan requires a perfect score because its format is checked +mechanically: ends in `?`, `Required for plan steps` filled, scope is +one of `local | external | both`, confidence is `high | medium | low`. +Anything less means at least one topic is malformed and planning will +stumble.) + +**If gate passes:** +1. Move `brief.md.draft` → `brief.md` (atomic rename). +2. Delete the draft file if rename is not possible on the OS; write + `brief.md` fresh. +3. Break the loop and proceed to Step 4g. + +**If gate fails AND iteration count < 3:** +1. Identify the weakest dimension (lowest score; tie broken by priority: + research_plan > testability > completeness > consistency > scope_clarity). +2. Generate a targeted follow-up question from the dimension's detail + field (gaps / issues / weak_criteria / unclear_sections / invalid_topics). + Example generators: + - `completeness.gaps: ["Non-Goals empty, unclear if deliberate"]` + → "You did not specify anything out-of-scope. Is that deliberate, or + are there things we should explicitly exclude?" + - `testability.weak_criteria: ["'system should be fast'"]` + → "'System should be fast' is not falsifiable. Which metric or + threshold proves this is met — e.g., p95 < 200ms, or throughput + ≥ X requests/sec?" + - `research_plan.invalid_topics: [{"topic":"JWT","issue":"Required for plan steps empty"}]` + → "For research topic 'JWT': which plan steps depend on the answer? + Give one or two concrete kinds of step (e.g., 'library selection', + 'threat model', 'migration strategy')." +3. Ask via `AskUserQuestion`. Record the answer into Phase 3 state. +4. Return to Step 4a with incremented iteration count. The reviewer sees + an updated draft, so you MUST re-read the brief and regenerate the + review each iteration — do not reuse stale scores. +5. When launching the reviewer on iteration 2 or 3, include prior + questions in the prompt so it does not produce circular follow-ups: + > "Questions already asked during this interview: {list from + > question_history}. Focus on issues that remain after those answers — + > do not re-raise gaps that have already been addressed." + +**If gate fails AND iteration count == 3 (loop exhausted):** +1. Move `brief.md.draft` → `brief.md`. +2. Add `brief_quality: partial` to the frontmatter (edit the file + post-rename — insert the key above the closing `---`). +3. Add a `## Brief Quality` section near the end with the failing + dimensions and their `detail` arrays from the final review, formatted: + ``` + ## Brief Quality + + Review loop exhausted after 3 iterations. The following dimensions + did not reach the pass threshold: + + - **Research Plan (score 2/5):** Topic 'JWT library' missing + Required-for-plan-steps field. + - **Testability (score 3/5):** Success criterion "works correctly" + is not falsifiable. + + Downstream planning will treat these as reduced-confidence areas. + ``` +4. Break the loop and proceed to Step 4g. + +### Step 4f — Force-stop handling + +If during any `AskUserQuestion` in Step 4e the user says "stop", "skip", +"enough", "just write it", or similar, do NOT exit the loop immediately. +Instead, surface the current review findings in plain text: + +``` +Brief-reviewer would flag these issues: +- Research Plan (score 2/5): Topic 'JWT library choice' missing Required-for-plan-steps field. +- Testability (score 3/5): Success criterion "works correctly" is not falsifiable. + +Continue anyway? The plan will have lower confidence in these areas. +``` + +Then ask via `AskUserQuestion`: + +| Option | Action | +|--------|--------| +| **Answer one more follow-up** | Return to Step 4e with the current weakest-dimension question. | +| **Stop now (accept partial brief)** | Finalize brief with `brief_quality: partial` and the `## Brief Quality` section (same path as iteration-cap exhaustion). Break loop. | + +The force-stop path is distinct from a silent iteration cap: the user +sees exactly which dimensions are weak and chooses informed. + +### Step 4g — Finalize + +After the loop exits (pass, cap, or force-stop), ensure: +- `brief.md` exists at `{PROJECT_DIR}/brief.md`. +- `brief.md.draft` no longer exists. +- If the loop ended without a clean pass, frontmatter contains + `brief_quality: partial` and a `## Brief Quality` section exists. +- If the loop ended with a clean pass, `brief_quality` is either + absent or set to `complete`. Populate the "How to continue" footer with the actual project path and topic questions. @@ -212,6 +463,8 @@ topic questions. Report: ``` Brief written: {PROJECT_DIR}/brief.md +Review iterations: {1..3} +Final quality: {complete | partial} Research topics identified: {N} ``` @@ -387,6 +640,8 @@ Append one record to `${CLAUDE_PLUGIN_DATA}/ultrabrief-stats.jsonl`: "slug": "{slug}", "mode": "{default | quick}", "interview_turns": {N}, + "review_iterations": {1..3}, + "brief_quality": "{complete | partial}", "research_topics": {N}, "auto_research": {true | false}, "auto_result": "{completed | cancelled | failed | manual}", @@ -417,4 +672,11 @@ Never let stats failures block the workflow. 7. **Auto mode blocks foreground.** If the user opts into auto, this session waits for research + planning to complete. Document this in the opt-in question. -8. **Privacy:** never log prompt text, secrets, or credentials. +8. **Quality gates, not question counts.** Phase 3 and Phase 4 are + quality-gated loops; do not enforce a hard cap on interview questions. + The brief-review gate (Phase 4) caps at 3 review iterations to bound + cost, but Phase 3 has no cap — required-section content drives exit. +9. **Never write `brief.md` while the review gate is still pending.** + Draft lives in `brief.md.draft` until the loop terminates. A caller + that sees `brief.md` must be able to trust that Phase 4 finished. +10. **Privacy:** never log prompt text, secrets, or credentials.