feat(ultraplan-local): v2.1.0 — dynamic quality-gated interview
Replace hardcoded Q1-Q8 in /ultrabrief-local with a section-driven completeness loop (Phase 3) and a draft/review/revise loop with brief-reviewer as stop-gate (Phase 4). Quality drives the interview, not a question counter. brief-reviewer now emits a machine-readable JSON block with per-dimension scores (1-5) and detail arrays alongside the existing prose report; planning-orchestrator continues to consume the prose verdict unchanged. Phase 4 gate: all dimensions >= 4 AND research_plan = 5. On fail, a targeted follow-up is generated from the weakest dimension's detail field and the draft is re-reviewed. Max 3 review iterations bound cost; exhaustion writes brief.md with brief_quality: partial and an explicit Brief Quality section. Force-stop surfaces per-dimension findings before the user chooses continue or partial. Not breaking. /ultrabrief-local [--quick] <task> interface unchanged. --quick now means compact start with escalation, not a max-N cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
2bc405e14a
commit
1634197853
8 changed files with 501 additions and 120 deletions
|
|
@ -12,7 +12,7 @@ plugins/
|
||||||
llm-security/ v6.0.0 — Security scanning, auditing, threat modeling
|
llm-security/ v6.0.0 — Security scanning, auditing, threat modeling
|
||||||
ms-ai-architect/ v1.8.0 — Microsoft AI architecture (Cosmo Skyberg persona)
|
ms-ai-architect/ v1.8.0 — Microsoft AI architecture (Cosmo Skyberg persona)
|
||||||
okr/ v1.0.0 — OKR guidance for Norwegian public sector
|
okr/ v1.0.0 — OKR guidance for Norwegian public sector
|
||||||
ultraplan-local/ v2.0.0 — Brief, research, plan, execute (four-command pipeline)
|
ultraplan-local/ v2.1.0 — Brief, research, plan, execute (four-command pipeline)
|
||||||
```
|
```
|
||||||
|
|
||||||
Hvert plugin er selvstendig med egen CLAUDE.md, README, hooks, agents og commands.
|
Hvert plugin er selvstendig med egen CLAUDE.md, README, hooks, agents og commands.
|
||||||
|
|
|
||||||
|
|
@ -61,20 +61,20 @@ Key commands: `/config-audit posture`, `/config-audit feature-gap`, `/config-aud
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### [Ultra {brief | research | plan | execute} - local](plugins/ultraplan-local/) `v2.0.0`
|
### [Ultra {brief | research | plan | execute} - local](plugins/ultraplan-local/) `v2.1.0`
|
||||||
|
|
||||||
Deep requirements gathering, research, implementation planning, and self-verifying execution with specialized agent swarms, adversarial review, and failure recovery.
|
Deep requirements gathering, research, implementation planning, and self-verifying execution with specialized agent swarms, adversarial review, and failure recovery.
|
||||||
|
|
||||||
Four commands, one pipeline with clear division of labor:
|
Four commands, one pipeline with clear division of labor:
|
||||||
|
|
||||||
- **`/ultrabrief-local`** — Capture intent. Interactive interview (3-8 adaptive questions) produces a task brief with explicit research plan. Identifies research topics with copy-paste-ready `/ultraresearch-local` commands. Optional auto-orchestration runs research + planning in foreground. Always interactive.
|
- **`/ultrabrief-local`** — Capture intent. Dynamic, quality-gated interview: a section-driven completeness loop (Phase 3) followed by a `brief-reviewer` stop-gate (Phase 4, max 3 review iterations). Required sections must reach an initial-signal gate AND pass review across completeness, consistency, testability, scope clarity, and research-plan validity before `brief.md` is written. Identifies research topics with copy-paste-ready `/ultraresearch-local` commands. Optional auto-orchestration runs research + planning in foreground. Always interactive.
|
||||||
- **`/ultraresearch-local`** — Gather context. Deep multi-source research with triangulation: 5 local agents + 4 external agents + Gemini bridge, producing structured briefs with confidence ratings. Makes no build decisions.
|
- **`/ultraresearch-local`** — Gather context. Deep multi-source research with triangulation: 5 local agents + 4 external agents + Gemini bridge, producing structured briefs with confidence ratings. Makes no build decisions.
|
||||||
- **`/ultraplan-local`** — Transform intent into an executable contract. Per-step YAML manifests (`expected_paths`, `commit_message_pattern`, `bash_syntax_check`). Plan-critic is a hard gate on manifest quality. Requires a task brief as input (`--brief` or `--project`).
|
- **`/ultraplan-local`** — Transform intent into an executable contract. Per-step YAML manifests (`expected_paths`, `commit_message_pattern`, `bash_syntax_check`). Plan-critic is a hard gate on manifest quality. Requires a task brief as input (`--brief` or `--project`).
|
||||||
- **`/ultraexecute-local`** — Execute the contract disciplined. Manifest-based verification, independent Phase 7.5 audit from git log + filesystem (ignores agent bookkeeping), Phase 7.6 bounded recovery dispatch for missing steps. Step 0 pre-flight catches sandbox push-denial before any work. `--validate` mode offers a fast schema-only sanity-check between planning and execution.
|
- **`/ultraexecute-local`** — Execute the contract disciplined. Manifest-based verification, independent Phase 7.5 audit from git log + filesystem (ignores agent bookkeeping), Phase 7.6 bounded recovery dispatch for missing steps. Step 0 pre-flight catches sandbox push-denial before any work. `--validate` mode offers a fast schema-only sanity-check between planning and execution.
|
||||||
|
|
||||||
All artifacts land in one project directory: `.claude/projects/{YYYY-MM-DD}-{slug}/` contains `brief.md`, `research/NN-*.md`, `plan.md`, `sessions/`, and `progress.json`. `--project <dir>` works across `/ultraresearch-local`, `/ultraplan-local`, and `/ultraexecute-local`.
|
All artifacts land in one project directory: `.claude/projects/{YYYY-MM-DD}-{slug}/` contains `brief.md`, `research/NN-*.md`, `plan.md`, `sessions/`, and `progress.json`. `--project <dir>` works across `/ultraresearch-local`, `/ultraplan-local`, and `/ultraexecute-local`.
|
||||||
|
|
||||||
v2.0 extracts interview from planning: briefs are now reviewable artifacts with explicit intent, research plan, and falsifiable success criteria — each a contract that downstream agents (`brief-reviewer`, `plan-critic`, `scope-guardian`) can validate independently. Breaking change: `/ultraplan-local` no longer runs an interview and requires `--brief` or `--project`. See `plugins/ultraplan-local/MIGRATION.md`.
|
v2.1 (non-breaking) replaces the hardcoded Q1–Q8 interview with a dynamic, quality-gated loop; `brief-reviewer` now emits machine-readable per-dimension JSON scores so `/ultrabrief-local` can use it as an internal stop-gate. v2.0 extracts the interview from planning: briefs are reviewable artifacts with explicit intent, research plan, and falsifiable success criteria — each a contract that downstream agents (`brief-reviewer`, `plan-critic`, `scope-guardian`) can validate independently. v2.0 is breaking: `/ultraplan-local` no longer runs an interview and requires `--brief` or `--project`. See `plugins/ultraplan-local/MIGRATION.md`.
|
||||||
|
|
||||||
v1.7 self-verifying chain (preserved): a step may not be marked `completed` unless its manifest verifies. v1.8 Opus 4.7 literalism fixes (preserved): literal Step+Manifest template, forbidden narrative headers, schema self-check.
|
v1.7 self-verifying chain (preserved): a step may not be marked `completed` unless its manifest verifies. v1.8 Opus 4.7 literalism fixes (preserved): literal Step+Manifest template, forbidden narrative headers, schema self-check.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,7 +1,7 @@
|
||||||
{
|
{
|
||||||
"name": "ultraplan-local",
|
"name": "ultraplan-local",
|
||||||
"description": "Four-command context-engineering pipeline (brief → research → plan → execute) with project folders, specialized agent swarms, external research triangulation, adversarial review, session decomposition, and headless execution.",
|
"description": "Four-command context-engineering pipeline (brief → research → plan → execute) with project folders, specialized agent swarms, external research triangulation, adversarial review, session decomposition, and headless execution.",
|
||||||
"version": "2.0.0",
|
"version": "2.1.0",
|
||||||
"author": {
|
"author": {
|
||||||
"name": "Kjell Tore Guttormsen"
|
"name": "Kjell Tore Guttormsen"
|
||||||
},
|
},
|
||||||
|
|
|
||||||
|
|
@ -4,6 +4,61 @@ All notable changes to this project will be documented in this file.
|
||||||
|
|
||||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||||||
|
|
||||||
|
## [2.1.0] - 2026-04-18
|
||||||
|
|
||||||
|
### Changed — Dynamic, quality-gated interview in `/ultrabrief-local`
|
||||||
|
|
||||||
|
The Phase 3 interview is no longer a hardcoded Q1–Q8 list with a numeric
|
||||||
|
cap (3–4 questions in `--quick`, 5–8 in default). It is now a
|
||||||
|
**section-driven completeness loop**: the command maintains per-section
|
||||||
|
state (Intent, Goal, Success Criteria, Research Plan, and five optional
|
||||||
|
sections), picks the next question from the section with the weakest
|
||||||
|
signal, and keeps probing until all four required sections meet an
|
||||||
|
initial-signal gate. Quality drives the loop, not a counter.
|
||||||
|
|
||||||
|
Phase 4 adds a **draft → brief-reviewer → revise** loop. The brief is
|
||||||
|
drafted in memory, written to `brief.md.draft`, reviewed by the
|
||||||
|
`brief-reviewer` agent as a stop-gate, and only renamed to `brief.md`
|
||||||
|
after all five dimensions pass (`completeness/consistency/testability/
|
||||||
|
scope_clarity ≥ 4` and `research_plan == 5`). If the gate fails, a
|
||||||
|
targeted follow-up is generated from the weakest dimension's detail
|
||||||
|
field and the draft is re-reviewed. The loop is capped at 3 review
|
||||||
|
iterations to bound cost; exhaustion writes the brief with
|
||||||
|
`brief_quality: partial` and an explicit `## Brief Quality` section.
|
||||||
|
|
||||||
|
Force-stop path: when the user says "stop" during Phase 4, the current
|
||||||
|
review findings are surfaced with per-dimension scores before asking
|
||||||
|
whether to continue or accept a partial brief. No silent exits.
|
||||||
|
|
||||||
|
Not breaking. The `/ultrabrief-local [--quick] <task>` interface is
|
||||||
|
unchanged from the outside; only internals change. `--quick` now means
|
||||||
|
"start compact, escalate if gates fail" rather than "max 4 questions".
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
- **JSON output from `brief-reviewer`** — the agent now emits a final
|
||||||
|
fenced `json` block with per-dimension `score` (1–5) and `detail`
|
||||||
|
arrays (`gaps`, `issues`, `weak_criteria`, `unclear_sections`,
|
||||||
|
`invalid_topics`) alongside the existing prose report. The JSON block
|
||||||
|
is mandatory; empty arrays and `score: 5` are required when a
|
||||||
|
dimension passes cleanly. `planning-orchestrator` continues to use
|
||||||
|
the prose verdict unchanged.
|
||||||
|
- **`brief_quality` frontmatter field** on task briefs — `complete`
|
||||||
|
(default) when the Phase 4 gate passed, or `partial` when the
|
||||||
|
iteration cap was hit or the user force-stopped with known issues.
|
||||||
|
`planning-orchestrator` can inspect this to decide how heavily to
|
||||||
|
weight brief sections as assumptions.
|
||||||
|
- **`review_iterations` and `brief_quality` in ultrabrief-stats** —
|
||||||
|
recorded per run for telemetry.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
|
||||||
|
- Hard rule added: `/ultrabrief-local` never writes `brief.md` while the
|
||||||
|
review gate is pending. The draft lives in `brief.md.draft` until the
|
||||||
|
loop terminates.
|
||||||
|
- Hard rule added: no hard cap on Phase 3 questions; the brief-review
|
||||||
|
gate is the only loop bound (3-iteration cap) and is in Phase 4.
|
||||||
|
|
||||||
## [2.0.0] - 2026-04-18
|
## [2.0.0] - 2026-04-18
|
||||||
|
|
||||||
### Breaking — Four-command pipeline with dedicated brief step
|
### Breaking — Four-command pipeline with dedicated brief step
|
||||||
|
|
|
||||||
|
|
@ -17,10 +17,10 @@ Deep implementation planning and research with an explicit brief step, specializ
|
||||||
|
|
||||||
| Flag | Behavior |
|
| Flag | Behavior |
|
||||||
|------|----------|
|
|------|----------|
|
||||||
| _(default)_ | Full interview (5-8 questions) → brief.md with research plan |
|
| _(default)_ | Dynamic interview until quality gates pass → brief.md with research plan |
|
||||||
| `--quick` | Short interview (3-4 questions) → brief.md with research plan |
|
| `--quick` | Compact start; still escalates if required sections are weak or the brief-review gate fails → brief.md with research plan |
|
||||||
|
|
||||||
Always interactive. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground).
|
Always interactive. Phase 3 is a section-driven completeness loop (no hard cap on question count); Phase 4 runs a `brief-reviewer` stop-gate with max 3 review iterations. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground).
|
||||||
|
|
||||||
### /ultraresearch-local modes
|
### /ultraresearch-local modes
|
||||||
|
|
||||||
|
|
@ -88,7 +88,7 @@ Flags combine: `--project <dir> --local --fg`, `--external --quick`.
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
**Brief:** 7-phase workflow: Parse mode → Create project dir → Interactive interview (adaptive depth) → Identify research topics → Write `brief.md` → Manual/auto opt-in → Stats. Always interactive. Auto mode blocks foreground until plan is ready.
|
**Brief:** 7-phase workflow: Parse mode → Create project dir → Phase 3 completeness loop (section-driven, no question cap) → Phase 4 draft/review/revise with `brief-reviewer` as stop-gate (max 3 iterations; gate = all dimensions ≥ 4 and research plan = 5) → Finalize (`brief.md` on pass, or `brief_quality: partial` on cap/force-stop) → Manual/auto opt-in → Stats. Always interactive. Auto mode blocks foreground until plan is ready.
|
||||||
|
|
||||||
**Research:** 8-phase workflow: Parse mode → Interview → Background transition → Parallel research (5 local + 4 external + 1 bridge) → Follow-ups → Triangulation → Synthesis + brief → Stats. With `--project`, writes to `{dir}/research/NN-slug.md`.
|
**Research:** 8-phase workflow: Parse mode → Interview → Background transition → Parallel research (5 local + 4 external + 1 bridge) → Follow-ups → Triangulation → Synthesis + brief → Stats. With `--project`, writes to `{dir}/research/NN-slug.md`.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -84,17 +84,18 @@ Or opt into auto-mode in `/ultrabrief-local` — it will launch research and pla
|
||||||
|
|
||||||
## `/ultrabrief-local` — Brief
|
## `/ultrabrief-local` — Brief
|
||||||
|
|
||||||
Interactive requirements-gathering command. Runs a focused interview (3-8 questions depending on mode and adaptive depth) and produces a **task brief** with an explicit research plan. Optionally orchestrates the rest of the pipeline.
|
Interactive requirements-gathering command. Runs a **dynamic, quality-gated interview** and produces a **task brief** with an explicit research plan. Optionally orchestrates the rest of the pipeline.
|
||||||
|
|
||||||
### How it works
|
### How it works (v2.1 — quality-gated)
|
||||||
|
|
||||||
1. **Parse mode** — `default` (5-8 questions) or `--quick` (3-4 questions)
|
1. **Parse mode** — `default` (dynamic; probes until quality gates pass) or `--quick` (starts compact, still escalates if gates fail)
|
||||||
2. **Create project directory** — `.claude/projects/{YYYY-MM-DD}-{slug}/` with `research/` subdirectory
|
2. **Create project directory** — `.claude/projects/{YYYY-MM-DD}-{slug}/` with `research/` subdirectory
|
||||||
3. **Interview** — one question at a time via `AskUserQuestion`, starting with Intent → Goal → Success Criteria, then optional Non-Goals, Constraints, Preferences, NFRs, Prior Attempts
|
3. **Phase 3 — Completeness loop** — a section-driven loop (not a fixed question list) picks the next question from the section with the weakest signal (Intent → Goal → Success Criteria → Research Plan first, then optional sections). Required sections must reach an initial-signal gate before Phase 3 exits. No hard cap on question count.
|
||||||
4. **Identify research topics** — probe for unfamiliar tech, version upgrades, security decisions, architectural choices
|
4. **Identify research topics** — inline during Phase 3; probe for unfamiliar tech, version upgrades, security-sensitive decisions, architectural choices. Topics get a research question, scope, confidence, and cost.
|
||||||
5. **Write brief** — `{project_dir}/brief.md` with all sections and a copy-paste-ready research plan
|
5. **Phase 4 — Draft, review, revise** — draft the brief in memory, write to `brief.md.draft`, launch `brief-reviewer` as a stop-gate. The reviewer scores five dimensions (completeness, consistency, testability, scope clarity, research plan) 1–5 and returns machine-readable JSON. Gate passes when all ≥ 4 and research plan = 5. On fail, a targeted follow-up is generated from the weakest dimension's detail field and the draft is re-reviewed. Max 3 review iterations.
|
||||||
6. **Auto-orchestration opt-in** — user chooses manual (default) or auto (Claude-managed research + plan in foreground)
|
6. **Finalize** — rename draft to `brief.md` on pass, or write with `brief_quality: partial` + a `## Brief Quality` section if the cap is hit or the user force-stops.
|
||||||
7. **Stats tracking** — append to `ultrabrief-stats.jsonl`
|
7. **Auto-orchestration opt-in** — user chooses manual (default) or auto (Claude-managed research + plan in foreground)
|
||||||
|
8. **Stats tracking** — append to `ultrabrief-stats.jsonl` (includes `review_iterations` and `brief_quality`)
|
||||||
|
|
||||||
Output: `.claude/projects/{YYYY-MM-DD}-{slug}/brief.md`
|
Output: `.claude/projects/{YYYY-MM-DD}-{slug}/brief.md`
|
||||||
|
|
||||||
|
|
@ -102,11 +103,17 @@ Output: `.claude/projects/{YYYY-MM-DD}-{slug}/brief.md`
|
||||||
|
|
||||||
| Mode | Usage | Behavior |
|
| Mode | Usage | Behavior |
|
||||||
|------|-------|----------|
|
|------|-------|----------|
|
||||||
| **Default** | `/ultrabrief-local <task>` | Full interview (5-8 questions) |
|
| **Default** | `/ultrabrief-local <task>` | Dynamic interview until quality gates pass. No question cap. |
|
||||||
| **Quick** | `/ultrabrief-local --quick <task>` | Short interview (3-4 questions) |
|
| **Quick** | `/ultrabrief-local --quick <task>` | Starts compact (optional sections get at most one probe), still escalates on weak required sections or failed review gate. |
|
||||||
|
|
||||||
`/ultrabrief-local` is **always interactive**. There is no foreground/background mode — the interview requires user input.
|
`/ultrabrief-local` is **always interactive**. There is no foreground/background mode — the interview requires user input.
|
||||||
|
|
||||||
|
### Force-stop
|
||||||
|
|
||||||
|
If you say "stop" or "enough" during Phase 4, the current review findings are surfaced with per-dimension scores and you choose:
|
||||||
|
- **Answer one more follow-up** — the loop continues.
|
||||||
|
- **Stop now (accept partial brief)** — the brief is finalized with `brief_quality: partial` and a `## Brief Quality` section listing the failing dimensions. Downstream planning will treat these as reduced-confidence areas.
|
||||||
|
|
||||||
### What the brief contains
|
### What the brief contains
|
||||||
|
|
||||||
- **Intent** — why this matters, motivation, user need (load-bearing)
|
- **Intent** — why this matters, motivation, user need (load-bearing)
|
||||||
|
|
|
||||||
|
|
@ -150,13 +150,35 @@ Flag as **research-plan invalid** if:
|
||||||
|
|
||||||
## Rating
|
## Rating
|
||||||
|
|
||||||
Rate each dimension:
|
Rate each dimension on two parallel scales:
|
||||||
|
|
||||||
|
**Verbal rating** (used in the prose report and the summary table):
|
||||||
- **Pass** — adequate for planning
|
- **Pass** — adequate for planning
|
||||||
- **Weak** — has issues but exploration can proceed with noted risks
|
- **Weak** — has issues but exploration can proceed with noted risks
|
||||||
- **Fail** — must be addressed before exploration (wastes tokens otherwise)
|
- **Fail** — must be addressed before exploration (wastes tokens otherwise)
|
||||||
|
|
||||||
|
**Numeric score 1–5** (used in the machine-readable JSON block):
|
||||||
|
- **5** — no issues; section is strong
|
||||||
|
- **4** — minor issues that do not block exploration (maps to Pass)
|
||||||
|
- **3** — weak but usable; assumptions should be carried (maps to Weak)
|
||||||
|
- **2** — serious gap; exploration risks wasted work (maps to Fail)
|
||||||
|
- **1** — section is effectively missing or contradictory (maps to Fail)
|
||||||
|
|
||||||
|
Use both. The verbal rating drives the human-readable verdict. The numeric
|
||||||
|
score drives callers (such as `/ultrabrief-local` Phase 4) that use the
|
||||||
|
review as a quality gate and need per-dimension granularity.
|
||||||
|
|
||||||
## Output format
|
## Output format
|
||||||
|
|
||||||
|
Produce **two artifacts in this order**:
|
||||||
|
|
||||||
|
1. A prose report (for humans and for `planning-orchestrator` Phase 1b).
|
||||||
|
2. A final fenced `json` block with per-dimension numeric scores (for callers
|
||||||
|
that gate on machine-readable output, such as `/ultrabrief-local` Phase 4).
|
||||||
|
|
||||||
|
The JSON block MUST be the last fenced block in your output so parsers can
|
||||||
|
find it by reading the last `json` code fence.
|
||||||
|
|
||||||
```
|
```
|
||||||
## Brief Review
|
## Brief Review
|
||||||
|
|
||||||
|
|
@ -187,7 +209,42 @@ information that would strengthen the brief. List only if actionable.}
|
||||||
- **{PROCEED}** — brief is adequate for exploration
|
- **{PROCEED}** — brief is adequate for exploration
|
||||||
- **{PROCEED_WITH_RISKS}** — brief has weaknesses; note them as assumptions in the plan
|
- **{PROCEED_WITH_RISKS}** — brief has weaknesses; note them as assumptions in the plan
|
||||||
- **{REVISE}** — brief needs fixes before exploration (list what to fix)
|
- **{REVISE}** — brief needs fixes before exploration (list what to fix)
|
||||||
|
|
||||||
|
### Machine-readable scores
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"completeness": { "score": 1-5, "gaps": [ "{short gap description}", ... ] },
|
||||||
|
"consistency": { "score": 1-5, "issues": [ "{short issue description}", ... ] },
|
||||||
|
"testability": { "score": 1-5, "weak_criteria": [ "{quoted weak criterion}", ... ] },
|
||||||
|
"scope_clarity": { "score": 1-5, "unclear_sections":[ "{section name}", ... ] },
|
||||||
|
"research_plan": {
|
||||||
|
"score": 1-5,
|
||||||
|
"invalid_topics": [
|
||||||
|
{ "topic": "{topic title}", "issue": "{what is missing or wrong}" }
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"verdict": "PROCEED | PROCEED_WITH_RISKS | REVISE"
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
```
|
||||||
|
|
||||||
|
### JSON output rules
|
||||||
|
|
||||||
|
- The JSON block is mandatory. Emit it even when everything passes — use
|
||||||
|
empty arrays and `"score": 5` in that case.
|
||||||
|
- Every dimension key must be present. Do not omit dimensions.
|
||||||
|
- `score` is an integer 1–5. Use the mapping in the Rating section.
|
||||||
|
- Array fields must be strings (or objects in the case of `invalid_topics`)
|
||||||
|
that are short, concrete, and actionable — never sentences spanning lines.
|
||||||
|
- `verdict` must match the verbal verdict in the prose section. If the JSON
|
||||||
|
verdict disagrees with the prose, the caller will fall back to the prose
|
||||||
|
verdict — but the mismatch is a bug in your output.
|
||||||
|
- Do not include trailing commas, comments, or non-JSON text inside the
|
||||||
|
fence. The block must parse with a strict JSON parser.
|
||||||
|
- If a dimension's score is 4 or 5, its detail array may be `[]`. A score of
|
||||||
|
3 or below SHOULD populate the detail array so callers can generate
|
||||||
|
targeted follow-up questions.
|
||||||
|
|
||||||
## Rules
|
## Rules
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -6,7 +6,7 @@ model: opus
|
||||||
allowed-tools: Agent, Read, Glob, Grep, Write, Edit, Bash, AskUserQuestion
|
allowed-tools: Agent, Read, Glob, Grep, Write, Edit, Bash, AskUserQuestion
|
||||||
---
|
---
|
||||||
|
|
||||||
# Ultrabrief Local v2.0
|
# Ultrabrief Local v2.1
|
||||||
|
|
||||||
Interactive requirements-gathering command. Produces a **task brief** — a
|
Interactive requirements-gathering command. Produces a **task brief** — a
|
||||||
structured markdown file that declares intent, goal, constraints, and an
|
structured markdown file that declares intent, goal, constraints, and an
|
||||||
|
|
@ -33,11 +33,14 @@ foreground if the user opts in.
|
||||||
|
|
||||||
Parse `$ARGUMENTS`:
|
Parse `$ARGUMENTS`:
|
||||||
|
|
||||||
1. If arguments start with `--quick`: set **mode = quick**. Interview is
|
1. If arguments start with `--quick`: set **mode = quick**. The interview
|
||||||
shorter (3-4 questions instead of 5-8). Strip the flag; remainder is
|
starts more compactly (fewer opening probes per section) but still
|
||||||
the task description.
|
escalates automatically if quality gates fail. There is no hard cap on
|
||||||
|
question count — quality drives the loop, not a counter. Strip the flag;
|
||||||
|
remainder is the task description.
|
||||||
|
|
||||||
2. Otherwise: **mode = default**. Full interview (5-8 questions).
|
2. Otherwise: **mode = default**. Interview probes each section until the
|
||||||
|
completeness gate (Phase 3) and brief-review gate (Phase 4) both pass.
|
||||||
|
|
||||||
If no task description is provided, output usage and stop:
|
If no task description is provided, output usage and stop:
|
||||||
|
|
||||||
|
|
@ -46,8 +49,8 @@ Usage: /ultrabrief-local <task description>
|
||||||
/ultrabrief-local --quick <task description>
|
/ultrabrief-local --quick <task description>
|
||||||
|
|
||||||
Modes:
|
Modes:
|
||||||
default Full interview (5-8 questions) → brief with research plan
|
default Dynamic interview until quality gates pass — brief with research plan
|
||||||
--quick Short interview (3-4 questions) → brief with research plan
|
--quick Compact start; still escalates on weak sections — brief with research plan
|
||||||
|
|
||||||
Examples:
|
Examples:
|
||||||
/ultrabrief-local Add user authentication with JWT tokens
|
/ultrabrief-local Add user authentication with JWT tokens
|
||||||
|
|
@ -86,74 +89,144 @@ If the directory already exists and is non-empty, warn and ask:
|
||||||
Use `AskUserQuestion` with three options. If "pick new slug", ask for a
|
Use `AskUserQuestion` with three options. If "pick new slug", ask for a
|
||||||
new slug and restart Phase 2.
|
new slug and restart Phase 2.
|
||||||
|
|
||||||
## Phase 3 — Interview
|
## Phase 3 — Completeness loop
|
||||||
|
|
||||||
Use `AskUserQuestion` throughout. **Ask one question at a time.** Never
|
Phase 3 is a **section-driven completeness loop**. Instead of a numbered
|
||||||
dump all questions at once. Follow up based on answers.
|
question list, maintain an internal state of brief sections and keep asking
|
||||||
|
until every required section has substantive content. Quality drives the
|
||||||
|
loop — there is no hard cap on question count.
|
||||||
|
|
||||||
### Interview flow
|
Use `AskUserQuestion` for every question. **Ask one question at a time.**
|
||||||
|
Never dump multiple questions.
|
||||||
|
|
||||||
**Question 1 (always) — Intent:**
|
### Internal state
|
||||||
> "What is the motivation for this task? Why does it matter? What happens
|
|
||||||
> if we don't do it? (The plan will use this to justify every implementation
|
|
||||||
> decision.)"
|
|
||||||
|
|
||||||
**Question 2 (always) — Goal:**
|
Track this structure in memory as the loop runs:
|
||||||
> "What does success look like concretely? Describe the end state in
|
|
||||||
> 1-3 sentences — specific enough to disagree with."
|
|
||||||
|
|
||||||
**Question 3 (always) — Success criteria:**
|
```
|
||||||
> "How will we verify it's done? List 2-4 specific, testable conditions
|
state = {
|
||||||
> (commands to run, observations, metrics). Avoid 'it works'."
|
intent: { content: "", probes: 0 }, # required
|
||||||
|
goal: { content: "", probes: 0 }, # required
|
||||||
|
success_criteria: { content: [], probes: 0 }, # required
|
||||||
|
research_plan: { topics: [], probes: 0 }, # required
|
||||||
|
non_goals: { content: [], probes: 0 }, # optional
|
||||||
|
constraints: { content: [], probes: 0 }, # optional
|
||||||
|
preferences: { content: [], probes: 0 }, # optional
|
||||||
|
nfrs: { content: [], probes: 0 }, # optional
|
||||||
|
prior_attempts: { content: "", probes: 0 }, # optional
|
||||||
|
question_history: [] # list of questions asked
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
**Question 4 (usually) — Non-goals:**
|
`content` is raw user answers merged; `probes` is how many times this
|
||||||
> "What is explicitly NOT in scope? (Prevents scope-guardian flagging gaps
|
section has been asked; `question_history` prevents re-asking the same
|
||||||
> for things we deliberately don't do.)"
|
variant twice.
|
||||||
|
|
||||||
**Question 5 (conditional) — Constraints:**
|
### Required sections (initial-signal gate)
|
||||||
> "Are there technical, time, or resource constraints? (Dependencies,
|
|
||||||
> compatibility, deadlines, budget.)"
|
|
||||||
>
|
|
||||||
> Skip if the user already mentioned constraints.
|
|
||||||
|
|
||||||
**Question 6 (conditional) — Preferences:**
|
Four sections MUST have substantive content before exiting Phase 3:
|
||||||
> "Preferences on libraries, patterns, or architectural style?"
|
|
||||||
>
|
|
||||||
> Skip for small tasks or when constraints already imply them.
|
|
||||||
|
|
||||||
**Question 7 (conditional) — Non-functional requirements:**
|
1. **Intent** — full sentence or paragraph (not a single word or phrase)
|
||||||
> "Performance, security, accessibility, or scalability targets? (Quantified
|
2. **Goal** — full sentence or paragraph
|
||||||
> where possible.)"
|
3. **Success Criteria** — at least one concrete, testable item
|
||||||
>
|
4. **Research Plan** — either ≥ 1 topic probed, OR the user has explicitly
|
||||||
> Skip if not applicable.
|
confirmed "no external research needed"
|
||||||
|
|
||||||
**Question 8 (conditional) — Prior attempts:**
|
"Substantive" means: non-empty, not a trivial one-word reply, not
|
||||||
> "Has this been tried before? What worked or failed?"
|
"I don't know" without a recorded assumption. The strict falsifiability
|
||||||
>
|
check happens in Phase 4 (brief-review gate); Phase 3 is just the
|
||||||
> Skip if the task is clearly fresh.
|
initial-signal bar.
|
||||||
|
|
||||||
### Adaptive depth
|
Optional sections (Non-Goals, Constraints, Preferences, NFRs, Prior
|
||||||
|
Attempts) do not gate exit. If they remain empty after the required
|
||||||
|
sections pass, they will be recorded as "Not discussed — no constraints
|
||||||
|
assumed" in Phase 4's draft.
|
||||||
|
|
||||||
After each answer:
|
### Question bank (per section)
|
||||||
|
|
||||||
- **Detailed technical answer (2+ sentences, domain vocabulary):** skip
|
Pick the next question from the section's bank based on `content` and
|
||||||
obvious follow-ups the user already covered. Aim for 4-5 total questions.
|
`probes`. Wording must stay conversational — only the *selection* is
|
||||||
- **Short or uncertain answer ("I don't know", "not sure", vague):** offer
|
section-driven, not the tone.
|
||||||
alternatives instead of open questions. Record uncertainty as an
|
|
||||||
open assumption.
|
|
||||||
- **"Skip" / "just make it" / "proceed":** stop interviewing. Write a
|
|
||||||
minimal brief from the task description and answers so far. Mark
|
|
||||||
uncovered sections as "Not discussed — no constraints assumed."
|
|
||||||
|
|
||||||
### Quick mode
|
**Intent** (required):
|
||||||
|
- _Anchor_ (probes=0, content empty): "Why are we doing this? What is the
|
||||||
|
motivation, the user need, or the strategic context behind the task?"
|
||||||
|
- _Follow-up_ (probes≥1, content present but shallow): "What happens if
|
||||||
|
we do nothing? Who is affected?"
|
||||||
|
- _Sharpen_ (user mentioned a symptom): "You mentioned {X}. Is {X} the
|
||||||
|
symptom or the underlying cause?"
|
||||||
|
|
||||||
If **mode = quick**, ask only Questions 1, 2, 3, and 4. Maximum 4 questions.
|
**Goal** (required):
|
||||||
Skip the rest.
|
- _Anchor_: "Describe the end state in 1–3 sentences — specific enough to
|
||||||
|
disagree with."
|
||||||
|
- _Follow-up_: "How would you recognize this is done when looking at
|
||||||
|
the UI / API / codebase?"
|
||||||
|
|
||||||
### Research topic identification (CRITICAL)
|
**Success Criteria** (required):
|
||||||
|
- _Anchor_: "How do we verify it is actually done? List 2–4 concrete,
|
||||||
|
testable conditions — commands to run, observations, or metrics."
|
||||||
|
- _Sharpen_ (criterion is vague): "'{quoted criterion}' is subjective.
|
||||||
|
Which command, observation, or metric would prove this is met?"
|
||||||
|
- _Quantify_ (performance/quality claim): "You mentioned it should be
|
||||||
|
{fast/reliable/secure}. What number or threshold counts as success?"
|
||||||
|
|
||||||
As the interview progresses, identify topics that will need research for
|
**Research Plan** (required, strictest):
|
||||||
the plan to be high-confidence. Watch for:
|
- _Anchor_ (no topics yet): "Are there technologies, libraries, or
|
||||||
|
decisions in this task you do not have solid current knowledge of?
|
||||||
|
Examples might be library choice, a protocol, or a security pattern."
|
||||||
|
- _Per-topic sharpen_ (topic exists but incomplete): "For topic
|
||||||
|
'{title}': which parts of the plan depend on the answer? What
|
||||||
|
confidence level do you need — high, medium, or low?"
|
||||||
|
- _Scope question_: "Is '{topic}' answerable from the existing codebase,
|
||||||
|
from external docs, or both?"
|
||||||
|
- _Confirm none_ (user refuses all topics): "Confirming: no external
|
||||||
|
research needed — you already know everything the plan will depend on?"
|
||||||
|
|
||||||
|
**Non-Goals** (optional):
|
||||||
|
- _Anchor_: "What is explicitly NOT in scope? This prevents scope-guardian
|
||||||
|
from flagging gaps for things we deliberately don't do."
|
||||||
|
|
||||||
|
**Constraints** (optional):
|
||||||
|
- _Anchor_: "Technical, time, or resource constraints the plan must
|
||||||
|
respect? Dependencies, compatibility, deadlines, or budget."
|
||||||
|
- _Sharpen_: "You mentioned {deadline / budget / compatibility}. Is it
|
||||||
|
hard or guidance?"
|
||||||
|
|
||||||
|
**Preferences** (optional):
|
||||||
|
- _Anchor_: "Preferences for libraries, patterns, or architectural style?"
|
||||||
|
|
||||||
|
**NFRs** (optional):
|
||||||
|
- _Anchor_: "Performance, security, accessibility, or scalability targets?
|
||||||
|
Quantified wherever possible."
|
||||||
|
|
||||||
|
**Prior Attempts** (optional):
|
||||||
|
- _Anchor_: "Has this been attempted before? What worked or failed?"
|
||||||
|
|
||||||
|
### Selection rule
|
||||||
|
|
||||||
|
On each loop iteration:
|
||||||
|
|
||||||
|
1. Compute the next section to probe:
|
||||||
|
- If any required section is below the initial-signal gate → pick the
|
||||||
|
weakest required section in this priority order:
|
||||||
|
Intent → Goal → Success Criteria → Research Plan.
|
||||||
|
- Else if an optional section is clearly missing and likely material
|
||||||
|
to scope (heuristic: the task description hints at constraints or
|
||||||
|
NFRs) → probe it at most once.
|
||||||
|
- Else: exit Phase 3.
|
||||||
|
2. Within the chosen section, pick the question variant:
|
||||||
|
- If `probes == 0` and content is empty → _Anchor_.
|
||||||
|
- If content exists but is shallow → _Follow-up_ or _Sharpen_.
|
||||||
|
- If the section is Research Plan and topics exist → iterate per-topic
|
||||||
|
sharpen across incomplete topics.
|
||||||
|
3. Ensure the exact question is NOT already in `question_history`. If it
|
||||||
|
is, pick the next variant or skip to the next weakest section.
|
||||||
|
4. Ask via `AskUserQuestion`. Append question to history. Increment probes.
|
||||||
|
5. Record the answer into `content`. Never overwrite — merge.
|
||||||
|
|
||||||
|
### Research topic identification
|
||||||
|
|
||||||
|
As the user answers Intent, Goal, or Success Criteria, listen for:
|
||||||
|
|
||||||
- **Unfamiliar technologies** — libraries, frameworks, protocols not
|
- **Unfamiliar technologies** — libraries, frameworks, protocols not
|
||||||
clearly present in the codebase
|
clearly present in the codebase
|
||||||
|
|
@ -163,48 +236,226 @@ the plan to be high-confidence. Watch for:
|
||||||
- **Unknown integrations** — third-party APIs, external services
|
- **Unknown integrations** — third-party APIs, external services
|
||||||
- **Compliance / legal** — GDPR, accessibility, industry regulations
|
- **Compliance / legal** — GDPR, accessibility, industry regulations
|
||||||
|
|
||||||
For each potential topic, probe briefly:
|
When you hear one, add a *candidate* topic to `research_plan.topics` with
|
||||||
> "Do you already know {topic}, or should I plan a research step for it?"
|
only a title and why-it-matters. Probe it on the next Research Plan
|
||||||
|
iteration using the per-topic sharpen question to fill in:
|
||||||
Record:
|
- Research question (must end in `?`)
|
||||||
- Topic title (short)
|
- Required for plan steps
|
||||||
- Why it matters for the plan
|
- Scope (local / external / both)
|
||||||
- Exact research question (phrased for `/ultraresearch-local`)
|
|
||||||
- Suggested scope (local / external / both)
|
|
||||||
- Confidence needed (high / medium / low)
|
- Confidence needed (high / medium / low)
|
||||||
- Estimated cost (quick / standard / deep)
|
- Estimated cost (quick / standard / deep)
|
||||||
|
|
||||||
If the user says "I know this" — do not add it as a topic. Trust the user.
|
If the user says "I know this" to a candidate topic, remove it from the
|
||||||
|
list. Trust the user. If no topics emerge after probing, the user confirms
|
||||||
|
"no external research needed" → `research_plan` gate passes with 0 topics.
|
||||||
|
|
||||||
If no topics emerge: `research_topics = 0` is valid. Not every task needs
|
### Quick mode adjustments
|
||||||
external research.
|
|
||||||
|
|
||||||
## Phase 4 — Write the brief
|
If **mode = quick**:
|
||||||
|
- For optional sections, cap probes at 1 each. Do not revisit optional
|
||||||
|
sections during Phase 3.
|
||||||
|
- Required sections still have no probe cap — quality gate still applies.
|
||||||
|
- Prefer _Anchor_ variants over _Sharpen_ on the first pass.
|
||||||
|
|
||||||
Read the brief template:
|
### Force-stop path
|
||||||
|
|
||||||
|
If the user says "skip", "stop asking", "just proceed", "enough", or
|
||||||
|
similar, break the loop immediately:
|
||||||
|
- Mark any required sections still below the initial-signal gate as
|
||||||
|
`{ incomplete_forced_stop: true }` in state.
|
||||||
|
- Proceed to Phase 4 with a note that the brief will carry a reduced
|
||||||
|
confidence flag.
|
||||||
|
|
||||||
|
### Exit condition
|
||||||
|
|
||||||
|
Exit Phase 3 when:
|
||||||
|
- All four required sections meet the initial-signal gate, OR
|
||||||
|
- The user has force-stopped.
|
||||||
|
|
||||||
|
Report:
|
||||||
|
```
|
||||||
|
Phase 3 complete: {N} questions asked across {M} sections.
|
||||||
|
Proceeding to draft and review.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Phase 4 — Draft, review, and revise
|
||||||
|
|
||||||
|
Phase 4 runs a **draft → brief-reviewer → revise** loop. The draft is
|
||||||
|
not written to disk until the brief-review quality gate passes (or the
|
||||||
|
iteration cap is hit). This ensures the brief that reaches `/ultraplan-local`
|
||||||
|
has already survived a critical review.
|
||||||
|
|
||||||
|
Read the brief template first:
|
||||||
`@${CLAUDE_PLUGIN_ROOT}/templates/ultrabrief-template.md`
|
`@${CLAUDE_PLUGIN_ROOT}/templates/ultrabrief-template.md`
|
||||||
|
|
||||||
Write the brief to: `{PROJECT_DIR}/brief.md`.
|
### Loop bound
|
||||||
|
|
||||||
Fill in every section based on interview answers:
|
**Maximum 3 review iterations.** This bounds cost in the worst case while
|
||||||
|
leaving room for two rounds of targeted follow-ups.
|
||||||
|
|
||||||
- **Frontmatter:** populate `task`, `slug`, `project_dir`, `research_topics`,
|
### Iteration step-by-step
|
||||||
`research_status: pending`, `auto_research: false` (will update in Phase 5
|
|
||||||
if user opts in), `interview_turns`, `source: interview`.
|
**Step 4a — Draft in memory**
|
||||||
- **Intent:** expand the user's motivation into 3-5 sentences. This is
|
|
||||||
load-bearing — be explicit.
|
Build the brief text from Phase 3 state by filling the template:
|
||||||
|
|
||||||
|
- **Frontmatter:** populate `task`, `slug`, `project_dir`, `research_topics`
|
||||||
|
(count of topics), `research_status: pending`, `auto_research: false`
|
||||||
|
(will update in Phase 5 if user opts in), `interview_turns` (total
|
||||||
|
questions asked across Phase 3 + Phase 4), `source: interview`.
|
||||||
|
- **Intent:** expand the user's motivation into 3–5 sentences. Load-bearing.
|
||||||
- **Goal:** concrete end state.
|
- **Goal:** concrete end state.
|
||||||
- **Non-Goals:** from user's answer, or empty list with note.
|
- **Non-Goals:** from state, or "- None explicitly stated" bullet if empty.
|
||||||
- **Constraints / Preferences / NFRs:** from user's answers. Mark
|
- **Constraints / Preferences / NFRs:** from state, or "Not discussed — no
|
||||||
"Not discussed — no constraints assumed" if not covered.
|
constraints assumed" note if empty.
|
||||||
- **Success Criteria:** falsifiable commands/observations. Reject vague
|
- **Success Criteria:** falsifiable commands/observations from state.
|
||||||
criteria from the user — ask for a concrete version if needed.
|
- **Research Plan:** one `### Topic N: {title}` section per topic with the
|
||||||
- **Research Plan:** one section per identified topic, with the full
|
full structure from the template. If 0 topics: write the "No external
|
||||||
structure from the template. If 0 topics, write the "No external
|
research needed — user confirmed solid knowledge of all plan
|
||||||
research needed" note.
|
dependencies" note.
|
||||||
- **Open Questions / Assumptions:** from "I don't know" answers and
|
- **Open Questions / Assumptions:** from any `"I don't know"` answers
|
||||||
implicit gaps.
|
recorded during Phase 3, plus implicit gaps.
|
||||||
- **Prior Attempts:** from user's answer, or "None — fresh task."
|
- **Prior Attempts:** from state, or "None — fresh task."
|
||||||
|
|
||||||
|
**Step 4b — Write draft to disk**
|
||||||
|
|
||||||
|
Write the draft to `{PROJECT_DIR}/brief.md.draft` (not `brief.md` — the
|
||||||
|
final file is only written after the gate passes).
|
||||||
|
|
||||||
|
**Step 4c — Launch brief-reviewer**
|
||||||
|
|
||||||
|
Launch the `brief-reviewer` agent (foreground, blocking) with the prompt:
|
||||||
|
|
||||||
|
> "Review this task brief for quality: `{PROJECT_DIR}/brief.md.draft`.
|
||||||
|
> Check completeness, consistency, testability, scope clarity, and
|
||||||
|
> research-plan validity. Report findings, verdict, and the required
|
||||||
|
> machine-readable JSON block."
|
||||||
|
|
||||||
|
**Step 4d — Parse JSON scores**
|
||||||
|
|
||||||
|
Parse the agent's output. Locate the **last** fenced ```json``` block.
|
||||||
|
Extract per-dimension scores:
|
||||||
|
|
||||||
|
```
|
||||||
|
review = {
|
||||||
|
completeness: { score, gaps },
|
||||||
|
consistency: { score, issues },
|
||||||
|
testability: { score, weak_criteria },
|
||||||
|
scope_clarity: { score, unclear_sections },
|
||||||
|
research_plan: { score, invalid_topics },
|
||||||
|
verdict: "PROCEED | PROCEED_WITH_RISKS | REVISE"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**JSON fallback:** if the JSON block is missing, invalid, or a dimension
|
||||||
|
is missing, treat all dimensions as `score: 3` and set the `verdict` from
|
||||||
|
the prose verdict if present, otherwise `PROCEED_WITH_RISKS`. Emit an
|
||||||
|
internal note that the reviewer output was degraded. This ensures the
|
||||||
|
loop never deadlocks on a parser error.
|
||||||
|
|
||||||
|
**Step 4e — Gate evaluation**
|
||||||
|
|
||||||
|
The gate **passes** when all of the following are true:
|
||||||
|
|
||||||
|
- `completeness.score ≥ 4`
|
||||||
|
- `consistency.score ≥ 4`
|
||||||
|
- `testability.score ≥ 4`
|
||||||
|
- `scope_clarity.score ≥ 4`
|
||||||
|
- `research_plan.score == 5`
|
||||||
|
|
||||||
|
(Research Plan requires a perfect score because its format is checked
|
||||||
|
mechanically: ends in `?`, `Required for plan steps` filled, scope is
|
||||||
|
one of `local | external | both`, confidence is `high | medium | low`.
|
||||||
|
Anything less means at least one topic is malformed and planning will
|
||||||
|
stumble.)
|
||||||
|
|
||||||
|
**If gate passes:**
|
||||||
|
1. Move `brief.md.draft` → `brief.md` (atomic rename).
|
||||||
|
2. Delete the draft file if rename is not possible on the OS; write
|
||||||
|
`brief.md` fresh.
|
||||||
|
3. Break the loop and proceed to Step 4g.
|
||||||
|
|
||||||
|
**If gate fails AND iteration count < 3:**
|
||||||
|
1. Identify the weakest dimension (lowest score; tie broken by priority:
|
||||||
|
research_plan > testability > completeness > consistency > scope_clarity).
|
||||||
|
2. Generate a targeted follow-up question from the dimension's detail
|
||||||
|
field (gaps / issues / weak_criteria / unclear_sections / invalid_topics).
|
||||||
|
Example generators:
|
||||||
|
- `completeness.gaps: ["Non-Goals empty, unclear if deliberate"]`
|
||||||
|
→ "You did not specify anything out-of-scope. Is that deliberate, or
|
||||||
|
are there things we should explicitly exclude?"
|
||||||
|
- `testability.weak_criteria: ["'system should be fast'"]`
|
||||||
|
→ "'System should be fast' is not falsifiable. Which metric or
|
||||||
|
threshold proves this is met — e.g., p95 < 200ms, or throughput
|
||||||
|
≥ X requests/sec?"
|
||||||
|
- `research_plan.invalid_topics: [{"topic":"JWT","issue":"Required for plan steps empty"}]`
|
||||||
|
→ "For research topic 'JWT': which plan steps depend on the answer?
|
||||||
|
Give one or two concrete kinds of step (e.g., 'library selection',
|
||||||
|
'threat model', 'migration strategy')."
|
||||||
|
3. Ask via `AskUserQuestion`. Record the answer into Phase 3 state.
|
||||||
|
4. Return to Step 4a with incremented iteration count. The reviewer sees
|
||||||
|
an updated draft, so you MUST re-read the brief and regenerate the
|
||||||
|
review each iteration — do not reuse stale scores.
|
||||||
|
5. When launching the reviewer on iteration 2 or 3, include prior
|
||||||
|
questions in the prompt so it does not produce circular follow-ups:
|
||||||
|
> "Questions already asked during this interview: {list from
|
||||||
|
> question_history}. Focus on issues that remain after those answers —
|
||||||
|
> do not re-raise gaps that have already been addressed."
|
||||||
|
|
||||||
|
**If gate fails AND iteration count == 3 (loop exhausted):**
|
||||||
|
1. Move `brief.md.draft` → `brief.md`.
|
||||||
|
2. Add `brief_quality: partial` to the frontmatter (edit the file
|
||||||
|
post-rename — insert the key above the closing `---`).
|
||||||
|
3. Add a `## Brief Quality` section near the end with the failing
|
||||||
|
dimensions and their `detail` arrays from the final review, formatted:
|
||||||
|
```
|
||||||
|
## Brief Quality
|
||||||
|
|
||||||
|
Review loop exhausted after 3 iterations. The following dimensions
|
||||||
|
did not reach the pass threshold:
|
||||||
|
|
||||||
|
- **Research Plan (score 2/5):** Topic 'JWT library' missing
|
||||||
|
Required-for-plan-steps field.
|
||||||
|
- **Testability (score 3/5):** Success criterion "works correctly"
|
||||||
|
is not falsifiable.
|
||||||
|
|
||||||
|
Downstream planning will treat these as reduced-confidence areas.
|
||||||
|
```
|
||||||
|
4. Break the loop and proceed to Step 4g.
|
||||||
|
|
||||||
|
### Step 4f — Force-stop handling
|
||||||
|
|
||||||
|
If during any `AskUserQuestion` in Step 4e the user says "stop", "skip",
|
||||||
|
"enough", "just write it", or similar, do NOT exit the loop immediately.
|
||||||
|
Instead, surface the current review findings in plain text:
|
||||||
|
|
||||||
|
```
|
||||||
|
Brief-reviewer would flag these issues:
|
||||||
|
- Research Plan (score 2/5): Topic 'JWT library choice' missing Required-for-plan-steps field.
|
||||||
|
- Testability (score 3/5): Success criterion "works correctly" is not falsifiable.
|
||||||
|
|
||||||
|
Continue anyway? The plan will have lower confidence in these areas.
|
||||||
|
```
|
||||||
|
|
||||||
|
Then ask via `AskUserQuestion`:
|
||||||
|
|
||||||
|
| Option | Action |
|
||||||
|
|--------|--------|
|
||||||
|
| **Answer one more follow-up** | Return to Step 4e with the current weakest-dimension question. |
|
||||||
|
| **Stop now (accept partial brief)** | Finalize brief with `brief_quality: partial` and the `## Brief Quality` section (same path as iteration-cap exhaustion). Break loop. |
|
||||||
|
|
||||||
|
The force-stop path is distinct from a silent iteration cap: the user
|
||||||
|
sees exactly which dimensions are weak and chooses informed.
|
||||||
|
|
||||||
|
### Step 4g — Finalize
|
||||||
|
|
||||||
|
After the loop exits (pass, cap, or force-stop), ensure:
|
||||||
|
- `brief.md` exists at `{PROJECT_DIR}/brief.md`.
|
||||||
|
- `brief.md.draft` no longer exists.
|
||||||
|
- If the loop ended without a clean pass, frontmatter contains
|
||||||
|
`brief_quality: partial` and a `## Brief Quality` section exists.
|
||||||
|
- If the loop ended with a clean pass, `brief_quality` is either
|
||||||
|
absent or set to `complete`.
|
||||||
|
|
||||||
Populate the "How to continue" footer with the actual project path and
|
Populate the "How to continue" footer with the actual project path and
|
||||||
topic questions.
|
topic questions.
|
||||||
|
|
@ -212,6 +463,8 @@ topic questions.
|
||||||
Report:
|
Report:
|
||||||
```
|
```
|
||||||
Brief written: {PROJECT_DIR}/brief.md
|
Brief written: {PROJECT_DIR}/brief.md
|
||||||
|
Review iterations: {1..3}
|
||||||
|
Final quality: {complete | partial}
|
||||||
Research topics identified: {N}
|
Research topics identified: {N}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
@ -387,6 +640,8 @@ Append one record to `${CLAUDE_PLUGIN_DATA}/ultrabrief-stats.jsonl`:
|
||||||
"slug": "{slug}",
|
"slug": "{slug}",
|
||||||
"mode": "{default | quick}",
|
"mode": "{default | quick}",
|
||||||
"interview_turns": {N},
|
"interview_turns": {N},
|
||||||
|
"review_iterations": {1..3},
|
||||||
|
"brief_quality": "{complete | partial}",
|
||||||
"research_topics": {N},
|
"research_topics": {N},
|
||||||
"auto_research": {true | false},
|
"auto_research": {true | false},
|
||||||
"auto_result": "{completed | cancelled | failed | manual}",
|
"auto_result": "{completed | cancelled | failed | manual}",
|
||||||
|
|
@ -417,4 +672,11 @@ Never let stats failures block the workflow.
|
||||||
7. **Auto mode blocks foreground.** If the user opts into auto, this
|
7. **Auto mode blocks foreground.** If the user opts into auto, this
|
||||||
session waits for research + planning to complete. Document this in
|
session waits for research + planning to complete. Document this in
|
||||||
the opt-in question.
|
the opt-in question.
|
||||||
8. **Privacy:** never log prompt text, secrets, or credentials.
|
8. **Quality gates, not question counts.** Phase 3 and Phase 4 are
|
||||||
|
quality-gated loops; do not enforce a hard cap on interview questions.
|
||||||
|
The brief-review gate (Phase 4) caps at 3 review iterations to bound
|
||||||
|
cost, but Phase 3 has no cap — required-section content drives exit.
|
||||||
|
9. **Never write `brief.md` while the review gate is still pending.**
|
||||||
|
Draft lives in `brief.md.draft` until the loop terminates. A caller
|
||||||
|
that sees `brief.md` must be able to trust that Phase 4 finished.
|
||||||
|
10. **Privacy:** never log prompt text, secrets, or credentials.
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue