feat(ultraplan-local): v2.1.0 — dynamic quality-gated interview

Replace hardcoded Q1-Q8 in /ultrabrief-local with a section-driven completeness loop (Phase 3) and a draft/review/revise loop with brief-reviewer as stop-gate (Phase 4). Quality drives the interview, not a question counter. brief-reviewer now emits a machine-readable JSON block with per-dimension scores (1-5) and detail arrays alongside the existing prose report; planning-orchestrator continues to consume the prose verdict unchanged. Phase 4 gate: all dimensions >= 4 AND research_plan = 5. On fail, a targeted follow-up is generated from the weakest dimension's detail field and the draft is re-reviewed. Max 3 review iterations bound cost; exhaustion writes brief.md with brief_quality: partial and an explicit Brief Quality section. Force-stop surfaces per-dimension findings before the user chooses continue or partial. Not breaking. /ultrabrief-local [--quick] <task> interface unchanged. --quick now means compact start with escalation, not a max-N cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 09:43:43 +02:00 · 2026-04-18 09:43:43 +02:00 · 1634197853
commit 1634197853
parent 2bc405e14a
8 changed files with 501 additions and 120 deletions
--- a/plugins/ultraplan-local/agents/brief-reviewer.md
+++ b/plugins/ultraplan-local/agents/brief-reviewer.md
@ -150,13 +150,35 @@ Flag as **research-plan invalid** if:

 ## Rating

-Rate each dimension:
+Rate each dimension on two parallel scales:
+
+**Verbal rating** (used in the prose report and the summary table):
 - **Pass** — adequate for planning
 - **Weak** — has issues but exploration can proceed with noted risks
 - **Fail** — must be addressed before exploration (wastes tokens otherwise)

+**Numeric score 1–5** (used in the machine-readable JSON block):
+- **5** — no issues; section is strong
+- **4** — minor issues that do not block exploration (maps to Pass)
+- **3** — weak but usable; assumptions should be carried (maps to Weak)
+- **2** — serious gap; exploration risks wasted work (maps to Fail)
+- **1** — section is effectively missing or contradictory (maps to Fail)
+
+Use both. The verbal rating drives the human-readable verdict. The numeric
+score drives callers (such as `/ultrabrief-local` Phase 4) that use the
+review as a quality gate and need per-dimension granularity.
+
 ## Output format

+Produce **two artifacts in this order**:
+
+1. A prose report (for humans and for `planning-orchestrator` Phase 1b).
+2. A final fenced `json` block with per-dimension numeric scores (for callers
+   that gate on machine-readable output, such as `/ultrabrief-local` Phase 4).
+
+The JSON block MUST be the last fenced block in your output so parsers can
+find it by reading the last `json` code fence.
+
 ```
 ## Brief Review

@ -187,7 +209,42 @@ information that would strengthen the brief. List only if actionable.}
 - **{PROCEED}** — brief is adequate for exploration
 - **{PROCEED_WITH_RISKS}** — brief has weaknesses; note them as assumptions in the plan
 - **{REVISE}** — brief needs fixes before exploration (list what to fix)
+
+### Machine-readable scores
+
+```json
+{
+  "completeness":   { "score": 1-5, "gaps":            [ "{short gap description}", ... ] },
+  "consistency":    { "score": 1-5, "issues":          [ "{short issue description}", ... ] },
+  "testability":    { "score": 1-5, "weak_criteria":   [ "{quoted weak criterion}", ... ] },
+  "scope_clarity":  { "score": 1-5, "unclear_sections":[ "{section name}", ... ] },
+  "research_plan":  {
+    "score": 1-5,
+    "invalid_topics": [
+      { "topic": "{topic title}", "issue": "{what is missing or wrong}" }
+    ]
+  },
+  "verdict": "PROCEED | PROCEED_WITH_RISKS | REVISE"
+}
 ```
+```
+
+### JSON output rules
+
+- The JSON block is mandatory. Emit it even when everything passes — use
+  empty arrays and `"score": 5` in that case.
+- Every dimension key must be present. Do not omit dimensions.
+- `score` is an integer 1–5. Use the mapping in the Rating section.
+- Array fields must be strings (or objects in the case of `invalid_topics`)
+  that are short, concrete, and actionable — never sentences spanning lines.
+- `verdict` must match the verbal verdict in the prose section. If the JSON
+  verdict disagrees with the prose, the caller will fall back to the prose
+  verdict — but the mismatch is a bug in your output.
+- Do not include trailing commas, comments, or non-JSON text inside the
+  fence. The block must parse with a strict JSON parser.
+- If a dimension's score is 4 or 5, its detail array may be `[]`. A score of
+  3 or below SHOULD populate the detail array so callers can generate
+  targeted follow-up questions.

 ## Rules