feat(ultraplan-local): v2.1.0 — dynamic quality-gated interview
Replace hardcoded Q1-Q8 in /ultrabrief-local with a section-driven completeness loop (Phase 3) and a draft/review/revise loop with brief-reviewer as stop-gate (Phase 4). Quality drives the interview, not a question counter. brief-reviewer now emits a machine-readable JSON block with per-dimension scores (1-5) and detail arrays alongside the existing prose report; planning-orchestrator continues to consume the prose verdict unchanged. Phase 4 gate: all dimensions >= 4 AND research_plan = 5. On fail, a targeted follow-up is generated from the weakest dimension's detail field and the draft is re-reviewed. Max 3 review iterations bound cost; exhaustion writes brief.md with brief_quality: partial and an explicit Brief Quality section. Force-stop surfaces per-dimension findings before the user chooses continue or partial. Not breaking. /ultrabrief-local [--quick] <task> interface unchanged. --quick now means compact start with escalation, not a max-N cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
2bc405e14a
commit
1634197853
8 changed files with 501 additions and 120 deletions
|
|
@ -150,13 +150,35 @@ Flag as **research-plan invalid** if:
|
|||
|
||||
## Rating
|
||||
|
||||
Rate each dimension:
|
||||
Rate each dimension on two parallel scales:
|
||||
|
||||
**Verbal rating** (used in the prose report and the summary table):
|
||||
- **Pass** — adequate for planning
|
||||
- **Weak** — has issues but exploration can proceed with noted risks
|
||||
- **Fail** — must be addressed before exploration (wastes tokens otherwise)
|
||||
|
||||
**Numeric score 1–5** (used in the machine-readable JSON block):
|
||||
- **5** — no issues; section is strong
|
||||
- **4** — minor issues that do not block exploration (maps to Pass)
|
||||
- **3** — weak but usable; assumptions should be carried (maps to Weak)
|
||||
- **2** — serious gap; exploration risks wasted work (maps to Fail)
|
||||
- **1** — section is effectively missing or contradictory (maps to Fail)
|
||||
|
||||
Use both. The verbal rating drives the human-readable verdict. The numeric
|
||||
score drives callers (such as `/ultrabrief-local` Phase 4) that use the
|
||||
review as a quality gate and need per-dimension granularity.
|
||||
|
||||
## Output format
|
||||
|
||||
Produce **two artifacts in this order**:
|
||||
|
||||
1. A prose report (for humans and for `planning-orchestrator` Phase 1b).
|
||||
2. A final fenced `json` block with per-dimension numeric scores (for callers
|
||||
that gate on machine-readable output, such as `/ultrabrief-local` Phase 4).
|
||||
|
||||
The JSON block MUST be the last fenced block in your output so parsers can
|
||||
find it by reading the last `json` code fence.
|
||||
|
||||
```
|
||||
## Brief Review
|
||||
|
||||
|
|
@ -187,7 +209,42 @@ information that would strengthen the brief. List only if actionable.}
|
|||
- **{PROCEED}** — brief is adequate for exploration
|
||||
- **{PROCEED_WITH_RISKS}** — brief has weaknesses; note them as assumptions in the plan
|
||||
- **{REVISE}** — brief needs fixes before exploration (list what to fix)
|
||||
|
||||
### Machine-readable scores
|
||||
|
||||
```json
|
||||
{
|
||||
"completeness": { "score": 1-5, "gaps": [ "{short gap description}", ... ] },
|
||||
"consistency": { "score": 1-5, "issues": [ "{short issue description}", ... ] },
|
||||
"testability": { "score": 1-5, "weak_criteria": [ "{quoted weak criterion}", ... ] },
|
||||
"scope_clarity": { "score": 1-5, "unclear_sections":[ "{section name}", ... ] },
|
||||
"research_plan": {
|
||||
"score": 1-5,
|
||||
"invalid_topics": [
|
||||
{ "topic": "{topic title}", "issue": "{what is missing or wrong}" }
|
||||
]
|
||||
},
|
||||
"verdict": "PROCEED | PROCEED_WITH_RISKS | REVISE"
|
||||
}
|
||||
```
|
||||
```
|
||||
|
||||
### JSON output rules
|
||||
|
||||
- The JSON block is mandatory. Emit it even when everything passes — use
|
||||
empty arrays and `"score": 5` in that case.
|
||||
- Every dimension key must be present. Do not omit dimensions.
|
||||
- `score` is an integer 1–5. Use the mapping in the Rating section.
|
||||
- Array fields must be strings (or objects in the case of `invalid_topics`)
|
||||
that are short, concrete, and actionable — never sentences spanning lines.
|
||||
- `verdict` must match the verbal verdict in the prose section. If the JSON
|
||||
verdict disagrees with the prose, the caller will fall back to the prose
|
||||
verdict — but the mismatch is a bug in your output.
|
||||
- Do not include trailing commas, comments, or non-JSON text inside the
|
||||
fence. The block must parse with a strict JSON parser.
|
||||
- If a dimension's score is 4 or 5, its detail array may be `[]`. A score of
|
||||
3 or below SHOULD populate the detail array so callers can generate
|
||||
targeted follow-up questions.
|
||||
|
||||
## Rules
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue