Kjell Tore Guttormsen b4e58e3fc2 feat(ultraplan-local): add commands/ultrareview-local.md

2026-05-01 16:56:47 +02:00

13 KiB

Raw Blame History

name	description	argument-hint	model	allowed-tools
ultrareview-local	Independent post-hoc review of delivered code against the brief. Produces review.md with severity-tagged findings (BLOCKER/MAJOR/MINOR/SUGGESTION) per Handover 6 (review → plan).	--project <dir> [--since <ref>] [--quick] [--validate] [--dry-run]	opus	Agent, Read, Glob, Grep, Write, Edit, Bash, AskUserQuestion

Ultrareview Local v1.0

Independent post-hoc review of code delivered by /ultraexecute-local against the contract in brief.md. Produces review.md — a structured artifact with severity-tagged findings that /ultraplan-local --brief review.md can consume as plan input (Handover 6).

Pipeline position:

/ultrabrief-local     →  brief.md
/ultraresearch-local  →  research/*.md
/ultraplan-local      →  plan.md
/ultraexecute-local   →  progress.json (+ commits)
/ultrareview-local    →  review.md            (this command)

The review is independent: each reviewer runs without cross-feeding, and the coordinator applies BOUNDED operations only. Synthesis-level inference across files is forbidden in v1.0 (Judge Agent pattern).

See agents/review-orchestrator.md for the canonical workflow this command executes inline.

Phase 1 — Parse mode and validate input

Parse $ARGUMENTS via the shared arg-parser:

node ${CLAUDE_PLUGIN_ROOT}/lib/parsers/arg-parser.mjs --command ultrareview "$@"

The parser recognizes these flags (see lib/parsers/arg-parser.mjs FLAG_SCHEMA ultrareview entry):

Flag	Type	Purpose
`--project <dir>`	valued	Required. Path to ultraplan-local project folder containing `brief.md`.
`--since <ref>`	valued	Optional. Override "before" SHA for the diff. Validated via `git rev-parse --verify`.
`--quick`	boolean	Skip the brief-conformance pass; run only the code-correctness reviewer; skip the coordinator's reasonableness filter.
`--validate`	boolean	Schema-only check on existing `{project_dir}/review.md`. No LLM calls.
`--dry-run`	boolean	Print the discovered scope and triage map. Skip writes.
`--fg`	boolean	No-op alias (foreground is default).

Resolution:

If --project is missing, print usage and stop:

Error: --project <dir> is required.
Usage: /ultrareview-local --project <dir> [--since <ref>] [--quick] [--validate] [--dry-run]

Trim trailing slash from {dir}. Set:
- project_dir = {dir}
- brief_path = {dir}/brief.md
- review_path = {dir}/review.md

If {dir} does not exist or {dir}/brief.md is missing:

Error: project directory not initialized. Run /ultrabrief-local first.
Missing: {dir}/brief.md

Set mode:

validate if --validate is set (overrides everything else; skip to Phase 8.5).
dry-run if --dry-run is set.
quick if --quick is set.
default otherwise.

Phase 2 — Validate brief

Run the brief validator in soft mode — the brief is upstream context, not something this command produces, so partial grades are acceptable as long as the file is parseable:

node ${CLAUDE_PLUGIN_ROOT}/lib/validators/brief-validator.mjs --soft --json "{brief_path}"

Read the JSON output. If valid: false AND any error has code BRIEF_MISSING_REQUIRED_FIELD or FRONTMATTER_PARSE_ERROR: stop and ask the user to re-run /ultrabrief-local. Other soft errors become warnings in the review's Executive Summary.

Read the brief frontmatter. Capture for review.md:

task → review frontmatter task
slug → review frontmatter slug
project_dir → review frontmatter project_dir (defaults to the CLI --project value when missing)

Phase 3 — Discover scope SHA range

Determine the "before" SHA that bounds the review:

--since <ref> override — if set, validate via:
```
git rev-parse --verify "$since_ref"
```
On failure: print Error: --since ref is not a valid git revision: {ref} and stop. Set before_sha = $(git rev-parse --verify "$since_ref").
Preferred path — read {project_dir}/progress.json if it exists. Extract session_start_sha. Validate it via git rev-parse --verify. Set before_sha = session_start_sha.
Fallback — no progress.json. Use the brief's mtime to find the most recent commit at or before the brief was written:
```
brief_mtime=$(stat -f %m "{brief_path}")  # macOS; on Linux use stat -c %Y
before_sha=$(git log --until="@$brief_mtime" -n 1 --format=%H)
```
Emit a clear warning that gets surfaced in the review's Executive Summary: "scope_sha_start unavailable — falling back to brief mtime ({timestamp}). Coverage may include unrelated commits."

Compute the "after" SHA: after_sha=$(git rev-parse HEAD).

Capture working-tree changes (uncommitted at review time):

git diff --name-only "$before_sha".."$after_sha"
git diff --name-only HEAD       # uncommitted (annotated [uncommitted])

The combined file list is the review scope. Note that the [uncommitted] annotation is a brief-level contract — the brief's Assumptions section declares this is allowed; the review surfaces it explicitly in the Coverage table.

If the file count is 0, write a one-line review.md noting "No diff between {before_sha} and {after_sha}; nothing to review." Verdict: ALLOW. Skip Phases 4–7. Continue to Phase 8 (validate + stats).

Phase 4 — Triage gate (deterministic path-pattern classifier)

The triage gate is deterministic — no LLM judgment. It classifies every file from Phase 3 into a treatment bucket:

Treatment	When
`skip`	Matches `.lock`, `.svg`, `dist/`, `build/`, `node_modules/**`, OR the file's first 3 lines contain a generated-file marker (`@generated`, `Code generated by`, `DO NOT EDIT`).
`deep-review`	Matches `auth/`, `crypto/`, `/security/`, `hooks/**`.
`summary-only`	Default treatment for everything else.

Hard refuse-with-suggestion gates — use AskUserQuestion:

if (reviewed_files_count > 100) → ask user
if (estimated_diff_tokens > 100000) → ask user

Token estimation: wc -c "$diff_file" / 4 (rough proxy). Use AskUserQuestion with the prompt:

The diff under review is large ({N} files / ~{T} tokens). Continue with the full scope, narrow with --since <closer-ref>, or stop?

Options:

Continue — proceed at this scope.
Narrow — print suggested git log --oneline {before}..HEAD so the user can pick a closer ref, then stop.
Stop — cancel.

Record the treatment for every file. Files marked skip MUST appear in the Coverage section of review.md — never silently drop them. Silent drops are COVERAGE_SILENT_SKIP (MAJOR) per the rule catalogue.

If mode == dry-run: print the triage map and exit.

Phase 5 — Launch parallel reviewers

Launch two reviewer agents in parallel via the Agent tool — one message, multiple tool calls.

Reviewers run independently. Do NOT pre-feed findings between them.

Agent	Mode-gated	Purpose
`brief-conformance-reviewer`	Skipped in `quick`	Trace each Success Criterion + Non-Goal to delivered code. Emits findings tagged with rule_keys from the conformance/scope categories.
`code-correctness-reviewer`	Always runs	7-dimension code review. Emits findings tagged with rule_keys from the correctness/security/maintenance/tests categories.

Each reviewer prompt includes:

Diff context — the unified diff from Phase 3, truncated per file for files marked summary-only.
Triage map — full file list with treatments. Reviewers must respect skip decisions.
Brief path — {brief_path} (read on demand; do not inline).
Rule catalogue — reference to lib/review/rule-catalogue.mjs.

Collect each reviewer's trailing JSON block (last fenced json block in their output). Parse with JSON.parse(). On parse error, ask the agent to re-emit the JSON only.

In quick mode, launch only code-correctness-reviewer. The Executive Summary will note the brief-conformance pass was skipped.

Phase 6 — Coordinator dedup + verdict

Launch review-coordinator (Agent tool) with the merged findings array from Phase 5 plus the triage map, brief metadata, and SHA range.

The coordinator runs the 4-pass process documented in agents/review-coordinator.md:

Dedup by (file, line, rule_key) triplet.
HubSpot Judge filters — Succinctness, Accuracy, Actionability.
Cloudflare reasonableness — drop speculative or catalogue-violating findings (skipped in quick mode).
Verdict — BLOCK / WARN / ALLOW per the threshold table.

The coordinator's output is the full review.md content — frontmatter + body sections + trailing JSON block. Do NOT re-run the reviewers based on the coordinator's output.

Phase 7 — Write review.md

Write the coordinator's output verbatim to:

{project_dir}/review.md

Create parent directories if they do not exist. Atomic write pattern: write to a temp file, then rename. The frontmatter findings: field must use block-style YAML (one ID per line, - prefix). The parser at lib/util/frontmatter.mjs does not support flow-style arrays.

If mode == dry-run: skip the write; print the would-be path and the first 60 lines of the rendered output.

Phase 8 — Validate output + stats

Run the strict validator:

node ${CLAUDE_PLUGIN_ROOT}/lib/validators/review-validator.mjs --json "{review_path}"

If validation fails:

For repairable errors (missing required body section, malformed finding-ID, REVIEW_VERSION_FORMAT warning): repair in place — re-emit the missing section, recompute the finding-ID, fix the version string. Re-validate.
For unrepairable errors (REVIEW_WRONG_TYPE, malformed frontmatter): stop and ask the user to re-run; do not silently produce an invalid review.md.

Append a stats line to ${CLAUDE_PLUGIN_DATA}/ultrareview-stats.jsonl (create the file if it does not exist):

{"ts":"{ISO-8601}","slug":"{slug}","verdict":"BLOCK|WARN|ALLOW","counts":{"BLOCKER":N,"MAJOR":N,"MINOR":N,"SUGGESTION":N},"reviewed_files_count":N,"mode":"default|quick|validate|dry-run","duration_ms":N}

If ${CLAUDE_PLUGIN_DATA} is unset or not writable, skip stats silently. Never let stats failures block the main workflow.

Phase 8.5 — Validate-only mode (`--validate`)

When mode == validate:

Skip Phases 3–7 entirely.
Run the strict validator on {project_dir}/review.md.
Print a one-line PASS/FAIL summary plus the JSON output on FAIL.
Exit 0 on PASS, 1 on FAIL. Never write to disk. Never call any agent.

Phase 9 — Present summary

After the write succeeds, print:

## Ultrareview Complete

**Task:** {task}
**Mode:** {default | quick | dry-run}
**Brief:** {brief_path}
**Project:** {project_dir}
**Review:** {review_path}
**Scope:** {before_sha}..{after_sha} ({reviewed_files_count} files)
**Verdict:** {BLOCK | WARN | ALLOW}

### Counts
- BLOCKER: {N}
- MAJOR: {N}
- MINOR: {N}
- SUGGESTION: {N}

### Top findings
- [{severity}] {title} ({file}:{line})
  ...
{up to 5 highest-severity findings}

You can:
- Read the full review at {review_path}
- Feed BLOCKER + MAJOR findings into a follow-up plan:
    /ultraplan-local --brief {review_path}
- Re-run with `--quick` for a faster correctness-only pass
- Re-run with `--since <ref>` to narrow scope

Per Handover 6, BLOCKER and MAJOR findings are consumed by /ultraplan-local --brief review.md to produce a remediation plan. The review's frontmatter findings: list and the trailing JSON block are the contract for that handover (see docs/HANDOVER-CONTRACTS.md).

Hard rules

Brief is the contract. Every finding in the review traces to a brief section via brief_ref, except SCOPE_CREEP_BUILT (which traces to "no anchor"). Conformance is the conformance reviewer's job — code-correctness findings carry generic anchors like "NFR — code correctness".
Independent reviewers. Do NOT cross-feed findings between brief-conformance-reviewer and code-correctness-reviewer. The coordinator is the only place where outputs combine.
Bounded coordination. Synthesis-level inference across files is forbidden in v1.0. The coordinator dedups, filters, and computes the verdict — nothing more.
Triage map respected. Files marked skip MUST appear in the Coverage section. Silent drops are COVERAGE_SILENT_SKIP (MAJOR).
Block-style YAML for findings list. The frontmatter parser does not support flow-style arrays. findings: [a, b] is broken; use findings:\n - a\n - b.
Refuse-with-suggestion above 100 files / 100K tokens. Never run blind on a giant diff. Use AskUserQuestion to surface the gate.
Cost. Sonnet for all sub-agents (reviewers + coordinator). Opus only runs in the main /ultrareview-local command thread.
Privacy. Never log secrets, tokens, or credentials in review.md. Findings citing files with secret-like content must redact the secret in the detail field.
Honesty. If the diff is trivially small or all-skip, say so. Do not pad findings to make the review look thorough.
No production code. This command never runs production code, never writes to anything outside {project_dir} and ${CLAUDE_PLUGIN_DATA}.

13 KiB Raw Blame History Unescape Escape