ktg-plugin-marketplace/plugins/voyage/commands/trekreview.md

418 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
name: trekreview
description: |
Independent post-hoc review of delivered code against the brief. Produces
review.md with severity-tagged findings (BLOCKER/MAJOR/MINOR/SUGGESTION)
per Handover 6 (review → plan).
argument-hint: "--project <dir> [--since <ref>] [--quick] [--validate] [--dry-run]"
model: opus
allowed-tools: Agent, Read, Glob, Grep, Write, Edit, Bash, AskUserQuestion
---
# Ultrareview Local v1.0
Independent post-hoc review of code delivered by `/trekexecute`
against the contract in `brief.md`. Produces `review.md` — a structured
artifact with severity-tagged findings that `/trekplan --brief
review.md` can consume as plan input (Handover 6).
Pipeline position:
```
/trekbrief → brief.md
/trekresearch → research/*.md
/trekplan → plan.md
/trekexecute → progress.json (+ commits)
/trekreview → review.md (this command)
```
The review is **independent**: each reviewer runs without cross-feeding,
and the coordinator applies BOUNDED operations only. Synthesis-level
inference across files is forbidden in v1.0 (Judge Agent pattern).
See `agents/review-orchestrator.md` for the canonical workflow this
command executes inline.
## Phase 1 — Parse mode and validate input
Parse `$ARGUMENTS` via the shared arg-parser:
```bash
node ${CLAUDE_PLUGIN_ROOT}/lib/parsers/arg-parser.mjs --command trekreview "$@"
```
The parser recognizes these flags (see `lib/parsers/arg-parser.mjs`
FLAG_SCHEMA `trekreview` entry):
| Flag | Type | Purpose |
|------|------|---------|
| `--project <dir>` | valued | Required. Path to trekplan project folder containing `brief.md`. |
| `--since <ref>` | valued | Optional. Override "before" SHA for the diff. Validated via `git rev-parse --verify`. |
| `--quick` | boolean | Skip the brief-conformance pass; run only the code-correctness reviewer; skip the coordinator's reasonableness filter. |
| `--validate` | boolean | Schema-only check on existing `{project_dir}/review.md`. No LLM calls. |
| `--dry-run` | boolean | Print the discovered scope and triage map. Skip writes. |
| `--fg` | boolean | No-op alias (foreground is default). |
Resolution:
1. If `--project` is missing, print usage and stop:
```
Error: --project <dir> is required.
Usage: /trekreview --project <dir> [--since <ref>] [--quick] [--validate] [--dry-run]
```
2. Trim trailing slash from `{dir}`. Set:
- `project_dir = {dir}`
- `brief_path = {dir}/brief.md`
- `review_path = {dir}/review.md`
3. If `{dir}` does not exist or `{dir}/brief.md` is missing:
```
Error: project directory not initialized. Run /trekbrief first.
Missing: {dir}/brief.md
```
Set `mode`:
- `validate` if `--validate` is set (overrides everything else; skip to Phase 8.5).
- `dry-run` if `--dry-run` is set.
- `quick` if `--quick` is set.
- `default` otherwise.
## Phase 2 — Validate brief
Run the brief validator in soft mode — the brief is upstream context, not
something this command produces, so partial grades are acceptable as long
as the file is parseable:
```bash
node ${CLAUDE_PLUGIN_ROOT}/lib/validators/brief-validator.mjs --soft --json "{brief_path}"
```
Read the JSON output. If `valid: false` AND any error has code
`BRIEF_MISSING_REQUIRED_FIELD` or `FRONTMATTER_PARSE_ERROR`: stop and
ask the user to re-run `/trekbrief`. Other soft errors become
warnings in the review's Executive Summary.
Read the brief frontmatter. Capture for review.md:
- `task` → review frontmatter `task`
- `slug` → review frontmatter `slug`
- `project_dir` → review frontmatter `project_dir` (defaults to the
CLI `--project` value when missing)
## Phase 3 — Discover scope SHA range
Determine the "before" SHA that bounds the review:
1. **`--since <ref>` override** — if set, validate via:
```bash
git rev-parse --verify "$since_ref"
```
On failure: print `Error: --since ref is not a valid git revision: {ref}` and stop.
Set `before_sha = $(git rev-parse --verify "$since_ref")`.
2. **Preferred path** — read `{project_dir}/progress.json` if it exists.
Extract `session_start_sha`. Validate it via `git rev-parse --verify`.
Set `before_sha = session_start_sha`.
3. **Fallback** — no `progress.json`. Use the brief's mtime to find the
most recent commit at or before the brief was written:
```bash
brief_mtime=$(stat -f %m "{brief_path}") # macOS; on Linux use stat -c %Y
before_sha=$(git log --until="@$brief_mtime" -n 1 --format=%H)
```
Emit a clear warning that gets surfaced in the review's Executive
Summary: "scope_sha_start unavailable — falling back to brief mtime
({timestamp}). Coverage may include unrelated commits."
Compute the "after" SHA: `after_sha=$(git rev-parse HEAD)`.
Capture working-tree changes (uncommitted at review time):
```bash
git diff --name-only "$before_sha".."$after_sha"
git diff --name-only HEAD # uncommitted (annotated [uncommitted])
```
The combined file list is the review scope. Note that the
`[uncommitted]` annotation is a **brief-level contract** — the brief's
Assumptions section declares this is allowed; the review surfaces it
explicitly in the Coverage table.
If the file count is `0`, write a one-line review.md noting "No diff
between {before_sha} and {after_sha}; nothing to review." Verdict: ALLOW.
Skip Phases 47. Continue to Phase 8 (validate + stats).
## Phase 4 — Triage gate (deterministic path-pattern classifier)
The triage gate is **deterministic** — no LLM judgment. It classifies
every file from Phase 3 into a treatment bucket:
| Treatment | When |
|-----------|------|
| `skip` | Matches `*.lock`, `*.svg`, `dist/**`, `build/**`, `node_modules/**`, OR the file's first 3 lines contain a generated-file marker (`@generated`, `Code generated by`, `DO NOT EDIT`). |
| `deep-review` | Matches `auth/**`, `crypto/**`, `**/security/**`, `hooks/**`. |
| `summary-only` | Default treatment for everything else. |
Hard refuse-with-suggestion gates — use `AskUserQuestion`:
```
if (reviewed_files_count > 100) → ask user
if (estimated_diff_tokens > 100000) → ask user
```
Token estimation: `wc -c "$diff_file" / 4` (rough proxy). Use
`AskUserQuestion` with the prompt:
> The diff under review is large (`{N}` files / `~{T}` tokens). Continue
> with the full scope, narrow with `--since <closer-ref>`, or stop?
Options:
1. **Continue** — proceed at this scope.
2. **Narrow** — print suggested `git log --oneline {before}..HEAD` so the
user can pick a closer ref, then stop.
3. **Stop** — cancel.
Record the treatment for every file. Files marked `skip` MUST appear in
the Coverage section of `review.md` — never silently drop them. Silent
drops are `COVERAGE_SILENT_SKIP` (MAJOR) per the rule catalogue.
If `mode == dry-run`: print the triage map and exit.
## Phase 5 — Launch parallel reviewers
Launch two reviewer agents **in parallel** via the Agent tool — one
message, multiple tool calls.
Reviewers run independently. Do NOT pre-feed findings between them.
| Agent | Mode-gated | Purpose |
|-------|------------|---------|
| `brief-conformance-reviewer` | Skipped in `quick` | Trace each Success Criterion + Non-Goal to delivered code. Emits findings tagged with rule_keys from the conformance/scope categories. |
| `code-correctness-reviewer` | Always runs | 7-dimension code review. Emits findings tagged with rule_keys from the correctness/security/maintenance/tests categories. |
Each reviewer prompt includes:
- **Diff context** — the unified diff from Phase 3, truncated per file
for files marked `summary-only`.
- **Triage map** — full file list with treatments. Reviewers must
respect `skip` decisions.
- **Brief path** — `{brief_path}` (read on demand; do not inline).
- **Rule catalogue** — reference to `lib/review/rule-catalogue.mjs`.
Collect each reviewer's trailing JSON block (last fenced `json` block in
their output). Parse with `JSON.parse()`. On parse error, ask the agent
to re-emit the JSON only.
In `quick` mode, launch only `code-correctness-reviewer`. The Executive
Summary will note the brief-conformance pass was skipped.
## Phase 6 — Coordinator dedup + verdict
Launch `review-coordinator` (Agent tool) with the merged findings array
from Phase 5 plus the triage map, brief metadata, and SHA range.
The coordinator runs the 4-pass process documented in
`agents/review-coordinator.md`:
1. **Dedup** by `(file, line, rule_key)` triplet.
2. **HubSpot Judge filters** — Succinctness, Accuracy, Actionability.
3. **Cloudflare reasonableness** — drop speculative or catalogue-violating
findings (skipped in `quick` mode).
4. **Verdict** — BLOCK / WARN / ALLOW per the threshold table.
The coordinator's output is the full review.md content — frontmatter +
body sections + trailing JSON block. Do NOT re-run the reviewers based
on the coordinator's output.
## Phase 7 — Write review.md
Write the coordinator's output verbatim to:
```
{project_dir}/review.md
```
Create parent directories if they do not exist. Atomic write pattern:
write to a temp file, then rename. The frontmatter `findings:` field
must use **block-style YAML** (one ID per line, ` - ` prefix). The
parser at `lib/util/frontmatter.mjs` does not support flow-style arrays.
If `mode == dry-run`: skip the write; print the would-be path and the
first 60 lines of the rendered output.
## Phase 8 — Validate output + stats
Run the strict validator:
```bash
node ${CLAUDE_PLUGIN_ROOT}/lib/validators/review-validator.mjs --json "{review_path}"
```
If validation fails:
- For repairable errors (missing required body section, malformed
finding-ID, REVIEW_VERSION_FORMAT warning): repair in place — re-emit
the missing section, recompute the finding-ID, fix the version
string. Re-validate.
- For unrepairable errors (REVIEW_WRONG_TYPE, malformed frontmatter):
stop and ask the user to re-run; do not silently produce an invalid
review.md.
Append a stats line to `${CLAUDE_PLUGIN_DATA}/trekreview-stats.jsonl`
(create the file if it does not exist):
```json
{"ts":"{ISO-8601}","slug":"{slug}","verdict":"BLOCK|WARN|ALLOW","counts":{"BLOCKER":N,"MAJOR":N,"MINOR":N,"SUGGESTION":N},"reviewed_files_count":N,"mode":"default|quick|validate|dry-run","duration_ms":N}
```
If `${CLAUDE_PLUGIN_DATA}` is unset or not writable, skip stats silently.
Never let stats failures block the main workflow.
**Build the operator-annotation HTML.** After stats land, run:
```bash
ANNOT_HTML=$(node ${CLAUDE_PLUGIN_ROOT}/scripts/annotate.mjs "{review_path}" 2>&1)
```
`stdout` is the absolute path to the `.html` on success. The HTML renders
`review.md` with line numbers, lets the operator click any line to attach
their own note (not Claude-generated suggestions — the operator drives
every annotation), keeps a sidebar of all notes, persists state in
localStorage, and exposes a "Copy Prompt" button. If `annotate.mjs`
exits non-zero, surface a one-line warning and continue — the annotation
HTML is a convenience, not a gate.
## Phase 8.5 — Validate-only mode (`--validate`)
When `mode == validate`:
1. Skip Phases 37 entirely.
2. Run the strict validator on `{project_dir}/review.md`.
3. Print a one-line PASS/FAIL summary plus the JSON output on FAIL.
4. Exit 0 on PASS, 1 on FAIL. Never write to disk. Never call any agent.
## Phase 9 — Present summary
After the write succeeds, print:
```
## Ultrareview Complete
**Task:** {task}
**Mode:** {default | quick | dry-run}
**Brief:** {brief_path}
**Project:** {project_dir}
**Review:** {review_path}
**Annotation HTML:** file://{$ANNOT_HTML}
**Scope:** {before_sha}..{after_sha} ({reviewed_files_count} files)
**Verdict:** {BLOCK | WARN | ALLOW}
### Counts
- BLOCKER: {N}
- MAJOR: {N}
- MINOR: {N}
- SUGGESTION: {N}
### Top findings
- [{severity}] {title} ({file}:{line})
...
{up to 5 highest-severity findings}
────────────────────────────────────────────────────────────────────
To review and annotate the review, open it in a browser:
open file://{$ANNOT_HTML}
Click any line to add YOUR OWN note. The sidebar collects every note,
the "Copy Prompt" button gathers them into one structured prompt.
Paste that prompt back into this chat and Claude revises review.md
from your notes. Annotations persist in your browser if you close
the tab and reopen the same file.
────────────────────────────────────────────────────────────────────
You can also:
- Feed BLOCKER + MAJOR findings into a follow-up plan:
/trekplan --brief {review_path}
- Re-run with `--quick` for a faster correctness-only pass
- Re-run with `--since <ref>` to narrow scope
```
Per **Handover 6**, BLOCKER and MAJOR findings are consumed by
`/trekplan --brief review.md` to produce a remediation plan. The
review's frontmatter `findings:` list and the trailing JSON block are
the contract for that handover (see `docs/HANDOVER-CONTRACTS.md`).
## Profile (v4.1)
Accepts `--profile <name>` where `<name>` is `economy`, `balanced`, `premium`,
or a custom profile under `voyage-profiles/`. Default: `premium`.
Resolution order (per `lib/profiles/resolver.mjs`):
1. `--profile` flag (source: `flag`)
2. `VOYAGE_PROFILE` env-var (source: `env`)
3. `premium` default (source: `default`)
The selected profile drives `phase_models.review` — `economy` uses sonnet
for the brief-conformance + code-correctness reviewers; `balanced` and
`premium` use opus (review benefits from deeper reasoning).
Examples:
```
/trekreview --profile balanced --project .claude/projects/2026-05-09-add-auth
VOYAGE_PROFILE=premium /trekreview --project ...
```
Stats records emit `profile` and `profile_source`.
## Composition rule (v5.1)
Independent of the profile system. When `brief.md` carries
`phase_signals` (brief_version ≥ 2.1), each downstream phase resolves
effort + model as:
```
effort_for_phase = brief.phase_signals[<phase>]?.effort ?? 'standard'
model_for_phase = brief.phase_signals[<phase>]?.model ?? profile.phase_models[<phase>]
```
The brief signal wins per-phase when present; the profile fills any
gaps. There is no helper module — composition is documented prose in
each downstream command.
For `/trekreview` specifically: `effort == 'low'` activates the existing
`--quick`-equivalent code-path (skip the brief-conformance reviewer; run
correctness-only). `effort == 'standard'` (or absent) → no change.
High-effort behavior is deferred to v5.1.1 per brief Non-Goal.
### Sequencing gate surface
Phase 1 already calls `brief-validator.mjs --soft` against `{brief_path}`.
If the validator returns `BRIEF_V51_MISSING_SIGNALS` in `errors`
(brief_version ≥ 2.1 without `phase_signals` or `phase_signals_partial:
true`), halt with: `Brief is brief_version 2.1 but does not carry
phase_signals — re-run /trekbrief to commit them (Phase 3.5).`
Enforcement is validator-only; this surface just makes the friendly hint
readable.
## Hard rules
- **Brief is the contract.** Every finding in the review traces to a
brief section via `brief_ref`, except `SCOPE_CREEP_BUILT` (which
traces to "no anchor"). Conformance is the conformance reviewer's
job — code-correctness findings carry generic anchors like
`"NFR — code correctness"`.
- **Independent reviewers.** Do NOT cross-feed findings between
brief-conformance-reviewer and code-correctness-reviewer. The
coordinator is the only place where outputs combine.
- **Bounded coordination.** Synthesis-level inference across files is
forbidden in v1.0. The coordinator dedups, filters, and computes the
verdict — nothing more.
- **Triage map respected.** Files marked `skip` MUST appear in the
Coverage section. Silent drops are `COVERAGE_SILENT_SKIP` (MAJOR).
- **Block-style YAML for findings list.** The frontmatter parser does
not support flow-style arrays. `findings: [a, b]` is broken; use
`findings:\n - a\n - b`.
- **Refuse-with-suggestion above 100 files / 100K tokens.** Never run
blind on a giant diff. Use AskUserQuestion to surface the gate.
- **Cost.** Sonnet for all sub-agents (reviewers + coordinator). Opus
only runs in the main /trekreview command thread.
- **Privacy.** Never log secrets, tokens, or credentials in review.md.
Findings citing files with secret-like content must redact the secret
in the `detail` field.
- **Honesty.** If the diff is trivially small or all-skip, say so. Do
not pad findings to make the review look thorough.
- **No production code.** This command never runs production code, never
writes to anything outside `{project_dir}` and `${CLAUDE_PLUGIN_DATA}`.