Kjell Tore Guttormsen 8ea692bc60 chore(voyage): release v5.0.2 — operator-driven annotation HTML (scripts/annotate.mjs)

v5.0.0 added a read-only HTML render. v5.0.1 deleted that and pointed at
/playground document-critique, which pre-generates Claude's suggestions
and asks the operator to approve/reject them. The operator asked for the
opposite — a surface where THEY drive every annotation. v5.0.2 lands it.

scripts/annotate.mjs (~430 lines, zero deps) takes any artifact .md and
writes a self-contained HTML next to it. The HTML renders the document
with line numbers, lets the operator click any line to add their own
note (inline textarea, save with Cmd+Enter or button), keeps a sidebar
of all notes (editable + deletable + persisted in localStorage per
artifact path), and exposes Copy Prompt to gather every note into one
structured prompt. Operator copies, pastes back, Claude revises the .md.

The three producing commands now run annotate.mjs at their last step and
print the file:// link with explicit "Click any line to add YOUR OWN note"
instructions. The v5.0.1 /playground document-critique line is gone.

npm test green: 516 tests, 514 pass, 0 fail, 2 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-13 14:04:28 +02:00

15 KiB

Raw Blame History

name	description	argument-hint	model	allowed-tools
trekreview	Independent post-hoc review of delivered code against the brief. Produces review.md with severity-tagged findings (BLOCKER/MAJOR/MINOR/SUGGESTION) per Handover 6 (review → plan).	--project <dir> [--since <ref>] [--quick] [--validate] [--dry-run]	opus	Agent, Read, Glob, Grep, Write, Edit, Bash, AskUserQuestion

Ultrareview Local v1.0

Independent post-hoc review of code delivered by /trekexecute against the contract in brief.md. Produces review.md — a structured artifact with severity-tagged findings that /trekplan --brief review.md can consume as plan input (Handover 6).

Pipeline position:

/trekbrief     →  brief.md
/trekresearch  →  research/*.md
/trekplan      →  plan.md
/trekexecute   →  progress.json (+ commits)
/trekreview    →  review.md            (this command)

The review is independent: each reviewer runs without cross-feeding, and the coordinator applies BOUNDED operations only. Synthesis-level inference across files is forbidden in v1.0 (Judge Agent pattern).

See agents/review-orchestrator.md for the canonical workflow this command executes inline.

Phase 1 — Parse mode and validate input

Parse $ARGUMENTS via the shared arg-parser:

node ${CLAUDE_PLUGIN_ROOT}/lib/parsers/arg-parser.mjs --command trekreview "$@"

The parser recognizes these flags (see lib/parsers/arg-parser.mjs FLAG_SCHEMA trekreview entry):

Flag	Type	Purpose
`--project <dir>`	valued	Required. Path to trekplan project folder containing `brief.md`.
`--since <ref>`	valued	Optional. Override "before" SHA for the diff. Validated via `git rev-parse --verify`.
`--quick`	boolean	Skip the brief-conformance pass; run only the code-correctness reviewer; skip the coordinator's reasonableness filter.
`--validate`	boolean	Schema-only check on existing `{project_dir}/review.md`. No LLM calls.
`--dry-run`	boolean	Print the discovered scope and triage map. Skip writes.
`--fg`	boolean	No-op alias (foreground is default).

Resolution:

If --project is missing, print usage and stop:

Error: --project <dir> is required.
Usage: /trekreview --project <dir> [--since <ref>] [--quick] [--validate] [--dry-run]

Trim trailing slash from {dir}. Set:
- project_dir = {dir}
- brief_path = {dir}/brief.md
- review_path = {dir}/review.md

If {dir} does not exist or {dir}/brief.md is missing:

Error: project directory not initialized. Run /trekbrief first.
Missing: {dir}/brief.md

Set mode:

validate if --validate is set (overrides everything else; skip to Phase 8.5).
dry-run if --dry-run is set.
quick if --quick is set.
default otherwise.

Phase 2 — Validate brief

Run the brief validator in soft mode — the brief is upstream context, not something this command produces, so partial grades are acceptable as long as the file is parseable:

node ${CLAUDE_PLUGIN_ROOT}/lib/validators/brief-validator.mjs --soft --json "{brief_path}"

Read the JSON output. If valid: false AND any error has code BRIEF_MISSING_REQUIRED_FIELD or FRONTMATTER_PARSE_ERROR: stop and ask the user to re-run /trekbrief. Other soft errors become warnings in the review's Executive Summary.

Read the brief frontmatter. Capture for review.md:

task → review frontmatter task
slug → review frontmatter slug
project_dir → review frontmatter project_dir (defaults to the CLI --project value when missing)

Phase 3 — Discover scope SHA range

Determine the "before" SHA that bounds the review:

--since <ref> override — if set, validate via:
```
git rev-parse --verify "$since_ref"
```
On failure: print Error: --since ref is not a valid git revision: {ref} and stop. Set before_sha = $(git rev-parse --verify "$since_ref").
Preferred path — read {project_dir}/progress.json if it exists. Extract session_start_sha. Validate it via git rev-parse --verify. Set before_sha = session_start_sha.
Fallback — no progress.json. Use the brief's mtime to find the most recent commit at or before the brief was written:
```
brief_mtime=$(stat -f %m "{brief_path}")  # macOS; on Linux use stat -c %Y
before_sha=$(git log --until="@$brief_mtime" -n 1 --format=%H)
```
Emit a clear warning that gets surfaced in the review's Executive Summary: "scope_sha_start unavailable — falling back to brief mtime ({timestamp}). Coverage may include unrelated commits."

Compute the "after" SHA: after_sha=$(git rev-parse HEAD).

Capture working-tree changes (uncommitted at review time):

git diff --name-only "$before_sha".."$after_sha"
git diff --name-only HEAD       # uncommitted (annotated [uncommitted])

The combined file list is the review scope. Note that the [uncommitted] annotation is a brief-level contract — the brief's Assumptions section declares this is allowed; the review surfaces it explicitly in the Coverage table.

If the file count is 0, write a one-line review.md noting "No diff between {before_sha} and {after_sha}; nothing to review." Verdict: ALLOW. Skip Phases 4–7. Continue to Phase 8 (validate + stats).

Phase 4 — Triage gate (deterministic path-pattern classifier)

The triage gate is deterministic — no LLM judgment. It classifies every file from Phase 3 into a treatment bucket:

Treatment	When
`skip`	Matches `.lock`, `.svg`, `dist/`, `build/`, `node_modules/**`, OR the file's first 3 lines contain a generated-file marker (`@generated`, `Code generated by`, `DO NOT EDIT`).
`deep-review`	Matches `auth/`, `crypto/`, `/security/`, `hooks/**`.
`summary-only`	Default treatment for everything else.

Hard refuse-with-suggestion gates — use AskUserQuestion:

if (reviewed_files_count > 100) → ask user
if (estimated_diff_tokens > 100000) → ask user

Token estimation: wc -c "$diff_file" / 4 (rough proxy). Use AskUserQuestion with the prompt:

The diff under review is large ({N} files / ~{T} tokens). Continue with the full scope, narrow with --since <closer-ref>, or stop?

Options:

Continue — proceed at this scope.
Narrow — print suggested git log --oneline {before}..HEAD so the user can pick a closer ref, then stop.
Stop — cancel.

Record the treatment for every file. Files marked skip MUST appear in the Coverage section of review.md — never silently drop them. Silent drops are COVERAGE_SILENT_SKIP (MAJOR) per the rule catalogue.

If mode == dry-run: print the triage map and exit.

Phase 5 — Launch parallel reviewers

Launch two reviewer agents in parallel via the Agent tool — one message, multiple tool calls.

Reviewers run independently. Do NOT pre-feed findings between them.

Agent	Mode-gated	Purpose
`brief-conformance-reviewer`	Skipped in `quick`	Trace each Success Criterion + Non-Goal to delivered code. Emits findings tagged with rule_keys from the conformance/scope categories.
`code-correctness-reviewer`	Always runs	7-dimension code review. Emits findings tagged with rule_keys from the correctness/security/maintenance/tests categories.

Each reviewer prompt includes:

Diff context — the unified diff from Phase 3, truncated per file for files marked summary-only.
Triage map — full file list with treatments. Reviewers must respect skip decisions.
Brief path — {brief_path} (read on demand; do not inline).
Rule catalogue — reference to lib/review/rule-catalogue.mjs.

Collect each reviewer's trailing JSON block (last fenced json block in their output). Parse with JSON.parse(). On parse error, ask the agent to re-emit the JSON only.

In quick mode, launch only code-correctness-reviewer. The Executive Summary will note the brief-conformance pass was skipped.

Phase 6 — Coordinator dedup + verdict

Launch review-coordinator (Agent tool) with the merged findings array from Phase 5 plus the triage map, brief metadata, and SHA range.

The coordinator runs the 4-pass process documented in agents/review-coordinator.md:

Dedup by (file, line, rule_key) triplet.
HubSpot Judge filters — Succinctness, Accuracy, Actionability.
Cloudflare reasonableness — drop speculative or catalogue-violating findings (skipped in quick mode).
Verdict — BLOCK / WARN / ALLOW per the threshold table.

The coordinator's output is the full review.md content — frontmatter + body sections + trailing JSON block. Do NOT re-run the reviewers based on the coordinator's output.

Phase 7 — Write review.md

Write the coordinator's output verbatim to:

{project_dir}/review.md

Create parent directories if they do not exist. Atomic write pattern: write to a temp file, then rename. The frontmatter findings: field must use block-style YAML (one ID per line, - prefix). The parser at lib/util/frontmatter.mjs does not support flow-style arrays.

If mode == dry-run: skip the write; print the would-be path and the first 60 lines of the rendered output.

Phase 8 — Validate output + stats

Run the strict validator:

node ${CLAUDE_PLUGIN_ROOT}/lib/validators/review-validator.mjs --json "{review_path}"

If validation fails:

For repairable errors (missing required body section, malformed finding-ID, REVIEW_VERSION_FORMAT warning): repair in place — re-emit the missing section, recompute the finding-ID, fix the version string. Re-validate.
For unrepairable errors (REVIEW_WRONG_TYPE, malformed frontmatter): stop and ask the user to re-run; do not silently produce an invalid review.md.

Append a stats line to ${CLAUDE_PLUGIN_DATA}/trekreview-stats.jsonl (create the file if it does not exist):

{"ts":"{ISO-8601}","slug":"{slug}","verdict":"BLOCK|WARN|ALLOW","counts":{"BLOCKER":N,"MAJOR":N,"MINOR":N,"SUGGESTION":N},"reviewed_files_count":N,"mode":"default|quick|validate|dry-run","duration_ms":N}

If ${CLAUDE_PLUGIN_DATA} is unset or not writable, skip stats silently. Never let stats failures block the main workflow.

Build the operator-annotation HTML. After stats land, run:

ANNOT_HTML=$(node ${CLAUDE_PLUGIN_ROOT}/scripts/annotate.mjs "{review_path}" 2>&1)

stdout is the absolute path to the .html on success. The HTML renders review.md with line numbers, lets the operator click any line to attach their own note (not Claude-generated suggestions — the operator drives every annotation), keeps a sidebar of all notes, persists state in localStorage, and exposes a "Copy Prompt" button. If annotate.mjs exits non-zero, surface a one-line warning and continue — the annotation HTML is a convenience, not a gate.

Phase 8.5 — Validate-only mode (`--validate`)

When mode == validate:

Skip Phases 3–7 entirely.
Run the strict validator on {project_dir}/review.md.
Print a one-line PASS/FAIL summary plus the JSON output on FAIL.
Exit 0 on PASS, 1 on FAIL. Never write to disk. Never call any agent.

Phase 9 — Present summary

After the write succeeds, print:

## Ultrareview Complete

**Task:** {task}
**Mode:** {default | quick | dry-run}
**Brief:** {brief_path}
**Project:** {project_dir}
**Review:** {review_path}
**Annotation HTML:** file://{$ANNOT_HTML}
**Scope:** {before_sha}..{after_sha} ({reviewed_files_count} files)
**Verdict:** {BLOCK | WARN | ALLOW}

### Counts
- BLOCKER: {N}
- MAJOR: {N}
- MINOR: {N}
- SUGGESTION: {N}

### Top findings
- [{severity}] {title} ({file}:{line})
  ...
{up to 5 highest-severity findings}

────────────────────────────────────────────────────────────────────
To review and annotate the review, open it in a browser:

    open file://{$ANNOT_HTML}

Click any line to add YOUR OWN note. The sidebar collects every note,
the "Copy Prompt" button gathers them into one structured prompt.
Paste that prompt back into this chat and Claude revises review.md
from your notes. Annotations persist in your browser if you close
the tab and reopen the same file.
────────────────────────────────────────────────────────────────────

You can also:
- Feed BLOCKER + MAJOR findings into a follow-up plan:
    /trekplan --brief {review_path}
- Re-run with `--quick` for a faster correctness-only pass
- Re-run with `--since <ref>` to narrow scope

Per Handover 6, BLOCKER and MAJOR findings are consumed by /trekplan --brief review.md to produce a remediation plan. The review's frontmatter findings: list and the trailing JSON block are the contract for that handover (see docs/HANDOVER-CONTRACTS.md).

Profile (v4.1)

Accepts --profile <name> where <name> is economy, balanced, premium, or a custom profile under voyage-profiles/. Default: premium.

Resolution order (per lib/profiles/resolver.mjs):

--profile flag (source: flag)
VOYAGE_PROFILE env-var (source: env)
premium default (source: default)

The selected profile drives phase_models.review — economy uses sonnet for the brief-conformance + code-correctness reviewers; balanced and premium use opus (review benefits from deeper reasoning).

Examples:

/trekreview --profile balanced --project .claude/projects/2026-05-09-add-auth
VOYAGE_PROFILE=premium /trekreview --project ...

Stats records emit profile and profile_source.

Hard rules

Brief is the contract. Every finding in the review traces to a brief section via brief_ref, except SCOPE_CREEP_BUILT (which traces to "no anchor"). Conformance is the conformance reviewer's job — code-correctness findings carry generic anchors like "NFR — code correctness".
Independent reviewers. Do NOT cross-feed findings between brief-conformance-reviewer and code-correctness-reviewer. The coordinator is the only place where outputs combine.
Bounded coordination. Synthesis-level inference across files is forbidden in v1.0. The coordinator dedups, filters, and computes the verdict — nothing more.
Triage map respected. Files marked skip MUST appear in the Coverage section. Silent drops are COVERAGE_SILENT_SKIP (MAJOR).
Block-style YAML for findings list. The frontmatter parser does not support flow-style arrays. findings: [a, b] is broken; use findings:\n - a\n - b.
Refuse-with-suggestion above 100 files / 100K tokens. Never run blind on a giant diff. Use AskUserQuestion to surface the gate.
Cost. Sonnet for all sub-agents (reviewers + coordinator). Opus only runs in the main /trekreview command thread.
Privacy. Never log secrets, tokens, or credentials in review.md. Findings citing files with secret-like content must redact the secret in the detail field.
Honesty. If the diff is trivially small or all-skip, say so. Do not pad findings to make the review look thorough.
No production code. This command never runs production code, never writes to anything outside {project_dir} and ${CLAUDE_PLUGIN_DATA}.

15 KiB Raw Blame History Unescape Escape