Kjell Tore Guttormsen f2f8246e01 docs(voyage): document v4.1 profiles + observability + doc-consistency-pinning

Step 22 of v4.1 — write top-level docs for the v4.1 feature surface.

Files updated:
  CLAUDE.md       — Commands tables: add --profile to all 6 modes
                    + new ## Profile system + ## Observability sections
  README.md       — per-command Modes tables: add --profile row
                    + new top-level ## Profile system + ## Observability
                    + cross-link from ## Cost profile
  CHANGELOG.md    — new "## v4.1.0 — 2026-05-09" entry per Keep-a-Changelog 1.1.0
                    (Added / Changed / Fixed / Notes)
  docs/profiles.md — NEW: 168-line decision tree, lookup precedence,
                    custom-profile authoring, drift detection,
                    cost-estimation table with disclaimer
  tests/lib/doc-consistency.test.mjs — extend with 5 new pinning tests:
                    CLAUDE.md --profile + phase_models canonical name,
                    README.md --profile coverage (≥ 6 mentions),
                    CHANGELOG.md v4.1.0 entry, docs/profiles.md substantive

ROADMAP.md is gitignored per marketplace policy (sesjonsfiler) — local
edit applied for v4.1 DONE marker, not committed.

Plan-critic Blocker 2 split is honored: Step 21 pinned commands-only;
Step 22 writes the docs and pins them. doc-consistency.test.mjs is
green AFTER Step 22 (would have failed if Step 22 ran in same wave).

Tests: 487 pass + 2 skipped (Docker not installed).

2026-05-09 10:09:44 +02:00

7 KiB

Raw Blame History

Profile system — voyage v4.1

This document describes the model profile system: built-in tiers, lookup precedence, custom-profile authoring, drift detection, and cost estimation (with disclaimer).

Built-in profiles

Three pre-defined tiers ship with v4.1, located at lib/profiles/{economy,balanced,premium}.yaml.

Profile	Brief	Research	Plan	Execute	Review	Continue	Use case
`economy`	sonnet	sonnet	sonnet	sonnet	sonnet	sonnet	Lowest cost; small-scope tasks where you have high confidence the brief is right
`balanced` (default)	sonnet	sonnet	opus	sonnet	opus	sonnet	Default — opus where reasoning depth pays off (plan synthesis + adversarial review)
`premium`	opus	sonnet	opus	sonnet	opus	sonnet	Critical-path planning + review when budget allows

balanced is the v4.1 default. It puts opus on the two phases where quality matters most (Plan synthesis + Review) and sonnet everywhere else. This lands the cost/quality trade-off that solo-developers and small teams actually want.

economy is strictly experimental in v4.1. The cross-tier Jaccard floor (0.55) is grounded in parked-synthetic fixtures, not empirical runs (Step 17 calibration was deferred — see tests/synthetic/profile-jaccard-calibration.md). If you observe economy-plan quality regressions, fall back to balanced.

Decision tree

Are you uncertain whether the brief is correctly framed?
  └── Yes → premium (opus on brief + plan + review)
  └── No  → continue
            ↓
Is the change small (≤ 5 steps in the plan)?
  └── Yes → economy (sonnet everywhere)
  └── No  → balanced (opus on plan + review)

Special cases:
  - Critical-infrastructure plan          → premium
  - Migration with rollback risk          → premium
  - Research-heavy task (≥ 4 dimensions)  → balanced (research-stage benefits)
  - Bug fix with clear reproducer         → economy
  - Documentation-only PR                 → economy

Lookup order

Voyage resolves the profile in this priority order:

Explicit --profile <name> flag — passed to the command
Plan-file frontmatter profile: — when resuming via /trekexecute --resume or /trekcontinue
VOYAGE_PROFILE environment variable — useful for headless CI
Default balanced — final fallback

The resolved value is recorded in two places:

Plan-file frontmatter profile: <name> and phase_models: [...]
Stats stream ${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl — profile, profile_source, phase_models, model_used, phase_models_resolved fields

profile_source distinguishes how the profile was resolved (flag / plan_frontmatter / env / default), so dashboards can surface unexpected env-var inheritance in CI.

Custom profiles

Drop a YAML file at lib/profiles/<name>.yaml to define a new tier. The validator (lib/validators/profile-validator.mjs) enforces:

Every phase_models[].phase must be a known phase enum: brief / research / plan / execute / review / continue
Every phase_models[].model must match ^(opus|sonnet)(\b|-).* or one of the canonical short names
All six phases must be present (no partial profiles)

Custom profiles override built-ins of the same name (lookup is alphabetical with <custom> taking precedence). You may NOT redefine balanced (the default tier is locked to prevent accidental override of headless CI behaviour); use a different name and reference it via --profile <new-name> or VOYAGE_PROFILE=<new-name>.

Example custom profile

# lib/profiles/critical.yaml — opus everywhere except continue
phase_models:
  - phase: brief
    model: opus
  - phase: research
    model: opus
  - phase: plan
    model: opus
  - phase: execute
    model: opus
  - phase: review
    model: opus
  - phase: continue
    model: sonnet

Validate with: node lib/validators/profile-validator.mjs --json lib/profiles/critical.yaml

Drift detection

In --strict mode, plan-validator.mjs emits a MANIFEST_PROFILE_DRIFT warning when the plan-level profile: differs from any step manifest's profile_used. The warning is a signal, not a failure — the plan remains valid: true. This catches:

Manual edits where an operator changed a single step's profile
Resume from a partial run where the previous session used a different tier
Copy-paste errors when stitching plan fragments

To suppress the warning intentionally (e.g. when a critical step genuinely needs a higher tier), document the override in the step's prose and re-run with --soft to validate without strict-mode warnings.

Cost estimation

Disclaimer: the table below is an anslag, not a contractual SLA. Real cost depends on context size, agent-swarm cardinality, tool-use density, and Claude Code billing schedule. Treat these as rough order-of-magnitude.

Profile	Brief	Research	Plan	Execute	Review	Total
`economy`	$0.10–0.50	$0.50–2.00	$0.50–2.00	$1.00–5.00	$0.20–1.00	$2–10
`balanced`	$0.10–0.50	$0.50–2.00	$1.00–4.00	$1.00–5.00	$0.50–2.00	$3–14
`premium`	$0.50–2.00	$0.50–2.00	$1.00–4.00	$1.00–5.00	$0.50–2.00	$4–15

Numbers are per full pipeline run (brief + research + plan + execute + review) on a moderate-complexity task. Numbers scale roughly linearly with the size of the resulting plan (10 steps ≈ baseline; 30 steps ≈ 3× the execute column).

Per-profile actuals are emitted to JSONL stats — pipe them through the OTel export (docs/observability.md) to get real cost-attribution graphs in Grafana. Replace the table above with your own measured numbers after ≥ 3 runs of each profile.

Deferred to v4.2

balanced.external_research_enabled operator-override — v4.1 omits this per scope-guardian SG2. v4.2 may add an opt-in flag to enable external research agents in the balanced tier without forcing premium.
Empirical Jaccard re-calibration — parked-synthetic fixtures in v4.1 use a 0.55 conservative starting threshold. v4.2 plans an empirical re-run with $60-120 LLM budget to derive a calibrated threshold from real economy-vs-premium plan pairs.
ROUGE-L + char-4gram MinHash as primary/secondary cross-tier gates per research/02 Recommendation #7. Jaccard remains the gate in v4.1; v4.2 may layer ROUGE-L on top.

7 KiB Raw Blame History Unescape Escape