Step 22 of v4.1 — write top-level docs for the v4.1 feature surface.
Files updated:
CLAUDE.md — Commands tables: add --profile to all 6 modes
+ new ## Profile system + ## Observability sections
README.md — per-command Modes tables: add --profile row
+ new top-level ## Profile system + ## Observability
+ cross-link from ## Cost profile
CHANGELOG.md — new "## v4.1.0 — 2026-05-09" entry per Keep-a-Changelog 1.1.0
(Added / Changed / Fixed / Notes)
docs/profiles.md — NEW: 168-line decision tree, lookup precedence,
custom-profile authoring, drift detection,
cost-estimation table with disclaimer
tests/lib/doc-consistency.test.mjs — extend with 5 new pinning tests:
CLAUDE.md --profile + phase_models canonical name,
README.md --profile coverage (≥ 6 mentions),
CHANGELOG.md v4.1.0 entry, docs/profiles.md substantive
ROADMAP.md is gitignored per marketplace policy (sesjonsfiler) — local
edit applied for v4.1 DONE marker, not committed.
Plan-critic Blocker 2 split is honored: Step 21 pinned commands-only;
Step 22 writes the docs and pins them. doc-consistency.test.mjs is
green AFTER Step 22 (would have failed if Step 22 ran in same wave).
Tests: 487 pass + 2 skipped (Docker not installed).
7 KiB
Profile system — voyage v4.1
This document describes the model profile system: built-in tiers, lookup precedence, custom-profile authoring, drift detection, and cost estimation (with disclaimer).
Built-in profiles
Three pre-defined tiers ship with v4.1, located at
lib/profiles/{economy,balanced,premium}.yaml.
| Profile | Brief | Research | Plan | Execute | Review | Continue | Use case |
|---|---|---|---|---|---|---|---|
economy |
sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; small-scope tasks where you have high confidence the brief is right |
balanced (default) |
sonnet | sonnet | opus | sonnet | opus | sonnet | Default — opus where reasoning depth pays off (plan synthesis + adversarial review) |
premium |
opus | sonnet | opus | sonnet | opus | sonnet | Critical-path planning + review when budget allows |
balanced is the v4.1 default. It puts opus on the two phases where
quality matters most (Plan synthesis + Review) and sonnet everywhere
else. This lands the cost/quality trade-off that solo-developers and
small teams actually want.
economy is strictly experimental in v4.1. The cross-tier Jaccard
floor (0.55) is grounded in parked-synthetic fixtures, not empirical
runs (Step 17 calibration was deferred — see
tests/synthetic/profile-jaccard-calibration.md). If you observe
economy-plan quality regressions, fall back to balanced.
Decision tree
Are you uncertain whether the brief is correctly framed?
└── Yes → premium (opus on brief + plan + review)
└── No → continue
↓
Is the change small (≤ 5 steps in the plan)?
└── Yes → economy (sonnet everywhere)
└── No → balanced (opus on plan + review)
Special cases:
- Critical-infrastructure plan → premium
- Migration with rollback risk → premium
- Research-heavy task (≥ 4 dimensions) → balanced (research-stage benefits)
- Bug fix with clear reproducer → economy
- Documentation-only PR → economy
Lookup order
Voyage resolves the profile in this priority order:
- Explicit
--profile <name>flag — passed to the command - Plan-file frontmatter
profile:— when resuming via/trekexecute --resumeor/trekcontinue VOYAGE_PROFILEenvironment variable — useful for headless CI- Default
balanced— final fallback
The resolved value is recorded in two places:
- Plan-file frontmatter
profile: <name>andphase_models: [...] - Stats stream
${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl—profile,profile_source,phase_models,model_used,phase_models_resolvedfields
profile_source distinguishes how the profile was resolved (flag /
plan_frontmatter / env / default), so dashboards can surface
unexpected env-var inheritance in CI.
Custom profiles
Drop a YAML file at lib/profiles/<name>.yaml to define a new tier.
The validator (lib/validators/profile-validator.mjs) enforces:
- Every
phase_models[].phasemust be a known phase enum:brief/research/plan/execute/review/continue - Every
phase_models[].modelmust match^(opus|sonnet)(\b|-).*or one of the canonical short names - All six phases must be present (no partial profiles)
Custom profiles override built-ins of the same name (lookup is
alphabetical with <custom> taking precedence). You may NOT redefine
balanced (the default tier is locked to prevent accidental override
of headless CI behaviour); use a different name and reference it via
--profile <new-name> or VOYAGE_PROFILE=<new-name>.
Example custom profile
# lib/profiles/critical.yaml — opus everywhere except continue
phase_models:
- phase: brief
model: opus
- phase: research
model: opus
- phase: plan
model: opus
- phase: execute
model: opus
- phase: review
model: opus
- phase: continue
model: sonnet
Validate with: node lib/validators/profile-validator.mjs --json lib/profiles/critical.yaml
Drift detection
In --strict mode, plan-validator.mjs emits a MANIFEST_PROFILE_DRIFT
warning when the plan-level profile: differs from any step manifest's
profile_used. The warning is a signal, not a failure — the plan
remains valid: true. This catches:
- Manual edits where an operator changed a single step's profile
- Resume from a partial run where the previous session used a different tier
- Copy-paste errors when stitching plan fragments
To suppress the warning intentionally (e.g. when a critical step
genuinely needs a higher tier), document the override in the step's
prose and re-run with --soft to validate without strict-mode warnings.
Cost estimation
Disclaimer: the table below is an anslag, not a contractual SLA. Real cost depends on context size, agent-swarm cardinality, tool-use density, and Claude Code billing schedule. Treat these as rough order-of-magnitude.
| Profile | Brief | Research | Plan | Execute | Review | Total |
|---|---|---|---|---|---|---|
economy |
$0.10–0.50 | $0.50–2.00 | $0.50–2.00 | $1.00–5.00 | $0.20–1.00 | $2–10 |
balanced |
$0.10–0.50 | $0.50–2.00 | $1.00–4.00 | $1.00–5.00 | $0.50–2.00 | $3–14 |
premium |
$0.50–2.00 | $0.50–2.00 | $1.00–4.00 | $1.00–5.00 | $0.50–2.00 | $4–15 |
Numbers are per full pipeline run (brief + research + plan + execute + review) on a moderate-complexity task. Numbers scale roughly linearly with the size of the resulting plan (10 steps ≈ baseline; 30 steps ≈ 3× the execute column).
Per-profile actuals are emitted to JSONL stats — pipe them through the
OTel export (docs/observability.md) to get real cost-attribution
graphs in Grafana. Replace the table above with your own measured
numbers after ≥ 3 runs of each profile.
Deferred to v4.2
balanced.external_research_enabledoperator-override — v4.1 omits this per scope-guardian SG2. v4.2 may add an opt-in flag to enable external research agents in the balanced tier without forcing premium.- Empirical Jaccard re-calibration — parked-synthetic fixtures in v4.1 use a 0.55 conservative starting threshold. v4.2 plans an empirical re-run with $60-120 LLM budget to derive a calibrated threshold from real economy-vs-premium plan pairs.
- ROUGE-L + char-4gram MinHash as primary/secondary cross-tier gates per research/02 Recommendation #7. Jaccard remains the gate in v4.1; v4.2 may layer ROUGE-L on top.
See also
README.md§ Profile system — top-level overviewCLAUDE.md§ Profile system — internal referencedocs/observability.md— JSONL → OTel pipelinetests/synthetic/profile-jaccard-calibration.md— calibration status and threshold rationalelib/profiles/— built-in profile YAMLslib/validators/profile-validator.mjs— schema validator with CLI shim