# Profile system — voyage v4.1 This document describes the model profile system: built-in tiers, lookup precedence, custom-profile authoring, drift detection, and cost estimation (with disclaimer). ## Built-in profiles Three pre-defined tiers ship with v4.1, located at `lib/profiles/{economy,balanced,premium}.yaml`. | Profile | Brief | Research | Plan | Execute | Review | Continue | Use case | |---------|-------|----------|------|---------|--------|----------|----------| | `economy` | sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; small-scope tasks where you have high confidence the brief is right | | `balanced` (default) | sonnet | sonnet | opus | sonnet | opus | sonnet | Default — opus where reasoning depth pays off (plan synthesis + adversarial review) | | `premium` | opus | sonnet | opus | sonnet | opus | sonnet | Critical-path planning + review when budget allows | `balanced` is the v4.1 default. It puts opus on the two phases where quality matters most (Plan synthesis + Review) and sonnet everywhere else. This lands the cost/quality trade-off that solo-developers and small teams actually want. `economy` is *strictly experimental* in v4.1. The cross-tier Jaccard floor (0.55) is grounded in parked-synthetic fixtures, not empirical runs (Step 17 calibration was deferred — see `tests/synthetic/profile-jaccard-calibration.md`). If you observe economy-plan quality regressions, fall back to `balanced`. ## Decision tree ``` Are you uncertain whether the brief is correctly framed? └── Yes → premium (opus on brief + plan + review) └── No → continue ↓ Is the change small (≤ 5 steps in the plan)? └── Yes → economy (sonnet everywhere) └── No → balanced (opus on plan + review) Special cases: - Critical-infrastructure plan → premium - Migration with rollback risk → premium - Research-heavy task (≥ 4 dimensions) → balanced (research-stage benefits) - Bug fix with clear reproducer → economy - Documentation-only PR → economy ``` ## Lookup order Voyage resolves the profile in this priority order: 1. **Explicit `--profile ` flag** — passed to the command 2. **Plan-file frontmatter `profile:`** — when resuming via `/trekexecute --resume` or `/trekcontinue` 3. **`VOYAGE_PROFILE` environment variable** — useful for headless CI 4. **Default `balanced`** — final fallback The resolved value is recorded in two places: - Plan-file frontmatter `profile: ` and `phase_models: [...]` - Stats stream `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl` — `profile`, `profile_source`, `phase_models`, `model_used`, `phase_models_resolved` fields `profile_source` distinguishes how the profile was resolved (`flag` / `plan_frontmatter` / `env` / `default`), so dashboards can surface unexpected env-var inheritance in CI. ## Custom profiles Drop a YAML file at `lib/profiles/.yaml` to define a new tier. The validator (`lib/validators/profile-validator.mjs`) enforces: - Every `phase_models[].phase` must be a known phase enum: `brief` / `research` / `plan` / `execute` / `review` / `continue` - Every `phase_models[].model` must match `^(opus|sonnet)(\b|-).*` or one of the canonical short names - All six phases must be present (no partial profiles) Custom profiles override built-ins of the same name (lookup is alphabetical with `` taking precedence). You may NOT redefine `balanced` (the default tier is locked to prevent accidental override of headless CI behaviour); use a different name and reference it via `--profile ` or `VOYAGE_PROFILE=`. ### Example custom profile ```yaml # lib/profiles/critical.yaml — opus everywhere except continue phase_models: - phase: brief model: opus - phase: research model: opus - phase: plan model: opus - phase: execute model: opus - phase: review model: opus - phase: continue model: sonnet ``` Validate with: `node lib/validators/profile-validator.mjs --json lib/profiles/critical.yaml` ## Drift detection In `--strict` mode, `plan-validator.mjs` emits a `MANIFEST_PROFILE_DRIFT` warning when the plan-level `profile:` differs from any step manifest's `profile_used`. The warning is a *signal*, not a failure — the plan remains `valid: true`. This catches: - Manual edits where an operator changed a single step's profile - Resume from a partial run where the previous session used a different tier - Copy-paste errors when stitching plan fragments To suppress the warning intentionally (e.g. when a critical step genuinely needs a higher tier), document the override in the step's prose and re-run with `--soft` to validate without strict-mode warnings. ## Cost estimation > **Disclaimer:** the table below is an *anslag*, not a contractual > SLA. Real cost depends on context size, agent-swarm cardinality, > tool-use density, and Claude Code billing schedule. Treat these as > rough order-of-magnitude. | Profile | Brief | Research | Plan | Execute | Review | Total | |---------|-------|----------|------|---------|--------|-------| | `economy` | $0.10–0.50 | $0.50–2.00 | $0.50–2.00 | $1.00–5.00 | $0.20–1.00 | **$2–10** | | `balanced` | $0.10–0.50 | $0.50–2.00 | $1.00–4.00 | $1.00–5.00 | $0.50–2.00 | **$3–14** | | `premium` | $0.50–2.00 | $0.50–2.00 | $1.00–4.00 | $1.00–5.00 | $0.50–2.00 | **$4–15** | Numbers are per *full pipeline run* (brief + research + plan + execute + review) on a moderate-complexity task. Numbers scale roughly linearly with the size of the resulting plan (10 steps ≈ baseline; 30 steps ≈ 3× the execute column). Per-profile actuals are emitted to JSONL stats — pipe them through the OTel export (`docs/observability.md`) to get real cost-attribution graphs in Grafana. Replace the table above with your own measured numbers after ≥ 3 runs of each profile. ## Deferred to v4.2 - **`balanced.external_research_enabled` operator-override** — v4.1 omits this per scope-guardian SG2. v4.2 may add an opt-in flag to enable external research agents in the balanced tier without forcing premium. - **Empirical Jaccard re-calibration** — parked-synthetic fixtures in v4.1 use a 0.55 conservative starting threshold. v4.2 plans an empirical re-run with $60-120 LLM budget to derive a calibrated threshold from real economy-vs-premium plan pairs. - **ROUGE-L + char-4gram MinHash** as primary/secondary cross-tier gates per research/02 Recommendation #7. Jaccard remains the gate in v4.1; v4.2 may layer ROUGE-L on top. ## See also - [`README.md` § Profile system](../README.md) — top-level overview - [`CLAUDE.md` § Profile system](../CLAUDE.md) — internal reference - [`docs/observability.md`](observability.md) — JSONL → OTel pipeline - [`tests/synthetic/profile-jaccard-calibration.md`](../tests/synthetic/profile-jaccard-calibration.md) — calibration status and threshold rationale - [`lib/profiles/`](../lib/profiles/) — built-in profile YAMLs - [`lib/validators/profile-validator.mjs`](../lib/validators/profile-validator.mjs) — schema validator with CLI shim