# Profile system — voyage v4.1

This document describes the model profile system: built-in tiers,
lookup precedence, custom-profile authoring, drift detection, and
cost estimation (with disclaimer).

## Built-in profiles

Three pre-defined tiers ship with v4.1, located at
`lib/profiles/{economy,balanced,premium}.yaml`.

| Profile | Brief | Research | Plan | Execute | Review | Continue | Use case |
|---------|-------|----------|------|---------|--------|----------|----------|
| `economy` | sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; small-scope tasks where you have high confidence the brief is right |
| `balanced` (default) | sonnet | sonnet | opus | sonnet | opus | sonnet | Default — opus where reasoning depth pays off (plan synthesis + adversarial review) |
| `premium` | opus | sonnet | opus | sonnet | opus | sonnet | Critical-path planning + review when budget allows |

`balanced` is the v4.1 default. It puts opus on the two phases where
quality matters most (Plan synthesis + Review) and sonnet everywhere
else. This lands the cost/quality trade-off that solo-developers and
small teams actually want.

`economy` is *strictly experimental* in v4.1. The cross-tier Jaccard
floor (0.55) is grounded in parked-synthetic fixtures, not empirical
runs (Step 17 calibration was deferred — see
`tests/synthetic/profile-jaccard-calibration.md`). If you observe
economy-plan quality regressions, fall back to `balanced`.

## Decision tree

```
Are you uncertain whether the brief is correctly framed?
  └── Yes → premium (opus on brief + plan + review)
  └── No  → continue
            ↓
Is the change small (≤ 5 steps in the plan)?
  └── Yes → economy (sonnet everywhere)
  └── No  → balanced (opus on plan + review)

Special cases:
  - Critical-infrastructure plan          → premium
  - Migration with rollback risk          → premium
  - Research-heavy task (≥ 4 dimensions)  → balanced (research-stage benefits)
  - Bug fix with clear reproducer         → economy
  - Documentation-only PR                 → economy
```

## Lookup order

Voyage resolves the profile in this priority order:

1. **Explicit `--profile <name>` flag** — passed to the command
2. **Plan-file frontmatter `profile:`** — when resuming via
   `/trekexecute --resume` or `/trekcontinue`
3. **`VOYAGE_PROFILE` environment variable** — useful for headless CI
4. **Default `balanced`** — final fallback

The resolved value is recorded in two places:

- Plan-file frontmatter `profile: <name>` and `phase_models: [...]`
- Stats stream `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl` —
  `profile`, `profile_source`, `phase_models`, `model_used`,
  `phase_models_resolved` fields

`profile_source` distinguishes how the profile was resolved (`flag` /
`plan_frontmatter` / `env` / `default`), so dashboards can surface
unexpected env-var inheritance in CI.

## Custom profiles

Drop a YAML file at `lib/profiles/<name>.yaml` to define a new tier.
The validator (`lib/validators/profile-validator.mjs`) enforces:

- Every `phase_models[].phase` must be a known phase enum:
  `brief` / `research` / `plan` / `execute` / `review` / `continue`
- Every `phase_models[].model` must match `^(opus|sonnet)(\b|-).*` or
  one of the canonical short names
- All six phases must be present (no partial profiles)

Custom profiles override built-ins of the same name (lookup is
alphabetical with `<custom>` taking precedence). You may NOT redefine
`balanced` (the default tier is locked to prevent accidental override
of headless CI behaviour); use a different name and reference it via
`--profile <new-name>` or `VOYAGE_PROFILE=<new-name>`.

### Example custom profile

```yaml
# lib/profiles/critical.yaml — opus everywhere except continue
phase_models:
  - phase: brief
    model: opus
  - phase: research
    model: opus
  - phase: plan
    model: opus
  - phase: execute
    model: opus
  - phase: review
    model: opus
  - phase: continue
    model: sonnet
```

Validate with: `node lib/validators/profile-validator.mjs --json lib/profiles/critical.yaml`

## Drift detection

In `--strict` mode, `plan-validator.mjs` emits a `MANIFEST_PROFILE_DRIFT`
warning when the plan-level `profile:` differs from any step manifest's
`profile_used`. The warning is a *signal*, not a failure — the plan
remains `valid: true`. This catches:

- Manual edits where an operator changed a single step's profile
- Resume from a partial run where the previous session used a different
  tier
- Copy-paste errors when stitching plan fragments

To suppress the warning intentionally (e.g. when a critical step
genuinely needs a higher tier), document the override in the step's
prose and re-run with `--soft` to validate without strict-mode warnings.

## Cost estimation

> **Disclaimer:** the table below is an *anslag*, not a contractual
> SLA. Real cost depends on context size, agent-swarm cardinality,
> tool-use density, and Claude Code billing schedule. Treat these as
> rough order-of-magnitude.

| Profile | Brief | Research | Plan | Execute | Review | Total |
|---------|-------|----------|------|---------|--------|-------|
| `economy` | $0.10–0.50 | $0.50–2.00 | $0.50–2.00 | $1.00–5.00 | $0.20–1.00 | **$2–10** |
| `balanced` | $0.10–0.50 | $0.50–2.00 | $1.00–4.00 | $1.00–5.00 | $0.50–2.00 | **$3–14** |
| `premium` | $0.50–2.00 | $0.50–2.00 | $1.00–4.00 | $1.00–5.00 | $0.50–2.00 | **$4–15** |

Numbers are per *full pipeline run* (brief + research + plan +
execute + review) on a moderate-complexity task. Numbers scale roughly
linearly with the size of the resulting plan (10 steps ≈ baseline; 30
steps ≈ 3× the execute column).

Per-profile actuals are emitted to JSONL stats — pipe them through the
OTel export (`docs/observability.md`) to get real cost-attribution
graphs in Grafana. Replace the table above with your own measured
numbers after ≥ 3 runs of each profile.

## Deferred to v4.2

- **`balanced.external_research_enabled` operator-override** —
  v4.1 omits this per scope-guardian SG2. v4.2 may add an opt-in
  flag to enable external research agents in the balanced tier
  without forcing premium.
- **Empirical Jaccard re-calibration** — parked-synthetic fixtures
  in v4.1 use a 0.55 conservative starting threshold. v4.2 plans an
  empirical re-run with $60-120 LLM budget to derive a calibrated
  threshold from real economy-vs-premium plan pairs.
- **ROUGE-L + char-4gram MinHash** as primary/secondary cross-tier
  gates per research/02 Recommendation #7. Jaccard remains the gate
  in v4.1; v4.2 may layer ROUGE-L on top.

## See also

- [`README.md` § Profile system](../README.md) — top-level overview
- [`CLAUDE.md` § Profile system](../CLAUDE.md) — internal reference
- [`docs/observability.md`](observability.md) — JSONL → OTel pipeline
- [`tests/synthetic/profile-jaccard-calibration.md`](../tests/synthetic/profile-jaccard-calibration.md)
  — calibration status and threshold rationale
- [`lib/profiles/`](../lib/profiles/) — built-in profile YAMLs
- [`lib/validators/profile-validator.mjs`](../lib/validators/profile-validator.mjs)
  — schema validator with CLI shim