ktg-plugin-marketplace/plugins/voyage/docs/profiles.md
Kjell Tore Guttormsen f2f8246e01 docs(voyage): document v4.1 profiles + observability + doc-consistency-pinning
Step 22 of v4.1 — write top-level docs for the v4.1 feature surface.

Files updated:
  CLAUDE.md       — Commands tables: add --profile to all 6 modes
                    + new ## Profile system + ## Observability sections
  README.md       — per-command Modes tables: add --profile row
                    + new top-level ## Profile system + ## Observability
                    + cross-link from ## Cost profile
  CHANGELOG.md    — new "## v4.1.0 — 2026-05-09" entry per Keep-a-Changelog 1.1.0
                    (Added / Changed / Fixed / Notes)
  docs/profiles.md — NEW: 168-line decision tree, lookup precedence,
                    custom-profile authoring, drift detection,
                    cost-estimation table with disclaimer
  tests/lib/doc-consistency.test.mjs — extend with 5 new pinning tests:
                    CLAUDE.md --profile + phase_models canonical name,
                    README.md --profile coverage (≥ 6 mentions),
                    CHANGELOG.md v4.1.0 entry, docs/profiles.md substantive

ROADMAP.md is gitignored per marketplace policy (sesjonsfiler) — local
edit applied for v4.1 DONE marker, not committed.

Plan-critic Blocker 2 split is honored: Step 21 pinned commands-only;
Step 22 writes the docs and pins them. doc-consistency.test.mjs is
green AFTER Step 22 (would have failed if Step 22 ran in same wave).

Tests: 487 pass + 2 skipped (Docker not installed).
2026-05-09 10:09:44 +02:00

169 lines
7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Profile system — voyage v4.1
This document describes the model profile system: built-in tiers,
lookup precedence, custom-profile authoring, drift detection, and
cost estimation (with disclaimer).
## Built-in profiles
Three pre-defined tiers ship with v4.1, located at
`lib/profiles/{economy,balanced,premium}.yaml`.
| Profile | Brief | Research | Plan | Execute | Review | Continue | Use case |
|---------|-------|----------|------|---------|--------|----------|----------|
| `economy` | sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; small-scope tasks where you have high confidence the brief is right |
| `balanced` (default) | sonnet | sonnet | opus | sonnet | opus | sonnet | Default — opus where reasoning depth pays off (plan synthesis + adversarial review) |
| `premium` | opus | sonnet | opus | sonnet | opus | sonnet | Critical-path planning + review when budget allows |
`balanced` is the v4.1 default. It puts opus on the two phases where
quality matters most (Plan synthesis + Review) and sonnet everywhere
else. This lands the cost/quality trade-off that solo-developers and
small teams actually want.
`economy` is *strictly experimental* in v4.1. The cross-tier Jaccard
floor (0.55) is grounded in parked-synthetic fixtures, not empirical
runs (Step 17 calibration was deferred — see
`tests/synthetic/profile-jaccard-calibration.md`). If you observe
economy-plan quality regressions, fall back to `balanced`.
## Decision tree
```
Are you uncertain whether the brief is correctly framed?
└── Yes → premium (opus on brief + plan + review)
└── No → continue
Is the change small (≤ 5 steps in the plan)?
└── Yes → economy (sonnet everywhere)
└── No → balanced (opus on plan + review)
Special cases:
- Critical-infrastructure plan → premium
- Migration with rollback risk → premium
- Research-heavy task (≥ 4 dimensions) → balanced (research-stage benefits)
- Bug fix with clear reproducer → economy
- Documentation-only PR → economy
```
## Lookup order
Voyage resolves the profile in this priority order:
1. **Explicit `--profile <name>` flag** — passed to the command
2. **Plan-file frontmatter `profile:`** — when resuming via
`/trekexecute --resume` or `/trekcontinue`
3. **`VOYAGE_PROFILE` environment variable** — useful for headless CI
4. **Default `balanced`** — final fallback
The resolved value is recorded in two places:
- Plan-file frontmatter `profile: <name>` and `phase_models: [...]`
- Stats stream `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl`
`profile`, `profile_source`, `phase_models`, `model_used`,
`phase_models_resolved` fields
`profile_source` distinguishes how the profile was resolved (`flag` /
`plan_frontmatter` / `env` / `default`), so dashboards can surface
unexpected env-var inheritance in CI.
## Custom profiles
Drop a YAML file at `lib/profiles/<name>.yaml` to define a new tier.
The validator (`lib/validators/profile-validator.mjs`) enforces:
- Every `phase_models[].phase` must be a known phase enum:
`brief` / `research` / `plan` / `execute` / `review` / `continue`
- Every `phase_models[].model` must match `^(opus|sonnet)(\b|-).*` or
one of the canonical short names
- All six phases must be present (no partial profiles)
Custom profiles override built-ins of the same name (lookup is
alphabetical with `<custom>` taking precedence). You may NOT redefine
`balanced` (the default tier is locked to prevent accidental override
of headless CI behaviour); use a different name and reference it via
`--profile <new-name>` or `VOYAGE_PROFILE=<new-name>`.
### Example custom profile
```yaml
# lib/profiles/critical.yaml — opus everywhere except continue
phase_models:
- phase: brief
model: opus
- phase: research
model: opus
- phase: plan
model: opus
- phase: execute
model: opus
- phase: review
model: opus
- phase: continue
model: sonnet
```
Validate with: `node lib/validators/profile-validator.mjs --json lib/profiles/critical.yaml`
## Drift detection
In `--strict` mode, `plan-validator.mjs` emits a `MANIFEST_PROFILE_DRIFT`
warning when the plan-level `profile:` differs from any step manifest's
`profile_used`. The warning is a *signal*, not a failure — the plan
remains `valid: true`. This catches:
- Manual edits where an operator changed a single step's profile
- Resume from a partial run where the previous session used a different
tier
- Copy-paste errors when stitching plan fragments
To suppress the warning intentionally (e.g. when a critical step
genuinely needs a higher tier), document the override in the step's
prose and re-run with `--soft` to validate without strict-mode warnings.
## Cost estimation
> **Disclaimer:** the table below is an *anslag*, not a contractual
> SLA. Real cost depends on context size, agent-swarm cardinality,
> tool-use density, and Claude Code billing schedule. Treat these as
> rough order-of-magnitude.
| Profile | Brief | Research | Plan | Execute | Review | Total |
|---------|-------|----------|------|---------|--------|-------|
| `economy` | $0.100.50 | $0.502.00 | $0.502.00 | $1.005.00 | $0.201.00 | **$210** |
| `balanced` | $0.100.50 | $0.502.00 | $1.004.00 | $1.005.00 | $0.502.00 | **$314** |
| `premium` | $0.502.00 | $0.502.00 | $1.004.00 | $1.005.00 | $0.502.00 | **$415** |
Numbers are per *full pipeline run* (brief + research + plan +
execute + review) on a moderate-complexity task. Numbers scale roughly
linearly with the size of the resulting plan (10 steps ≈ baseline; 30
steps ≈ 3× the execute column).
Per-profile actuals are emitted to JSONL stats — pipe them through the
OTel export (`docs/observability.md`) to get real cost-attribution
graphs in Grafana. Replace the table above with your own measured
numbers after ≥ 3 runs of each profile.
## Deferred to v4.2
- **`balanced.external_research_enabled` operator-override** —
v4.1 omits this per scope-guardian SG2. v4.2 may add an opt-in
flag to enable external research agents in the balanced tier
without forcing premium.
- **Empirical Jaccard re-calibration** — parked-synthetic fixtures
in v4.1 use a 0.55 conservative starting threshold. v4.2 plans an
empirical re-run with $60-120 LLM budget to derive a calibrated
threshold from real economy-vs-premium plan pairs.
- **ROUGE-L + char-4gram MinHash** as primary/secondary cross-tier
gates per research/02 Recommendation #7. Jaccard remains the gate
in v4.1; v4.2 may layer ROUGE-L on top.
## See also
- [`README.md` § Profile system](../README.md) — top-level overview
- [`CLAUDE.md` § Profile system](../CLAUDE.md) — internal reference
- [`docs/observability.md`](observability.md) — JSONL → OTel pipeline
- [`tests/synthetic/profile-jaccard-calibration.md`](../tests/synthetic/profile-jaccard-calibration.md)
— calibration status and threshold rationale
- [`lib/profiles/`](../lib/profiles/) — built-in profile YAMLs
- [`lib/validators/profile-validator.mjs`](../lib/validators/profile-validator.mjs)
— schema validator with CLI shim