docs(voyage): document v4.1 profiles + observability + doc-consistency-pinning
Step 22 of v4.1 — write top-level docs for the v4.1 feature surface.
Files updated:
CLAUDE.md — Commands tables: add --profile to all 6 modes
+ new ## Profile system + ## Observability sections
README.md — per-command Modes tables: add --profile row
+ new top-level ## Profile system + ## Observability
+ cross-link from ## Cost profile
CHANGELOG.md — new "## v4.1.0 — 2026-05-09" entry per Keep-a-Changelog 1.1.0
(Added / Changed / Fixed / Notes)
docs/profiles.md — NEW: 168-line decision tree, lookup precedence,
custom-profile authoring, drift detection,
cost-estimation table with disclaimer
tests/lib/doc-consistency.test.mjs — extend with 5 new pinning tests:
CLAUDE.md --profile + phase_models canonical name,
README.md --profile coverage (≥ 6 mentions),
CHANGELOG.md v4.1.0 entry, docs/profiles.md substantive
ROADMAP.md is gitignored per marketplace policy (sesjonsfiler) — local
edit applied for v4.1 DONE marker, not committed.
Plan-critic Blocker 2 split is honored: Step 21 pinned commands-only;
Step 22 writes the docs and pins them. doc-consistency.test.mjs is
green AFTER Step 22 (would have failed if Step 22 ran in same wave).
Tests: 487 pass + 2 skipped (Docker not installed).
This commit is contained in:
parent
e440ca858c
commit
f2f8246e01
5 changed files with 394 additions and 0 deletions
169
plugins/voyage/docs/profiles.md
Normal file
169
plugins/voyage/docs/profiles.md
Normal file
|
|
@ -0,0 +1,169 @@
|
|||
# Profile system — voyage v4.1
|
||||
|
||||
This document describes the model profile system: built-in tiers,
|
||||
lookup precedence, custom-profile authoring, drift detection, and
|
||||
cost estimation (with disclaimer).
|
||||
|
||||
## Built-in profiles
|
||||
|
||||
Three pre-defined tiers ship with v4.1, located at
|
||||
`lib/profiles/{economy,balanced,premium}.yaml`.
|
||||
|
||||
| Profile | Brief | Research | Plan | Execute | Review | Continue | Use case |
|
||||
|---------|-------|----------|------|---------|--------|----------|----------|
|
||||
| `economy` | sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; small-scope tasks where you have high confidence the brief is right |
|
||||
| `balanced` (default) | sonnet | sonnet | opus | sonnet | opus | sonnet | Default — opus where reasoning depth pays off (plan synthesis + adversarial review) |
|
||||
| `premium` | opus | sonnet | opus | sonnet | opus | sonnet | Critical-path planning + review when budget allows |
|
||||
|
||||
`balanced` is the v4.1 default. It puts opus on the two phases where
|
||||
quality matters most (Plan synthesis + Review) and sonnet everywhere
|
||||
else. This lands the cost/quality trade-off that solo-developers and
|
||||
small teams actually want.
|
||||
|
||||
`economy` is *strictly experimental* in v4.1. The cross-tier Jaccard
|
||||
floor (0.55) is grounded in parked-synthetic fixtures, not empirical
|
||||
runs (Step 17 calibration was deferred — see
|
||||
`tests/synthetic/profile-jaccard-calibration.md`). If you observe
|
||||
economy-plan quality regressions, fall back to `balanced`.
|
||||
|
||||
## Decision tree
|
||||
|
||||
```
|
||||
Are you uncertain whether the brief is correctly framed?
|
||||
└── Yes → premium (opus on brief + plan + review)
|
||||
└── No → continue
|
||||
↓
|
||||
Is the change small (≤ 5 steps in the plan)?
|
||||
└── Yes → economy (sonnet everywhere)
|
||||
└── No → balanced (opus on plan + review)
|
||||
|
||||
Special cases:
|
||||
- Critical-infrastructure plan → premium
|
||||
- Migration with rollback risk → premium
|
||||
- Research-heavy task (≥ 4 dimensions) → balanced (research-stage benefits)
|
||||
- Bug fix with clear reproducer → economy
|
||||
- Documentation-only PR → economy
|
||||
```
|
||||
|
||||
## Lookup order
|
||||
|
||||
Voyage resolves the profile in this priority order:
|
||||
|
||||
1. **Explicit `--profile <name>` flag** — passed to the command
|
||||
2. **Plan-file frontmatter `profile:`** — when resuming via
|
||||
`/trekexecute --resume` or `/trekcontinue`
|
||||
3. **`VOYAGE_PROFILE` environment variable** — useful for headless CI
|
||||
4. **Default `balanced`** — final fallback
|
||||
|
||||
The resolved value is recorded in two places:
|
||||
|
||||
- Plan-file frontmatter `profile: <name>` and `phase_models: [...]`
|
||||
- Stats stream `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl` —
|
||||
`profile`, `profile_source`, `phase_models`, `model_used`,
|
||||
`phase_models_resolved` fields
|
||||
|
||||
`profile_source` distinguishes how the profile was resolved (`flag` /
|
||||
`plan_frontmatter` / `env` / `default`), so dashboards can surface
|
||||
unexpected env-var inheritance in CI.
|
||||
|
||||
## Custom profiles
|
||||
|
||||
Drop a YAML file at `lib/profiles/<name>.yaml` to define a new tier.
|
||||
The validator (`lib/validators/profile-validator.mjs`) enforces:
|
||||
|
||||
- Every `phase_models[].phase` must be a known phase enum:
|
||||
`brief` / `research` / `plan` / `execute` / `review` / `continue`
|
||||
- Every `phase_models[].model` must match `^(opus|sonnet)(\b|-).*` or
|
||||
one of the canonical short names
|
||||
- All six phases must be present (no partial profiles)
|
||||
|
||||
Custom profiles override built-ins of the same name (lookup is
|
||||
alphabetical with `<custom>` taking precedence). You may NOT redefine
|
||||
`balanced` (the default tier is locked to prevent accidental override
|
||||
of headless CI behaviour); use a different name and reference it via
|
||||
`--profile <new-name>` or `VOYAGE_PROFILE=<new-name>`.
|
||||
|
||||
### Example custom profile
|
||||
|
||||
```yaml
|
||||
# lib/profiles/critical.yaml — opus everywhere except continue
|
||||
phase_models:
|
||||
- phase: brief
|
||||
model: opus
|
||||
- phase: research
|
||||
model: opus
|
||||
- phase: plan
|
||||
model: opus
|
||||
- phase: execute
|
||||
model: opus
|
||||
- phase: review
|
||||
model: opus
|
||||
- phase: continue
|
||||
model: sonnet
|
||||
```
|
||||
|
||||
Validate with: `node lib/validators/profile-validator.mjs --json lib/profiles/critical.yaml`
|
||||
|
||||
## Drift detection
|
||||
|
||||
In `--strict` mode, `plan-validator.mjs` emits a `MANIFEST_PROFILE_DRIFT`
|
||||
warning when the plan-level `profile:` differs from any step manifest's
|
||||
`profile_used`. The warning is a *signal*, not a failure — the plan
|
||||
remains `valid: true`. This catches:
|
||||
|
||||
- Manual edits where an operator changed a single step's profile
|
||||
- Resume from a partial run where the previous session used a different
|
||||
tier
|
||||
- Copy-paste errors when stitching plan fragments
|
||||
|
||||
To suppress the warning intentionally (e.g. when a critical step
|
||||
genuinely needs a higher tier), document the override in the step's
|
||||
prose and re-run with `--soft` to validate without strict-mode warnings.
|
||||
|
||||
## Cost estimation
|
||||
|
||||
> **Disclaimer:** the table below is an *anslag*, not a contractual
|
||||
> SLA. Real cost depends on context size, agent-swarm cardinality,
|
||||
> tool-use density, and Claude Code billing schedule. Treat these as
|
||||
> rough order-of-magnitude.
|
||||
|
||||
| Profile | Brief | Research | Plan | Execute | Review | Total |
|
||||
|---------|-------|----------|------|---------|--------|-------|
|
||||
| `economy` | $0.10–0.50 | $0.50–2.00 | $0.50–2.00 | $1.00–5.00 | $0.20–1.00 | **$2–10** |
|
||||
| `balanced` | $0.10–0.50 | $0.50–2.00 | $1.00–4.00 | $1.00–5.00 | $0.50–2.00 | **$3–14** |
|
||||
| `premium` | $0.50–2.00 | $0.50–2.00 | $1.00–4.00 | $1.00–5.00 | $0.50–2.00 | **$4–15** |
|
||||
|
||||
Numbers are per *full pipeline run* (brief + research + plan +
|
||||
execute + review) on a moderate-complexity task. Numbers scale roughly
|
||||
linearly with the size of the resulting plan (10 steps ≈ baseline; 30
|
||||
steps ≈ 3× the execute column).
|
||||
|
||||
Per-profile actuals are emitted to JSONL stats — pipe them through the
|
||||
OTel export (`docs/observability.md`) to get real cost-attribution
|
||||
graphs in Grafana. Replace the table above with your own measured
|
||||
numbers after ≥ 3 runs of each profile.
|
||||
|
||||
## Deferred to v4.2
|
||||
|
||||
- **`balanced.external_research_enabled` operator-override** —
|
||||
v4.1 omits this per scope-guardian SG2. v4.2 may add an opt-in
|
||||
flag to enable external research agents in the balanced tier
|
||||
without forcing premium.
|
||||
- **Empirical Jaccard re-calibration** — parked-synthetic fixtures
|
||||
in v4.1 use a 0.55 conservative starting threshold. v4.2 plans an
|
||||
empirical re-run with $60-120 LLM budget to derive a calibrated
|
||||
threshold from real economy-vs-premium plan pairs.
|
||||
- **ROUGE-L + char-4gram MinHash** as primary/secondary cross-tier
|
||||
gates per research/02 Recommendation #7. Jaccard remains the gate
|
||||
in v4.1; v4.2 may layer ROUGE-L on top.
|
||||
|
||||
## See also
|
||||
|
||||
- [`README.md` § Profile system](../README.md) — top-level overview
|
||||
- [`CLAUDE.md` § Profile system](../CLAUDE.md) — internal reference
|
||||
- [`docs/observability.md`](observability.md) — JSONL → OTel pipeline
|
||||
- [`tests/synthetic/profile-jaccard-calibration.md`](../tests/synthetic/profile-jaccard-calibration.md)
|
||||
— calibration status and threshold rationale
|
||||
- [`lib/profiles/`](../lib/profiles/) — built-in profile YAMLs
|
||||
- [`lib/validators/profile-validator.mjs`](../lib/validators/profile-validator.mjs)
|
||||
— schema validator with CLI shim
|
||||
Loading…
Add table
Add a link
Reference in a new issue