docs(voyage): document v4.1 profiles + observability + doc-consistency-pinning

Step 22 of v4.1 — write top-level docs for the v4.1 feature surface.

Files updated:
  CLAUDE.md       — Commands tables: add --profile to all 6 modes
                    + new ## Profile system + ## Observability sections
  README.md       — per-command Modes tables: add --profile row
                    + new top-level ## Profile system + ## Observability
                    + cross-link from ## Cost profile
  CHANGELOG.md    — new "## v4.1.0 — 2026-05-09" entry per Keep-a-Changelog 1.1.0
                    (Added / Changed / Fixed / Notes)
  docs/profiles.md — NEW: 168-line decision tree, lookup precedence,
                    custom-profile authoring, drift detection,
                    cost-estimation table with disclaimer
  tests/lib/doc-consistency.test.mjs — extend with 5 new pinning tests:
                    CLAUDE.md --profile + phase_models canonical name,
                    README.md --profile coverage (≥ 6 mentions),
                    CHANGELOG.md v4.1.0 entry, docs/profiles.md substantive

ROADMAP.md is gitignored per marketplace policy (sesjonsfiler) — local
edit applied for v4.1 DONE marker, not committed.

Plan-critic Blocker 2 split is honored: Step 21 pinned commands-only;
Step 22 writes the docs and pins them. doc-consistency.test.mjs is
green AFTER Step 22 (would have failed if Step 22 ran in same wave).

Tests: 487 pass + 2 skipped (Docker not installed).
This commit is contained in:
Kjell Tore Guttormsen 2026-05-09 10:09:44 +02:00
commit f2f8246e01
5 changed files with 394 additions and 0 deletions

View file

@ -4,6 +4,73 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## v4.1.0 — 2026-05-09
**Additive. No breaking changes. Forward-compat with all v4.0.0 plans.**
Two new feature surfaces: model profiles (`--profile <name>`) and
opt-in OpenTelemetry / Prometheus export at session-end.
### Added
- **Model profile system** (`lib/profiles/{economy,balanced,premium}.yaml`,
`lib/profiles/resolver.mjs`, `lib/validators/profile-validator.mjs`).
Three built-in tiers + custom-yaml support. Lookup order: explicit
`--profile` flag → plan frontmatter `profile:``VOYAGE_PROFILE`
env-var → `balanced` default. See `docs/profiles.md` for the decision
tree, custom authoring, and cost estimation disclaimer.
- **`--profile <name>` flag** documented in all 6 pipeline commands
(`commands/trek{brief,research,plan,execute,review,continue}.md`).
- **5 additive profile fields** in JSONL stats (`profile`,
`profile_source`, `phase_models`, `model_used`,
`phase_models_resolved`) for cost-attribution and drift detection.
- **OpenTelemetry / Prometheus export** at session-end via new Stop
hook (`hooks/scripts/otel-export.mjs`, wired in `hooks/hooks.json`).
Strict opt-in via `VOYAGE_EXPORT_MODE`:
- `textfile` — Prometheus exposition format → node-exporter textfile
- `otlp` — OTLP/JSON POST → otel-collector
- `off` (default) — no work done
- **Local Docker Compose stack** at `examples/observability/`
(Prometheus 3.0.1 + node-exporter 1.10.2 + Grafana 11.4.0 + OTel
Collector 0.115.0, all version-pinned per research/01).
- **Operator documentation** at `docs/observability.md` (151 lines:
env-var matrix, security mitigations, limitations, CVE-pinned
minimum versions).
- **Cross-tier Jaccard smoke test**
(`tests/integration/profile-jaccard-smoke.test.mjs`) with parked-
synthetic fixtures and 0.55 conservative starting threshold per
research/02. Empirical recalibration deferred to v4.2.
- **MANIFEST_PROFILE_DRIFT warning** in `plan-validator --strict`
when plan frontmatter `profile:` differs from any step manifest's
`profile_used`. Plan remains valid (warning, not error).
### Changed
- **`lib/parsers/arg-parser.mjs` FLAG_SCHEMA** — additively adds
`--profile <name>` to all 6 commands. Existing flags unchanged.
- **`lib/parsers/manifest-yaml.mjs` schema** — additively adds new
`OPTIONAL_STRING_KEYS` collection with `profile_used` as the first
member. Forward-compat: legacy plans without `profile_used` parse
unchanged; new plans with it round-trip cleanly.
### Fixed
- **Doc-consistency coverage** now spans all 6 pipeline commands
(was 5 — `/trekcontinue` was missing per HIGH risk-assessor).
- **Plan-validator strict mode** detects profile-drift between plan-
level frontmatter and step-level manifests (brief Assumptions
block 7).
### Notes
- Forward-compat: every v4.0.0 plan validates `valid: true` under
v4.1.0's `plan-validator --strict`. No migration needed.
- Tests: 361 baseline → 484 pass + 2 skipped (Docker-dependent).
- Path A/B/C decision (v3.4.0) is unchanged. Path C remains closed.
- v4.2 deferred items: ROUGE-L scoring, char-4gram MinHash, empirical
Jaccard re-calibration, `balanced.external_research_enabled`
operator-override.
## v4.0.0 — 2026-05-05 ## v4.0.0 — 2026-05-05
**Breaking. Rebrand. No migration path.** **Breaking. Rebrand. No migration path.**

View file

@ -25,6 +25,7 @@ Voyage — a contract-driven Claude Code pipeline: brief, research, plan, execut
| _(default)_ | Dynamic interview until quality gates pass → brief.md with research plan | | _(default)_ | Dynamic interview until quality gates pass → brief.md with research plan |
| `--quick` | Compact start; still escalates if required sections are weak or the brief-review gate fails → brief.md with research plan | | `--quick` | Compact start; still escalates if required sections are weak or the brief-review gate fails → brief.md with research plan |
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` | | `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
| `--profile <name>` | (v4.1.0) Model profile: `economy` / `balanced` / `premium` / `<custom>`. Sets `phase_models` for the brief phase. See `## Profile system` below. |
Always interactive. Phase 3 is a section-driven completeness loop (no hard cap on question count); Phase 4 runs a `brief-reviewer` stop-gate with max 3 review iterations. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground). Always interactive. Phase 3 is a section-driven completeness loop (no hard cap on question count); Phase 4 runs a `brief-reviewer` stop-gate with max 3 review iterations. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground).
@ -39,6 +40,7 @@ Always interactive. Phase 3 is a section-driven completeness loop (no hard cap o
| `--external` | Only external research agents (skip codebase analysis) | | `--external` | Only external research agents (skip codebase analysis) |
| `--fg` | No-op alias (foreground is default since v2.4.0) | | `--fg` | No-op alias (foreground is default since v2.4.0) |
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` | | `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
| `--profile <name>` | (v4.1.0) Model profile for the research phase. See `## Profile system` below. |
Flags combine: `--project <dir> --local`, `--external --quick`. Flags combine: `--project <dir> --local`, `--external --quick`.
@ -54,6 +56,7 @@ Flags combine: `--project <dir> --local`, `--external --quick`.
| `--export <pr\|issue\|markdown\|headless> <plan>` | Generate shareable output from existing plan | | `--export <pr\|issue\|markdown\|headless> <plan>` | Generate shareable output from existing plan |
| `--decompose <plan>` | Split plan into self-contained headless sessions | | `--decompose <plan>` | Split plan into self-contained headless sessions |
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` | | `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
| `--profile <name>` | (v4.1.0) Model profile for the plan phase (and others, since plan emits `profile:` to plan.md frontmatter). See `## Profile system` below. |
**Breaking change (v2.0):** one of `--brief` or `--project` is required. There is no interview inside `/trekplan`. The `--spec` flag has been removed — use `/trekbrief` to produce a brief instead. **Breaking change (v2.0):** one of `--brief` or `--project` is required. There is no interview inside `/trekplan`. The `--spec` flag has been removed — use `/trekbrief` to produce a brief instead.
@ -72,6 +75,7 @@ If `{project_dir}/architecture/overview.md` exists (typically produced by an opt
| `--fg` | Force foreground — run all steps sequentially, ignore Execution Strategy | | `--fg` | Force foreground — run all steps sequentially, ignore Execution Strategy |
| `--session N` | Execute only session N from plan's Execution Strategy | | `--session N` | Execute only session N from plan's Execution Strategy |
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` | | `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
| `--profile <name>` | (v4.1.0) Model profile for the execute phase. Inherited from plan.md frontmatter `profile:` if present. See `## Profile system` below. |
### /trekreview modes ### /trekreview modes
@ -84,6 +88,15 @@ If `{project_dir}/architecture/overview.md` exists (typically produced by an opt
| `--validate` | Schema-only check on existing `{dir}/review.md`. No LLM calls | | `--validate` | Schema-only check on existing `{dir}/review.md`. No LLM calls |
| `--dry-run` | Print discovered scope + triage map; skip writes | | `--dry-run` | Print discovered scope + triage map; skip writes |
| `--fg` | No-op alias (foreground is default) | | `--fg` | No-op alias (foreground is default) |
| `--profile <name>` | (v4.1.0) Model profile for the review phase. See `## Profile system` below. |
### /trekcontinue modes
| Flag | Behavior |
|------|----------|
| _(default)_ | Auto-discover active project's `.session-state.local.json` and resume |
| `<project-dir>` | Resume the next session of an explicit project directory |
| `--profile <name>` | (v4.1.0) Model profile for the resumed session. Inherited from the previous session's plan.md frontmatter when absent. See `## Profile system` below. |
The triage gate is deterministic — path-pattern classifier produces `{file → deep-review|summary-only|skip}`. Hard refuse-with-suggestion above 100 files / 100K diff tokens. The triage gate is deterministic — path-pattern classifier produces `{file → deep-review|summary-only|skip}`. Hard refuse-with-suggestion above 100 files / 100K diff tokens.
@ -168,6 +181,43 @@ Three architectural options were considered for the speedup work:
A revived Path C (post-v2.2.xxx) would require: (1) re-architecting tool-list to be identical across all wave children, (2) cache-telemetry analysis confirming the new fork-cache behaviour holds, (3) prompt-level deny re-enablement to compensate for tool scoping rollback. A revived Path C (post-v2.2.xxx) would require: (1) re-architecting tool-list to be identical across all wave children, (2) cache-telemetry analysis confirming the new fork-cache behaviour holds, (3) prompt-level deny re-enablement to compensate for tool scoping rollback.
## Profile system (`--profile`, v4.1.0)
Three built-in model profiles plus operator-defined `<custom>.yaml`. Each profile pins `phase_models` for the six pipeline phases (`brief`, `research`, `plan`, `execute`, `review`, `continue`). Profile is recorded in plan.md frontmatter as `profile: <name>` and emitted to `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl` for cost-attribution.
| Profile | Brief | Research | Plan | Execute | Review | Continue | Use case |
|---------|-------|----------|------|---------|--------|----------|----------|
| `economy` | sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; high-confidence small-scope tasks |
| `balanced` (default) | sonnet | sonnet | opus | sonnet | opus | sonnet | Default — opus where reasoning depth pays off |
| `premium` | opus | sonnet | opus | sonnet | opus | sonnet | Critical-path planning + review when budget allows |
### Lookup order
1. Explicit `--profile <name>` flag passed to the command
2. Plan-file frontmatter `profile:` (when resuming via `/trekexecute --resume` or `/trekcontinue`)
3. `VOYAGE_PROFILE` environment variable
4. Default `balanced`
### Custom profiles
Create `lib/profiles/<custom>.yaml` to define a new tier. The validator (`lib/validators/profile-validator.mjs`) enforces: every `phase_models[].phase` must be a known phase enum; every `phase_models[].model` must match `^(opus|sonnet)(\b|-).*` or one of the canonical short names. Custom profiles override built-ins of the same name (lookup is alphabetical with `<custom>` taking precedence).
Drift between plan-frontmatter `profile:` and step-manifest `profile_used:` emits a `MANIFEST_PROFILE_DRIFT` warning from `plan-validator --strict` (Step 20). Plan remains valid; the warning surfaces accidental tier-mismatch.
## Observability (Stop hook, v4.1.0)
The `Stop` hook in `hooks/hooks.json` runs `hooks/scripts/otel-export.mjs` at session-end. The hook is **opt-in** — when `VOYAGE_EXPORT_MODE` is unset or `off`, no work is done.
| Mode | Output | Endpoint env-var |
|------|--------|------------------|
| `off` (default) | _(no export)_ | — |
| `textfile` | `voyage.prom` (Prometheus exposition format) | `VOYAGE_TEXTFILE_DIR` |
| `otlp` | OTLP/JSON POST | `VOYAGE_OTEL_ENDPOINT` |
Endpoint validation: `VOYAGE_OTEL_ALLOW_PRIVATE=1` is required to send to loopback or RFC1918 destinations (CWE-918 SSRF mitigation). Allowlist `lib/exporters/field-allowlist.mjs` redacts records before export (CWE-212). Path validation (`lib/exporters/path-validator.mjs`) rejects symlink + traversal (CWE-22).
Local Docker Compose stack: `examples/observability/`. Operator docs: `docs/observability.md`. Both pin minimum versions per CVE history (`prom/prometheus:v3.0.1`, `grafana/grafana:11.4.0`, `otel/opentelemetry-collector-contrib:0.115.0`).
## Architecture ## Architecture
**Brief:** 7-phase workflow: Parse mode → Create project dir → Phase 3 completeness loop (section-driven, no question cap) → Phase 4 draft/review/revise with `brief-reviewer` as stop-gate (max 3 iterations; gate = all dimensions ≥ 4 and research plan = 5) → Finalize (`brief.md` on pass, or `brief_quality: partial` on cap/force-stop) → Manual/auto opt-in → Stats. Always interactive. Auto mode runs research + plan inline in the main context (v2.4.0). **Brief:** 7-phase workflow: Parse mode → Create project dir → Phase 3 completeness loop (section-driven, no question cap) → Phase 4 draft/review/revise with `brief-reviewer` as stop-gate (max 3 iterations; gate = all dimensions ≥ 4 and research plan = 5) → Finalize (`brief.md` on pass, or `brief_quality: partial` on cap/force-stop) → Manual/auto opt-in → Stats. Always interactive. Auto mode runs research + plan inline in the main context (v2.4.0).

View file

@ -147,6 +147,7 @@ Output: `.claude/projects/{YYYY-MM-DD}-{slug}/brief.md`
|------|-------|----------| |------|-------|----------|
| **Default** | `/trekbrief <task>` | Dynamic interview until quality gates pass. No question cap. | | **Default** | `/trekbrief <task>` | Dynamic interview until quality gates pass. No question cap. |
| **Quick** | `/trekbrief --quick <task>` | Starts compact (optional sections get at most one probe), still escalates on weak required sections or failed review gate. | | **Quick** | `/trekbrief --quick <task>` | Starts compact (optional sections get at most one probe), still escalates on weak required sections or failed review gate. |
| **Profile** | `/trekbrief --profile <name> <task>` | (v4.1.0) Pin model profile for the brief phase: `economy` / `balanced` / `premium` / `<custom>`. See [Profile system](#profile-system-v410) below. |
`/trekbrief` is **always interactive**. There is no foreground/background mode — the interview requires user input. `/trekbrief` is **always interactive**. There is no foreground/background mode — the interview requires user input.
@ -189,6 +190,7 @@ Output:
| **Local** | `/trekresearch --local <question>` | Only codebase analysis agents (skip external + Gemini) | | **Local** | `/trekresearch --local <question>` | Only codebase analysis agents (skip external + Gemini) |
| **External** | `/trekresearch --external <question>` | Only external research agents (skip codebase analysis) | | **External** | `/trekresearch --external <question>` | Only external research agents (skip codebase analysis) |
| **Foreground** | `/trekresearch --fg <question>` | No-op alias (foreground is default since v2.4.0) | | **Foreground** | `/trekresearch --fg <question>` | No-op alias (foreground is default since v2.4.0) |
| **Profile** | `/trekresearch --profile <name> <question>` | (v4.1.0) Pin model profile for the research phase. See [Profile system](#profile-system-v410). |
Flags combine: `--project <dir> --external`. Flags combine: `--project <dir> --external`.
@ -217,6 +219,7 @@ Output:
| **Quick** | `/trekplan --project <dir> --quick` | No agent swarm, lightweight scan only | | **Quick** | `/trekplan --project <dir> --quick` | No agent swarm, lightweight scan only |
| **Decompose** | `/trekplan --decompose plan.md` | Split plan into headless session specs | | **Decompose** | `/trekplan --decompose plan.md` | Split plan into headless session specs |
| **Export** | `/trekplan --export pr plan.md` | PR description, issue comment, or clean markdown | | **Export** | `/trekplan --export pr plan.md` | PR description, issue comment, or clean markdown |
| **Profile** | `/trekplan --profile <name> --project <dir>` | (v4.1.0) Pin model profile; emitted as `profile:` in plan.md frontmatter. See [Profile system](#profile-system-v410). |
`--brief` or `--project` is **required**. `/trekplan` with no brief exits with an error and a pointer to `/trekbrief`. `--brief` or `--project` is **required**. `/trekplan` with no brief exits with an error and a pointer to `/trekbrief`.
@ -264,6 +267,7 @@ Per step: apply Changes exactly as written → run Verify (exit code is truth)
| **Single step** | `/trekexecute --project <dir> --step 3` | Execute only step 3 | | **Single step** | `/trekexecute --project <dir> --step 3` | Execute only step 3 |
| **Foreground** | `/trekexecute --project <dir> --fg` | Force sequential, ignore Execution Strategy | | **Foreground** | `/trekexecute --project <dir> --fg` | Force sequential, ignore Execution Strategy |
| **Single session** | `/trekexecute --project <dir> --session 2` | Execute only session 2 from Execution Strategy | | **Single session** | `/trekexecute --project <dir> --session 2` | Execute only session 2 from Execution Strategy |
| **Profile** | `/trekexecute --profile <name> --project <dir>` | (v4.1.0) Pin model profile for execution. Plan-frontmatter `profile:` is honored when this flag is omitted. See [Profile system](#profile-system-v410). |
### Session-aware parallel execution (worktree-isolated) ### Session-aware parallel execution (worktree-isolated)
@ -400,6 +404,7 @@ the iteration loop without ad-hoc conventions.
| **Quick** | `/trekreview --project <dir> --quick` | Skip brief-conformance reviewer; skip coordinator's reasonableness filter — fast correctness-only pass | | **Quick** | `/trekreview --project <dir> --quick` | Skip brief-conformance reviewer; skip coordinator's reasonableness filter — fast correctness-only pass |
| **Validate** | `/trekreview --project <dir> --validate` | Schema-only check on existing `review.md`. No LLM calls | | **Validate** | `/trekreview --project <dir> --validate` | Schema-only check on existing `review.md`. No LLM calls |
| **Dry run** | `/trekreview --project <dir> --dry-run` | Print discovered scope + triage map; skip writes | | **Dry run** | `/trekreview --project <dir> --dry-run` | Print discovered scope + triage map; skip writes |
| **Profile** | `/trekreview --profile <name> --project <dir>` | (v4.1.0) Pin model profile for the review phase. See [Profile system](#profile-system-v410). |
### What review produces ### What review produces
@ -452,6 +457,7 @@ schema and producer/consumer contract.
| **Default** | `/trekcontinue` | Auto-discover `.session-state.local.json` under cwd, validate, narrate, and begin executing the next session | | **Default** | `/trekcontinue` | Auto-discover `.session-state.local.json` under cwd, validate, narrate, and begin executing the next session |
| **Explicit** | `/trekcontinue <project-dir>` | Use the named project directory; helpful when several active projects coexist under cwd | | **Explicit** | `/trekcontinue <project-dir>` | Use the named project directory; helpful when several active projects coexist under cwd |
| **Help** | `/trekcontinue --help` | Print usage block and the schema-v1 reference | | **Help** | `/trekcontinue --help` | Print usage block and the schema-v1 reference |
| **Profile** | `/trekcontinue --profile <name> [<project-dir>]` | (v4.1.0) Pin model profile for the resumed session. Plan-frontmatter `profile:` from the previous session is honored when this flag is omitted. See [Profile system](#profile-system-v410). |
### Schema v1 — `.session-state.local.json` ### Schema v1 — `.session-state.local.json`
@ -645,10 +651,43 @@ Or enable directly in `~/.claude/settings.json`:
An optional architect step between research and plan was previously available via a separate plugin; that architect plugin is no longer publicly distributed. The `architecture/overview.md` filesystem slot remains supported by `/trekplan` for any compatible producer. An optional architect step between research and plan was previously available via a separate plugin; that architect plugin is no longer publicly distributed. The `architecture/overview.md` filesystem slot remains supported by `/trekplan` for any compatible producer.
## Profile system (v4.1.0)
Three built-in model profiles plus operator-defined `<custom>.yaml` (drop in `lib/profiles/`). Each profile pins `phase_models` for the six pipeline phases. The active profile is recorded in plan.md frontmatter as `profile: <name>` and emitted to JSONL stats for cost-attribution.
| Profile | Brief | Research | Plan | Execute | Review | Continue | Use case |
|---------|-------|----------|------|---------|--------|----------|----------|
| `economy` | sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; high-confidence small-scope tasks |
| `balanced` (default) | sonnet | sonnet | opus | sonnet | opus | sonnet | Default — opus where reasoning depth pays off |
| `premium` | opus | sonnet | opus | sonnet | opus | sonnet | Critical-path planning + review when budget allows |
Lookup order:
1. Explicit `--profile <name>` flag passed to the command
2. Plan-file frontmatter `profile:` (when resuming via `/trekexecute --resume` or `/trekcontinue`)
3. `VOYAGE_PROFILE` environment variable
4. Default `balanced`
See [`docs/profiles.md`](docs/profiles.md) for the decision tree, custom-profile authoring, and cost estimation disclaimer (the per-profile cost numbers are *anslag*, not contractual SLAs).
## Observability (v4.1.0)
Stop-hook OTel/Prometheus export — opt-in via `VOYAGE_EXPORT_MODE`:
| Mode | Output | Endpoint env-var |
|------|--------|------------------|
| `off` (default) | _(no export)_ | — |
| `textfile` | `voyage.prom` (Prometheus exposition format) | `VOYAGE_TEXTFILE_DIR` |
| `otlp` | OTLP/JSON POST | `VOYAGE_OTEL_ENDPOINT` |
Default JSONL stats stream (`${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl`) is unchanged — OTel is additive. Local Docker Compose stack: [`examples/observability/`](examples/observability/). Operator docs: [`docs/observability.md`](docs/observability.md). Both pin minimum versions per CVE history.
## Cost profile ## Cost profile
Opus runs the orchestrators (one per command) and the executor (one per plan session). Sonnet runs the exploration and review swarms (510 agents per command, with effort/turn limits). The pipeline front-loads cheap Sonnet work so Opus only does synthesis and execution. Typical total: comparable to a long single Claude Code session — the per-command cost is published in `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl` if you want exact numbers. Opus runs the orchestrators (one per command) and the executor (one per plan session). Sonnet runs the exploration and review swarms (510 agents per command, with effort/turn limits). The pipeline front-loads cheap Sonnet work so Opus only does synthesis and execution. Typical total: comparable to a long single Claude Code session — the per-command cost is published in `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl` if you want exact numbers.
For per-profile cost estimates, see [`docs/profiles.md`](docs/profiles.md).
## Requirements ## Requirements
- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) (CLI, desktop app, or web app) - [Claude Code](https://docs.anthropic.com/en/docs/claude-code) (CLI, desktop app, or web app)

View file

@ -0,0 +1,169 @@
# Profile system — voyage v4.1
This document describes the model profile system: built-in tiers,
lookup precedence, custom-profile authoring, drift detection, and
cost estimation (with disclaimer).
## Built-in profiles
Three pre-defined tiers ship with v4.1, located at
`lib/profiles/{economy,balanced,premium}.yaml`.
| Profile | Brief | Research | Plan | Execute | Review | Continue | Use case |
|---------|-------|----------|------|---------|--------|----------|----------|
| `economy` | sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; small-scope tasks where you have high confidence the brief is right |
| `balanced` (default) | sonnet | sonnet | opus | sonnet | opus | sonnet | Default — opus where reasoning depth pays off (plan synthesis + adversarial review) |
| `premium` | opus | sonnet | opus | sonnet | opus | sonnet | Critical-path planning + review when budget allows |
`balanced` is the v4.1 default. It puts opus on the two phases where
quality matters most (Plan synthesis + Review) and sonnet everywhere
else. This lands the cost/quality trade-off that solo-developers and
small teams actually want.
`economy` is *strictly experimental* in v4.1. The cross-tier Jaccard
floor (0.55) is grounded in parked-synthetic fixtures, not empirical
runs (Step 17 calibration was deferred — see
`tests/synthetic/profile-jaccard-calibration.md`). If you observe
economy-plan quality regressions, fall back to `balanced`.
## Decision tree
```
Are you uncertain whether the brief is correctly framed?
└── Yes → premium (opus on brief + plan + review)
└── No → continue
Is the change small (≤ 5 steps in the plan)?
└── Yes → economy (sonnet everywhere)
└── No → balanced (opus on plan + review)
Special cases:
- Critical-infrastructure plan → premium
- Migration with rollback risk → premium
- Research-heavy task (≥ 4 dimensions) → balanced (research-stage benefits)
- Bug fix with clear reproducer → economy
- Documentation-only PR → economy
```
## Lookup order
Voyage resolves the profile in this priority order:
1. **Explicit `--profile <name>` flag** — passed to the command
2. **Plan-file frontmatter `profile:`** — when resuming via
`/trekexecute --resume` or `/trekcontinue`
3. **`VOYAGE_PROFILE` environment variable** — useful for headless CI
4. **Default `balanced`** — final fallback
The resolved value is recorded in two places:
- Plan-file frontmatter `profile: <name>` and `phase_models: [...]`
- Stats stream `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl`
`profile`, `profile_source`, `phase_models`, `model_used`,
`phase_models_resolved` fields
`profile_source` distinguishes how the profile was resolved (`flag` /
`plan_frontmatter` / `env` / `default`), so dashboards can surface
unexpected env-var inheritance in CI.
## Custom profiles
Drop a YAML file at `lib/profiles/<name>.yaml` to define a new tier.
The validator (`lib/validators/profile-validator.mjs`) enforces:
- Every `phase_models[].phase` must be a known phase enum:
`brief` / `research` / `plan` / `execute` / `review` / `continue`
- Every `phase_models[].model` must match `^(opus|sonnet)(\b|-).*` or
one of the canonical short names
- All six phases must be present (no partial profiles)
Custom profiles override built-ins of the same name (lookup is
alphabetical with `<custom>` taking precedence). You may NOT redefine
`balanced` (the default tier is locked to prevent accidental override
of headless CI behaviour); use a different name and reference it via
`--profile <new-name>` or `VOYAGE_PROFILE=<new-name>`.
### Example custom profile
```yaml
# lib/profiles/critical.yaml — opus everywhere except continue
phase_models:
- phase: brief
model: opus
- phase: research
model: opus
- phase: plan
model: opus
- phase: execute
model: opus
- phase: review
model: opus
- phase: continue
model: sonnet
```
Validate with: `node lib/validators/profile-validator.mjs --json lib/profiles/critical.yaml`
## Drift detection
In `--strict` mode, `plan-validator.mjs` emits a `MANIFEST_PROFILE_DRIFT`
warning when the plan-level `profile:` differs from any step manifest's
`profile_used`. The warning is a *signal*, not a failure — the plan
remains `valid: true`. This catches:
- Manual edits where an operator changed a single step's profile
- Resume from a partial run where the previous session used a different
tier
- Copy-paste errors when stitching plan fragments
To suppress the warning intentionally (e.g. when a critical step
genuinely needs a higher tier), document the override in the step's
prose and re-run with `--soft` to validate without strict-mode warnings.
## Cost estimation
> **Disclaimer:** the table below is an *anslag*, not a contractual
> SLA. Real cost depends on context size, agent-swarm cardinality,
> tool-use density, and Claude Code billing schedule. Treat these as
> rough order-of-magnitude.
| Profile | Brief | Research | Plan | Execute | Review | Total |
|---------|-------|----------|------|---------|--------|-------|
| `economy` | $0.100.50 | $0.502.00 | $0.502.00 | $1.005.00 | $0.201.00 | **$210** |
| `balanced` | $0.100.50 | $0.502.00 | $1.004.00 | $1.005.00 | $0.502.00 | **$314** |
| `premium` | $0.502.00 | $0.502.00 | $1.004.00 | $1.005.00 | $0.502.00 | **$415** |
Numbers are per *full pipeline run* (brief + research + plan +
execute + review) on a moderate-complexity task. Numbers scale roughly
linearly with the size of the resulting plan (10 steps ≈ baseline; 30
steps ≈ 3× the execute column).
Per-profile actuals are emitted to JSONL stats — pipe them through the
OTel export (`docs/observability.md`) to get real cost-attribution
graphs in Grafana. Replace the table above with your own measured
numbers after ≥ 3 runs of each profile.
## Deferred to v4.2
- **`balanced.external_research_enabled` operator-override** —
v4.1 omits this per scope-guardian SG2. v4.2 may add an opt-in
flag to enable external research agents in the balanced tier
without forcing premium.
- **Empirical Jaccard re-calibration** — parked-synthetic fixtures
in v4.1 use a 0.55 conservative starting threshold. v4.2 plans an
empirical re-run with $60-120 LLM budget to derive a calibrated
threshold from real economy-vs-premium plan pairs.
- **ROUGE-L + char-4gram MinHash** as primary/secondary cross-tier
gates per research/02 Recommendation #7. Jaccard remains the gate
in v4.1; v4.2 may layer ROUGE-L on top.
## See also
- [`README.md` § Profile system](../README.md) — top-level overview
- [`CLAUDE.md` § Profile system](../CLAUDE.md) — internal reference
- [`docs/observability.md`](observability.md) — JSONL → OTel pipeline
- [`tests/synthetic/profile-jaccard-calibration.md`](../tests/synthetic/profile-jaccard-calibration.md)
— calibration status and threshold rationale
- [`lib/profiles/`](../lib/profiles/) — built-in profile YAMLs
- [`lib/validators/profile-validator.mjs`](../lib/validators/profile-validator.mjs)
— schema validator with CLI shim

View file

@ -301,6 +301,75 @@ test('at least one pipeline command-file references `phase_models` canonical nam
); );
}); });
// --- v4.1 Step 22 — post-write CLAUDE.md / README.md pinning ---
//
// Plan-critic Blocker 2 fix: Step 21 only pinned commands/*.md (which
// are written in Step 7 / Wave 3). Step 22 writes the top-level docs
// and extends pinning here so doc-consistency stays green AFTER Step 22.
test('CLAUDE.md documents --profile flag', () => {
const md = read('CLAUDE.md');
assert.match(
md,
/--profile\b/,
'CLAUDE.md must document the --profile flag (v4.1 SC #20)',
);
});
test('CLAUDE.md uses canonical name `phase_models`', () => {
const md = read('CLAUDE.md');
assert.match(
md,
/phase_models/,
'CLAUDE.md must use canonical name "phase_models" (v4.1 SC #20)',
);
for (const bad of ['model_per_phase', 'phase_to_model', 'profile_phase_models']) {
assert.ok(
!md.includes(bad),
`CLAUDE.md must NOT use legacy alias "${bad}"`,
);
}
});
test('README.md documents --profile flag for all 6 commands', () => {
// SG1: README flag-table coverage is gating for SC #20. README is the
// primary discovery surface for new users.
const md = read('README.md');
// Top-level Profile system section is required so the flag is
// discoverable independent of per-command tables.
assert.match(md, /## Profile system/, 'README.md missing top-level "## Profile system" section');
// Every per-command Modes table must include --profile (count of
// --profile occurrences should be ≥ 6 — one per command + Profile
// system section).
const profileMentions = (md.match(/--profile\b/g) || []).length;
assert.ok(
profileMentions >= 6,
`README.md must mention --profile ≥ 6 times (one per command + section), got ${profileMentions}`,
);
});
test('CHANGELOG.md has v4.1.0 entry', () => {
const cl = read('CHANGELOG.md');
assert.match(
cl,
/## v4\.1\.0\b/,
'CHANGELOG.md must include "## v4.1.0" entry per Keep-a-Changelog 1.1.0',
);
});
test('docs/profiles.md exists and documents Custom.yaml authoring', () => {
const dp = read('docs/profiles.md');
assert.ok(dp.length > 1000, 'docs/profiles.md must be substantive (> 1000 chars)');
// Must document custom-profile authoring (Step 22 manifest must_contain
// pattern: "Custom.yaml" — case-insensitive match handled here as
// /[Cc]ustom[. ]/ to allow either "custom.yaml" or "Custom profile" prose).
assert.match(
dp,
/[Cc]ustom\.yaml|[Cc]ustom profile|<custom>\.yaml/,
'docs/profiles.md must document custom profile authoring',
);
});
test('commands/trekplan.md Phase 8 seals Opus-4.7 schema-drift defense', () => { test('commands/trekplan.md Phase 8 seals Opus-4.7 schema-drift defense', () => {
const cmd = read('commands/trekplan.md'); const cmd = read('commands/trekplan.md');
// Locate Phase 8 section // Locate Phase 8 section