docs(voyage): document v4.1 profiles + observability + doc-consistency-pinning
Step 22 of v4.1 — write top-level docs for the v4.1 feature surface.
Files updated:
CLAUDE.md — Commands tables: add --profile to all 6 modes
+ new ## Profile system + ## Observability sections
README.md — per-command Modes tables: add --profile row
+ new top-level ## Profile system + ## Observability
+ cross-link from ## Cost profile
CHANGELOG.md — new "## v4.1.0 — 2026-05-09" entry per Keep-a-Changelog 1.1.0
(Added / Changed / Fixed / Notes)
docs/profiles.md — NEW: 168-line decision tree, lookup precedence,
custom-profile authoring, drift detection,
cost-estimation table with disclaimer
tests/lib/doc-consistency.test.mjs — extend with 5 new pinning tests:
CLAUDE.md --profile + phase_models canonical name,
README.md --profile coverage (≥ 6 mentions),
CHANGELOG.md v4.1.0 entry, docs/profiles.md substantive
ROADMAP.md is gitignored per marketplace policy (sesjonsfiler) — local
edit applied for v4.1 DONE marker, not committed.
Plan-critic Blocker 2 split is honored: Step 21 pinned commands-only;
Step 22 writes the docs and pins them. doc-consistency.test.mjs is
green AFTER Step 22 (would have failed if Step 22 ran in same wave).
Tests: 487 pass + 2 skipped (Docker not installed).
This commit is contained in:
parent
e440ca858c
commit
f2f8246e01
5 changed files with 394 additions and 0 deletions
|
|
@ -4,6 +4,73 @@ All notable changes to this project will be documented in this file.
|
|||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||||
|
||||
## v4.1.0 — 2026-05-09
|
||||
|
||||
**Additive. No breaking changes. Forward-compat with all v4.0.0 plans.**
|
||||
|
||||
Two new feature surfaces: model profiles (`--profile <name>`) and
|
||||
opt-in OpenTelemetry / Prometheus export at session-end.
|
||||
|
||||
### Added
|
||||
|
||||
- **Model profile system** (`lib/profiles/{economy,balanced,premium}.yaml`,
|
||||
`lib/profiles/resolver.mjs`, `lib/validators/profile-validator.mjs`).
|
||||
Three built-in tiers + custom-yaml support. Lookup order: explicit
|
||||
`--profile` flag → plan frontmatter `profile:` → `VOYAGE_PROFILE`
|
||||
env-var → `balanced` default. See `docs/profiles.md` for the decision
|
||||
tree, custom authoring, and cost estimation disclaimer.
|
||||
- **`--profile <name>` flag** documented in all 6 pipeline commands
|
||||
(`commands/trek{brief,research,plan,execute,review,continue}.md`).
|
||||
- **5 additive profile fields** in JSONL stats (`profile`,
|
||||
`profile_source`, `phase_models`, `model_used`,
|
||||
`phase_models_resolved`) for cost-attribution and drift detection.
|
||||
- **OpenTelemetry / Prometheus export** at session-end via new Stop
|
||||
hook (`hooks/scripts/otel-export.mjs`, wired in `hooks/hooks.json`).
|
||||
Strict opt-in via `VOYAGE_EXPORT_MODE`:
|
||||
- `textfile` — Prometheus exposition format → node-exporter textfile
|
||||
- `otlp` — OTLP/JSON POST → otel-collector
|
||||
- `off` (default) — no work done
|
||||
- **Local Docker Compose stack** at `examples/observability/`
|
||||
(Prometheus 3.0.1 + node-exporter 1.10.2 + Grafana 11.4.0 + OTel
|
||||
Collector 0.115.0, all version-pinned per research/01).
|
||||
- **Operator documentation** at `docs/observability.md` (151 lines:
|
||||
env-var matrix, security mitigations, limitations, CVE-pinned
|
||||
minimum versions).
|
||||
- **Cross-tier Jaccard smoke test**
|
||||
(`tests/integration/profile-jaccard-smoke.test.mjs`) with parked-
|
||||
synthetic fixtures and 0.55 conservative starting threshold per
|
||||
research/02. Empirical recalibration deferred to v4.2.
|
||||
- **MANIFEST_PROFILE_DRIFT warning** in `plan-validator --strict`
|
||||
when plan frontmatter `profile:` differs from any step manifest's
|
||||
`profile_used`. Plan remains valid (warning, not error).
|
||||
|
||||
### Changed
|
||||
|
||||
- **`lib/parsers/arg-parser.mjs` FLAG_SCHEMA** — additively adds
|
||||
`--profile <name>` to all 6 commands. Existing flags unchanged.
|
||||
- **`lib/parsers/manifest-yaml.mjs` schema** — additively adds new
|
||||
`OPTIONAL_STRING_KEYS` collection with `profile_used` as the first
|
||||
member. Forward-compat: legacy plans without `profile_used` parse
|
||||
unchanged; new plans with it round-trip cleanly.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **Doc-consistency coverage** now spans all 6 pipeline commands
|
||||
(was 5 — `/trekcontinue` was missing per HIGH risk-assessor).
|
||||
- **Plan-validator strict mode** detects profile-drift between plan-
|
||||
level frontmatter and step-level manifests (brief Assumptions
|
||||
block 7).
|
||||
|
||||
### Notes
|
||||
|
||||
- Forward-compat: every v4.0.0 plan validates `valid: true` under
|
||||
v4.1.0's `plan-validator --strict`. No migration needed.
|
||||
- Tests: 361 baseline → 484 pass + 2 skipped (Docker-dependent).
|
||||
- Path A/B/C decision (v3.4.0) is unchanged. Path C remains closed.
|
||||
- v4.2 deferred items: ROUGE-L scoring, char-4gram MinHash, empirical
|
||||
Jaccard re-calibration, `balanced.external_research_enabled`
|
||||
operator-override.
|
||||
|
||||
## v4.0.0 — 2026-05-05
|
||||
|
||||
**Breaking. Rebrand. No migration path.**
|
||||
|
|
|
|||
|
|
@ -25,6 +25,7 @@ Voyage — a contract-driven Claude Code pipeline: brief, research, plan, execut
|
|||
| _(default)_ | Dynamic interview until quality gates pass → brief.md with research plan |
|
||||
| `--quick` | Compact start; still escalates if required sections are weak or the brief-review gate fails → brief.md with research plan |
|
||||
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
|
||||
| `--profile <name>` | (v4.1.0) Model profile: `economy` / `balanced` / `premium` / `<custom>`. Sets `phase_models` for the brief phase. See `## Profile system` below. |
|
||||
|
||||
Always interactive. Phase 3 is a section-driven completeness loop (no hard cap on question count); Phase 4 runs a `brief-reviewer` stop-gate with max 3 review iterations. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground).
|
||||
|
||||
|
|
@ -39,6 +40,7 @@ Always interactive. Phase 3 is a section-driven completeness loop (no hard cap o
|
|||
| `--external` | Only external research agents (skip codebase analysis) |
|
||||
| `--fg` | No-op alias (foreground is default since v2.4.0) |
|
||||
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the research phase. See `## Profile system` below. |
|
||||
|
||||
Flags combine: `--project <dir> --local`, `--external --quick`.
|
||||
|
||||
|
|
@ -54,6 +56,7 @@ Flags combine: `--project <dir> --local`, `--external --quick`.
|
|||
| `--export <pr\|issue\|markdown\|headless> <plan>` | Generate shareable output from existing plan |
|
||||
| `--decompose <plan>` | Split plan into self-contained headless sessions |
|
||||
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the plan phase (and others, since plan emits `profile:` to plan.md frontmatter). See `## Profile system` below. |
|
||||
|
||||
**Breaking change (v2.0):** one of `--brief` or `--project` is required. There is no interview inside `/trekplan`. The `--spec` flag has been removed — use `/trekbrief` to produce a brief instead.
|
||||
|
||||
|
|
@ -72,6 +75,7 @@ If `{project_dir}/architecture/overview.md` exists (typically produced by an opt
|
|||
| `--fg` | Force foreground — run all steps sequentially, ignore Execution Strategy |
|
||||
| `--session N` | Execute only session N from plan's Execution Strategy |
|
||||
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the execute phase. Inherited from plan.md frontmatter `profile:` if present. See `## Profile system` below. |
|
||||
|
||||
### /trekreview modes
|
||||
|
||||
|
|
@ -84,6 +88,15 @@ If `{project_dir}/architecture/overview.md` exists (typically produced by an opt
|
|||
| `--validate` | Schema-only check on existing `{dir}/review.md`. No LLM calls |
|
||||
| `--dry-run` | Print discovered scope + triage map; skip writes |
|
||||
| `--fg` | No-op alias (foreground is default) |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the review phase. See `## Profile system` below. |
|
||||
|
||||
### /trekcontinue modes
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| _(default)_ | Auto-discover active project's `.session-state.local.json` and resume |
|
||||
| `<project-dir>` | Resume the next session of an explicit project directory |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the resumed session. Inherited from the previous session's plan.md frontmatter when absent. See `## Profile system` below. |
|
||||
|
||||
The triage gate is deterministic — path-pattern classifier produces `{file → deep-review|summary-only|skip}`. Hard refuse-with-suggestion above 100 files / 100K diff tokens.
|
||||
|
||||
|
|
@ -168,6 +181,43 @@ Three architectural options were considered for the speedup work:
|
|||
|
||||
A revived Path C (post-v2.2.xxx) would require: (1) re-architecting tool-list to be identical across all wave children, (2) cache-telemetry analysis confirming the new fork-cache behaviour holds, (3) prompt-level deny re-enablement to compensate for tool scoping rollback.
|
||||
|
||||
## Profile system (`--profile`, v4.1.0)
|
||||
|
||||
Three built-in model profiles plus operator-defined `<custom>.yaml`. Each profile pins `phase_models` for the six pipeline phases (`brief`, `research`, `plan`, `execute`, `review`, `continue`). Profile is recorded in plan.md frontmatter as `profile: <name>` and emitted to `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl` for cost-attribution.
|
||||
|
||||
| Profile | Brief | Research | Plan | Execute | Review | Continue | Use case |
|
||||
|---------|-------|----------|------|---------|--------|----------|----------|
|
||||
| `economy` | sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; high-confidence small-scope tasks |
|
||||
| `balanced` (default) | sonnet | sonnet | opus | sonnet | opus | sonnet | Default — opus where reasoning depth pays off |
|
||||
| `premium` | opus | sonnet | opus | sonnet | opus | sonnet | Critical-path planning + review when budget allows |
|
||||
|
||||
### Lookup order
|
||||
|
||||
1. Explicit `--profile <name>` flag passed to the command
|
||||
2. Plan-file frontmatter `profile:` (when resuming via `/trekexecute --resume` or `/trekcontinue`)
|
||||
3. `VOYAGE_PROFILE` environment variable
|
||||
4. Default `balanced`
|
||||
|
||||
### Custom profiles
|
||||
|
||||
Create `lib/profiles/<custom>.yaml` to define a new tier. The validator (`lib/validators/profile-validator.mjs`) enforces: every `phase_models[].phase` must be a known phase enum; every `phase_models[].model` must match `^(opus|sonnet)(\b|-).*` or one of the canonical short names. Custom profiles override built-ins of the same name (lookup is alphabetical with `<custom>` taking precedence).
|
||||
|
||||
Drift between plan-frontmatter `profile:` and step-manifest `profile_used:` emits a `MANIFEST_PROFILE_DRIFT` warning from `plan-validator --strict` (Step 20). Plan remains valid; the warning surfaces accidental tier-mismatch.
|
||||
|
||||
## Observability (Stop hook, v4.1.0)
|
||||
|
||||
The `Stop` hook in `hooks/hooks.json` runs `hooks/scripts/otel-export.mjs` at session-end. The hook is **opt-in** — when `VOYAGE_EXPORT_MODE` is unset or `off`, no work is done.
|
||||
|
||||
| Mode | Output | Endpoint env-var |
|
||||
|------|--------|------------------|
|
||||
| `off` (default) | _(no export)_ | — |
|
||||
| `textfile` | `voyage.prom` (Prometheus exposition format) | `VOYAGE_TEXTFILE_DIR` |
|
||||
| `otlp` | OTLP/JSON POST | `VOYAGE_OTEL_ENDPOINT` |
|
||||
|
||||
Endpoint validation: `VOYAGE_OTEL_ALLOW_PRIVATE=1` is required to send to loopback or RFC1918 destinations (CWE-918 SSRF mitigation). Allowlist `lib/exporters/field-allowlist.mjs` redacts records before export (CWE-212). Path validation (`lib/exporters/path-validator.mjs`) rejects symlink + traversal (CWE-22).
|
||||
|
||||
Local Docker Compose stack: `examples/observability/`. Operator docs: `docs/observability.md`. Both pin minimum versions per CVE history (`prom/prometheus:v3.0.1`, `grafana/grafana:11.4.0`, `otel/opentelemetry-collector-contrib:0.115.0`).
|
||||
|
||||
## Architecture
|
||||
|
||||
**Brief:** 7-phase workflow: Parse mode → Create project dir → Phase 3 completeness loop (section-driven, no question cap) → Phase 4 draft/review/revise with `brief-reviewer` as stop-gate (max 3 iterations; gate = all dimensions ≥ 4 and research plan = 5) → Finalize (`brief.md` on pass, or `brief_quality: partial` on cap/force-stop) → Manual/auto opt-in → Stats. Always interactive. Auto mode runs research + plan inline in the main context (v2.4.0).
|
||||
|
|
|
|||
|
|
@ -147,6 +147,7 @@ Output: `.claude/projects/{YYYY-MM-DD}-{slug}/brief.md`
|
|||
|------|-------|----------|
|
||||
| **Default** | `/trekbrief <task>` | Dynamic interview until quality gates pass. No question cap. |
|
||||
| **Quick** | `/trekbrief --quick <task>` | Starts compact (optional sections get at most one probe), still escalates on weak required sections or failed review gate. |
|
||||
| **Profile** | `/trekbrief --profile <name> <task>` | (v4.1.0) Pin model profile for the brief phase: `economy` / `balanced` / `premium` / `<custom>`. See [Profile system](#profile-system-v410) below. |
|
||||
|
||||
`/trekbrief` is **always interactive**. There is no foreground/background mode — the interview requires user input.
|
||||
|
||||
|
|
@ -189,6 +190,7 @@ Output:
|
|||
| **Local** | `/trekresearch --local <question>` | Only codebase analysis agents (skip external + Gemini) |
|
||||
| **External** | `/trekresearch --external <question>` | Only external research agents (skip codebase analysis) |
|
||||
| **Foreground** | `/trekresearch --fg <question>` | No-op alias (foreground is default since v2.4.0) |
|
||||
| **Profile** | `/trekresearch --profile <name> <question>` | (v4.1.0) Pin model profile for the research phase. See [Profile system](#profile-system-v410). |
|
||||
|
||||
Flags combine: `--project <dir> --external`.
|
||||
|
||||
|
|
@ -217,6 +219,7 @@ Output:
|
|||
| **Quick** | `/trekplan --project <dir> --quick` | No agent swarm, lightweight scan only |
|
||||
| **Decompose** | `/trekplan --decompose plan.md` | Split plan into headless session specs |
|
||||
| **Export** | `/trekplan --export pr plan.md` | PR description, issue comment, or clean markdown |
|
||||
| **Profile** | `/trekplan --profile <name> --project <dir>` | (v4.1.0) Pin model profile; emitted as `profile:` in plan.md frontmatter. See [Profile system](#profile-system-v410). |
|
||||
|
||||
`--brief` or `--project` is **required**. `/trekplan` with no brief exits with an error and a pointer to `/trekbrief`.
|
||||
|
||||
|
|
@ -264,6 +267,7 @@ Per step: apply Changes exactly as written → run Verify (exit code is truth)
|
|||
| **Single step** | `/trekexecute --project <dir> --step 3` | Execute only step 3 |
|
||||
| **Foreground** | `/trekexecute --project <dir> --fg` | Force sequential, ignore Execution Strategy |
|
||||
| **Single session** | `/trekexecute --project <dir> --session 2` | Execute only session 2 from Execution Strategy |
|
||||
| **Profile** | `/trekexecute --profile <name> --project <dir>` | (v4.1.0) Pin model profile for execution. Plan-frontmatter `profile:` is honored when this flag is omitted. See [Profile system](#profile-system-v410). |
|
||||
|
||||
### Session-aware parallel execution (worktree-isolated)
|
||||
|
||||
|
|
@ -400,6 +404,7 @@ the iteration loop without ad-hoc conventions.
|
|||
| **Quick** | `/trekreview --project <dir> --quick` | Skip brief-conformance reviewer; skip coordinator's reasonableness filter — fast correctness-only pass |
|
||||
| **Validate** | `/trekreview --project <dir> --validate` | Schema-only check on existing `review.md`. No LLM calls |
|
||||
| **Dry run** | `/trekreview --project <dir> --dry-run` | Print discovered scope + triage map; skip writes |
|
||||
| **Profile** | `/trekreview --profile <name> --project <dir>` | (v4.1.0) Pin model profile for the review phase. See [Profile system](#profile-system-v410). |
|
||||
|
||||
### What review produces
|
||||
|
||||
|
|
@ -452,6 +457,7 @@ schema and producer/consumer contract.
|
|||
| **Default** | `/trekcontinue` | Auto-discover `.session-state.local.json` under cwd, validate, narrate, and begin executing the next session |
|
||||
| **Explicit** | `/trekcontinue <project-dir>` | Use the named project directory; helpful when several active projects coexist under cwd |
|
||||
| **Help** | `/trekcontinue --help` | Print usage block and the schema-v1 reference |
|
||||
| **Profile** | `/trekcontinue --profile <name> [<project-dir>]` | (v4.1.0) Pin model profile for the resumed session. Plan-frontmatter `profile:` from the previous session is honored when this flag is omitted. See [Profile system](#profile-system-v410). |
|
||||
|
||||
### Schema v1 — `.session-state.local.json`
|
||||
|
||||
|
|
@ -645,10 +651,43 @@ Or enable directly in `~/.claude/settings.json`:
|
|||
|
||||
An optional architect step between research and plan was previously available via a separate plugin; that architect plugin is no longer publicly distributed. The `architecture/overview.md` filesystem slot remains supported by `/trekplan` for any compatible producer.
|
||||
|
||||
## Profile system (v4.1.0)
|
||||
|
||||
Three built-in model profiles plus operator-defined `<custom>.yaml` (drop in `lib/profiles/`). Each profile pins `phase_models` for the six pipeline phases. The active profile is recorded in plan.md frontmatter as `profile: <name>` and emitted to JSONL stats for cost-attribution.
|
||||
|
||||
| Profile | Brief | Research | Plan | Execute | Review | Continue | Use case |
|
||||
|---------|-------|----------|------|---------|--------|----------|----------|
|
||||
| `economy` | sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; high-confidence small-scope tasks |
|
||||
| `balanced` (default) | sonnet | sonnet | opus | sonnet | opus | sonnet | Default — opus where reasoning depth pays off |
|
||||
| `premium` | opus | sonnet | opus | sonnet | opus | sonnet | Critical-path planning + review when budget allows |
|
||||
|
||||
Lookup order:
|
||||
|
||||
1. Explicit `--profile <name>` flag passed to the command
|
||||
2. Plan-file frontmatter `profile:` (when resuming via `/trekexecute --resume` or `/trekcontinue`)
|
||||
3. `VOYAGE_PROFILE` environment variable
|
||||
4. Default `balanced`
|
||||
|
||||
See [`docs/profiles.md`](docs/profiles.md) for the decision tree, custom-profile authoring, and cost estimation disclaimer (the per-profile cost numbers are *anslag*, not contractual SLAs).
|
||||
|
||||
## Observability (v4.1.0)
|
||||
|
||||
Stop-hook OTel/Prometheus export — opt-in via `VOYAGE_EXPORT_MODE`:
|
||||
|
||||
| Mode | Output | Endpoint env-var |
|
||||
|------|--------|------------------|
|
||||
| `off` (default) | _(no export)_ | — |
|
||||
| `textfile` | `voyage.prom` (Prometheus exposition format) | `VOYAGE_TEXTFILE_DIR` |
|
||||
| `otlp` | OTLP/JSON POST | `VOYAGE_OTEL_ENDPOINT` |
|
||||
|
||||
Default JSONL stats stream (`${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl`) is unchanged — OTel is additive. Local Docker Compose stack: [`examples/observability/`](examples/observability/). Operator docs: [`docs/observability.md`](docs/observability.md). Both pin minimum versions per CVE history.
|
||||
|
||||
## Cost profile
|
||||
|
||||
Opus runs the orchestrators (one per command) and the executor (one per plan session). Sonnet runs the exploration and review swarms (5–10 agents per command, with effort/turn limits). The pipeline front-loads cheap Sonnet work so Opus only does synthesis and execution. Typical total: comparable to a long single Claude Code session — the per-command cost is published in `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl` if you want exact numbers.
|
||||
|
||||
For per-profile cost estimates, see [`docs/profiles.md`](docs/profiles.md).
|
||||
|
||||
## Requirements
|
||||
|
||||
- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) (CLI, desktop app, or web app)
|
||||
|
|
|
|||
169
plugins/voyage/docs/profiles.md
Normal file
169
plugins/voyage/docs/profiles.md
Normal file
|
|
@ -0,0 +1,169 @@
|
|||
# Profile system — voyage v4.1
|
||||
|
||||
This document describes the model profile system: built-in tiers,
|
||||
lookup precedence, custom-profile authoring, drift detection, and
|
||||
cost estimation (with disclaimer).
|
||||
|
||||
## Built-in profiles
|
||||
|
||||
Three pre-defined tiers ship with v4.1, located at
|
||||
`lib/profiles/{economy,balanced,premium}.yaml`.
|
||||
|
||||
| Profile | Brief | Research | Plan | Execute | Review | Continue | Use case |
|
||||
|---------|-------|----------|------|---------|--------|----------|----------|
|
||||
| `economy` | sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; small-scope tasks where you have high confidence the brief is right |
|
||||
| `balanced` (default) | sonnet | sonnet | opus | sonnet | opus | sonnet | Default — opus where reasoning depth pays off (plan synthesis + adversarial review) |
|
||||
| `premium` | opus | sonnet | opus | sonnet | opus | sonnet | Critical-path planning + review when budget allows |
|
||||
|
||||
`balanced` is the v4.1 default. It puts opus on the two phases where
|
||||
quality matters most (Plan synthesis + Review) and sonnet everywhere
|
||||
else. This lands the cost/quality trade-off that solo-developers and
|
||||
small teams actually want.
|
||||
|
||||
`economy` is *strictly experimental* in v4.1. The cross-tier Jaccard
|
||||
floor (0.55) is grounded in parked-synthetic fixtures, not empirical
|
||||
runs (Step 17 calibration was deferred — see
|
||||
`tests/synthetic/profile-jaccard-calibration.md`). If you observe
|
||||
economy-plan quality regressions, fall back to `balanced`.
|
||||
|
||||
## Decision tree
|
||||
|
||||
```
|
||||
Are you uncertain whether the brief is correctly framed?
|
||||
└── Yes → premium (opus on brief + plan + review)
|
||||
└── No → continue
|
||||
↓
|
||||
Is the change small (≤ 5 steps in the plan)?
|
||||
└── Yes → economy (sonnet everywhere)
|
||||
└── No → balanced (opus on plan + review)
|
||||
|
||||
Special cases:
|
||||
- Critical-infrastructure plan → premium
|
||||
- Migration with rollback risk → premium
|
||||
- Research-heavy task (≥ 4 dimensions) → balanced (research-stage benefits)
|
||||
- Bug fix with clear reproducer → economy
|
||||
- Documentation-only PR → economy
|
||||
```
|
||||
|
||||
## Lookup order
|
||||
|
||||
Voyage resolves the profile in this priority order:
|
||||
|
||||
1. **Explicit `--profile <name>` flag** — passed to the command
|
||||
2. **Plan-file frontmatter `profile:`** — when resuming via
|
||||
`/trekexecute --resume` or `/trekcontinue`
|
||||
3. **`VOYAGE_PROFILE` environment variable** — useful for headless CI
|
||||
4. **Default `balanced`** — final fallback
|
||||
|
||||
The resolved value is recorded in two places:
|
||||
|
||||
- Plan-file frontmatter `profile: <name>` and `phase_models: [...]`
|
||||
- Stats stream `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl` —
|
||||
`profile`, `profile_source`, `phase_models`, `model_used`,
|
||||
`phase_models_resolved` fields
|
||||
|
||||
`profile_source` distinguishes how the profile was resolved (`flag` /
|
||||
`plan_frontmatter` / `env` / `default`), so dashboards can surface
|
||||
unexpected env-var inheritance in CI.
|
||||
|
||||
## Custom profiles
|
||||
|
||||
Drop a YAML file at `lib/profiles/<name>.yaml` to define a new tier.
|
||||
The validator (`lib/validators/profile-validator.mjs`) enforces:
|
||||
|
||||
- Every `phase_models[].phase` must be a known phase enum:
|
||||
`brief` / `research` / `plan` / `execute` / `review` / `continue`
|
||||
- Every `phase_models[].model` must match `^(opus|sonnet)(\b|-).*` or
|
||||
one of the canonical short names
|
||||
- All six phases must be present (no partial profiles)
|
||||
|
||||
Custom profiles override built-ins of the same name (lookup is
|
||||
alphabetical with `<custom>` taking precedence). You may NOT redefine
|
||||
`balanced` (the default tier is locked to prevent accidental override
|
||||
of headless CI behaviour); use a different name and reference it via
|
||||
`--profile <new-name>` or `VOYAGE_PROFILE=<new-name>`.
|
||||
|
||||
### Example custom profile
|
||||
|
||||
```yaml
|
||||
# lib/profiles/critical.yaml — opus everywhere except continue
|
||||
phase_models:
|
||||
- phase: brief
|
||||
model: opus
|
||||
- phase: research
|
||||
model: opus
|
||||
- phase: plan
|
||||
model: opus
|
||||
- phase: execute
|
||||
model: opus
|
||||
- phase: review
|
||||
model: opus
|
||||
- phase: continue
|
||||
model: sonnet
|
||||
```
|
||||
|
||||
Validate with: `node lib/validators/profile-validator.mjs --json lib/profiles/critical.yaml`
|
||||
|
||||
## Drift detection
|
||||
|
||||
In `--strict` mode, `plan-validator.mjs` emits a `MANIFEST_PROFILE_DRIFT`
|
||||
warning when the plan-level `profile:` differs from any step manifest's
|
||||
`profile_used`. The warning is a *signal*, not a failure — the plan
|
||||
remains `valid: true`. This catches:
|
||||
|
||||
- Manual edits where an operator changed a single step's profile
|
||||
- Resume from a partial run where the previous session used a different
|
||||
tier
|
||||
- Copy-paste errors when stitching plan fragments
|
||||
|
||||
To suppress the warning intentionally (e.g. when a critical step
|
||||
genuinely needs a higher tier), document the override in the step's
|
||||
prose and re-run with `--soft` to validate without strict-mode warnings.
|
||||
|
||||
## Cost estimation
|
||||
|
||||
> **Disclaimer:** the table below is an *anslag*, not a contractual
|
||||
> SLA. Real cost depends on context size, agent-swarm cardinality,
|
||||
> tool-use density, and Claude Code billing schedule. Treat these as
|
||||
> rough order-of-magnitude.
|
||||
|
||||
| Profile | Brief | Research | Plan | Execute | Review | Total |
|
||||
|---------|-------|----------|------|---------|--------|-------|
|
||||
| `economy` | $0.10–0.50 | $0.50–2.00 | $0.50–2.00 | $1.00–5.00 | $0.20–1.00 | **$2–10** |
|
||||
| `balanced` | $0.10–0.50 | $0.50–2.00 | $1.00–4.00 | $1.00–5.00 | $0.50–2.00 | **$3–14** |
|
||||
| `premium` | $0.50–2.00 | $0.50–2.00 | $1.00–4.00 | $1.00–5.00 | $0.50–2.00 | **$4–15** |
|
||||
|
||||
Numbers are per *full pipeline run* (brief + research + plan +
|
||||
execute + review) on a moderate-complexity task. Numbers scale roughly
|
||||
linearly with the size of the resulting plan (10 steps ≈ baseline; 30
|
||||
steps ≈ 3× the execute column).
|
||||
|
||||
Per-profile actuals are emitted to JSONL stats — pipe them through the
|
||||
OTel export (`docs/observability.md`) to get real cost-attribution
|
||||
graphs in Grafana. Replace the table above with your own measured
|
||||
numbers after ≥ 3 runs of each profile.
|
||||
|
||||
## Deferred to v4.2
|
||||
|
||||
- **`balanced.external_research_enabled` operator-override** —
|
||||
v4.1 omits this per scope-guardian SG2. v4.2 may add an opt-in
|
||||
flag to enable external research agents in the balanced tier
|
||||
without forcing premium.
|
||||
- **Empirical Jaccard re-calibration** — parked-synthetic fixtures
|
||||
in v4.1 use a 0.55 conservative starting threshold. v4.2 plans an
|
||||
empirical re-run with $60-120 LLM budget to derive a calibrated
|
||||
threshold from real economy-vs-premium plan pairs.
|
||||
- **ROUGE-L + char-4gram MinHash** as primary/secondary cross-tier
|
||||
gates per research/02 Recommendation #7. Jaccard remains the gate
|
||||
in v4.1; v4.2 may layer ROUGE-L on top.
|
||||
|
||||
## See also
|
||||
|
||||
- [`README.md` § Profile system](../README.md) — top-level overview
|
||||
- [`CLAUDE.md` § Profile system](../CLAUDE.md) — internal reference
|
||||
- [`docs/observability.md`](observability.md) — JSONL → OTel pipeline
|
||||
- [`tests/synthetic/profile-jaccard-calibration.md`](../tests/synthetic/profile-jaccard-calibration.md)
|
||||
— calibration status and threshold rationale
|
||||
- [`lib/profiles/`](../lib/profiles/) — built-in profile YAMLs
|
||||
- [`lib/validators/profile-validator.mjs`](../lib/validators/profile-validator.mjs)
|
||||
— schema validator with CLI shim
|
||||
|
|
@ -301,6 +301,75 @@ test('at least one pipeline command-file references `phase_models` canonical nam
|
|||
);
|
||||
});
|
||||
|
||||
// --- v4.1 Step 22 — post-write CLAUDE.md / README.md pinning ---
|
||||
//
|
||||
// Plan-critic Blocker 2 fix: Step 21 only pinned commands/*.md (which
|
||||
// are written in Step 7 / Wave 3). Step 22 writes the top-level docs
|
||||
// and extends pinning here so doc-consistency stays green AFTER Step 22.
|
||||
|
||||
test('CLAUDE.md documents --profile flag', () => {
|
||||
const md = read('CLAUDE.md');
|
||||
assert.match(
|
||||
md,
|
||||
/--profile\b/,
|
||||
'CLAUDE.md must document the --profile flag (v4.1 SC #20)',
|
||||
);
|
||||
});
|
||||
|
||||
test('CLAUDE.md uses canonical name `phase_models`', () => {
|
||||
const md = read('CLAUDE.md');
|
||||
assert.match(
|
||||
md,
|
||||
/phase_models/,
|
||||
'CLAUDE.md must use canonical name "phase_models" (v4.1 SC #20)',
|
||||
);
|
||||
for (const bad of ['model_per_phase', 'phase_to_model', 'profile_phase_models']) {
|
||||
assert.ok(
|
||||
!md.includes(bad),
|
||||
`CLAUDE.md must NOT use legacy alias "${bad}"`,
|
||||
);
|
||||
}
|
||||
});
|
||||
|
||||
test('README.md documents --profile flag for all 6 commands', () => {
|
||||
// SG1: README flag-table coverage is gating for SC #20. README is the
|
||||
// primary discovery surface for new users.
|
||||
const md = read('README.md');
|
||||
// Top-level Profile system section is required so the flag is
|
||||
// discoverable independent of per-command tables.
|
||||
assert.match(md, /## Profile system/, 'README.md missing top-level "## Profile system" section');
|
||||
// Every per-command Modes table must include --profile (count of
|
||||
// --profile occurrences should be ≥ 6 — one per command + Profile
|
||||
// system section).
|
||||
const profileMentions = (md.match(/--profile\b/g) || []).length;
|
||||
assert.ok(
|
||||
profileMentions >= 6,
|
||||
`README.md must mention --profile ≥ 6 times (one per command + section), got ${profileMentions}`,
|
||||
);
|
||||
});
|
||||
|
||||
test('CHANGELOG.md has v4.1.0 entry', () => {
|
||||
const cl = read('CHANGELOG.md');
|
||||
assert.match(
|
||||
cl,
|
||||
/## v4\.1\.0\b/,
|
||||
'CHANGELOG.md must include "## v4.1.0" entry per Keep-a-Changelog 1.1.0',
|
||||
);
|
||||
});
|
||||
|
||||
test('docs/profiles.md exists and documents Custom.yaml authoring', () => {
|
||||
const dp = read('docs/profiles.md');
|
||||
assert.ok(dp.length > 1000, 'docs/profiles.md must be substantive (> 1000 chars)');
|
||||
// Must document custom-profile authoring (Step 22 manifest must_contain
|
||||
// pattern: "Custom.yaml" — case-insensitive match handled here as
|
||||
// /[Cc]ustom[. ]/ to allow either "custom.yaml" or "Custom profile" prose).
|
||||
assert.match(
|
||||
dp,
|
||||
/[Cc]ustom\.yaml|[Cc]ustom profile|<custom>\.yaml/,
|
||||
'docs/profiles.md must document custom profile authoring',
|
||||
);
|
||||
});
|
||||
|
||||
test('commands/trekplan.md Phase 8 seals Opus-4.7 schema-drift defense', () => {
|
||||
const cmd = read('commands/trekplan.md');
|
||||
// Locate Phase 8 section
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue