test(voyage): empirical jaccard calibration — parked-synthetic placeholders + threshold pin
Step 17 of v4.1 — escalate-handler invoked. Live LLM-budget ($60-120 for
4 plan-runs á /trekplan --profile {economy,premium} on
examples/01-add-verbose-flag/brief.md) was not authorized for the
v4.1-execute-4b session.
Per Step 17 escalate-fallback (and NEXT-SESSION-PROMPT.local.md
fallback-strategy): document economy-Plan as parked, use balanced as
low-threshold profile, defer empirical calibration to v4.2.
Files:
tests/synthetic/profile-plan-run-economy-1.md — 30 steps, parked-synthetic
tests/synthetic/profile-plan-run-economy-2.md — 30 steps, parked-synthetic
tests/synthetic/profile-plan-run-premium-1.md — 40 steps, parked-synthetic
tests/synthetic/profile-plan-run-premium-2.md — 40 steps, parked-synthetic
tests/synthetic/profile-jaccard-calibration.md — threshold 0.55 pinned per
research/02 conservative starting value
Replacement procedure documented in calibration.md "How to replace"
section. Trigger conditions for empirical re-run:
1. Cross-tier smoke-test (Step 18) flips red on a real run
2. v4.2 LLM-budget approval
3. New profile tier added
This commit is contained in:
parent
8bbe60c2f5
commit
90425073b2
5 changed files with 386 additions and 0 deletions
61
plugins/voyage/tests/synthetic/profile-plan-run-economy-2.md
Normal file
61
plugins/voyage/tests/synthetic/profile-plan-run-economy-2.md
Normal file
|
|
@ -0,0 +1,61 @@
|
|||
---
|
||||
type: trekplan-synthetic
|
||||
plan_version: "1.7"
|
||||
created: 2026-05-09
|
||||
task: "Add --verbose flag to CLI"
|
||||
slug: verbose-flag
|
||||
run_id: economy-2
|
||||
profile_used: economy
|
||||
status: parked-synthetic
|
||||
steps:
|
||||
- "Add verbose flag config to package.json"
|
||||
- "Update parseArgs to handle --verbose"
|
||||
- "Add log level enum"
|
||||
- "Wire log level into logger module"
|
||||
- "Replace console.log calls with logger"
|
||||
- "Add tests for parseArgs verbose"
|
||||
- "Add tests for log level enum"
|
||||
- "Update README with --verbose docs"
|
||||
- "Add CHANGELOG entry for verbose flag"
|
||||
- "Bump package.json minor version"
|
||||
- "Add lint rule blocking console usage"
|
||||
- "Run lint and fix violations"
|
||||
- "Add CLI integration test for verbose"
|
||||
- "Add fixture for verbose log capture"
|
||||
- "Document verbose output format"
|
||||
- "Add jsdoc for logger API"
|
||||
- "Verify existing tests pass"
|
||||
- "Add backward-compat test for quiet behavior"
|
||||
- "Add edge-case test for repeated --verbose flags"
|
||||
- "Update help text for --verbose"
|
||||
- "Add usage example to quickstart"
|
||||
- "Verify CI matrix on Node 18 and 20"
|
||||
- "Add manual test checklist"
|
||||
- "Update .gitignore for log dumps"
|
||||
- "Add cleanup logic for stale logs"
|
||||
- "Verify exit code on verbose error"
|
||||
- "Add stderr routing for warnings"
|
||||
- "Update troubleshooting guide"
|
||||
- "Verify version sync across docs"
|
||||
- "Add timestamp prefix to verbose lines"
|
||||
---
|
||||
|
||||
# Synthetic plan run economy-2 — Add --verbose flag to CLI (PARKED)
|
||||
|
||||
Companion fixture to `profile-plan-run-economy-1.md`. Same `economy`
|
||||
profile, simulated as a second run of the same brief, with one step
|
||||
replaced (benchmark → timestamp) to model intra-tier variance.
|
||||
|
||||
See `profile-plan-run-economy-1.md` for full parked-synthetic rationale.
|
||||
|
||||
## Intra-tier Jaccard
|
||||
|
||||
Economy-1 vs economy-2 share 29/30 step titles (one differs); union = 31.
|
||||
Jaccard = 29/31 ≈ 0.935 — well above any reasonable cross-tier floor.
|
||||
This is the expected intra-tier band: small variance because the same
|
||||
profile produces near-identical plans modulo language drift.
|
||||
|
||||
When real LLM-budget runs replace this synthetic, the empirical
|
||||
intra-tier Jaccard is expected to land in the 0.85–0.95 band per
|
||||
research/02. Cross-tier (economy vs premium) is the discriminating
|
||||
measurement and is documented in `profile-jaccard-calibration.md`.
|
||||
Loading…
Add table
Add a link
Reference in a new issue