test(voyage): empirical jaccard calibration — parked-synthetic placeholders + threshold pin

Step 17 of v4.1 — escalate-handler invoked. Live LLM-budget ($60-120 for 4 plan-runs á /trekplan --profile {economy,premium} on examples/01-add-verbose-flag/brief.md) was not authorized for the v4.1-execute-4b session. Per Step 17 escalate-fallback (and NEXT-SESSION-PROMPT.local.md fallback-strategy): document economy-Plan as parked, use balanced as low-threshold profile, defer empirical calibration to v4.2. Files: tests/synthetic/profile-plan-run-economy-1.md — 30 steps, parked-synthetic tests/synthetic/profile-plan-run-economy-2.md — 30 steps, parked-synthetic tests/synthetic/profile-plan-run-premium-1.md — 40 steps, parked-synthetic tests/synthetic/profile-plan-run-premium-2.md — 40 steps, parked-synthetic tests/synthetic/profile-jaccard-calibration.md — threshold 0.55 pinned per research/02 conservative starting value Replacement procedure documented in calibration.md "How to replace" section. Trigger conditions for empirical re-run: 1. Cross-tier smoke-test (Step 18) flips red on a real run 2. v4.2 LLM-budget approval 3. New profile tier added
2026-05-09 09:54:45 +02:00 · 2026-05-09 09:54:45 +02:00 · 90425073b2
commit 90425073b2
parent 8bbe60c2f5
5 changed files with 386 additions and 0 deletions
--- a/plugins/voyage/tests/synthetic/profile-plan-run-economy-1.md
+++ b/plugins/voyage/tests/synthetic/profile-plan-run-economy-1.md
@ -0,0 +1,78 @@
+---
+type: trekplan-synthetic
+plan_version: "1.7"
+created: 2026-05-09
+task: "Add --verbose flag to CLI"
+slug: verbose-flag
+run_id: economy-1
+profile_used: economy
+status: parked-synthetic
+steps:
+  - "Add verbose flag config to package.json"
+  - "Update parseArgs to handle --verbose"
+  - "Add log level enum"
+  - "Wire log level into logger module"
+  - "Replace console.log calls with logger"
+  - "Add tests for parseArgs verbose"
+  - "Add tests for log level enum"
+  - "Update README with --verbose docs"
+  - "Add CHANGELOG entry for verbose flag"
+  - "Bump package.json minor version"
+  - "Add lint rule blocking console usage"
+  - "Run lint and fix violations"
+  - "Add CLI integration test for verbose"
+  - "Add fixture for verbose log capture"
+  - "Document verbose output format"
+  - "Add jsdoc for logger API"
+  - "Verify existing tests pass"
+  - "Add backward-compat test for quiet behavior"
+  - "Add edge-case test for repeated --verbose flags"
+  - "Update help text for --verbose"
+  - "Add usage example to quickstart"
+  - "Verify CI matrix on Node 18 and 20"
+  - "Add manual test checklist"
+  - "Update .gitignore for log dumps"
+  - "Add cleanup logic for stale logs"
+  - "Verify exit code on verbose error"
+  - "Add stderr routing for warnings"
+  - "Update troubleshooting guide"
+  - "Verify version sync across docs"
+  - "Add benchmark for verbose emission"
+---
+
+# Synthetic plan run economy-1 — Add --verbose flag to CLI (PARKED)
+
+This fixture is a SYNTHETIC PLACEHOLDER for empirical Jaccard calibration
+that requires live LLM-budget ($60-120 for 4 plan-runs). Marked
+`status: parked-synthetic` per the Step 17 escalate-handler in plan.md.
+
+## Why parked
+
+Per NEXT-SESSION-PROMPT.local.md fallback: "Hvis Step 17 LLM-budget
+blokkerer: dokumentér `economy`-Plan som `parked` i kalibrasjons-fil og
+fortsett med Step 18-19 ved bruk av `balanced` som lavterskel-profil."
+
+The session running v4.1-execute-4b did not have authorization for live
+LLM invocation against `/trekplan --profile economy --brief
+examples/01-add-verbose-flag/brief.md`. Synthetic fixtures here represent
+the *shape* of what such a run would produce — fewer total steps (30 vs
+40 in baseline plan-run-A), larger / coarser-grained steps that omit
+sub-verification and benchmark items.
+
+## How this fixture is consumed
+
+`tests/integration/profile-jaccard-smoke.test.mjs` (Step 18) reads the
+`steps` array from the frontmatter and pairs it with the corresponding
+`premium` fixtures to compute cross-tier Jaccard.
+
+When real LLM budget is approved (deferred to v4.2), regenerate this
+fixture by running the actual command and overwriting the frontmatter
+`steps` array. Update `status: parked-synthetic` → `status: empirical`.
+
+## Step-shape rationale
+
+Economy profile uses sonnet for all phases (per
+`lib/profiles/economy.yaml`). Empirical observation from research/02:
+sonnet plans tend toward larger steps, fewer verification entries, and
+fewer edge-case branches than opus plans. The 30 entries here capture the
+typical gist + omit ~10 of the finer-grained items present in opus runs.