ktg-plugin-marketplace/plugins/voyage/tests/synthetic/profile-plan-run-economy-2.md
Kjell Tore Guttormsen 90425073b2 test(voyage): empirical jaccard calibration — parked-synthetic placeholders + threshold pin
Step 17 of v4.1 — escalate-handler invoked. Live LLM-budget ($60-120 for
4 plan-runs á /trekplan --profile {economy,premium} on
examples/01-add-verbose-flag/brief.md) was not authorized for the
v4.1-execute-4b session.

Per Step 17 escalate-fallback (and NEXT-SESSION-PROMPT.local.md
fallback-strategy): document economy-Plan as parked, use balanced as
low-threshold profile, defer empirical calibration to v4.2.

Files:
  tests/synthetic/profile-plan-run-economy-1.md   — 30 steps, parked-synthetic
  tests/synthetic/profile-plan-run-economy-2.md   — 30 steps, parked-synthetic
  tests/synthetic/profile-plan-run-premium-1.md   — 40 steps, parked-synthetic
  tests/synthetic/profile-plan-run-premium-2.md   — 40 steps, parked-synthetic
  tests/synthetic/profile-jaccard-calibration.md  — threshold 0.55 pinned per
                                                    research/02 conservative starting value

Replacement procedure documented in calibration.md "How to replace"
section. Trigger conditions for empirical re-run:
  1. Cross-tier smoke-test (Step 18) flips red on a real run
  2. v4.2 LLM-budget approval
  3. New profile tier added
2026-05-09 09:54:45 +02:00

61 lines
2.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
type: trekplan-synthetic
plan_version: "1.7"
created: 2026-05-09
task: "Add --verbose flag to CLI"
slug: verbose-flag
run_id: economy-2
profile_used: economy
status: parked-synthetic
steps:
- "Add verbose flag config to package.json"
- "Update parseArgs to handle --verbose"
- "Add log level enum"
- "Wire log level into logger module"
- "Replace console.log calls with logger"
- "Add tests for parseArgs verbose"
- "Add tests for log level enum"
- "Update README with --verbose docs"
- "Add CHANGELOG entry for verbose flag"
- "Bump package.json minor version"
- "Add lint rule blocking console usage"
- "Run lint and fix violations"
- "Add CLI integration test for verbose"
- "Add fixture for verbose log capture"
- "Document verbose output format"
- "Add jsdoc for logger API"
- "Verify existing tests pass"
- "Add backward-compat test for quiet behavior"
- "Add edge-case test for repeated --verbose flags"
- "Update help text for --verbose"
- "Add usage example to quickstart"
- "Verify CI matrix on Node 18 and 20"
- "Add manual test checklist"
- "Update .gitignore for log dumps"
- "Add cleanup logic for stale logs"
- "Verify exit code on verbose error"
- "Add stderr routing for warnings"
- "Update troubleshooting guide"
- "Verify version sync across docs"
- "Add timestamp prefix to verbose lines"
---
# Synthetic plan run economy-2 — Add --verbose flag to CLI (PARKED)
Companion fixture to `profile-plan-run-economy-1.md`. Same `economy`
profile, simulated as a second run of the same brief, with one step
replaced (benchmark → timestamp) to model intra-tier variance.
See `profile-plan-run-economy-1.md` for full parked-synthetic rationale.
## Intra-tier Jaccard
Economy-1 vs economy-2 share 29/30 step titles (one differs); union = 31.
Jaccard = 29/31 ≈ 0.935 — well above any reasonable cross-tier floor.
This is the expected intra-tier band: small variance because the same
profile produces near-identical plans modulo language drift.
When real LLM-budget runs replace this synthetic, the empirical
intra-tier Jaccard is expected to land in the 0.850.95 band per
research/02. Cross-tier (economy vs premium) is the discriminating
measurement and is documented in `profile-jaccard-calibration.md`.