ktg-plugin-marketplace/plugins/voyage/tests/synthetic/profile-plan-run-economy-2.md
Kjell Tore Guttormsen 90425073b2 test(voyage): empirical jaccard calibration — parked-synthetic placeholders + threshold pin
Step 17 of v4.1 — escalate-handler invoked. Live LLM-budget ($60-120 for
4 plan-runs á /trekplan --profile {economy,premium} on
examples/01-add-verbose-flag/brief.md) was not authorized for the
v4.1-execute-4b session.

Per Step 17 escalate-fallback (and NEXT-SESSION-PROMPT.local.md
fallback-strategy): document economy-Plan as parked, use balanced as
low-threshold profile, defer empirical calibration to v4.2.

Files:
  tests/synthetic/profile-plan-run-economy-1.md   — 30 steps, parked-synthetic
  tests/synthetic/profile-plan-run-economy-2.md   — 30 steps, parked-synthetic
  tests/synthetic/profile-plan-run-premium-1.md   — 40 steps, parked-synthetic
  tests/synthetic/profile-plan-run-premium-2.md   — 40 steps, parked-synthetic
  tests/synthetic/profile-jaccard-calibration.md  — threshold 0.55 pinned per
                                                    research/02 conservative starting value

Replacement procedure documented in calibration.md "How to replace"
section. Trigger conditions for empirical re-run:
  1. Cross-tier smoke-test (Step 18) flips red on a real run
  2. v4.2 LLM-budget approval
  3. New profile tier added
2026-05-09 09:54:45 +02:00

2.2 KiB
Raw Blame History

type plan_version created task slug run_id profile_used status steps
trekplan-synthetic 1.7 2026-05-09 Add --verbose flag to CLI verbose-flag economy-2 economy parked-synthetic
Add verbose flag config to package.json
Update parseArgs to handle --verbose
Add log level enum
Wire log level into logger module
Replace console.log calls with logger
Add tests for parseArgs verbose
Add tests for log level enum
Update README with --verbose docs
Add CHANGELOG entry for verbose flag
Bump package.json minor version
Add lint rule blocking console usage
Run lint and fix violations
Add CLI integration test for verbose
Add fixture for verbose log capture
Document verbose output format
Add jsdoc for logger API
Verify existing tests pass
Add backward-compat test for quiet behavior
Add edge-case test for repeated --verbose flags
Update help text for --verbose
Add usage example to quickstart
Verify CI matrix on Node 18 and 20
Add manual test checklist
Update .gitignore for log dumps
Add cleanup logic for stale logs
Verify exit code on verbose error
Add stderr routing for warnings
Update troubleshooting guide
Verify version sync across docs
Add timestamp prefix to verbose lines

Synthetic plan run economy-2 — Add --verbose flag to CLI (PARKED)

Companion fixture to profile-plan-run-economy-1.md. Same economy profile, simulated as a second run of the same brief, with one step replaced (benchmark → timestamp) to model intra-tier variance.

See profile-plan-run-economy-1.md for full parked-synthetic rationale.

Intra-tier Jaccard

Economy-1 vs economy-2 share 29/30 step titles (one differs); union = 31. Jaccard = 29/31 ≈ 0.935 — well above any reasonable cross-tier floor. This is the expected intra-tier band: small variance because the same profile produces near-identical plans modulo language drift.

When real LLM-budget runs replace this synthetic, the empirical intra-tier Jaccard is expected to land in the 0.850.95 band per research/02. Cross-tier (economy vs premium) is the discriminating measurement and is documented in profile-jaccard-calibration.md.