ktg-plugin-marketplace/plugins/voyage/tests/synthetic/profile-plan-run-premium-2.md
Kjell Tore Guttormsen 90425073b2 test(voyage): empirical jaccard calibration — parked-synthetic placeholders + threshold pin
Step 17 of v4.1 — escalate-handler invoked. Live LLM-budget ($60-120 for
4 plan-runs á /trekplan --profile {economy,premium} on
examples/01-add-verbose-flag/brief.md) was not authorized for the
v4.1-execute-4b session.

Per Step 17 escalate-fallback (and NEXT-SESSION-PROMPT.local.md
fallback-strategy): document economy-Plan as parked, use balanced as
low-threshold profile, defer empirical calibration to v4.2.

Files:
  tests/synthetic/profile-plan-run-economy-1.md   — 30 steps, parked-synthetic
  tests/synthetic/profile-plan-run-economy-2.md   — 30 steps, parked-synthetic
  tests/synthetic/profile-plan-run-premium-1.md   — 40 steps, parked-synthetic
  tests/synthetic/profile-plan-run-premium-2.md   — 40 steps, parked-synthetic
  tests/synthetic/profile-jaccard-calibration.md  — threshold 0.55 pinned per
                                                    research/02 conservative starting value

Replacement procedure documented in calibration.md "How to replace"
section. Trigger conditions for empirical re-run:
  1. Cross-tier smoke-test (Step 18) flips red on a real run
  2. v4.2 LLM-budget approval
  3. New profile tier added
2026-05-09 09:54:45 +02:00

3.1 KiB

type plan_version created task slug run_id profile_used status steps
trekplan-synthetic 1.7 2026-05-09 Add --verbose flag to CLI verbose-flag premium-2 premium parked-synthetic
Add config entry for verbose flag in package.json
Define types for verbose mode in types.ts
Update parseArgs to recognize --verbose flag
Pass verbose context through main entry point
Add log level enum (silent, normal, verbose)
Wire log level into logger module
Replace console.log with logger.info in handler.ts
Add tests for parseArgs --verbose recognition
Add tests for log level enum mapping
Update README with --verbose flag documentation
Add CHANGELOG entry for verbose flag
Bump package.json minor version
Add lint rule blocking direct console usage
Run lint and fix new violations
Add CLI integration test for --verbose end-to-end
Add fixture file for verbose log capture
Document verbose output format in docs/cli.md
Add jsdoc for new logger API
Verify all existing tests pass with verbose disabled
Add backward-compat test for legacy quiet behavior
Add edge-case test for repeated --verbose flags
Add edge-case test for --verbose with --silent collision
Update help text to list --verbose flag
Add usage example to docs/quickstart.md
Verify CI matrix runs on Node 18 and 20
Add npm script for verbose mode debugging
Run security audit on logger dependency tree
Verify no PII leaks in verbose log output
Add manual test checklist to CONTRIBUTING.md
Update .gitignore for verbose log dump files
Add cleanup logic for stale verbose logs
Add unit test for cleanup logic
Verify exit code on verbose mode error
Add stderr routing for warnings in verbose
Add timestamp prefix in verbose log lines
Add test for timestamp format
Update troubleshooting guide with verbose flag
Verify version sync across all docs
Add benchmark for verbose log capture overhead
Document overhead methodology in PERF.md

Synthetic plan run premium-2 — Add --verbose flag to CLI (PARKED)

Companion to profile-plan-run-premium-1.md. Same premium profile, simulated as a second run with two terminal steps replaced (emission cost / benchmark methodology → capture overhead / overhead methodology) to model intra-tier variance.

Intra-tier Jaccard

Premium-1 vs premium-2 share 38/40 step titles; union = 42. Jaccard = 38/42 ≈ 0.905 — matches the existing baseline plan-run-A vs plan-run-B floor (≥ 0.833 in plan-determinism.test.mjs).

Cross-tier Jaccard rationale

Pairing premium fixtures (40 steps) against economy fixtures (30 steps) yields ~30 shared titles (after string-normalisering), with union ~40. Conservative cross-tier Jaccard ≈ 30/40 = 0.75 in this synthetic — but the calibration file pins a more conservative floor (0.55) per research/02 to absorb empirical variance once real runs replace these fixtures. See profile-jaccard-calibration.md for threshold derivation.