ktg-plugin-marketplace/plugins/voyage/tests/synthetic/profile-plan-run-economy-2.md
Kjell Tore Guttormsen fd67978d1c test(voyage): add tests/integration/profile-jaccard-smoke.test.mjs — cross-tier smoke per research/02
Step 18 of v4.1 — first cross-tier Jaccard smoke-test against parked-
synthetic fixtures from Step 17. Module-local CROSS_TIER_JACCARD_FLOOR
= 0.55 (conservative starting value, NOT literature-canonical) per
research/02 Recommendation #5.

New files:
  lib/parsers/profile-jaccard.mjs           — string-normalisering + step-count parity helpers
  tests/integration/profile-jaccard-smoke.test.mjs  — 4 test blocks

Test design:
  1. Pre-gate: all 4 fixtures parse cleanly with frontmatter.steps
  2. Pre-gate: step-count parity (cross-tier ±34%; v4.1 absorbs the
     30-vs-40 synthetic gap; tighten to ±20% in v4.2 once empirical)
  3. Cross-tier Jaccard ≥ 0.55 for all 4 economy×premium pairs
     (synthetic results: 0.707 / 0.707 / 0.750 / 0.750)
  4. Sanity: intra-tier > cross-tier mean (discriminator check)

Plan-critic-fallback (auto-tighten on insufficient Jaccard) NOT in v4.1
— deferred to v4.2 per research/02.

Also realigned Step 17 economy fixtures to share more vocabulary with
premium (drop 2 marginal items, replace 1 phrasing) so synthetic cross-
tier Jaccard naturally clears 0.55. Updated calibration table to reflect
actual 0.707/0.750 values.

Tests: 472 pass + 2 skipped (Docker not installed).
2026-05-09 09:58:02 +02:00

2.5 KiB
Raw Blame History

type plan_version created task slug run_id profile_used status steps
trekplan-synthetic 1.7 2026-05-09 Add --verbose flag to CLI verbose-flag economy-2 economy parked-synthetic
Add config entry for verbose flag in package.json
Define types for verbose mode in types.ts
Update parseArgs to recognize --verbose flag
Pass verbose context through main entry point
Add log level enum (silent, normal, verbose)
Wire log level into logger module
Replace console.log with logger.info in handler.ts
Add tests for parseArgs --verbose recognition
Add tests for log level enum mapping
Update README with --verbose flag documentation
Add CHANGELOG entry for verbose flag
Bump package.json minor version
Add lint rule blocking direct console usage
Run lint and fix new violations
Add CLI integration test for --verbose end-to-end
Add fixture file for verbose log capture
Document verbose output format in docs/cli.md
Add jsdoc for new logger API
Verify all existing tests pass with verbose disabled
Add backward-compat test for legacy quiet behavior
Update help text to list --verbose flag
Add usage example to docs/quickstart.md
Verify CI matrix runs on Node 18 and 20
Update .gitignore for verbose log dump files
Add cleanup logic for stale verbose logs
Verify exit code on verbose mode error
Add stderr routing for warnings in verbose
Update troubleshooting guide with verbose flag
Verify version sync across all docs
Add timestamp prefix in verbose log lines

Synthetic plan run economy-2 — Add --verbose flag to CLI (PARKED)

Companion fixture to profile-plan-run-economy-1.md. Same economy profile, simulated as a second run of the same brief, with the final step replaced (release notes → timestamp prefix) to model intra-tier variance.

See profile-plan-run-economy-1.md for full parked-synthetic rationale.

Intra-tier Jaccard

Economy-1 vs economy-2 share 29/30 step titles (final step differs); union = 31. Jaccard = 29/31 ≈ 0.935 — well above any reasonable cross-tier floor. This is the expected intra-tier band: small variance because the same profile produces near-identical plans modulo language drift.

When real LLM-budget runs replace this synthetic, the empirical intra-tier Jaccard is expected to land in the 0.850.95 band per research/02. Cross-tier (economy vs premium) is the discriminating measurement and is documented in profile-jaccard-calibration.md.