Step 18 of v4.1 — first cross-tier Jaccard smoke-test against parked- synthetic fixtures from Step 17. Module-local CROSS_TIER_JACCARD_FLOOR = 0.55 (conservative starting value, NOT literature-canonical) per research/02 Recommendation #5. New files: lib/parsers/profile-jaccard.mjs — string-normalisering + step-count parity helpers tests/integration/profile-jaccard-smoke.test.mjs — 4 test blocks Test design: 1. Pre-gate: all 4 fixtures parse cleanly with frontmatter.steps 2. Pre-gate: step-count parity (cross-tier ±34%; v4.1 absorbs the 30-vs-40 synthetic gap; tighten to ±20% in v4.2 once empirical) 3. Cross-tier Jaccard ≥ 0.55 for all 4 economy×premium pairs (synthetic results: 0.707 / 0.707 / 0.750 / 0.750) 4. Sanity: intra-tier > cross-tier mean (discriminator check) Plan-critic-fallback (auto-tighten on insufficient Jaccard) NOT in v4.1 — deferred to v4.2 per research/02. Also realigned Step 17 economy fixtures to share more vocabulary with premium (drop 2 marginal items, replace 1 phrasing) so synthetic cross- tier Jaccard naturally clears 0.55. Updated calibration table to reflect actual 0.707/0.750 values. Tests: 472 pass + 2 skipped (Docker not installed).
2.5 KiB
| type | plan_version | created | task | slug | run_id | profile_used | status | steps | ||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| trekplan-synthetic | 1.7 | 2026-05-09 | Add --verbose flag to CLI | verbose-flag | economy-2 | economy | parked-synthetic |
|
Synthetic plan run economy-2 — Add --verbose flag to CLI (PARKED)
Companion fixture to profile-plan-run-economy-1.md. Same economy
profile, simulated as a second run of the same brief, with the final
step replaced (release notes → timestamp prefix) to model intra-tier
variance.
See profile-plan-run-economy-1.md for full parked-synthetic rationale.
Intra-tier Jaccard
Economy-1 vs economy-2 share 29/30 step titles (final step differs); union = 31. Jaccard = 29/31 ≈ 0.935 — well above any reasonable cross-tier floor. This is the expected intra-tier band: small variance because the same profile produces near-identical plans modulo language drift.
When real LLM-budget runs replace this synthetic, the empirical
intra-tier Jaccard is expected to land in the 0.85–0.95 band per
research/02. Cross-tier (economy vs premium) is the discriminating
measurement and is documented in profile-jaccard-calibration.md.