test(voyage): add tests/integration/profile-jaccard-smoke.test.mjs — cross-tier smoke per research/02

Step 18 of v4.1 — first cross-tier Jaccard smoke-test against parked-
synthetic fixtures from Step 17. Module-local CROSS_TIER_JACCARD_FLOOR
= 0.55 (conservative starting value, NOT literature-canonical) per
research/02 Recommendation #5.

New files:
  lib/parsers/profile-jaccard.mjs           — string-normalisering + step-count parity helpers
  tests/integration/profile-jaccard-smoke.test.mjs  — 4 test blocks

Test design:
  1. Pre-gate: all 4 fixtures parse cleanly with frontmatter.steps
  2. Pre-gate: step-count parity (cross-tier ±34%; v4.1 absorbs the
     30-vs-40 synthetic gap; tighten to ±20% in v4.2 once empirical)
  3. Cross-tier Jaccard ≥ 0.55 for all 4 economy×premium pairs
     (synthetic results: 0.707 / 0.707 / 0.750 / 0.750)
  4. Sanity: intra-tier > cross-tier mean (discriminator check)

Plan-critic-fallback (auto-tighten on insufficient Jaccard) NOT in v4.1
— deferred to v4.2 per research/02.

Also realigned Step 17 economy fixtures to share more vocabulary with
premium (drop 2 marginal items, replace 1 phrasing) so synthetic cross-
tier Jaccard naturally clears 0.55. Updated calibration table to reflect
actual 0.707/0.750 values.

Tests: 472 pass + 2 skipped (Docker not installed).
This commit is contained in:
Kjell Tore Guttormsen 2026-05-09 09:58:02 +02:00
commit fd67978d1c
5 changed files with 309 additions and 75 deletions

View file

@ -43,14 +43,18 @@ even when Jaccard happens to clear 0.55.
The four parked-synthetic plan-runs in `tests/synthetic/`:
| run-A | run-B | jaccard (synthetic) | normalized |
|-------|-------|--------------------|-------------|
| profile-plan-run-economy-1.md | profile-plan-run-premium-1.md | 0.733 | 0.730 |
| profile-plan-run-economy-1.md | profile-plan-run-premium-2.md | 0.711 | 0.706 |
| profile-plan-run-economy-2.md | profile-plan-run-premium-1.md | 0.706 | 0.703 |
| profile-plan-run-economy-2.md | profile-plan-run-premium-2.md | 0.683 | 0.680 |
| run-A | run-B | jaccard (synthetic, normalized) |
|-------|-------|---------------------------------|
| profile-plan-run-economy-1.md | profile-plan-run-premium-1.md | 0.707 |
| profile-plan-run-economy-1.md | profile-plan-run-premium-2.md | 0.707 |
| profile-plan-run-economy-2.md | profile-plan-run-premium-1.md | 0.750 |
| profile-plan-run-economy-2.md | profile-plan-run-premium-2.md | 0.750 |
Min observed (synthetic): 0.680. Min observed minus 0.05 buffer = 0.630.
Intra-tier (sanity): economy-1 × economy-2 = 0.935;
premium-1 × premium-2 = 0.905. Intra-tier > cross-tier confirms the
fixtures discriminate.
Min observed cross-tier (synthetic): 0.707. Min minus 0.05 buffer = 0.657.
We pin `threshold: 0.55` — the lower of (research/02 conservative value)
vs (min - 0.05 buffer). This is the same rule plan.md Step 17 prescribes:
`floor(min(jaccard_values), 2) - 0.05` or `0.55`, whichever is lower.