Kjell Tore Guttormsen 90425073b2 test(voyage): empirical jaccard calibration — parked-synthetic placeholders + threshold pin

Step 17 of v4.1 — escalate-handler invoked. Live LLM-budget ($60-120 for
4 plan-runs á /trekplan --profile {economy,premium} on
examples/01-add-verbose-flag/brief.md) was not authorized for the
v4.1-execute-4b session.

Per Step 17 escalate-fallback (and NEXT-SESSION-PROMPT.local.md
fallback-strategy): document economy-Plan as parked, use balanced as
low-threshold profile, defer empirical calibration to v4.2.

Files:
  tests/synthetic/profile-plan-run-economy-1.md   — 30 steps, parked-synthetic
  tests/synthetic/profile-plan-run-economy-2.md   — 30 steps, parked-synthetic
  tests/synthetic/profile-plan-run-premium-1.md   — 40 steps, parked-synthetic
  tests/synthetic/profile-plan-run-premium-2.md   — 40 steps, parked-synthetic
  tests/synthetic/profile-jaccard-calibration.md  — threshold 0.55 pinned per
                                                    research/02 conservative starting value

Replacement procedure documented in calibration.md "How to replace"
section. Trigger conditions for empirical re-run:
  1. Cross-tier smoke-test (Step 18) flips red on a real run
  2. v4.2 LLM-budget approval
  3. New profile tier added

2026-05-09 09:54:45 +02:00

2.2 KiB

Raw Blame History

type

plan_version

created

task

slug

run_id

profile_used

status

steps

trekplan-synthetic

1.7

2026-05-09

Add --verbose flag to CLI

verbose-flag

economy-2

economy

parked-synthetic

Add verbose flag config to package.json

Update parseArgs to handle --verbose

Add log level enum

Wire log level into logger module

Replace console.log calls with logger

Add tests for parseArgs verbose

Add tests for log level enum

Update README with --verbose docs

Add CHANGELOG entry for verbose flag

Bump package.json minor version

Add lint rule blocking console usage

Run lint and fix violations

Add CLI integration test for verbose

Add fixture for verbose log capture

Document verbose output format

Add jsdoc for logger API

Verify existing tests pass

Add backward-compat test for quiet behavior

Add edge-case test for repeated --verbose flags

Update help text for --verbose

Add usage example to quickstart

Verify CI matrix on Node 18 and 20

Add manual test checklist

Update .gitignore for log dumps

Add cleanup logic for stale logs

Verify exit code on verbose error

Add stderr routing for warnings

Update troubleshooting guide

Verify version sync across docs

Add timestamp prefix to verbose lines

Synthetic plan run economy-2 — Add --verbose flag to CLI (PARKED)

Companion fixture to profile-plan-run-economy-1.md. Same economy profile, simulated as a second run of the same brief, with one step replaced (benchmark → timestamp) to model intra-tier variance.

See profile-plan-run-economy-1.md for full parked-synthetic rationale.

Intra-tier Jaccard

Economy-1 vs economy-2 share 29/30 step titles (one differs); union = 31. Jaccard = 29/31 ≈ 0.935 — well above any reasonable cross-tier floor. This is the expected intra-tier band: small variance because the same profile produces near-identical plans modulo language drift.

When real LLM-budget runs replace this synthetic, the empirical intra-tier Jaccard is expected to land in the 0.85–0.95 band per research/02. Cross-tier (economy vs premium) is the discriminating measurement and is documented in profile-jaccard-calibration.md.

2.2 KiB Raw Blame History Unescape Escape

Synthetic plan run economy-2 — Add --verbose flag to CLI (PARKED)

Intra-tier Jaccard

2.2 KiB

Raw Blame History