Step 17 of v4.1 — escalate-handler invoked. Live LLM-budget ($60-120 for
4 plan-runs á /trekplan --profile {economy,premium} on
examples/01-add-verbose-flag/brief.md) was not authorized for the
v4.1-execute-4b session.
Per Step 17 escalate-fallback (and NEXT-SESSION-PROMPT.local.md
fallback-strategy): document economy-Plan as parked, use balanced as
low-threshold profile, defer empirical calibration to v4.2.
Files:
tests/synthetic/profile-plan-run-economy-1.md — 30 steps, parked-synthetic
tests/synthetic/profile-plan-run-economy-2.md — 30 steps, parked-synthetic
tests/synthetic/profile-plan-run-premium-1.md — 40 steps, parked-synthetic
tests/synthetic/profile-plan-run-premium-2.md — 40 steps, parked-synthetic
tests/synthetic/profile-jaccard-calibration.md — threshold 0.55 pinned per
research/02 conservative starting value
Replacement procedure documented in calibration.md "How to replace"
section. Trigger conditions for empirical re-run:
1. Cross-tier smoke-test (Step 18) flips red on a real run
2. v4.2 LLM-budget approval
3. New profile tier added
2.2 KiB
| type | plan_version | created | task | slug | run_id | profile_used | status | steps | ||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| trekplan-synthetic | 1.7 | 2026-05-09 | Add --verbose flag to CLI | verbose-flag | economy-2 | economy | parked-synthetic |
|
Synthetic plan run economy-2 — Add --verbose flag to CLI (PARKED)
Companion fixture to profile-plan-run-economy-1.md. Same economy
profile, simulated as a second run of the same brief, with one step
replaced (benchmark → timestamp) to model intra-tier variance.
See profile-plan-run-economy-1.md for full parked-synthetic rationale.
Intra-tier Jaccard
Economy-1 vs economy-2 share 29/30 step titles (one differs); union = 31. Jaccard = 29/31 ≈ 0.935 — well above any reasonable cross-tier floor. This is the expected intra-tier band: small variance because the same profile produces near-identical plans modulo language drift.
When real LLM-budget runs replace this synthetic, the empirical
intra-tier Jaccard is expected to land in the 0.85–0.95 band per
research/02. Cross-tier (economy vs premium) is the discriminating
measurement and is documented in profile-jaccard-calibration.md.