ktg-plugin-marketplace/plugins/voyage/tests/synthetic/profile-plan-run-economy-1.md
Kjell Tore Guttormsen 90425073b2 test(voyage): empirical jaccard calibration — parked-synthetic placeholders + threshold pin
Step 17 of v4.1 — escalate-handler invoked. Live LLM-budget ($60-120 for
4 plan-runs á /trekplan --profile {economy,premium} on
examples/01-add-verbose-flag/brief.md) was not authorized for the
v4.1-execute-4b session.

Per Step 17 escalate-fallback (and NEXT-SESSION-PROMPT.local.md
fallback-strategy): document economy-Plan as parked, use balanced as
low-threshold profile, defer empirical calibration to v4.2.

Files:
  tests/synthetic/profile-plan-run-economy-1.md   — 30 steps, parked-synthetic
  tests/synthetic/profile-plan-run-economy-2.md   — 30 steps, parked-synthetic
  tests/synthetic/profile-plan-run-premium-1.md   — 40 steps, parked-synthetic
  tests/synthetic/profile-plan-run-premium-2.md   — 40 steps, parked-synthetic
  tests/synthetic/profile-jaccard-calibration.md  — threshold 0.55 pinned per
                                                    research/02 conservative starting value

Replacement procedure documented in calibration.md "How to replace"
section. Trigger conditions for empirical re-run:
  1. Cross-tier smoke-test (Step 18) flips red on a real run
  2. v4.2 LLM-budget approval
  3. New profile tier added
2026-05-09 09:54:45 +02:00

3 KiB

type plan_version created task slug run_id profile_used status steps
trekplan-synthetic 1.7 2026-05-09 Add --verbose flag to CLI verbose-flag economy-1 economy parked-synthetic
Add verbose flag config to package.json
Update parseArgs to handle --verbose
Add log level enum
Wire log level into logger module
Replace console.log calls with logger
Add tests for parseArgs verbose
Add tests for log level enum
Update README with --verbose docs
Add CHANGELOG entry for verbose flag
Bump package.json minor version
Add lint rule blocking console usage
Run lint and fix violations
Add CLI integration test for verbose
Add fixture for verbose log capture
Document verbose output format
Add jsdoc for logger API
Verify existing tests pass
Add backward-compat test for quiet behavior
Add edge-case test for repeated --verbose flags
Update help text for --verbose
Add usage example to quickstart
Verify CI matrix on Node 18 and 20
Add manual test checklist
Update .gitignore for log dumps
Add cleanup logic for stale logs
Verify exit code on verbose error
Add stderr routing for warnings
Update troubleshooting guide
Verify version sync across docs
Add benchmark for verbose emission

Synthetic plan run economy-1 — Add --verbose flag to CLI (PARKED)

This fixture is a SYNTHETIC PLACEHOLDER for empirical Jaccard calibration that requires live LLM-budget ($60-120 for 4 plan-runs). Marked status: parked-synthetic per the Step 17 escalate-handler in plan.md.

Why parked

Per NEXT-SESSION-PROMPT.local.md fallback: "Hvis Step 17 LLM-budget blokkerer: dokumentér economy-Plan som parked i kalibrasjons-fil og fortsett med Step 18-19 ved bruk av balanced som lavterskel-profil."

The session running v4.1-execute-4b did not have authorization for live LLM invocation against /trekplan --profile economy --brief examples/01-add-verbose-flag/brief.md. Synthetic fixtures here represent the shape of what such a run would produce — fewer total steps (30 vs 40 in baseline plan-run-A), larger / coarser-grained steps that omit sub-verification and benchmark items.

How this fixture is consumed

tests/integration/profile-jaccard-smoke.test.mjs (Step 18) reads the steps array from the frontmatter and pairs it with the corresponding premium fixtures to compute cross-tier Jaccard.

When real LLM budget is approved (deferred to v4.2), regenerate this fixture by running the actual command and overwriting the frontmatter steps array. Update status: parked-syntheticstatus: empirical.

Step-shape rationale

Economy profile uses sonnet for all phases (per lib/profiles/economy.yaml). Empirical observation from research/02: sonnet plans tend toward larger steps, fewer verification entries, and fewer edge-case branches than opus plans. The 30 entries here capture the typical gist + omit ~10 of the finer-grained items present in opus runs.