Kjell Tore Guttormsen 90425073b2 test(voyage): empirical jaccard calibration — parked-synthetic placeholders + threshold pin

Step 17 of v4.1 — escalate-handler invoked. Live LLM-budget ($60-120 for
4 plan-runs á /trekplan --profile {economy,premium} on
examples/01-add-verbose-flag/brief.md) was not authorized for the
v4.1-execute-4b session.

Per Step 17 escalate-fallback (and NEXT-SESSION-PROMPT.local.md
fallback-strategy): document economy-Plan as parked, use balanced as
low-threshold profile, defer empirical calibration to v4.2.

Files:
  tests/synthetic/profile-plan-run-economy-1.md   — 30 steps, parked-synthetic
  tests/synthetic/profile-plan-run-economy-2.md   — 30 steps, parked-synthetic
  tests/synthetic/profile-plan-run-premium-1.md   — 40 steps, parked-synthetic
  tests/synthetic/profile-plan-run-premium-2.md   — 40 steps, parked-synthetic
  tests/synthetic/profile-jaccard-calibration.md  — threshold 0.55 pinned per
                                                    research/02 conservative starting value

Replacement procedure documented in calibration.md "How to replace"
section. Trigger conditions for empirical re-run:
  1. Cross-tier smoke-test (Step 18) flips red on a real run
  2. v4.2 LLM-budget approval
  3. New profile tier added

2026-05-09 09:54:45 +02:00

3.3 KiB

Raw Blame History

type

plan_version

created

task

slug

run_id

profile_used

status

steps

trekplan-synthetic

1.7

2026-05-09

Add --verbose flag to CLI

verbose-flag

premium-1

premium

parked-synthetic

Add config entry for verbose flag in package.json

Define types for verbose mode in types.ts

Update parseArgs to recognize --verbose flag

Pass verbose context through main entry point

Add log level enum (silent, normal, verbose)

Wire log level into logger module

Replace console.log with logger.info in handler.ts

Add tests for parseArgs --verbose recognition

Add tests for log level enum mapping

Update README with --verbose flag documentation

Add CHANGELOG entry for verbose flag

Bump package.json minor version

Add lint rule blocking direct console usage

Run lint and fix new violations

Add CLI integration test for --verbose end-to-end

Add fixture file for verbose log capture

Document verbose output format in docs/cli.md

Add jsdoc for new logger API

Verify all existing tests pass with verbose disabled

Add backward-compat test for legacy quiet behavior

Add edge-case test for repeated --verbose flags

Add edge-case test for --verbose with --silent collision

Update help text to list --verbose flag

Add usage example to docs/quickstart.md

Verify CI matrix runs on Node 18 and 20

Add npm script for verbose mode debugging

Run security audit on logger dependency tree

Verify no PII leaks in verbose log output

Add manual test checklist to CONTRIBUTING.md

Update .gitignore for verbose log dump files

Add cleanup logic for stale verbose logs

Add unit test for cleanup logic

Verify exit code on verbose mode error

Add stderr routing for warnings in verbose

Add timestamp prefix in verbose log lines

Add test for timestamp format

Update troubleshooting guide with verbose flag

Verify version sync across all docs

Add benchmark for verbose log emission cost

Document benchmark methodology in PERF.md

Synthetic plan run premium-1 — Add --verbose flag to CLI (PARKED)

This fixture is a SYNTHETIC PLACEHOLDER for empirical Jaccard calibration that requires live LLM-budget ($60-120 for 4 plan-runs). Marked status: parked-synthetic per the Step 17 escalate-handler.

Why parked

Same rationale as profile-plan-run-economy-1.md. The session running v4.1-execute-4b did not have authorization for live LLM invocation. This fixture mirrors the existing baseline plan-run-A.md (40 steps, opus granularity) since premium profile uses opus for plan and review phases per lib/profiles/premium.yaml.

Step-shape rationale

Premium profile uses opus for plan + review phases (per lib/profiles/premium.yaml). Empirical observation from research/02: opus plans tend toward finer-grained steps, more explicit verification entries, and richer edge-case decomposition than sonnet plans. The 40 entries here capture the level of detail typical of an opus run.

Cross-tier Jaccard pairing

Paired with profile-plan-run-economy-1.md and -economy-2.md in tests/integration/profile-jaccard-smoke.test.mjs (Step 18). Expected cross-tier Jaccard for the parked-synthetic run-pair is documented in profile-jaccard-calibration.md.

3.3 KiB Raw Blame History

Synthetic plan run premium-1 — Add --verbose flag to CLI (PARKED)

Why parked

Step-shape rationale

Cross-tier Jaccard pairing

3.3 KiB

Raw Blame History