ktg-plugin-marketplace/plugins/ultraplan-local/tests/fixtures/ultrareview
Kjell Tore Guttormsen ea715b65de test(ultraplan-local): add SC3(b) source_findings structural test
Synthetic plan.md fixture with source_findings: block-style YAML list of 3
40-char hex IDs in frontmatter, plus minimal plan structure (Title +
Implementation Plan + 1 Step + Manifest). 3 tests verify:

1. plan-validator accepts a plan with source_findings (additive optional field)
2. frontmatter parser extracts source_findings as array of strings
3. each ID matches the 40-char lowercase hex format from finding-id.mjs

Closes the SC3(b) gap flagged by adversarial review (scope-guardian Gap 2).
LLM-level behavior (planner emitting source_findings) remains non-testable
without live invocation; this covers the structural contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 17:23:30 +02:00
..
plan-with-source-findings.md test(ultraplan-local): add SC3(b) source_findings structural test 2026-05-01 17:23:30 +02:00
README.md test(ultraplan-local): add synthetic ultrareview determinism fixtures 2026-05-01 17:21:02 +02:00
review-run-A.md test(ultraplan-local): add synthetic ultrareview determinism fixtures 2026-05-01 17:21:02 +02:00
review-run-B.md test(ultraplan-local): add synthetic ultrareview determinism fixtures 2026-05-01 17:21:02 +02:00

ultrareview determinism fixtures

Synthetic fixtures for the Jaccard-similarity determinism test in tests/lib/review-determinism.test.mjs.

What's here

  • review-run-A.md — synthetic review with 5 findings on a fictional JWT auth task
  • review-run-B.md — same fictional task, "re-reviewed" — same 5 findings as A plus 1 extra (a placeholder TODO that A missed)

Construction

Run A's finding-IDs are a strict subset of Run B's (A ⊂ B), so:

  • Intersection: |A ∩ B| = 5
  • Union: |A B| = 6
  • Jaccard: 5 / 6 = 0.833… (above the 0.70 SC4 threshold from brief.md)

Each ID is a real 40-char SHA1 computed via lib/parsers/finding-id.mjs: sha1(file:line:rule_key). Don't hand-edit the IDs — recompute via the helper if you change the underlying (file, line, rule_key) triplet, or both fixtures will fall out of sync.

Why synthetic for v1.0

Hand-curated for v1.0. Edit JSON IDs directly to test new Jaccard scenarios. Real-LLM determinism measurement is deferred to v1.1 once /ultrareview-local has produced enough real outputs to capture as fixtures.

These fixtures prove the Jaccard PIPELINE works given a known input — they do NOT measure real LLM determinism. The brief's SC4 (Jaccard ≥ 0.70 across two runs) is verified at the pipeline level today; capturing real LLM runs to verify the model-level claim is open work for v1.1.

Adding a new scenario

  1. Pick (file, line, rule_key) triplets — rule_key must be one of the 12 keys in lib/review/rule-catalogue.mjs.
  2. Compute IDs via:
    node -e "import('./lib/parsers/finding-id.mjs').then(({computeFindingId}) => console.log(computeFindingId('lib/foo.mjs', 42, 'SECURITY_INJECTION')))"
    
  3. Add the IDs to findings: block-style YAML in frontmatter and to ### <id> subsections in the body.
  4. Run node lib/validators/review-validator.mjs --json tests/fixtures/ultrareview/review-run-X.md to confirm the fixture validates.
  5. Update tests/lib/review-determinism.test.mjs if you want a new assertion.