test(ultraplan-local): add synthetic ultrareview determinism fixtures

review-run-A.md (5 findings) and review-run-B.md (6 findings, A ⊂ B) form a known-Jaccard fixture pair: |A ∩ B| = 5, |A ∪ B| = 6, Jaccard = 5/6 = 0.833, above the SC4 threshold of 0.70. IDs are real 40-char SHA1s computed via lib/parsers/finding-id.mjs from realistic (file, line, rule_key) triplets. Both fixtures pass review-validator --strict (frontmatter + body sections + findings shape). Real-LLM determinism measurement deferred to v1.1. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 17:21:02 +02:00 · 2026-05-01 17:21:02 +02:00 · 5aa37941ed
commit 5aa37941ed
parent 09e7fb9364
3 changed files with 270 additions and 0 deletions
--- a/plugins/ultraplan-local/tests/fixtures/ultrareview/README.md
+++ b/plugins/ultraplan-local/tests/fixtures/ultrareview/README.md
@ -0,0 +1,47 @@
+# ultrareview determinism fixtures
+
+Synthetic fixtures for the Jaccard-similarity determinism test in
+`tests/lib/review-determinism.test.mjs`.
+
+## What's here
+
+- `review-run-A.md` — synthetic review with 5 findings on a fictional JWT auth task
+- `review-run-B.md` — same fictional task, "re-reviewed" — same 5 findings as A plus 1 extra (a placeholder TODO that A missed)
+
+## Construction
+
+Run A's finding-IDs are a strict subset of Run B's (`A ⊂ B`), so:
+
+- Intersection: `|A ∩ B| = 5`
+- Union: `|A ∪ B| = 6`
+- Jaccard: `5 / 6 = 0.833…` (above the 0.70 SC4 threshold from `brief.md`)
+
+Each ID is a real 40-char SHA1 computed via `lib/parsers/finding-id.mjs`:
+`sha1(file:line:rule_key)`. Don't hand-edit the IDs — recompute via the helper if
+you change the underlying `(file, line, rule_key)` triplet, or both fixtures will
+fall out of sync.
+
+## Why synthetic for v1.0
+
+Hand-curated for v1.0. Edit JSON IDs directly to test new Jaccard scenarios.
+Real-LLM determinism measurement is deferred to v1.1 once `/ultrareview-local`
+has produced enough real outputs to capture as fixtures.
+
+These fixtures prove the Jaccard PIPELINE works given a known input — they do
+NOT measure real LLM determinism. The brief's SC4 (Jaccard ≥ 0.70 across two
+runs) is verified at the pipeline level today; capturing real LLM runs to
+verify the model-level claim is open work for v1.1.
+
+## Adding a new scenario
+
+1. Pick `(file, line, rule_key)` triplets — `rule_key` must be one of the 12
+   keys in `lib/review/rule-catalogue.mjs`.
+2. Compute IDs via:
+   ```bash
+   node -e "import('./lib/parsers/finding-id.mjs').then(({computeFindingId}) => console.log(computeFindingId('lib/foo.mjs', 42, 'SECURITY_INJECTION')))"
+   ```
+3. Add the IDs to `findings:` block-style YAML in frontmatter and to `### <id>`
+   subsections in the body.
+4. Run `node lib/validators/review-validator.mjs --json tests/fixtures/ultrareview/review-run-X.md`
+   to confirm the fixture validates.
+5. Update `tests/lib/review-determinism.test.mjs` if you want a new assertion.