Synthetic plan.md fixture with source_findings: block-style YAML list of 3 40-char hex IDs in frontmatter, plus minimal plan structure (Title + Implementation Plan + 1 Step + Manifest). 3 tests verify: 1. plan-validator accepts a plan with source_findings (additive optional field) 2. frontmatter parser extracts source_findings as array of strings 3. each ID matches the 40-char lowercase hex format from finding-id.mjs Closes the SC3(b) gap flagged by adversarial review (scope-guardian Gap 2). LLM-level behavior (planner emitting source_findings) remains non-testable without live invocation; this covers the structural contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| plan-with-source-findings.md | ||
| README.md | ||
| review-run-A.md | ||
| review-run-B.md | ||
ultrareview determinism fixtures
Synthetic fixtures for the Jaccard-similarity determinism test in
tests/lib/review-determinism.test.mjs.
What's here
review-run-A.md— synthetic review with 5 findings on a fictional JWT auth taskreview-run-B.md— same fictional task, "re-reviewed" — same 5 findings as A plus 1 extra (a placeholder TODO that A missed)
Construction
Run A's finding-IDs are a strict subset of Run B's (A ⊂ B), so:
- Intersection:
|A ∩ B| = 5 - Union:
|A ∪ B| = 6 - Jaccard:
5 / 6 = 0.833…(above the 0.70 SC4 threshold frombrief.md)
Each ID is a real 40-char SHA1 computed via lib/parsers/finding-id.mjs:
sha1(file:line:rule_key). Don't hand-edit the IDs — recompute via the helper if
you change the underlying (file, line, rule_key) triplet, or both fixtures will
fall out of sync.
Why synthetic for v1.0
Hand-curated for v1.0. Edit JSON IDs directly to test new Jaccard scenarios.
Real-LLM determinism measurement is deferred to v1.1 once /ultrareview-local
has produced enough real outputs to capture as fixtures.
These fixtures prove the Jaccard PIPELINE works given a known input — they do NOT measure real LLM determinism. The brief's SC4 (Jaccard ≥ 0.70 across two runs) is verified at the pipeline level today; capturing real LLM runs to verify the model-level claim is open work for v1.1.
Adding a new scenario
- Pick
(file, line, rule_key)triplets —rule_keymust be one of the 12 keys inlib/review/rule-catalogue.mjs. - Compute IDs via:
node -e "import('./lib/parsers/finding-id.mjs').then(({computeFindingId}) => console.log(computeFindingId('lib/foo.mjs', 42, 'SECURITY_INJECTION')))" - Add the IDs to
findings:block-style YAML in frontmatter and to### <id>subsections in the body. - Run
node lib/validators/review-validator.mjs --json tests/fixtures/ultrareview/review-run-X.mdto confirm the fixture validates. - Update
tests/lib/review-determinism.test.mjsif you want a new assertion.