test(ultraplan-local): add synthetic ultrareview determinism fixtures

review-run-A.md (5 findings) and review-run-B.md (6 findings, A ⊂ B) form a known-Jaccard fixture pair: |A ∩ B| = 5, |A ∪ B| = 6, Jaccard = 5/6 = 0.833, above the SC4 threshold of 0.70. IDs are real 40-char SHA1s computed via lib/parsers/finding-id.mjs from realistic (file, line, rule_key) triplets. Both fixtures pass review-validator --strict (frontmatter + body sections + findings shape). Real-LLM determinism measurement deferred to v1.1. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 17:21:02 +02:00 · 2026-05-01 17:21:02 +02:00 · 5aa37941ed
commit 5aa37941ed
parent 09e7fb9364
3 changed files with 270 additions and 0 deletions
--- a/plugins/ultraplan-local/tests/fixtures/ultrareview/README.md
+++ b/plugins/ultraplan-local/tests/fixtures/ultrareview/README.md
@ -0,0 +1,47 @@
+# ultrareview determinism fixtures
+
+Synthetic fixtures for the Jaccard-similarity determinism test in
+`tests/lib/review-determinism.test.mjs`.
+
+## What's here
+
+- `review-run-A.md` — synthetic review with 5 findings on a fictional JWT auth task
+- `review-run-B.md` — same fictional task, "re-reviewed" — same 5 findings as A plus 1 extra (a placeholder TODO that A missed)
+
+## Construction
+
+Run A's finding-IDs are a strict subset of Run B's (`A ⊂ B`), so:
+
+- Intersection: `|A ∩ B| = 5`
+- Union: `|A ∪ B| = 6`
+- Jaccard: `5 / 6 = 0.833…` (above the 0.70 SC4 threshold from `brief.md`)
+
+Each ID is a real 40-char SHA1 computed via `lib/parsers/finding-id.mjs`:
+`sha1(file:line:rule_key)`. Don't hand-edit the IDs — recompute via the helper if
+you change the underlying `(file, line, rule_key)` triplet, or both fixtures will
+fall out of sync.
+
+## Why synthetic for v1.0
+
+Hand-curated for v1.0. Edit JSON IDs directly to test new Jaccard scenarios.
+Real-LLM determinism measurement is deferred to v1.1 once `/ultrareview-local`
+has produced enough real outputs to capture as fixtures.
+
+These fixtures prove the Jaccard PIPELINE works given a known input — they do
+NOT measure real LLM determinism. The brief's SC4 (Jaccard ≥ 0.70 across two
+runs) is verified at the pipeline level today; capturing real LLM runs to
+verify the model-level claim is open work for v1.1.
+
+## Adding a new scenario
+
+1. Pick `(file, line, rule_key)` triplets — `rule_key` must be one of the 12
+   keys in `lib/review/rule-catalogue.mjs`.
+2. Compute IDs via:
+   ```bash
+   node -e "import('./lib/parsers/finding-id.mjs').then(({computeFindingId}) => console.log(computeFindingId('lib/foo.mjs', 42, 'SECURITY_INJECTION')))"
+   ```
+3. Add the IDs to `findings:` block-style YAML in frontmatter and to `### <id>`
+   subsections in the body.
+4. Run `node lib/validators/review-validator.mjs --json tests/fixtures/ultrareview/review-run-X.md`
+   to confirm the fixture validates.
+5. Update `tests/lib/review-determinism.test.mjs` if you want a new assertion.
--- a/plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-A.md
+++ b/plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-A.md
@ -0,0 +1,106 @@
+---
+type: ultrareview
+review_version: "1.0"
+created: 2026-05-01
+task: "Add JWT authentication with refresh-token rotation"
+slug: jwt-auth
+project_dir: .claude/projects/2026-05-01-jwt-auth/
+brief_path: .claude/projects/2026-05-01-jwt-auth/brief.md
+scope_sha_start: 0123456789abcdef0123456789abcdef01234567
+scope_sha_end: fedcba9876543210fedcba9876543210fedcba98
+reviewed_files_count: 3
+verdict: WARN
+findings:
+  - d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232
+  - 7861519c326c207aabf17072db51c469bebc217b
+  - 400dfcff81e0e219eb04a7123c68ae870696f121
+  - 763d174e6c519fafbadcba5d1706708479e36e61
+  - 7a3d7d0a668f6431ef3877ceeb106023b0f6295e
+---
+
+# Review: Add JWT authentication with refresh-token rotation (Run A)
+
+## Executive Summary
+
+Implementation hits the brief's core success criteria (login + refresh + logout) but
+has one BLOCKER and four MAJOR/MINOR issues. Verdict: **WARN** — fix the BLOCKER
+before merge; the MAJORs should land in a follow-up plan.
+
+This is a SYNTHETIC v1.0 fixture for testing the Jaccard determinism pipeline. It is
+NOT the output of a real LLM review.
+
+## Coverage
+
+| File | Treatment | Reason |
+|---|---|---|
+| `lib/auth/jwt.mjs` | deep-review | Security-critical (token signing/verification) |
+| `lib/handlers/login.mjs` | deep-review | Auth surface |
+| `lib/handlers/logout.mjs` | deep-review | Auth surface |
+| `package-lock.json` | skip | Lockfile |
+| `dist/**` | skip | Build output |
+
+## Findings (BLOCKER)
+
+### 763d174e6c519fafbadcba5d1706708479e36e61
+
+- **Location:** `lib/handlers/login.mjs:23`
+- **Rule:** `UNIMPLEMENTED_CRITERION`
+- **Brief ref:** SC-2 ("login endpoint MUST return 401 on invalid credentials")
+- **Evidence:** Handler returns 200 with empty body when password mismatch occurs.
+- **Fix:** Return 401 with WWW-Authenticate header per brief SC-2.
+
+## Findings (MAJOR)
+
+### d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232
+
+- **Location:** `lib/auth/jwt.mjs:42`
+- **Rule:** `SECURITY_INJECTION`
+- **Brief ref:** Non-Goal #3 ("must not accept user-supplied algorithm header")
+- **Evidence:** `jwt.verify(token, secret, { algorithms: req.body.alg })` — algorithm taken from request body.
+- **Fix:** Hard-code `algorithms: ['RS256']`; reject any token claiming a different alg.
+
+### 7861519c326c207aabf17072db51c469bebc217b
+
+- **Location:** `lib/auth/jwt.mjs:88`
+- **Rule:** `MISSING_TEST`
+- **Brief ref:** SC-4 ("refresh-token rotation must be tested under concurrent refresh")
+- **Evidence:** No test in `tests/` covers the concurrent-refresh path; only happy-path is exercised.
+- **Fix:** Add `tests/auth/concurrent-refresh.test.mjs` covering the race window.
+
+### 7a3d7d0a668f6431ef3877ceeb106023b0f6295e
+
+- **Location:** `lib/handlers/login.mjs:56`
+- **Rule:** `PLAN_EXECUTE_DRIFT`
+- **Brief ref:** Plan Step 4 ("login.mjs uses bcrypt.compare()")
+- **Evidence:** Plan said `bcrypt.compare`; implementation uses `crypto.timingSafeEqual` over plaintext-derived buffers.
+- **Fix:** Either update plan + brief to record the deviation or refactor to bcrypt.compare per plan.
+
+## Findings (MINOR)
+
+### 400dfcff81e0e219eb04a7123c68ae870696f121
+
+- **Location:** `lib/auth/jwt.mjs:117`
+- **Rule:** `MISSING_ERROR_HANDLING`
+- **Brief ref:** none (engineering hygiene)
+- **Evidence:** `await refreshTokenStore.delete(jti)` is not wrapped — store-down throws bubble to top-level handler.
+- **Fix:** Wrap in try/catch; log + 503 on store failure.
+
+## Remediation Summary
+
+5 findings total: 1 BLOCKER, 3 MAJOR, 1 MINOR. Run a remediation plan via
+`/ultraplan-local --brief review.md` — it will pick up BLOCKER + MAJOR findings as
+plan goals and emit `source_findings: [<id>, ...]` audit trail (Handover 6).
+
+```json
+{
+  "fixture_kind": "synthetic-v1.0",
+  "jaccard_with_run_B": "5/6 = 0.833",
+  "findings": [
+    {"id": "763d174e6c519fafbadcba5d1706708479e36e61", "severity": "BLOCKER", "rule": "UNIMPLEMENTED_CRITERION", "file": "lib/handlers/login.mjs", "line": 23},
+    {"id": "d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232", "severity": "MAJOR", "rule": "SECURITY_INJECTION", "file": "lib/auth/jwt.mjs", "line": 42},
+    {"id": "7861519c326c207aabf17072db51c469bebc217b", "severity": "MAJOR", "rule": "MISSING_TEST", "file": "lib/auth/jwt.mjs", "line": 88},
+    {"id": "7a3d7d0a668f6431ef3877ceeb106023b0f6295e", "severity": "MAJOR", "rule": "PLAN_EXECUTE_DRIFT", "file": "lib/handlers/login.mjs", "line": 56},
+    {"id": "400dfcff81e0e219eb04a7123c68ae870696f121", "severity": "MINOR", "rule": "MISSING_ERROR_HANDLING", "file": "lib/auth/jwt.mjs", "line": 117}
+  ]
+}
+```
--- a/plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-B.md
+++ b/plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-B.md
@ -0,0 +1,117 @@
+---
+type: ultrareview
+review_version: "1.0"
+created: 2026-05-01
+task: "Add JWT authentication with refresh-token rotation"
+slug: jwt-auth
+project_dir: .claude/projects/2026-05-01-jwt-auth/
+brief_path: .claude/projects/2026-05-01-jwt-auth/brief.md
+scope_sha_start: 0123456789abcdef0123456789abcdef01234567
+scope_sha_end: fedcba9876543210fedcba9876543210fedcba98
+reviewed_files_count: 3
+verdict: WARN
+findings:
+  - d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232
+  - 7861519c326c207aabf17072db51c469bebc217b
+  - 400dfcff81e0e219eb04a7123c68ae870696f121
+  - 763d174e6c519fafbadcba5d1706708479e36e61
+  - 7a3d7d0a668f6431ef3877ceeb106023b0f6295e
+  - bf3e8b347cf4269ad005a9cf64dab6f601345704
+---
+
+# Review: Add JWT authentication with refresh-token rotation (Run B)
+
+## Executive Summary
+
+Same diff as Run A, re-reviewed independently to test determinism. This run found
+the same 5 findings plus one extra (a placeholder TODO in logout.mjs that Run A
+missed). Verdict: **WARN** — same as Run A; the extra finding is MAJOR but does
+not change the merge gate.
+
+This is a SYNTHETIC v1.0 fixture for testing the Jaccard determinism pipeline.
+Run A's set is a strict subset of Run B's set, giving Jaccard = 5/6 = 0.833.
+
+## Coverage
+
+| File | Treatment | Reason |
+|---|---|---|
+| `lib/auth/jwt.mjs` | deep-review | Security-critical (token signing/verification) |
+| `lib/handlers/login.mjs` | deep-review | Auth surface |
+| `lib/handlers/logout.mjs` | deep-review | Auth surface |
+| `package-lock.json` | skip | Lockfile |
+| `dist/**` | skip | Build output |
+
+## Findings (BLOCKER)
+
+### 763d174e6c519fafbadcba5d1706708479e36e61
+
+- **Location:** `lib/handlers/login.mjs:23`
+- **Rule:** `UNIMPLEMENTED_CRITERION`
+- **Brief ref:** SC-2 ("login endpoint MUST return 401 on invalid credentials")
+- **Evidence:** Handler returns 200 with empty body when password mismatch occurs.
+- **Fix:** Return 401 with WWW-Authenticate header per brief SC-2.
+
+## Findings (MAJOR)
+
+### d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232
+
+- **Location:** `lib/auth/jwt.mjs:42`
+- **Rule:** `SECURITY_INJECTION`
+- **Brief ref:** Non-Goal #3 ("must not accept user-supplied algorithm header")
+- **Evidence:** `jwt.verify(token, secret, { algorithms: req.body.alg })` — algorithm taken from request body.
+- **Fix:** Hard-code `algorithms: ['RS256']`; reject any token claiming a different alg.
+
+### 7861519c326c207aabf17072db51c469bebc217b
+
+- **Location:** `lib/auth/jwt.mjs:88`
+- **Rule:** `MISSING_TEST`
+- **Brief ref:** SC-4 ("refresh-token rotation must be tested under concurrent refresh")
+- **Evidence:** No test in `tests/` covers the concurrent-refresh path; only happy-path is exercised.
+- **Fix:** Add `tests/auth/concurrent-refresh.test.mjs` covering the race window.
+
+### 7a3d7d0a668f6431ef3877ceeb106023b0f6295e
+
+- **Location:** `lib/handlers/login.mjs:56`
+- **Rule:** `PLAN_EXECUTE_DRIFT`
+- **Brief ref:** Plan Step 4 ("login.mjs uses bcrypt.compare()")
+- **Evidence:** Plan said `bcrypt.compare`; implementation uses `crypto.timingSafeEqual` over plaintext-derived buffers.
+- **Fix:** Either update plan + brief to record the deviation or refactor to bcrypt.compare per plan.
+
+### bf3e8b347cf4269ad005a9cf64dab6f601345704
+
+- **Location:** `lib/handlers/logout.mjs:14`
+- **Rule:** `PLACEHOLDER_IN_CODE`
+- **Brief ref:** none (Rule 7a violation)
+- **Evidence:** `// TODO: invalidate refresh-token cookie before responding` — left in committed code.
+- **Fix:** Implement the cookie invalidation or remove the comment with an issue link.
+
+## Findings (MINOR)
+
+### 400dfcff81e0e219eb04a7123c68ae870696f121
+
+- **Location:** `lib/auth/jwt.mjs:117`
+- **Rule:** `MISSING_ERROR_HANDLING`
+- **Brief ref:** none (engineering hygiene)
+- **Evidence:** `await refreshTokenStore.delete(jti)` is not wrapped — store-down throws bubble to top-level handler.
+- **Fix:** Wrap in try/catch; log + 503 on store failure.
+
+## Remediation Summary
+
+6 findings total: 1 BLOCKER, 4 MAJOR, 1 MINOR. Same merge gate as Run A; one
+extra MAJOR (PLACEHOLDER_IN_CODE) that Run A missed. Run a remediation plan via
+`/ultraplan-local --brief review.md`.
+
+```json
+{
+  "fixture_kind": "synthetic-v1.0",
+  "jaccard_with_run_A": "5/6 = 0.833",
+  "findings": [
+    {"id": "763d174e6c519fafbadcba5d1706708479e36e61", "severity": "BLOCKER", "rule": "UNIMPLEMENTED_CRITERION", "file": "lib/handlers/login.mjs", "line": 23},
+    {"id": "d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232", "severity": "MAJOR", "rule": "SECURITY_INJECTION", "file": "lib/auth/jwt.mjs", "line": 42},
+    {"id": "7861519c326c207aabf17072db51c469bebc217b", "severity": "MAJOR", "rule": "MISSING_TEST", "file": "lib/auth/jwt.mjs", "line": 88},
+    {"id": "7a3d7d0a668f6431ef3877ceeb106023b0f6295e", "severity": "MAJOR", "rule": "PLAN_EXECUTE_DRIFT", "file": "lib/handlers/login.mjs", "line": 56},
+    {"id": "bf3e8b347cf4269ad005a9cf64dab6f601345704", "severity": "MAJOR", "rule": "PLACEHOLDER_IN_CODE", "file": "lib/handlers/logout.mjs", "line": 14},
+    {"id": "400dfcff81e0e219eb04a7123c68ae870696f121", "severity": "MINOR", "rule": "MISSING_ERROR_HANDLING", "file": "lib/auth/jwt.mjs", "line": 117}
+  ]
+}
+```