test(ultraplan-local): add synthetic ultrareview determinism fixtures

review-run-A.md (5 findings) and review-run-B.md (6 findings, A ⊂ B) form a
known-Jaccard fixture pair: |A ∩ B| = 5, |A ∪ B| = 6, Jaccard = 5/6 = 0.833,
above the SC4 threshold of 0.70. IDs are real 40-char SHA1s computed via
lib/parsers/finding-id.mjs from realistic (file, line, rule_key) triplets.

Both fixtures pass review-validator --strict (frontmatter + body sections +
findings shape). Real-LLM determinism measurement deferred to v1.1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Kjell Tore Guttormsen 2026-05-01 17:21:02 +02:00
commit 5aa37941ed
3 changed files with 270 additions and 0 deletions

View file

@ -0,0 +1,47 @@
# ultrareview determinism fixtures
Synthetic fixtures for the Jaccard-similarity determinism test in
`tests/lib/review-determinism.test.mjs`.
## What's here
- `review-run-A.md` — synthetic review with 5 findings on a fictional JWT auth task
- `review-run-B.md` — same fictional task, "re-reviewed" — same 5 findings as A plus 1 extra (a placeholder TODO that A missed)
## Construction
Run A's finding-IDs are a strict subset of Run B's (`A ⊂ B`), so:
- Intersection: `|A ∩ B| = 5`
- Union: `|A B| = 6`
- Jaccard: `5 / 6 = 0.833…` (above the 0.70 SC4 threshold from `brief.md`)
Each ID is a real 40-char SHA1 computed via `lib/parsers/finding-id.mjs`:
`sha1(file:line:rule_key)`. Don't hand-edit the IDs — recompute via the helper if
you change the underlying `(file, line, rule_key)` triplet, or both fixtures will
fall out of sync.
## Why synthetic for v1.0
Hand-curated for v1.0. Edit JSON IDs directly to test new Jaccard scenarios.
Real-LLM determinism measurement is deferred to v1.1 once `/ultrareview-local`
has produced enough real outputs to capture as fixtures.
These fixtures prove the Jaccard PIPELINE works given a known input — they do
NOT measure real LLM determinism. The brief's SC4 (Jaccard ≥ 0.70 across two
runs) is verified at the pipeline level today; capturing real LLM runs to
verify the model-level claim is open work for v1.1.
## Adding a new scenario
1. Pick `(file, line, rule_key)` triplets — `rule_key` must be one of the 12
keys in `lib/review/rule-catalogue.mjs`.
2. Compute IDs via:
```bash
node -e "import('./lib/parsers/finding-id.mjs').then(({computeFindingId}) => console.log(computeFindingId('lib/foo.mjs', 42, 'SECURITY_INJECTION')))"
```
3. Add the IDs to `findings:` block-style YAML in frontmatter and to `### <id>`
subsections in the body.
4. Run `node lib/validators/review-validator.mjs --json tests/fixtures/ultrareview/review-run-X.md`
to confirm the fixture validates.
5. Update `tests/lib/review-determinism.test.mjs` if you want a new assertion.

View file

@ -0,0 +1,106 @@
---
type: ultrareview
review_version: "1.0"
created: 2026-05-01
task: "Add JWT authentication with refresh-token rotation"
slug: jwt-auth
project_dir: .claude/projects/2026-05-01-jwt-auth/
brief_path: .claude/projects/2026-05-01-jwt-auth/brief.md
scope_sha_start: 0123456789abcdef0123456789abcdef01234567
scope_sha_end: fedcba9876543210fedcba9876543210fedcba98
reviewed_files_count: 3
verdict: WARN
findings:
- d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232
- 7861519c326c207aabf17072db51c469bebc217b
- 400dfcff81e0e219eb04a7123c68ae870696f121
- 763d174e6c519fafbadcba5d1706708479e36e61
- 7a3d7d0a668f6431ef3877ceeb106023b0f6295e
---
# Review: Add JWT authentication with refresh-token rotation (Run A)
## Executive Summary
Implementation hits the brief's core success criteria (login + refresh + logout) but
has one BLOCKER and four MAJOR/MINOR issues. Verdict: **WARN** — fix the BLOCKER
before merge; the MAJORs should land in a follow-up plan.
This is a SYNTHETIC v1.0 fixture for testing the Jaccard determinism pipeline. It is
NOT the output of a real LLM review.
## Coverage
| File | Treatment | Reason |
|---|---|---|
| `lib/auth/jwt.mjs` | deep-review | Security-critical (token signing/verification) |
| `lib/handlers/login.mjs` | deep-review | Auth surface |
| `lib/handlers/logout.mjs` | deep-review | Auth surface |
| `package-lock.json` | skip | Lockfile |
| `dist/**` | skip | Build output |
## Findings (BLOCKER)
### 763d174e6c519fafbadcba5d1706708479e36e61
- **Location:** `lib/handlers/login.mjs:23`
- **Rule:** `UNIMPLEMENTED_CRITERION`
- **Brief ref:** SC-2 ("login endpoint MUST return 401 on invalid credentials")
- **Evidence:** Handler returns 200 with empty body when password mismatch occurs.
- **Fix:** Return 401 with WWW-Authenticate header per brief SC-2.
## Findings (MAJOR)
### d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232
- **Location:** `lib/auth/jwt.mjs:42`
- **Rule:** `SECURITY_INJECTION`
- **Brief ref:** Non-Goal #3 ("must not accept user-supplied algorithm header")
- **Evidence:** `jwt.verify(token, secret, { algorithms: req.body.alg })` — algorithm taken from request body.
- **Fix:** Hard-code `algorithms: ['RS256']`; reject any token claiming a different alg.
### 7861519c326c207aabf17072db51c469bebc217b
- **Location:** `lib/auth/jwt.mjs:88`
- **Rule:** `MISSING_TEST`
- **Brief ref:** SC-4 ("refresh-token rotation must be tested under concurrent refresh")
- **Evidence:** No test in `tests/` covers the concurrent-refresh path; only happy-path is exercised.
- **Fix:** Add `tests/auth/concurrent-refresh.test.mjs` covering the race window.
### 7a3d7d0a668f6431ef3877ceeb106023b0f6295e
- **Location:** `lib/handlers/login.mjs:56`
- **Rule:** `PLAN_EXECUTE_DRIFT`
- **Brief ref:** Plan Step 4 ("login.mjs uses bcrypt.compare()")
- **Evidence:** Plan said `bcrypt.compare`; implementation uses `crypto.timingSafeEqual` over plaintext-derived buffers.
- **Fix:** Either update plan + brief to record the deviation or refactor to bcrypt.compare per plan.
## Findings (MINOR)
### 400dfcff81e0e219eb04a7123c68ae870696f121
- **Location:** `lib/auth/jwt.mjs:117`
- **Rule:** `MISSING_ERROR_HANDLING`
- **Brief ref:** none (engineering hygiene)
- **Evidence:** `await refreshTokenStore.delete(jti)` is not wrapped — store-down throws bubble to top-level handler.
- **Fix:** Wrap in try/catch; log + 503 on store failure.
## Remediation Summary
5 findings total: 1 BLOCKER, 3 MAJOR, 1 MINOR. Run a remediation plan via
`/ultraplan-local --brief review.md` — it will pick up BLOCKER + MAJOR findings as
plan goals and emit `source_findings: [<id>, ...]` audit trail (Handover 6).
```json
{
"fixture_kind": "synthetic-v1.0",
"jaccard_with_run_B": "5/6 = 0.833",
"findings": [
{"id": "763d174e6c519fafbadcba5d1706708479e36e61", "severity": "BLOCKER", "rule": "UNIMPLEMENTED_CRITERION", "file": "lib/handlers/login.mjs", "line": 23},
{"id": "d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232", "severity": "MAJOR", "rule": "SECURITY_INJECTION", "file": "lib/auth/jwt.mjs", "line": 42},
{"id": "7861519c326c207aabf17072db51c469bebc217b", "severity": "MAJOR", "rule": "MISSING_TEST", "file": "lib/auth/jwt.mjs", "line": 88},
{"id": "7a3d7d0a668f6431ef3877ceeb106023b0f6295e", "severity": "MAJOR", "rule": "PLAN_EXECUTE_DRIFT", "file": "lib/handlers/login.mjs", "line": 56},
{"id": "400dfcff81e0e219eb04a7123c68ae870696f121", "severity": "MINOR", "rule": "MISSING_ERROR_HANDLING", "file": "lib/auth/jwt.mjs", "line": 117}
]
}
```

View file

@ -0,0 +1,117 @@
---
type: ultrareview
review_version: "1.0"
created: 2026-05-01
task: "Add JWT authentication with refresh-token rotation"
slug: jwt-auth
project_dir: .claude/projects/2026-05-01-jwt-auth/
brief_path: .claude/projects/2026-05-01-jwt-auth/brief.md
scope_sha_start: 0123456789abcdef0123456789abcdef01234567
scope_sha_end: fedcba9876543210fedcba9876543210fedcba98
reviewed_files_count: 3
verdict: WARN
findings:
- d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232
- 7861519c326c207aabf17072db51c469bebc217b
- 400dfcff81e0e219eb04a7123c68ae870696f121
- 763d174e6c519fafbadcba5d1706708479e36e61
- 7a3d7d0a668f6431ef3877ceeb106023b0f6295e
- bf3e8b347cf4269ad005a9cf64dab6f601345704
---
# Review: Add JWT authentication with refresh-token rotation (Run B)
## Executive Summary
Same diff as Run A, re-reviewed independently to test determinism. This run found
the same 5 findings plus one extra (a placeholder TODO in logout.mjs that Run A
missed). Verdict: **WARN** — same as Run A; the extra finding is MAJOR but does
not change the merge gate.
This is a SYNTHETIC v1.0 fixture for testing the Jaccard determinism pipeline.
Run A's set is a strict subset of Run B's set, giving Jaccard = 5/6 = 0.833.
## Coverage
| File | Treatment | Reason |
|---|---|---|
| `lib/auth/jwt.mjs` | deep-review | Security-critical (token signing/verification) |
| `lib/handlers/login.mjs` | deep-review | Auth surface |
| `lib/handlers/logout.mjs` | deep-review | Auth surface |
| `package-lock.json` | skip | Lockfile |
| `dist/**` | skip | Build output |
## Findings (BLOCKER)
### 763d174e6c519fafbadcba5d1706708479e36e61
- **Location:** `lib/handlers/login.mjs:23`
- **Rule:** `UNIMPLEMENTED_CRITERION`
- **Brief ref:** SC-2 ("login endpoint MUST return 401 on invalid credentials")
- **Evidence:** Handler returns 200 with empty body when password mismatch occurs.
- **Fix:** Return 401 with WWW-Authenticate header per brief SC-2.
## Findings (MAJOR)
### d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232
- **Location:** `lib/auth/jwt.mjs:42`
- **Rule:** `SECURITY_INJECTION`
- **Brief ref:** Non-Goal #3 ("must not accept user-supplied algorithm header")
- **Evidence:** `jwt.verify(token, secret, { algorithms: req.body.alg })` — algorithm taken from request body.
- **Fix:** Hard-code `algorithms: ['RS256']`; reject any token claiming a different alg.
### 7861519c326c207aabf17072db51c469bebc217b
- **Location:** `lib/auth/jwt.mjs:88`
- **Rule:** `MISSING_TEST`
- **Brief ref:** SC-4 ("refresh-token rotation must be tested under concurrent refresh")
- **Evidence:** No test in `tests/` covers the concurrent-refresh path; only happy-path is exercised.
- **Fix:** Add `tests/auth/concurrent-refresh.test.mjs` covering the race window.
### 7a3d7d0a668f6431ef3877ceeb106023b0f6295e
- **Location:** `lib/handlers/login.mjs:56`
- **Rule:** `PLAN_EXECUTE_DRIFT`
- **Brief ref:** Plan Step 4 ("login.mjs uses bcrypt.compare()")
- **Evidence:** Plan said `bcrypt.compare`; implementation uses `crypto.timingSafeEqual` over plaintext-derived buffers.
- **Fix:** Either update plan + brief to record the deviation or refactor to bcrypt.compare per plan.
### bf3e8b347cf4269ad005a9cf64dab6f601345704
- **Location:** `lib/handlers/logout.mjs:14`
- **Rule:** `PLACEHOLDER_IN_CODE`
- **Brief ref:** none (Rule 7a violation)
- **Evidence:** `// TODO: invalidate refresh-token cookie before responding` — left in committed code.
- **Fix:** Implement the cookie invalidation or remove the comment with an issue link.
## Findings (MINOR)
### 400dfcff81e0e219eb04a7123c68ae870696f121
- **Location:** `lib/auth/jwt.mjs:117`
- **Rule:** `MISSING_ERROR_HANDLING`
- **Brief ref:** none (engineering hygiene)
- **Evidence:** `await refreshTokenStore.delete(jti)` is not wrapped — store-down throws bubble to top-level handler.
- **Fix:** Wrap in try/catch; log + 503 on store failure.
## Remediation Summary
6 findings total: 1 BLOCKER, 4 MAJOR, 1 MINOR. Same merge gate as Run A; one
extra MAJOR (PLACEHOLDER_IN_CODE) that Run A missed. Run a remediation plan via
`/ultraplan-local --brief review.md`.
```json
{
"fixture_kind": "synthetic-v1.0",
"jaccard_with_run_A": "5/6 = 0.833",
"findings": [
{"id": "763d174e6c519fafbadcba5d1706708479e36e61", "severity": "BLOCKER", "rule": "UNIMPLEMENTED_CRITERION", "file": "lib/handlers/login.mjs", "line": 23},
{"id": "d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232", "severity": "MAJOR", "rule": "SECURITY_INJECTION", "file": "lib/auth/jwt.mjs", "line": 42},
{"id": "7861519c326c207aabf17072db51c469bebc217b", "severity": "MAJOR", "rule": "MISSING_TEST", "file": "lib/auth/jwt.mjs", "line": 88},
{"id": "7a3d7d0a668f6431ef3877ceeb106023b0f6295e", "severity": "MAJOR", "rule": "PLAN_EXECUTE_DRIFT", "file": "lib/handlers/login.mjs", "line": 56},
{"id": "bf3e8b347cf4269ad005a9cf64dab6f601345704", "severity": "MAJOR", "rule": "PLACEHOLDER_IN_CODE", "file": "lib/handlers/logout.mjs", "line": 14},
{"id": "400dfcff81e0e219eb04a7123c68ae870696f121", "severity": "MINOR", "rule": "MISSING_ERROR_HANDLING", "file": "lib/auth/jwt.mjs", "line": 117}
]
}
```