From 5aa37941edefcc1ff805a7a0ce36e4ff739ed013 Mon Sep 17 00:00:00 2001 From: Kjell Tore Guttormsen Date: Fri, 1 May 2026 17:21:02 +0200 Subject: [PATCH] test(ultraplan-local): add synthetic ultrareview determinism fixtures MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit review-run-A.md (5 findings) and review-run-B.md (6 findings, A ⊂ B) form a known-Jaccard fixture pair: |A ∩ B| = 5, |A ∪ B| = 6, Jaccard = 5/6 = 0.833, above the SC4 threshold of 0.70. IDs are real 40-char SHA1s computed via lib/parsers/finding-id.mjs from realistic (file, line, rule_key) triplets. Both fixtures pass review-validator --strict (frontmatter + body sections + findings shape). Real-LLM determinism measurement deferred to v1.1. Co-Authored-By: Claude Opus 4.7 --- .../tests/fixtures/ultrareview/README.md | 47 +++++++ .../fixtures/ultrareview/review-run-A.md | 106 ++++++++++++++++ .../fixtures/ultrareview/review-run-B.md | 117 ++++++++++++++++++ 3 files changed, 270 insertions(+) create mode 100644 plugins/ultraplan-local/tests/fixtures/ultrareview/README.md create mode 100644 plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-A.md create mode 100644 plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-B.md diff --git a/plugins/ultraplan-local/tests/fixtures/ultrareview/README.md b/plugins/ultraplan-local/tests/fixtures/ultrareview/README.md new file mode 100644 index 0000000..93498f6 --- /dev/null +++ b/plugins/ultraplan-local/tests/fixtures/ultrareview/README.md @@ -0,0 +1,47 @@ +# ultrareview determinism fixtures + +Synthetic fixtures for the Jaccard-similarity determinism test in +`tests/lib/review-determinism.test.mjs`. + +## What's here + +- `review-run-A.md` — synthetic review with 5 findings on a fictional JWT auth task +- `review-run-B.md` — same fictional task, "re-reviewed" — same 5 findings as A plus 1 extra (a placeholder TODO that A missed) + +## Construction + +Run A's finding-IDs are a strict subset of Run B's (`A ⊂ B`), so: + +- Intersection: `|A ∩ B| = 5` +- Union: `|A ∪ B| = 6` +- Jaccard: `5 / 6 = 0.833…` (above the 0.70 SC4 threshold from `brief.md`) + +Each ID is a real 40-char SHA1 computed via `lib/parsers/finding-id.mjs`: +`sha1(file:line:rule_key)`. Don't hand-edit the IDs — recompute via the helper if +you change the underlying `(file, line, rule_key)` triplet, or both fixtures will +fall out of sync. + +## Why synthetic for v1.0 + +Hand-curated for v1.0. Edit JSON IDs directly to test new Jaccard scenarios. +Real-LLM determinism measurement is deferred to v1.1 once `/ultrareview-local` +has produced enough real outputs to capture as fixtures. + +These fixtures prove the Jaccard PIPELINE works given a known input — they do +NOT measure real LLM determinism. The brief's SC4 (Jaccard ≥ 0.70 across two +runs) is verified at the pipeline level today; capturing real LLM runs to +verify the model-level claim is open work for v1.1. + +## Adding a new scenario + +1. Pick `(file, line, rule_key)` triplets — `rule_key` must be one of the 12 + keys in `lib/review/rule-catalogue.mjs`. +2. Compute IDs via: + ```bash + node -e "import('./lib/parsers/finding-id.mjs').then(({computeFindingId}) => console.log(computeFindingId('lib/foo.mjs', 42, 'SECURITY_INJECTION')))" + ``` +3. Add the IDs to `findings:` block-style YAML in frontmatter and to `### ` + subsections in the body. +4. Run `node lib/validators/review-validator.mjs --json tests/fixtures/ultrareview/review-run-X.md` + to confirm the fixture validates. +5. Update `tests/lib/review-determinism.test.mjs` if you want a new assertion. diff --git a/plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-A.md b/plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-A.md new file mode 100644 index 0000000..afcca18 --- /dev/null +++ b/plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-A.md @@ -0,0 +1,106 @@ +--- +type: ultrareview +review_version: "1.0" +created: 2026-05-01 +task: "Add JWT authentication with refresh-token rotation" +slug: jwt-auth +project_dir: .claude/projects/2026-05-01-jwt-auth/ +brief_path: .claude/projects/2026-05-01-jwt-auth/brief.md +scope_sha_start: 0123456789abcdef0123456789abcdef01234567 +scope_sha_end: fedcba9876543210fedcba9876543210fedcba98 +reviewed_files_count: 3 +verdict: WARN +findings: + - d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232 + - 7861519c326c207aabf17072db51c469bebc217b + - 400dfcff81e0e219eb04a7123c68ae870696f121 + - 763d174e6c519fafbadcba5d1706708479e36e61 + - 7a3d7d0a668f6431ef3877ceeb106023b0f6295e +--- + +# Review: Add JWT authentication with refresh-token rotation (Run A) + +## Executive Summary + +Implementation hits the brief's core success criteria (login + refresh + logout) but +has one BLOCKER and four MAJOR/MINOR issues. Verdict: **WARN** — fix the BLOCKER +before merge; the MAJORs should land in a follow-up plan. + +This is a SYNTHETIC v1.0 fixture for testing the Jaccard determinism pipeline. It is +NOT the output of a real LLM review. + +## Coverage + +| File | Treatment | Reason | +|---|---|---| +| `lib/auth/jwt.mjs` | deep-review | Security-critical (token signing/verification) | +| `lib/handlers/login.mjs` | deep-review | Auth surface | +| `lib/handlers/logout.mjs` | deep-review | Auth surface | +| `package-lock.json` | skip | Lockfile | +| `dist/**` | skip | Build output | + +## Findings (BLOCKER) + +### 763d174e6c519fafbadcba5d1706708479e36e61 + +- **Location:** `lib/handlers/login.mjs:23` +- **Rule:** `UNIMPLEMENTED_CRITERION` +- **Brief ref:** SC-2 ("login endpoint MUST return 401 on invalid credentials") +- **Evidence:** Handler returns 200 with empty body when password mismatch occurs. +- **Fix:** Return 401 with WWW-Authenticate header per brief SC-2. + +## Findings (MAJOR) + +### d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232 + +- **Location:** `lib/auth/jwt.mjs:42` +- **Rule:** `SECURITY_INJECTION` +- **Brief ref:** Non-Goal #3 ("must not accept user-supplied algorithm header") +- **Evidence:** `jwt.verify(token, secret, { algorithms: req.body.alg })` — algorithm taken from request body. +- **Fix:** Hard-code `algorithms: ['RS256']`; reject any token claiming a different alg. + +### 7861519c326c207aabf17072db51c469bebc217b + +- **Location:** `lib/auth/jwt.mjs:88` +- **Rule:** `MISSING_TEST` +- **Brief ref:** SC-4 ("refresh-token rotation must be tested under concurrent refresh") +- **Evidence:** No test in `tests/` covers the concurrent-refresh path; only happy-path is exercised. +- **Fix:** Add `tests/auth/concurrent-refresh.test.mjs` covering the race window. + +### 7a3d7d0a668f6431ef3877ceeb106023b0f6295e + +- **Location:** `lib/handlers/login.mjs:56` +- **Rule:** `PLAN_EXECUTE_DRIFT` +- **Brief ref:** Plan Step 4 ("login.mjs uses bcrypt.compare()") +- **Evidence:** Plan said `bcrypt.compare`; implementation uses `crypto.timingSafeEqual` over plaintext-derived buffers. +- **Fix:** Either update plan + brief to record the deviation or refactor to bcrypt.compare per plan. + +## Findings (MINOR) + +### 400dfcff81e0e219eb04a7123c68ae870696f121 + +- **Location:** `lib/auth/jwt.mjs:117` +- **Rule:** `MISSING_ERROR_HANDLING` +- **Brief ref:** none (engineering hygiene) +- **Evidence:** `await refreshTokenStore.delete(jti)` is not wrapped — store-down throws bubble to top-level handler. +- **Fix:** Wrap in try/catch; log + 503 on store failure. + +## Remediation Summary + +5 findings total: 1 BLOCKER, 3 MAJOR, 1 MINOR. Run a remediation plan via +`/ultraplan-local --brief review.md` — it will pick up BLOCKER + MAJOR findings as +plan goals and emit `source_findings: [, ...]` audit trail (Handover 6). + +```json +{ + "fixture_kind": "synthetic-v1.0", + "jaccard_with_run_B": "5/6 = 0.833", + "findings": [ + {"id": "763d174e6c519fafbadcba5d1706708479e36e61", "severity": "BLOCKER", "rule": "UNIMPLEMENTED_CRITERION", "file": "lib/handlers/login.mjs", "line": 23}, + {"id": "d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232", "severity": "MAJOR", "rule": "SECURITY_INJECTION", "file": "lib/auth/jwt.mjs", "line": 42}, + {"id": "7861519c326c207aabf17072db51c469bebc217b", "severity": "MAJOR", "rule": "MISSING_TEST", "file": "lib/auth/jwt.mjs", "line": 88}, + {"id": "7a3d7d0a668f6431ef3877ceeb106023b0f6295e", "severity": "MAJOR", "rule": "PLAN_EXECUTE_DRIFT", "file": "lib/handlers/login.mjs", "line": 56}, + {"id": "400dfcff81e0e219eb04a7123c68ae870696f121", "severity": "MINOR", "rule": "MISSING_ERROR_HANDLING", "file": "lib/auth/jwt.mjs", "line": 117} + ] +} +``` diff --git a/plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-B.md b/plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-B.md new file mode 100644 index 0000000..d6ee8d9 --- /dev/null +++ b/plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-B.md @@ -0,0 +1,117 @@ +--- +type: ultrareview +review_version: "1.0" +created: 2026-05-01 +task: "Add JWT authentication with refresh-token rotation" +slug: jwt-auth +project_dir: .claude/projects/2026-05-01-jwt-auth/ +brief_path: .claude/projects/2026-05-01-jwt-auth/brief.md +scope_sha_start: 0123456789abcdef0123456789abcdef01234567 +scope_sha_end: fedcba9876543210fedcba9876543210fedcba98 +reviewed_files_count: 3 +verdict: WARN +findings: + - d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232 + - 7861519c326c207aabf17072db51c469bebc217b + - 400dfcff81e0e219eb04a7123c68ae870696f121 + - 763d174e6c519fafbadcba5d1706708479e36e61 + - 7a3d7d0a668f6431ef3877ceeb106023b0f6295e + - bf3e8b347cf4269ad005a9cf64dab6f601345704 +--- + +# Review: Add JWT authentication with refresh-token rotation (Run B) + +## Executive Summary + +Same diff as Run A, re-reviewed independently to test determinism. This run found +the same 5 findings plus one extra (a placeholder TODO in logout.mjs that Run A +missed). Verdict: **WARN** — same as Run A; the extra finding is MAJOR but does +not change the merge gate. + +This is a SYNTHETIC v1.0 fixture for testing the Jaccard determinism pipeline. +Run A's set is a strict subset of Run B's set, giving Jaccard = 5/6 = 0.833. + +## Coverage + +| File | Treatment | Reason | +|---|---|---| +| `lib/auth/jwt.mjs` | deep-review | Security-critical (token signing/verification) | +| `lib/handlers/login.mjs` | deep-review | Auth surface | +| `lib/handlers/logout.mjs` | deep-review | Auth surface | +| `package-lock.json` | skip | Lockfile | +| `dist/**` | skip | Build output | + +## Findings (BLOCKER) + +### 763d174e6c519fafbadcba5d1706708479e36e61 + +- **Location:** `lib/handlers/login.mjs:23` +- **Rule:** `UNIMPLEMENTED_CRITERION` +- **Brief ref:** SC-2 ("login endpoint MUST return 401 on invalid credentials") +- **Evidence:** Handler returns 200 with empty body when password mismatch occurs. +- **Fix:** Return 401 with WWW-Authenticate header per brief SC-2. + +## Findings (MAJOR) + +### d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232 + +- **Location:** `lib/auth/jwt.mjs:42` +- **Rule:** `SECURITY_INJECTION` +- **Brief ref:** Non-Goal #3 ("must not accept user-supplied algorithm header") +- **Evidence:** `jwt.verify(token, secret, { algorithms: req.body.alg })` — algorithm taken from request body. +- **Fix:** Hard-code `algorithms: ['RS256']`; reject any token claiming a different alg. + +### 7861519c326c207aabf17072db51c469bebc217b + +- **Location:** `lib/auth/jwt.mjs:88` +- **Rule:** `MISSING_TEST` +- **Brief ref:** SC-4 ("refresh-token rotation must be tested under concurrent refresh") +- **Evidence:** No test in `tests/` covers the concurrent-refresh path; only happy-path is exercised. +- **Fix:** Add `tests/auth/concurrent-refresh.test.mjs` covering the race window. + +### 7a3d7d0a668f6431ef3877ceeb106023b0f6295e + +- **Location:** `lib/handlers/login.mjs:56` +- **Rule:** `PLAN_EXECUTE_DRIFT` +- **Brief ref:** Plan Step 4 ("login.mjs uses bcrypt.compare()") +- **Evidence:** Plan said `bcrypt.compare`; implementation uses `crypto.timingSafeEqual` over plaintext-derived buffers. +- **Fix:** Either update plan + brief to record the deviation or refactor to bcrypt.compare per plan. + +### bf3e8b347cf4269ad005a9cf64dab6f601345704 + +- **Location:** `lib/handlers/logout.mjs:14` +- **Rule:** `PLACEHOLDER_IN_CODE` +- **Brief ref:** none (Rule 7a violation) +- **Evidence:** `// TODO: invalidate refresh-token cookie before responding` — left in committed code. +- **Fix:** Implement the cookie invalidation or remove the comment with an issue link. + +## Findings (MINOR) + +### 400dfcff81e0e219eb04a7123c68ae870696f121 + +- **Location:** `lib/auth/jwt.mjs:117` +- **Rule:** `MISSING_ERROR_HANDLING` +- **Brief ref:** none (engineering hygiene) +- **Evidence:** `await refreshTokenStore.delete(jti)` is not wrapped — store-down throws bubble to top-level handler. +- **Fix:** Wrap in try/catch; log + 503 on store failure. + +## Remediation Summary + +6 findings total: 1 BLOCKER, 4 MAJOR, 1 MINOR. Same merge gate as Run A; one +extra MAJOR (PLACEHOLDER_IN_CODE) that Run A missed. Run a remediation plan via +`/ultraplan-local --brief review.md`. + +```json +{ + "fixture_kind": "synthetic-v1.0", + "jaccard_with_run_A": "5/6 = 0.833", + "findings": [ + {"id": "763d174e6c519fafbadcba5d1706708479e36e61", "severity": "BLOCKER", "rule": "UNIMPLEMENTED_CRITERION", "file": "lib/handlers/login.mjs", "line": 23}, + {"id": "d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232", "severity": "MAJOR", "rule": "SECURITY_INJECTION", "file": "lib/auth/jwt.mjs", "line": 42}, + {"id": "7861519c326c207aabf17072db51c469bebc217b", "severity": "MAJOR", "rule": "MISSING_TEST", "file": "lib/auth/jwt.mjs", "line": 88}, + {"id": "7a3d7d0a668f6431ef3877ceeb106023b0f6295e", "severity": "MAJOR", "rule": "PLAN_EXECUTE_DRIFT", "file": "lib/handlers/login.mjs", "line": 56}, + {"id": "bf3e8b347cf4269ad005a9cf64dab6f601345704", "severity": "MAJOR", "rule": "PLACEHOLDER_IN_CODE", "file": "lib/handlers/logout.mjs", "line": 14}, + {"id": "400dfcff81e0e219eb04a7123c68ae870696f121", "severity": "MINOR", "rule": "MISSING_ERROR_HANDLING", "file": "lib/auth/jwt.mjs", "line": 117} + ] +} +```