test(ultraplan-local): add synthetic ultrareview determinism fixtures
review-run-A.md (5 findings) and review-run-B.md (6 findings, A ⊂ B) form a known-Jaccard fixture pair: |A ∩ B| = 5, |A ∪ B| = 6, Jaccard = 5/6 = 0.833, above the SC4 threshold of 0.70. IDs are real 40-char SHA1s computed via lib/parsers/finding-id.mjs from realistic (file, line, rule_key) triplets. Both fixtures pass review-validator --strict (frontmatter + body sections + findings shape). Real-LLM determinism measurement deferred to v1.1. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
09e7fb9364
commit
5aa37941ed
3 changed files with 270 additions and 0 deletions
47
plugins/ultraplan-local/tests/fixtures/ultrareview/README.md
vendored
Normal file
47
plugins/ultraplan-local/tests/fixtures/ultrareview/README.md
vendored
Normal file
|
|
@ -0,0 +1,47 @@
|
|||
# ultrareview determinism fixtures
|
||||
|
||||
Synthetic fixtures for the Jaccard-similarity determinism test in
|
||||
`tests/lib/review-determinism.test.mjs`.
|
||||
|
||||
## What's here
|
||||
|
||||
- `review-run-A.md` — synthetic review with 5 findings on a fictional JWT auth task
|
||||
- `review-run-B.md` — same fictional task, "re-reviewed" — same 5 findings as A plus 1 extra (a placeholder TODO that A missed)
|
||||
|
||||
## Construction
|
||||
|
||||
Run A's finding-IDs are a strict subset of Run B's (`A ⊂ B`), so:
|
||||
|
||||
- Intersection: `|A ∩ B| = 5`
|
||||
- Union: `|A ∪ B| = 6`
|
||||
- Jaccard: `5 / 6 = 0.833…` (above the 0.70 SC4 threshold from `brief.md`)
|
||||
|
||||
Each ID is a real 40-char SHA1 computed via `lib/parsers/finding-id.mjs`:
|
||||
`sha1(file:line:rule_key)`. Don't hand-edit the IDs — recompute via the helper if
|
||||
you change the underlying `(file, line, rule_key)` triplet, or both fixtures will
|
||||
fall out of sync.
|
||||
|
||||
## Why synthetic for v1.0
|
||||
|
||||
Hand-curated for v1.0. Edit JSON IDs directly to test new Jaccard scenarios.
|
||||
Real-LLM determinism measurement is deferred to v1.1 once `/ultrareview-local`
|
||||
has produced enough real outputs to capture as fixtures.
|
||||
|
||||
These fixtures prove the Jaccard PIPELINE works given a known input — they do
|
||||
NOT measure real LLM determinism. The brief's SC4 (Jaccard ≥ 0.70 across two
|
||||
runs) is verified at the pipeline level today; capturing real LLM runs to
|
||||
verify the model-level claim is open work for v1.1.
|
||||
|
||||
## Adding a new scenario
|
||||
|
||||
1. Pick `(file, line, rule_key)` triplets — `rule_key` must be one of the 12
|
||||
keys in `lib/review/rule-catalogue.mjs`.
|
||||
2. Compute IDs via:
|
||||
```bash
|
||||
node -e "import('./lib/parsers/finding-id.mjs').then(({computeFindingId}) => console.log(computeFindingId('lib/foo.mjs', 42, 'SECURITY_INJECTION')))"
|
||||
```
|
||||
3. Add the IDs to `findings:` block-style YAML in frontmatter and to `### <id>`
|
||||
subsections in the body.
|
||||
4. Run `node lib/validators/review-validator.mjs --json tests/fixtures/ultrareview/review-run-X.md`
|
||||
to confirm the fixture validates.
|
||||
5. Update `tests/lib/review-determinism.test.mjs` if you want a new assertion.
|
||||
106
plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-A.md
vendored
Normal file
106
plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-A.md
vendored
Normal file
|
|
@ -0,0 +1,106 @@
|
|||
---
|
||||
type: ultrareview
|
||||
review_version: "1.0"
|
||||
created: 2026-05-01
|
||||
task: "Add JWT authentication with refresh-token rotation"
|
||||
slug: jwt-auth
|
||||
project_dir: .claude/projects/2026-05-01-jwt-auth/
|
||||
brief_path: .claude/projects/2026-05-01-jwt-auth/brief.md
|
||||
scope_sha_start: 0123456789abcdef0123456789abcdef01234567
|
||||
scope_sha_end: fedcba9876543210fedcba9876543210fedcba98
|
||||
reviewed_files_count: 3
|
||||
verdict: WARN
|
||||
findings:
|
||||
- d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232
|
||||
- 7861519c326c207aabf17072db51c469bebc217b
|
||||
- 400dfcff81e0e219eb04a7123c68ae870696f121
|
||||
- 763d174e6c519fafbadcba5d1706708479e36e61
|
||||
- 7a3d7d0a668f6431ef3877ceeb106023b0f6295e
|
||||
---
|
||||
|
||||
# Review: Add JWT authentication with refresh-token rotation (Run A)
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Implementation hits the brief's core success criteria (login + refresh + logout) but
|
||||
has one BLOCKER and four MAJOR/MINOR issues. Verdict: **WARN** — fix the BLOCKER
|
||||
before merge; the MAJORs should land in a follow-up plan.
|
||||
|
||||
This is a SYNTHETIC v1.0 fixture for testing the Jaccard determinism pipeline. It is
|
||||
NOT the output of a real LLM review.
|
||||
|
||||
## Coverage
|
||||
|
||||
| File | Treatment | Reason |
|
||||
|---|---|---|
|
||||
| `lib/auth/jwt.mjs` | deep-review | Security-critical (token signing/verification) |
|
||||
| `lib/handlers/login.mjs` | deep-review | Auth surface |
|
||||
| `lib/handlers/logout.mjs` | deep-review | Auth surface |
|
||||
| `package-lock.json` | skip | Lockfile |
|
||||
| `dist/**` | skip | Build output |
|
||||
|
||||
## Findings (BLOCKER)
|
||||
|
||||
### 763d174e6c519fafbadcba5d1706708479e36e61
|
||||
|
||||
- **Location:** `lib/handlers/login.mjs:23`
|
||||
- **Rule:** `UNIMPLEMENTED_CRITERION`
|
||||
- **Brief ref:** SC-2 ("login endpoint MUST return 401 on invalid credentials")
|
||||
- **Evidence:** Handler returns 200 with empty body when password mismatch occurs.
|
||||
- **Fix:** Return 401 with WWW-Authenticate header per brief SC-2.
|
||||
|
||||
## Findings (MAJOR)
|
||||
|
||||
### d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232
|
||||
|
||||
- **Location:** `lib/auth/jwt.mjs:42`
|
||||
- **Rule:** `SECURITY_INJECTION`
|
||||
- **Brief ref:** Non-Goal #3 ("must not accept user-supplied algorithm header")
|
||||
- **Evidence:** `jwt.verify(token, secret, { algorithms: req.body.alg })` — algorithm taken from request body.
|
||||
- **Fix:** Hard-code `algorithms: ['RS256']`; reject any token claiming a different alg.
|
||||
|
||||
### 7861519c326c207aabf17072db51c469bebc217b
|
||||
|
||||
- **Location:** `lib/auth/jwt.mjs:88`
|
||||
- **Rule:** `MISSING_TEST`
|
||||
- **Brief ref:** SC-4 ("refresh-token rotation must be tested under concurrent refresh")
|
||||
- **Evidence:** No test in `tests/` covers the concurrent-refresh path; only happy-path is exercised.
|
||||
- **Fix:** Add `tests/auth/concurrent-refresh.test.mjs` covering the race window.
|
||||
|
||||
### 7a3d7d0a668f6431ef3877ceeb106023b0f6295e
|
||||
|
||||
- **Location:** `lib/handlers/login.mjs:56`
|
||||
- **Rule:** `PLAN_EXECUTE_DRIFT`
|
||||
- **Brief ref:** Plan Step 4 ("login.mjs uses bcrypt.compare()")
|
||||
- **Evidence:** Plan said `bcrypt.compare`; implementation uses `crypto.timingSafeEqual` over plaintext-derived buffers.
|
||||
- **Fix:** Either update plan + brief to record the deviation or refactor to bcrypt.compare per plan.
|
||||
|
||||
## Findings (MINOR)
|
||||
|
||||
### 400dfcff81e0e219eb04a7123c68ae870696f121
|
||||
|
||||
- **Location:** `lib/auth/jwt.mjs:117`
|
||||
- **Rule:** `MISSING_ERROR_HANDLING`
|
||||
- **Brief ref:** none (engineering hygiene)
|
||||
- **Evidence:** `await refreshTokenStore.delete(jti)` is not wrapped — store-down throws bubble to top-level handler.
|
||||
- **Fix:** Wrap in try/catch; log + 503 on store failure.
|
||||
|
||||
## Remediation Summary
|
||||
|
||||
5 findings total: 1 BLOCKER, 3 MAJOR, 1 MINOR. Run a remediation plan via
|
||||
`/ultraplan-local --brief review.md` — it will pick up BLOCKER + MAJOR findings as
|
||||
plan goals and emit `source_findings: [<id>, ...]` audit trail (Handover 6).
|
||||
|
||||
```json
|
||||
{
|
||||
"fixture_kind": "synthetic-v1.0",
|
||||
"jaccard_with_run_B": "5/6 = 0.833",
|
||||
"findings": [
|
||||
{"id": "763d174e6c519fafbadcba5d1706708479e36e61", "severity": "BLOCKER", "rule": "UNIMPLEMENTED_CRITERION", "file": "lib/handlers/login.mjs", "line": 23},
|
||||
{"id": "d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232", "severity": "MAJOR", "rule": "SECURITY_INJECTION", "file": "lib/auth/jwt.mjs", "line": 42},
|
||||
{"id": "7861519c326c207aabf17072db51c469bebc217b", "severity": "MAJOR", "rule": "MISSING_TEST", "file": "lib/auth/jwt.mjs", "line": 88},
|
||||
{"id": "7a3d7d0a668f6431ef3877ceeb106023b0f6295e", "severity": "MAJOR", "rule": "PLAN_EXECUTE_DRIFT", "file": "lib/handlers/login.mjs", "line": 56},
|
||||
{"id": "400dfcff81e0e219eb04a7123c68ae870696f121", "severity": "MINOR", "rule": "MISSING_ERROR_HANDLING", "file": "lib/auth/jwt.mjs", "line": 117}
|
||||
]
|
||||
}
|
||||
```
|
||||
117
plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-B.md
vendored
Normal file
117
plugins/ultraplan-local/tests/fixtures/ultrareview/review-run-B.md
vendored
Normal file
|
|
@ -0,0 +1,117 @@
|
|||
---
|
||||
type: ultrareview
|
||||
review_version: "1.0"
|
||||
created: 2026-05-01
|
||||
task: "Add JWT authentication with refresh-token rotation"
|
||||
slug: jwt-auth
|
||||
project_dir: .claude/projects/2026-05-01-jwt-auth/
|
||||
brief_path: .claude/projects/2026-05-01-jwt-auth/brief.md
|
||||
scope_sha_start: 0123456789abcdef0123456789abcdef01234567
|
||||
scope_sha_end: fedcba9876543210fedcba9876543210fedcba98
|
||||
reviewed_files_count: 3
|
||||
verdict: WARN
|
||||
findings:
|
||||
- d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232
|
||||
- 7861519c326c207aabf17072db51c469bebc217b
|
||||
- 400dfcff81e0e219eb04a7123c68ae870696f121
|
||||
- 763d174e6c519fafbadcba5d1706708479e36e61
|
||||
- 7a3d7d0a668f6431ef3877ceeb106023b0f6295e
|
||||
- bf3e8b347cf4269ad005a9cf64dab6f601345704
|
||||
---
|
||||
|
||||
# Review: Add JWT authentication with refresh-token rotation (Run B)
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Same diff as Run A, re-reviewed independently to test determinism. This run found
|
||||
the same 5 findings plus one extra (a placeholder TODO in logout.mjs that Run A
|
||||
missed). Verdict: **WARN** — same as Run A; the extra finding is MAJOR but does
|
||||
not change the merge gate.
|
||||
|
||||
This is a SYNTHETIC v1.0 fixture for testing the Jaccard determinism pipeline.
|
||||
Run A's set is a strict subset of Run B's set, giving Jaccard = 5/6 = 0.833.
|
||||
|
||||
## Coverage
|
||||
|
||||
| File | Treatment | Reason |
|
||||
|---|---|---|
|
||||
| `lib/auth/jwt.mjs` | deep-review | Security-critical (token signing/verification) |
|
||||
| `lib/handlers/login.mjs` | deep-review | Auth surface |
|
||||
| `lib/handlers/logout.mjs` | deep-review | Auth surface |
|
||||
| `package-lock.json` | skip | Lockfile |
|
||||
| `dist/**` | skip | Build output |
|
||||
|
||||
## Findings (BLOCKER)
|
||||
|
||||
### 763d174e6c519fafbadcba5d1706708479e36e61
|
||||
|
||||
- **Location:** `lib/handlers/login.mjs:23`
|
||||
- **Rule:** `UNIMPLEMENTED_CRITERION`
|
||||
- **Brief ref:** SC-2 ("login endpoint MUST return 401 on invalid credentials")
|
||||
- **Evidence:** Handler returns 200 with empty body when password mismatch occurs.
|
||||
- **Fix:** Return 401 with WWW-Authenticate header per brief SC-2.
|
||||
|
||||
## Findings (MAJOR)
|
||||
|
||||
### d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232
|
||||
|
||||
- **Location:** `lib/auth/jwt.mjs:42`
|
||||
- **Rule:** `SECURITY_INJECTION`
|
||||
- **Brief ref:** Non-Goal #3 ("must not accept user-supplied algorithm header")
|
||||
- **Evidence:** `jwt.verify(token, secret, { algorithms: req.body.alg })` — algorithm taken from request body.
|
||||
- **Fix:** Hard-code `algorithms: ['RS256']`; reject any token claiming a different alg.
|
||||
|
||||
### 7861519c326c207aabf17072db51c469bebc217b
|
||||
|
||||
- **Location:** `lib/auth/jwt.mjs:88`
|
||||
- **Rule:** `MISSING_TEST`
|
||||
- **Brief ref:** SC-4 ("refresh-token rotation must be tested under concurrent refresh")
|
||||
- **Evidence:** No test in `tests/` covers the concurrent-refresh path; only happy-path is exercised.
|
||||
- **Fix:** Add `tests/auth/concurrent-refresh.test.mjs` covering the race window.
|
||||
|
||||
### 7a3d7d0a668f6431ef3877ceeb106023b0f6295e
|
||||
|
||||
- **Location:** `lib/handlers/login.mjs:56`
|
||||
- **Rule:** `PLAN_EXECUTE_DRIFT`
|
||||
- **Brief ref:** Plan Step 4 ("login.mjs uses bcrypt.compare()")
|
||||
- **Evidence:** Plan said `bcrypt.compare`; implementation uses `crypto.timingSafeEqual` over plaintext-derived buffers.
|
||||
- **Fix:** Either update plan + brief to record the deviation or refactor to bcrypt.compare per plan.
|
||||
|
||||
### bf3e8b347cf4269ad005a9cf64dab6f601345704
|
||||
|
||||
- **Location:** `lib/handlers/logout.mjs:14`
|
||||
- **Rule:** `PLACEHOLDER_IN_CODE`
|
||||
- **Brief ref:** none (Rule 7a violation)
|
||||
- **Evidence:** `// TODO: invalidate refresh-token cookie before responding` — left in committed code.
|
||||
- **Fix:** Implement the cookie invalidation or remove the comment with an issue link.
|
||||
|
||||
## Findings (MINOR)
|
||||
|
||||
### 400dfcff81e0e219eb04a7123c68ae870696f121
|
||||
|
||||
- **Location:** `lib/auth/jwt.mjs:117`
|
||||
- **Rule:** `MISSING_ERROR_HANDLING`
|
||||
- **Brief ref:** none (engineering hygiene)
|
||||
- **Evidence:** `await refreshTokenStore.delete(jti)` is not wrapped — store-down throws bubble to top-level handler.
|
||||
- **Fix:** Wrap in try/catch; log + 503 on store failure.
|
||||
|
||||
## Remediation Summary
|
||||
|
||||
6 findings total: 1 BLOCKER, 4 MAJOR, 1 MINOR. Same merge gate as Run A; one
|
||||
extra MAJOR (PLACEHOLDER_IN_CODE) that Run A missed. Run a remediation plan via
|
||||
`/ultraplan-local --brief review.md`.
|
||||
|
||||
```json
|
||||
{
|
||||
"fixture_kind": "synthetic-v1.0",
|
||||
"jaccard_with_run_A": "5/6 = 0.833",
|
||||
"findings": [
|
||||
{"id": "763d174e6c519fafbadcba5d1706708479e36e61", "severity": "BLOCKER", "rule": "UNIMPLEMENTED_CRITERION", "file": "lib/handlers/login.mjs", "line": 23},
|
||||
{"id": "d2d0e27875ae9ef0d818cb08bb6f14e6d33c4232", "severity": "MAJOR", "rule": "SECURITY_INJECTION", "file": "lib/auth/jwt.mjs", "line": 42},
|
||||
{"id": "7861519c326c207aabf17072db51c469bebc217b", "severity": "MAJOR", "rule": "MISSING_TEST", "file": "lib/auth/jwt.mjs", "line": 88},
|
||||
{"id": "7a3d7d0a668f6431ef3877ceeb106023b0f6295e", "severity": "MAJOR", "rule": "PLAN_EXECUTE_DRIFT", "file": "lib/handlers/login.mjs", "line": 56},
|
||||
{"id": "bf3e8b347cf4269ad005a9cf64dab6f601345704", "severity": "MAJOR", "rule": "PLACEHOLDER_IN_CODE", "file": "lib/handlers/logout.mjs", "line": 14},
|
||||
{"id": "400dfcff81e0e219eb04a7123c68ae870696f121", "severity": "MINOR", "rule": "MISSING_ERROR_HANDLING", "file": "lib/auth/jwt.mjs", "line": 117}
|
||||
]
|
||||
}
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue