fix(llm-security): A2 batch — JSDoc arithmetic + co-monotonicity test + CaMeL nedton

Closes A2 of v7.1.0 critical-review patch (docs/critical-review-2026-04-20.md):

- B4 (severity JSDoc): 4 critical = 93, not 90. Fixed in scanners/lib/severity.mjs:23
  and CHANGELOG.md v7.0.0 tier description. The actual computation has always been
  93 (70 + log2(5)*10 = 93.22 → round); only the docs were wrong.

- §5.4 co-monotonicity: new sweep test in tests/lib/severity.test.mjs over 15
  representative count vectors. Asserts that (verdict, riskBand) agree under the
  v7.0.0 contract for every case — catches future drift between riskScore tiers,
  verdict cutoffs, and riskBand cutoffs. Includes a B4 anchor test (riskScore
  {critical: 4} === 93) so doc/code drift fails loudly.

- B8 (CaMeL claims toned down): post-session-guard.mjs:646 comment block and
  CLAUDE.md:184 Defense Philosophy bullet now describe the implementation
  honestly — opportunistic byte-matching of truncated output fingerprints
  (first 200 bytes, SHA-256/16-hex), not semantic data-flow tracking.
  Trivially bypassed by mutation, summarisation, or re-encoding. Inspired by
  CaMeL (DeepMind 2025), but not a CaMeL capability-tracking implementation.

Tests: 1495 → 1511 (+16: 15 sweep cases + 1 B4 anchor). All green.
This commit is contained in:
Kjell Tore Guttormsen 2026-04-29 11:49:08 +02:00
commit 4aa5318bcb
5 changed files with 84 additions and 6 deletions

View file

@ -643,12 +643,22 @@ function formatDriftWarning(jsd, firstTools, lastTools) {
}
// ---------------------------------------------------------------------------
// CaMeL-inspired data flow tagging (DeepMind CaMeL, v5.0 S6)
// Output fingerprint matching (inspired by CaMeL, DeepMind 2025; v5.0 S6)
//
// NOTE: This is opportunistic byte-matching of truncated output fingerprints,
// not semantic data-flow tracking. We hash the first 200 bytes of tool output
// (SHA-256, truncated to 16 hex chars) and check whether that exact tag
// appears verbatim in the next tool input. Trivially bypassed by:
// - Mutating any of the first 200 bytes
// - Summarising the output before passing it on
// - Re-encoding (base64, JSON-escape, whitespace changes)
// Inspired by CaMeL but NOT a CaMeL capability-tracking implementation.
// ---------------------------------------------------------------------------
/**
* Compute a short data tag from tool output (first 200 chars, SHA-256 truncated to 16 hex).
* Used for lightweight data provenance tracking.
* Compute a short output fingerprint from tool output (first 200 chars,
* SHA-256 truncated to 16 hex). Used for opportunistic byte-matching, not
* semantic provenance.
* @param {string} text - tool output text
* @returns {string} 16-char hex hash
*/