Commit graph

3 commits

Author SHA1 Message Date
Kjell Tore Guttormsen
f0a1d4024a feat(post-session-guard): E17 — configurable escalation window + 20-call MEDIUM advisory
Critical-review §4 E17 finding: pre-v7.2.0 the delegation-after-input
advisory fired only within a 5-call window. Attackers who deliberately
waited 6+ calls before delegating bypassed detection. Window was also
hardcoded — operators couldn't tune it for their environment.

Two coordinated changes:

1. LLM_SECURITY_ESCALATION_WINDOW env var (primary window override)
   - parseInt(env) || getPolicyValue('trifecta', 'escalation_window', 5)
   - Mirrors the established pattern from
     LLM_SECURITY_TRIFECTA_MODE et al.
   - Setting env=3 narrows; env=8 expands.

2. Secondary 20-call MEDIUM advisory (slow-burn variant)
   - DELEGATION_ESCALATION_WINDOW_MEDIUM = 20 (hardcoded — same value
     for all operators; tunable in a future patch if needed)
   - checkEscalationAfterInput now returns `tier: 'primary'|'secondary'|null`
   - formatEscalationWarning emits a different message for secondary —
     mentions "slow-burn", references env-var, distinct from the
     primary "DeepMind Category 4" framing

Hook reads max(WINDOW_SIZE, secondary+5) entries to cover the wider
window. Existing duplicate-suppression (`escalation_warning` state
entry) covers both tiers. Audit-trail event captures `tier` field.

Tests: +5 cases in tests/hooks/post-session-guard.test.mjs:
- secondary window catches 9-call distance (slow-burn)
- secondary boundary at exactly 20 calls
- primary regression guard (1-call distance)
- env=3 narrows primary (4-call distance becomes secondary)
- env=8 expands primary (7-call distance stays primary)

Updated existing test "does NOT trigger when input_source is >5 calls
ago" — now requires >20 calls (secondary window catches 6-20).

Suite: 1644 → 1672 (+28 from new tests + extended scope). All green.

CLAUDE.md hooks table updated to document both windows and the env var.
2026-04-29 14:26:18 +02:00
Kjell Tore Guttormsen
36be963d4d fix(llm-security): B2 block-mode blocks all detected trifectas, not only high-confidence
Previously, `LLM_SECURITY_TRIFECTA_MODE=block` only exited 2 when the
detected trifecta was MCP-concentrated (all three legs via the same MCP
server) or involved sensitive-path + exfil. Distributed trifectas —
three legs originating from different tools, with a non-sensitive data
path and a non-sensitive exfiltration sink — were detected and warned
but not blocked. This mismatched the documented semantics of block mode
and gave operators a false sense of enforcement.

Change: remove the `(mcpInfo.concentrated || sensitiveExfil)` AND-gate
in the `TRIFECTA_MODE === 'block'` branch so any detected trifecta
blocks in block mode. Audit event `severity` still differentiates
critical (concentrated / sensitive-exfil) from high (distributed); the
blocked stderr message now explicitly names "Distributed trifecta:
three legs from different sources" when the confidence sub-signals
are absent.

Addresses critical review 2026-04-20 §2 B2 (HIGH) and §9 row 1
("enforces the Rule of Two").

Tests: 1 added (distributed trifecta in block mode now exits 2).
All 1495 tests pass.
2026-04-20 00:04:36 +02:00
Kjell Tore Guttormsen
f93d6abdae feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00