From 3bcd0d4bc450fa6adc63c39a73be6e617a23abb8 Mon Sep 17 00:00:00 2001
From: Kjell Tore Guttormsen <ktg@humanize.no>
Date: Fri, 17 Apr 2026 14:50:07 +0200
Subject: [PATCH] docs(claude-md): link Defense Philosophy to Opus 4.7 system
 card

---
 plugins/llm-security/CLAUDE.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/plugins/llm-security/CLAUDE.md b/plugins/llm-security/CLAUDE.md
index 6ac7205..5fefd1d 100644
--- a/plugins/llm-security/CLAUDE.md
+++ b/plugins/llm-security/CLAUDE.md
@@ -163,6 +163,12 @@ Prompt injection is **structurally unsolvable** with current architectures (join
 
 **Bash evasion layers (T1-T6):** `bash-normalize.mjs` collapses six known obfuscation techniques before gate matching as a defense-in-depth layer. T1 empty quotes (`rm''-rf`), T2 `${}` parameter expansion, T3 backslash continuation, T4 tab/whitespace splitting, T5 `${IFS}` word-splitting, T6 ANSI-C hex quoting (`$'\x72\x6d'`). These layers complement — not replace — Claude Code 2.1.98+ harness-level protections. Full reference: `docs/security-hardening-guide.md`.
 
+**Opus 4.7 system card alignment:**
+
+- System card §5.2.1 (agentic safety evaluations) documents that multi-layer defenses outperform single-layer defenses against adaptive attacks. This plugin's posture (prompt-scan + pathguard + trifecta-guard + pre-compact-scan operating in depth) matches that guidance.
+- System card §6.3.1.1 (instruction following and hierarchy) documents that Opus 4.7 interprets agent instructions more literally. Stacked imperatives (e.g., "MUST NOT do X") are therefore less useful than tool-level enforcement via `tools:` frontmatter. Agent files in this plugin have been updated accordingly.
+- See `docs/security-hardening-guide.md` §5 for the full mapping.
+
 **What v5.0 cannot do:**
 - Prevent adaptive attacks from motivated human red-teamers (100% ASR per joint paper)
 - Fix CLAUDE.md loading before hooks (platform limitation)