chore: WIP marketplace doc adjustments across plugins

Pre-trekexecute snapshot of in-progress CLAUDE.md/SKILL.md edits and extracted docs/ files. Captured as one commit so /trekexecute claude-design can run against a clean working tree. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 12:04:02 +02:00 · 2026-05-18 12:04:02 +02:00 · f460814fe9
commit f460814fe9
parent 0dc7ff485f
26 changed files with 805 additions and 1078 deletions
--- a/plugins/llm-security/docs/defense-philosophy.md
+++ b/plugins/llm-security/docs/defense-philosophy.md
@ -0,0 +1,27 @@
+# LLM Security — Defense philosophy (v5.0)
+
+Imported from `CLAUDE.md` via `@docs/defense-philosophy.md`.
+
+Prompt injection is **structurally unsolvable** with current architectures (joint paper, 14 researchers, 95-100% ASR against all 12 tested defenses). v5.0 does not claim to "prevent" injection. Instead, it implements **defense-in-depth**:
+
+- **Broader detection** — MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language), Unicode Tag steganography, bash expansion evasion
+- **Increased attack cost** — Rule of Two detection (configurable block/warn/off for lethal trifecta; default `warn`, blocks on high-confidence trifectas in opt-in `block` mode; distributed trifectas across MCP servers are detected but not blocked by default), bash normalization before gate matching
+- **Longer monitoring windows** — 100-call long-horizon alongside 20-call sliding window, slow-burn trifecta detection, behavioral drift via Jensen-Shannon divergence
+- **Architectural constraints** — opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag; not semantic lineage; trivially bypassed by mutation or summarisation of tool output), sub-agent delegation tracking, HITL trap detection. Inspired by CaMeL (DeepMind, 2025), but this is a lightweight byte-fingerprint, not semantic capability tracking
+- **Honest documentation** — Known Limitations section acknowledges what deterministic hooks cannot detect
+
+**Bash evasion layers (T1-T6):** `bash-normalize.mjs` collapses six known obfuscation techniques before gate matching as a defense-in-depth layer. T1 empty quotes (`rm''-rf`), T2 `${}` parameter expansion, T3 backslash continuation, T4 tab/whitespace splitting, T5 `${IFS}` word-splitting, T6 ANSI-C hex quoting (`$'\x72\x6d'`). These layers complement — not replace — Claude Code 2.1.98+ harness-level protections. Full reference: `docs/security-hardening-guide.md`.
+
+**Opus 4.7 system card alignment:**
+
+- System card §5.2.1 (agentic safety evaluations) documents that multi-layer defenses outperform single-layer defenses against adaptive attacks. This plugin's posture (prompt-scan + pathguard + trifecta-guard + pre-compact-scan operating in depth) matches that guidance.
+- System card §6.3.1.1 (instruction following and hierarchy) documents that Opus 4.7 interprets agent instructions more literally. Stacked imperatives (e.g., "MUST NOT do X") are therefore less useful than tool-level enforcement via `tools:` frontmatter. Agent files in this plugin have been updated accordingly.
+- See `docs/security-hardening-guide.md` §5 for the full mapping.
+
+**What v5.0 cannot do:**
+
+- Prevent adaptive attacks from motivated human red-teamers (100% ASR per joint paper)
+- Fix CLAUDE.md loading before hooks (platform limitation)
+- Detect novel NL indirection without ML
+- Prevent long-horizon attacks without detectable patterns
+- Provide formal worst-case guarantees