ktg-plugin-marketplace/plugins/llm-security/V3-UPGRADE.md

389 lines
15 KiB
Markdown

# llm-security v3.0 Upgrade — Master Session Document
> This document tracks the multi-session upgrade from v2.5.0 to v3.0.0.
> Updated after each session. Read this at session start.
## Session Prompt Template
At the start of each new session, paste this:
```
Jeg fortsetter llm-security v3-oppgraderingen. Les V3-UPGRADE.md i plugin-rooten
for full kontekst, nåværende status, og hva neste sesjon skal gjøre.
```
---
## Overall Status
| Session | Version | Status | Date | Commit |
|---------|---------|--------|------|--------|
| S1 | v2.6.0 | DONE | 2026-04-02 | b36312c |
| S2 | v2.7.0 | DONE | 2026-04-02 | 41d7493 |
| S3 | v2.7.1 | DONE | 2026-04-02 | ec01163 |
| S4 | v2.8.0 | DONE | 2026-04-02 | b004f46 |
| S4+ | v2.8.1 | DONE | 2026-04-03 | — |
| S5 | v2.9.0 | DONE | 2026-04-03 | 162a23a |
| S6 | v2.9.1 | DONE | 2026-04-03 | 110032e |
| S7 | v2.9.2 | DONE | 2026-04-03 | 3129e7a |
| S8 | v3.0.0 | DONE | 2026-04-03 | 293dee5 |
**Current:** Session 8 complete — v3.0.0 released
**Status:** ALL SESSIONS DONE
---
## Competitive Context
**Why v3:** Public release. Close gaps vs Snyk Agent Scan (toxic flow analysis, MCP live inspection, continuous scanning, skill registry) while keeping architectural advantages (100% local, pre-extraction defense, full lifecycle coverage).
**Key differentiators to maintain:**
- Pre-extraction layer (no competitor has this)
- 7+ deterministic scanners + LLM analysis in same pipeline
- 100% local, no cloud dependency
- Full lifecycle: hooks + scanning + audit + threat modeling + remediation
- Supply chain hook covering 7 package managers + OSV.dev
**v3.0 target inventory:**
- 9 scanners (was 7, now 9): +toxic-flow-analyzer (done), +mcp-live-inspect (done)
- 7 hooks (was 6): +post-session-guard
- 14 commands (was 10, now 14): +mcp-inspect (done), +diff (done), +watch (done), +registry (done)
- 6 agents (all updated with new OWASP mappings)
- 9 knowledge files (was 7): +owasp-skills-top10 (done), +skill-registry.json (done)
---
## Session 1: Enhanced Patterns + OWASP Mapping (v2.6.0)
**Goal:** Foundation for all subsequent sessions.
### Tasks
- [ ] **1a.** Add `MEDIUM_PATTERNS` tier to `scanners/lib/injection-patterns.mjs`
- ~15-20 patterns: base64 payloads, leetspeak, multi-language mixing, markdown/HTML comment injection, homoglyph-obfuscated keywords, invisible Unicode separators
- Update `scanForInjection()` to return severity level (not just boolean)
- [ ] **1b.** Update OWASP mappings in `scanners/lib/severity.mjs`
- Add ASI01-ASI10 (Agentic Top 10) prefix mappings
- Add MCP1-MCP7 (MCP Top 10) prefix mappings
- Add AST01-AST10 (Skills Top 10) prefix mappings
- Add TFA scanner prefix
- Update `owaspCategorize()` for all frameworks
- [ ] **1c.** Create `knowledge/owasp-skills-top10.md`
- AST01-AST10 definitions and mapping
- [ ] **1d.** Update agent prompts with new OWASP references
- `agents/skill-scanner-agent.md`: AST10 mapping
- `agents/mcp-scanner-agent.md`: MCP Top 10 mapping
- `agents/posture-assessor-agent.md`: ASI mapping
- `agents/deep-scan-synthesizer-agent.md`: new scanner prefixes
- [ ] **1e.** Update `CLAUDE.md` with new knowledge file
- [ ] **1f.** Verify: `node scanners/scan-orchestrator.mjs .` passes, new OWASP IDs in output
### Files Modified
- `scanners/lib/injection-patterns.mjs` — MEDIUM tier
- `scanners/lib/severity.mjs` — ASI/AST/MCP/TFA mappings
- `agents/skill-scanner-agent.md` — AST10
- `agents/mcp-scanner-agent.md` — MCP Top 10
- `agents/posture-assessor-agent.md` — ASI
- `agents/deep-scan-synthesizer-agent.md` — new prefixes
- `CLAUDE.md` — knowledge table update
### Files Created
- `knowledge/owasp-skills-top10.md`
### Acceptance Criteria
- `scanForInjection()` returns `{ found, severity, patterns }` instead of boolean
- All 4 OWASP frameworks mapped in severity.mjs
- `node scanners/scan-orchestrator.mjs .` runs clean
- MEDIUM patterns detect base64 instruction payloads and homoglyph obfuscation
---
## Session 2: Toxic Flow Analysis (v2.7.0) — FLAGSHIP
**Goal:** Detect lethal trifecta — when combinations of safe tools create exfiltration chains.
### Concept
"Lethal trifecta" (Willison/Invariant Labs):
1. Agent exposed to **untrusted input** (prompt injection surface)
2. Agent has access to **sensitive data** via tools
3. An **exfiltration sink** exists (HTTP, email, file write)
### Tasks
- [x] **2a.** Create `scanners/toxic-flow-analyzer.mjs` (~380 lines)
- Phase 1: Component inventory from plugin frontmatter + MCP/hook detection
- Phase 2: Trifecta leg classification with prior scanner enrichment
- Phase 3: Trifecta detection (direct/cross-component/project-level) with mitigation downgrades
- Scanner prefix: `TFA`, OWASP: ASI01, ASI02, ASI05
- [x] **2b.** Modify `scanners/scan-orchestrator.mjs`
- TFA runs LAST after all 7 scanners
- Pass accumulated scanner results to TFA via `requiresPriorResults` flag
- [x] **2c.** Update `commands/scan.md` + `commands/deep-scan.md` to render TFA findings
- [x] **2d.** Update `agents/deep-scan-synthesizer-agent.md` for TFA report section
- [x] **2e.** Create test fixture: `test-fixtures/trifecta-plugin/` with known trifecta pattern
- [x] **2f.** Update `CLAUDE.md` — version v2.7.0, scanner count 8
### Key Design Decisions
- **Post-processing correlator** — does NOT re-scan files, consumes existing scanner output
- **Severity:** CRITICAL (2-hop + confirmed taint), HIGH (3+ hop or unconfirmed), MEDIUM (theoretical chain)
- **Graph model:** Adjacency list, not full graph library (keep dependencies at zero)
### Files Modified
- `scanners/scan-orchestrator.mjs`
- `scanners/lib/severity.mjs` (TFA prefix already added in S1)
- `commands/scan.md`
- `agents/deep-scan-synthesizer-agent.md`
- `CLAUDE.md`
### Files Created
- `scanners/toxic-flow-analyzer.mjs`
### Acceptance Criteria
- Test fixture with read+exfil tools produces TFA-001 CRITICAL finding
- Scan-orchestrator runs 8 scanners with TFA last
- `/security scan` on fixture shows chain description
- `/security deep-scan` includes TFA section in report
---
## Session 3: Runtime Session Guard (v2.7.1)
**Goal:** Real-time PostToolUse hook detecting lethal trifecta forming during a session.
### Tasks
- [x] **3a.** Create `hooks/scripts/post-session-guard.mjs` (~200-250 lines)
- Append tool calls to `/tmp/llm-security-session-${ppid}.jsonl`
- Classify each tool: `input_source | data_access | exfil_sink | neutral`
- Sliding window (20 calls) trifecta detection
- Emit `systemMessage` warning (never block)
- Cleanup state files >24h old
- [x] **3b.** Update `hooks/hooks.json` — add PostToolUse entry
- [x] **3c.** Update `CLAUDE.md` — hooks table
- [x] **3d.** Test: simulate trifecta sequence, verify warning
### Files Modified
- `hooks/hooks.json`
- `CLAUDE.md`
### Files Created
- `hooks/scripts/post-session-guard.mjs`
### Acceptance Criteria
- Hook fires on every PostToolUse
- Trifecta sequence (Read sensitive → Bash curl) triggers warning
- State file is JSONL, keyed by ppid
- Old state files cleaned up
- No false positives on normal tool sequences
---
## Session 4: MCP Runtime Inspection (v2.8.0)
**Goal:** Connect to running MCP servers, fetch live tool descriptions, scan for injection/poisoning/shadowing.
### Tasks
- [x] **4a.** Create `scanners/mcp-live-inspect.mjs` (~350-400 lines)
- Config discovery (6 locations, reuse mcp-scanner-agent logic)
- Spawn servers, JSON-RPC 2.0 initialize + tools/list + prompts/list + resources/list
- Scan descriptions with injection-patterns.mjs
- Tool shadowing detection (same names across servers)
- Description drift (live vs static config)
- 10s timeout per server
- [x] **4b.** Create `commands/mcp-inspect.md` (~40-50 lines)
- [x] **4c.** Update `commands/mcp-audit.md` with `--live` flag
- [x] **4d.** Update `agents/mcp-scanner-agent.md` for live inspection context
- [x] **4e.** Update `CLAUDE.md`
- [x] **4f.** Update `README.md` — badges, tables, version history
- [x] **4g.** Update `plugin.json` version
- [x] **4h.** Subtree push to public repo
### Files Modified
- `commands/mcp-audit.md`
- `agents/mcp-scanner-agent.md`
- `CLAUDE.md`
### Files Created
- `scanners/mcp-live-inspect.mjs`
- `commands/mcp-inspect.md`
### Acceptance Criteria
- Successfully connects to at least one MCP server and fetches tool list
- Injection patterns detected in tool descriptions
- Tool shadowing flagged when two servers expose same tool name
- Servers that fail to start are skipped gracefully (10s timeout)
---
## Session 5: Report Diffing & Baseline (v2.9.0)
**Goal:** Compare scan results over time. Show new/resolved/unchanged findings.
### Tasks
- [x] **5a.** Create `scanners/lib/diff-engine.mjs` (~200-250 lines)
- Baseline storage in `reports/baselines/<target-hash>.json`
- Match findings by: scanner prefix + file path + line (fuzzy ±3) + pattern type
- Categories: `new`, `resolved`, `unchanged`, `moved`
- [x] **5b.** Update `scanners/scan-orchestrator.mjs` — add `--baseline` and `--save-baseline` flags
- [x] **5c.** Create `commands/diff.md` (~40-50 lines)
- [x] **5d.** Update `CLAUDE.md`
- [x] **5e.** Update `README.md` — badges, tables, version history
- [x] **5f.** Update `plugin.json` version
- [x] **5g.** Subtree push to public repo
### Files Modified
- `scanners/scan-orchestrator.mjs`
- `CLAUDE.md`
### Files Created
- `scanners/lib/diff-engine.mjs`
- `commands/diff.md`
- `reports/baselines/` (directory)
### Acceptance Criteria
- `--save-baseline` stores results, `--baseline` loads and diffs
- NEW findings flagged after adding a vulnerability
- RESOLVED findings flagged after removing one
- Fuzzy line matching handles ±3 line drift
---
## Session 6: Continuous/Background Scanning (v2.9.1)
**Goal:** Automated periodic scanning with delta reporting.
### Tasks
- [x] **6a.** Create `commands/watch.md` (~50-60 lines)
- `/security watch [path] [--interval 6h]`
- Uses /loop as execution engine
- Runs scan-orchestrator with --baseline --save-baseline
- Reports delta only
- [x] **6b.** Create `scanners/watch-cron.mjs` (~150-200 lines)
- Standalone Node.js script for cron/launchd
- Config: `reports/watch/config.json`
- Output: `reports/watch/latest.json`
- [x] **6c.** Update `CLAUDE.md`
- [x] **6d.** Update `README.md` — badges, tables, version history
- [x] **6e.** Update `plugin.json` version
- [x] **6f.** Subtree push to public repo
### Files Modified
- `CLAUDE.md`
### Files Created
- `commands/watch.md`
- `scanners/watch-cron.mjs`
- `reports/watch/` (directory)
### Acceptance Criteria
- `/security watch .` creates baseline and shows "No changes"
- After modification: shows delta with NEW findings
- Cron wrapper runs standalone: `node scanners/watch-cron.mjs`
---
## Session 7: Skill Signature Registry (v2.9.2)
**Goal:** Local database of known skill patterns and risk profiles.
### Tasks
- [x] **7a.** Create `scanners/lib/skill-registry.mjs` (~300-350 lines)
- Fingerprinting: SHA-256 of normalized SKILL.md content
- `scanAndRegister(skillPath)` and `checkRegistry(fingerprint)`
- Registry format: JSON with skill metadata + findings summary
- [x] **7b.** Create `knowledge/skill-registry.json` (seed data)
- [x] **7c.** Create `commands/registry.md` (~40-50 lines)
- `/security registry` — stats
- `/security registry scan <url>` — scan and register
- `/security registry search <pattern>` — search
- [x] **7d.** Integrate with `commands/scan.md` — check registry before full scan
- [x] **7e.** Update `CLAUDE.md`
- [x] **7f.** Update `README.md` — badges, tables, version history
- [x] **7g.** Update `plugin.json` version
- [x] **7h.** Subtree push to public repo
### Files Modified
- `commands/scan.md`
- `CLAUDE.md`
### Files Created
- `scanners/lib/skill-registry.mjs`
- `knowledge/skill-registry.json`
- `commands/registry.md`
### Acceptance Criteria
- Scan a skill → fingerprint added to registry
- Re-scan same skill → registry hit, instant result
- `/security registry search` returns matches
---
## Session 8: Polish & Public Release (v3.0.0)
**Goal:** Quality pass, documentation, public release, announcement prep.
### Tasks
- [x] **8a.** Full quality pass
- 544/544 tests pass
- Scan-orchestrator: 8/8 scanners OK (0 findings with ignore, ~190 suppressed)
- All 14 commands verified (valid frontmatter)
- All 8 hooks verified (parse without errors)
- Scan-orchestrator: ~7.5s on plugin self-scan
- [x] **8b.** Documentation
- README.md: v3 badge, mermaid architecture diagram, TFA in scanner table, updated stats, v3.0.0 version history
- CHANGELOG.md: full version history v1.0→v3.0 in Keep a Changelog format
- package.json + plugin.json bumped to v3.0.0
- .llm-security-ignore updated with TFA suppressions
- [x] **8c.** Public repo sync
- Subtree push to `git.fromaitochitta.com/open/claude-code-llm-security`
- [x] **8d.** Announcement prep
- V3-ANNOUNCEMENT.md with feature comparison matrix (vs Snyk Agent Scan, Lasso Claude Hooks)
- Key differentiators narrative (6 points)
- Demo scenario with scan/diff/watch workflow
### Acceptance Criteria
- `/security audit` on plugin itself scores A or B
- All commands documented in CLAUDE.md
- All hooks documented in CLAUDE.md
- README has complete v3 feature list
- Public repo updated and accessible
---
## Technical Notes
### Reusable Infrastructure (do not duplicate)
- `scanners/lib/injection-patterns.mjs` — all injection pattern matching
- `scanners/lib/output.mjs``finding()` and `scannerResult()` builders
- `scanners/lib/severity.mjs` — risk scoring, OWASP mapping
- `scanners/lib/file-discovery.mjs``discoverFiles()` and `readTextFile()`
- `scanners/lib/string-utils.mjs` — entropy, Levenshtein, base64 detection
- `scanners/content-extractor.mjs` — pre-extraction for remote repos
### Constraints
- All code is Node.js (>=18), no external dependencies beyond Node stdlib
- Hooks are separate processes per invocation (no shared memory)
- Context budget: max 3 knowledge files per agent invocation
- Intel Mac target (no Apple Silicon-specific features)
- Plugin convention: commands ~30-60 lines, agents use registered subagent_type
- CLAUDE.md updated in same commit as the change it documents
- **README.md + plugin.json + subtree push are MANDATORY per session** — not optional, not deferred to S8. Every version bump must update: plugin.json version, README badges/tables/version history, then subtree push. Session is NOT done until public repo is current.
### Scanner Integration Pattern
```javascript
// In scan-orchestrator.mjs, TFA scanner receives prior results:
const tfaResults = await runTfaScanner(target, files, priorResults);
// All other scanners: (target, files) signature unchanged
```
### Hook State Pattern
```javascript
// Session guard uses temp file for cross-invocation state:
const stateFile = `/tmp/llm-security-session-${process.ppid}.jsonl`;
// Append on each invocation, read sliding window for analysis
```