ktg-plugin-marketplace/plugins/llm-security/V3-UPGRADE.md

15 KiB

llm-security v3.0 Upgrade — Master Session Document

This document tracks the multi-session upgrade from v2.5.0 to v3.0.0. Updated after each session. Read this at session start.

Session Prompt Template

At the start of each new session, paste this:

Jeg fortsetter llm-security v3-oppgraderingen. Les V3-UPGRADE.md i plugin-rooten
for full kontekst, nåværende status, og hva neste sesjon skal gjøre.

Overall Status

Session Version Status Date Commit
S1 v2.6.0 DONE 2026-04-02 b36312c
S2 v2.7.0 DONE 2026-04-02 41d7493
S3 v2.7.1 DONE 2026-04-02 ec01163
S4 v2.8.0 DONE 2026-04-02 b004f46
S4+ v2.8.1 DONE 2026-04-03
S5 v2.9.0 DONE 2026-04-03 162a23a
S6 v2.9.1 DONE 2026-04-03 110032e
S7 v2.9.2 DONE 2026-04-03 3129e7a
S8 v3.0.0 DONE 2026-04-03 293dee5

Current: Session 8 complete — v3.0.0 released Status: ALL SESSIONS DONE


Competitive Context

Why v3: Public release. Close gaps vs Snyk Agent Scan (toxic flow analysis, MCP live inspection, continuous scanning, skill registry) while keeping architectural advantages (100% local, pre-extraction defense, full lifecycle coverage).

Key differentiators to maintain:

  • Pre-extraction layer (no competitor has this)
  • 7+ deterministic scanners + LLM analysis in same pipeline
  • 100% local, no cloud dependency
  • Full lifecycle: hooks + scanning + audit + threat modeling + remediation
  • Supply chain hook covering 7 package managers + OSV.dev

v3.0 target inventory:

  • 9 scanners (was 7, now 9): +toxic-flow-analyzer (done), +mcp-live-inspect (done)
  • 7 hooks (was 6): +post-session-guard
  • 14 commands (was 10, now 14): +mcp-inspect (done), +diff (done), +watch (done), +registry (done)
  • 6 agents (all updated with new OWASP mappings)
  • 9 knowledge files (was 7): +owasp-skills-top10 (done), +skill-registry.json (done)

Session 1: Enhanced Patterns + OWASP Mapping (v2.6.0)

Goal: Foundation for all subsequent sessions.

Tasks

  • 1a. Add MEDIUM_PATTERNS tier to scanners/lib/injection-patterns.mjs
    • ~15-20 patterns: base64 payloads, leetspeak, multi-language mixing, markdown/HTML comment injection, homoglyph-obfuscated keywords, invisible Unicode separators
    • Update scanForInjection() to return severity level (not just boolean)
  • 1b. Update OWASP mappings in scanners/lib/severity.mjs
    • Add ASI01-ASI10 (Agentic Top 10) prefix mappings
    • Add MCP1-MCP7 (MCP Top 10) prefix mappings
    • Add AST01-AST10 (Skills Top 10) prefix mappings
    • Add TFA scanner prefix
    • Update owaspCategorize() for all frameworks
  • 1c. Create knowledge/owasp-skills-top10.md
    • AST01-AST10 definitions and mapping
  • 1d. Update agent prompts with new OWASP references
    • agents/skill-scanner-agent.md: AST10 mapping
    • agents/mcp-scanner-agent.md: MCP Top 10 mapping
    • agents/posture-assessor-agent.md: ASI mapping
    • agents/deep-scan-synthesizer-agent.md: new scanner prefixes
  • 1e. Update CLAUDE.md with new knowledge file
  • 1f. Verify: node scanners/scan-orchestrator.mjs . passes, new OWASP IDs in output

Files Modified

  • scanners/lib/injection-patterns.mjs — MEDIUM tier
  • scanners/lib/severity.mjs — ASI/AST/MCP/TFA mappings
  • agents/skill-scanner-agent.md — AST10
  • agents/mcp-scanner-agent.md — MCP Top 10
  • agents/posture-assessor-agent.md — ASI
  • agents/deep-scan-synthesizer-agent.md — new prefixes
  • CLAUDE.md — knowledge table update

Files Created

  • knowledge/owasp-skills-top10.md

Acceptance Criteria

  • scanForInjection() returns { found, severity, patterns } instead of boolean
  • All 4 OWASP frameworks mapped in severity.mjs
  • node scanners/scan-orchestrator.mjs . runs clean
  • MEDIUM patterns detect base64 instruction payloads and homoglyph obfuscation

Session 2: Toxic Flow Analysis (v2.7.0) — FLAGSHIP

Goal: Detect lethal trifecta — when combinations of safe tools create exfiltration chains.

Concept

"Lethal trifecta" (Willison/Invariant Labs):

  1. Agent exposed to untrusted input (prompt injection surface)
  2. Agent has access to sensitive data via tools
  3. An exfiltration sink exists (HTTP, email, file write)

Tasks

  • 2a. Create scanners/toxic-flow-analyzer.mjs (~380 lines)
    • Phase 1: Component inventory from plugin frontmatter + MCP/hook detection
    • Phase 2: Trifecta leg classification with prior scanner enrichment
    • Phase 3: Trifecta detection (direct/cross-component/project-level) with mitigation downgrades
    • Scanner prefix: TFA, OWASP: ASI01, ASI02, ASI05
  • 2b. Modify scanners/scan-orchestrator.mjs
    • TFA runs LAST after all 7 scanners
    • Pass accumulated scanner results to TFA via requiresPriorResults flag
  • 2c. Update commands/scan.md + commands/deep-scan.md to render TFA findings
  • 2d. Update agents/deep-scan-synthesizer-agent.md for TFA report section
  • 2e. Create test fixture: test-fixtures/trifecta-plugin/ with known trifecta pattern
  • 2f. Update CLAUDE.md — version v2.7.0, scanner count 8

Key Design Decisions

  • Post-processing correlator — does NOT re-scan files, consumes existing scanner output
  • Severity: CRITICAL (2-hop + confirmed taint), HIGH (3+ hop or unconfirmed), MEDIUM (theoretical chain)
  • Graph model: Adjacency list, not full graph library (keep dependencies at zero)

Files Modified

  • scanners/scan-orchestrator.mjs
  • scanners/lib/severity.mjs (TFA prefix already added in S1)
  • commands/scan.md
  • agents/deep-scan-synthesizer-agent.md
  • CLAUDE.md

Files Created

  • scanners/toxic-flow-analyzer.mjs

Acceptance Criteria

  • Test fixture with read+exfil tools produces TFA-001 CRITICAL finding
  • Scan-orchestrator runs 8 scanners with TFA last
  • /security scan on fixture shows chain description
  • /security deep-scan includes TFA section in report

Session 3: Runtime Session Guard (v2.7.1)

Goal: Real-time PostToolUse hook detecting lethal trifecta forming during a session.

Tasks

  • 3a. Create hooks/scripts/post-session-guard.mjs (~200-250 lines)
    • Append tool calls to /tmp/llm-security-session-${ppid}.jsonl
    • Classify each tool: input_source | data_access | exfil_sink | neutral
    • Sliding window (20 calls) trifecta detection
    • Emit systemMessage warning (never block)
    • Cleanup state files >24h old
  • 3b. Update hooks/hooks.json — add PostToolUse entry
  • 3c. Update CLAUDE.md — hooks table
  • 3d. Test: simulate trifecta sequence, verify warning

Files Modified

  • hooks/hooks.json
  • CLAUDE.md

Files Created

  • hooks/scripts/post-session-guard.mjs

Acceptance Criteria

  • Hook fires on every PostToolUse
  • Trifecta sequence (Read sensitive → Bash curl) triggers warning
  • State file is JSONL, keyed by ppid
  • Old state files cleaned up
  • No false positives on normal tool sequences

Session 4: MCP Runtime Inspection (v2.8.0)

Goal: Connect to running MCP servers, fetch live tool descriptions, scan for injection/poisoning/shadowing.

Tasks

  • 4a. Create scanners/mcp-live-inspect.mjs (~350-400 lines)
    • Config discovery (6 locations, reuse mcp-scanner-agent logic)
    • Spawn servers, JSON-RPC 2.0 initialize + tools/list + prompts/list + resources/list
    • Scan descriptions with injection-patterns.mjs
    • Tool shadowing detection (same names across servers)
    • Description drift (live vs static config)
    • 10s timeout per server
  • 4b. Create commands/mcp-inspect.md (~40-50 lines)
  • 4c. Update commands/mcp-audit.md with --live flag
  • 4d. Update agents/mcp-scanner-agent.md for live inspection context
  • 4e. Update CLAUDE.md
  • 4f. Update README.md — badges, tables, version history
  • 4g. Update plugin.json version
  • 4h. Subtree push to public repo

Files Modified

  • commands/mcp-audit.md
  • agents/mcp-scanner-agent.md
  • CLAUDE.md

Files Created

  • scanners/mcp-live-inspect.mjs
  • commands/mcp-inspect.md

Acceptance Criteria

  • Successfully connects to at least one MCP server and fetches tool list
  • Injection patterns detected in tool descriptions
  • Tool shadowing flagged when two servers expose same tool name
  • Servers that fail to start are skipped gracefully (10s timeout)

Session 5: Report Diffing & Baseline (v2.9.0)

Goal: Compare scan results over time. Show new/resolved/unchanged findings.

Tasks

  • 5a. Create scanners/lib/diff-engine.mjs (~200-250 lines)
    • Baseline storage in reports/baselines/<target-hash>.json
    • Match findings by: scanner prefix + file path + line (fuzzy ±3) + pattern type
    • Categories: new, resolved, unchanged, moved
  • 5b. Update scanners/scan-orchestrator.mjs — add --baseline and --save-baseline flags
  • 5c. Create commands/diff.md (~40-50 lines)
  • 5d. Update CLAUDE.md
  • 5e. Update README.md — badges, tables, version history
  • 5f. Update plugin.json version
  • 5g. Subtree push to public repo

Files Modified

  • scanners/scan-orchestrator.mjs
  • CLAUDE.md

Files Created

  • scanners/lib/diff-engine.mjs
  • commands/diff.md
  • reports/baselines/ (directory)

Acceptance Criteria

  • --save-baseline stores results, --baseline loads and diffs
  • NEW findings flagged after adding a vulnerability
  • RESOLVED findings flagged after removing one
  • Fuzzy line matching handles ±3 line drift

Session 6: Continuous/Background Scanning (v2.9.1)

Goal: Automated periodic scanning with delta reporting.

Tasks

  • 6a. Create commands/watch.md (~50-60 lines)
    • /security watch [path] [--interval 6h]
    • Uses /loop as execution engine
    • Runs scan-orchestrator with --baseline --save-baseline
    • Reports delta only
  • 6b. Create scanners/watch-cron.mjs (~150-200 lines)
    • Standalone Node.js script for cron/launchd
    • Config: reports/watch/config.json
    • Output: reports/watch/latest.json
  • 6c. Update CLAUDE.md
  • 6d. Update README.md — badges, tables, version history
  • 6e. Update plugin.json version
  • 6f. Subtree push to public repo

Files Modified

  • CLAUDE.md

Files Created

  • commands/watch.md
  • scanners/watch-cron.mjs
  • reports/watch/ (directory)

Acceptance Criteria

  • /security watch . creates baseline and shows "No changes"
  • After modification: shows delta with NEW findings
  • Cron wrapper runs standalone: node scanners/watch-cron.mjs

Session 7: Skill Signature Registry (v2.9.2)

Goal: Local database of known skill patterns and risk profiles.

Tasks

  • 7a. Create scanners/lib/skill-registry.mjs (~300-350 lines)
    • Fingerprinting: SHA-256 of normalized SKILL.md content
    • scanAndRegister(skillPath) and checkRegistry(fingerprint)
    • Registry format: JSON with skill metadata + findings summary
  • 7b. Create knowledge/skill-registry.json (seed data)
  • 7c. Create commands/registry.md (~40-50 lines)
    • /security registry — stats
    • /security registry scan <url> — scan and register
    • /security registry search <pattern> — search
  • 7d. Integrate with commands/scan.md — check registry before full scan
  • 7e. Update CLAUDE.md
  • 7f. Update README.md — badges, tables, version history
  • 7g. Update plugin.json version
  • 7h. Subtree push to public repo

Files Modified

  • commands/scan.md
  • CLAUDE.md

Files Created

  • scanners/lib/skill-registry.mjs
  • knowledge/skill-registry.json
  • commands/registry.md

Acceptance Criteria

  • Scan a skill → fingerprint added to registry
  • Re-scan same skill → registry hit, instant result
  • /security registry search returns matches

Session 8: Polish & Public Release (v3.0.0)

Goal: Quality pass, documentation, public release, announcement prep.

Tasks

  • 8a. Full quality pass
    • 544/544 tests pass
    • Scan-orchestrator: 8/8 scanners OK (0 findings with ignore, ~190 suppressed)
    • All 14 commands verified (valid frontmatter)
    • All 8 hooks verified (parse without errors)
    • Scan-orchestrator: ~7.5s on plugin self-scan
  • 8b. Documentation
    • README.md: v3 badge, mermaid architecture diagram, TFA in scanner table, updated stats, v3.0.0 version history
    • CHANGELOG.md: full version history v1.0→v3.0 in Keep a Changelog format
    • package.json + plugin.json bumped to v3.0.0
    • .llm-security-ignore updated with TFA suppressions
  • 8c. Public repo sync
    • Subtree push to git.fromaitochitta.com/open/claude-code-llm-security
  • 8d. Announcement prep
    • V3-ANNOUNCEMENT.md with feature comparison matrix (vs Snyk Agent Scan, Lasso Claude Hooks)
    • Key differentiators narrative (6 points)
    • Demo scenario with scan/diff/watch workflow

Acceptance Criteria

  • /security audit on plugin itself scores A or B
  • All commands documented in CLAUDE.md
  • All hooks documented in CLAUDE.md
  • README has complete v3 feature list
  • Public repo updated and accessible

Technical Notes

Reusable Infrastructure (do not duplicate)

  • scanners/lib/injection-patterns.mjs — all injection pattern matching
  • scanners/lib/output.mjsfinding() and scannerResult() builders
  • scanners/lib/severity.mjs — risk scoring, OWASP mapping
  • scanners/lib/file-discovery.mjsdiscoverFiles() and readTextFile()
  • scanners/lib/string-utils.mjs — entropy, Levenshtein, base64 detection
  • scanners/content-extractor.mjs — pre-extraction for remote repos

Constraints

  • All code is Node.js (>=18), no external dependencies beyond Node stdlib
  • Hooks are separate processes per invocation (no shared memory)
  • Context budget: max 3 knowledge files per agent invocation
  • Intel Mac target (no Apple Silicon-specific features)
  • Plugin convention: commands ~30-60 lines, agents use registered subagent_type
  • CLAUDE.md updated in same commit as the change it documents
  • README.md + plugin.json + subtree push are MANDATORY per session — not optional, not deferred to S8. Every version bump must update: plugin.json version, README badges/tables/version history, then subtree push. Session is NOT done until public repo is current.

Scanner Integration Pattern

// In scan-orchestrator.mjs, TFA scanner receives prior results:
const tfaResults = await runTfaScanner(target, files, priorResults);
// All other scanners: (target, files) signature unchanged

Hook State Pattern

// Session guard uses temp file for cross-invocation state:
const stateFile = `/tmp/llm-security-session-${process.ppid}.jsonl`;
// Append on each invocation, read sliding window for analysis