feat(llm-security): playground Fase 3 — v7.5.0 med 18 parsere/renderere
Single-file SPA playground har nå parser + renderer for alle 18 produces_report=true-kommandoer (Fase 2: 10 høy-prio + Fase 3: 8 gjenstående: mcp-inspect, supply-check, pre-deploy, diff, watch, registry, clean, threat-model). 18 markdown test-fixtures fungerer som kontrakt-anker for parser-utvikling. Komplett demo-prosjekt `dft-komplett-demo` har alle 18 rapporter ferdig parsed inline — klikk-gjennom uten "parser ikke implementert"- paneler. 2 nye archetypes i KEY_STATS_CONFIG: kanban-buckets (clean) og matrix-risk (threat-model). Bug-fix: normalizeVerdictText sjekker nå GO-WITH-CONDITIONS / CONDITIONAL / BETINGET FØR plain GO så betinget verdict (pre-deploy med åpne vilkår) ikke kollapser til ALLOW. Eksponert 11 window-globaler for testing/automasjon (__store, __navigate, __loadDemoState, __PARSERS, __RENDERERS, __CATALOG, __inferVerdict, __inferKeyStats, __renderPageShell, __handlePasteImport, __scheduleRender). 12 Playwright-genererte screenshots i playground/screenshots/v7.5.0/. A11Y-rapport (WCAG 2.1 AA): 0 blokkerende, 3 mindre forbedringer flagget for v7.5.x patch (skip-link, heading-hierarki på project, aria-live toast). Versjonsbump 7.4.0 -> 7.5.0 i 10 filer (package.json, plugin.json, CLAUDE.md header, README badge, CHANGELOG-entry, 3 scanner VERSION- konstanter, ROADMAP, marketplace-rot README). Ingen scanner- eller hook-behavior-changes — purely additive surface. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|
@ -26,7 +26,7 @@ Then open Claude Code and type `/plugin` to browse and install plugins from the
|
|||
|
||||
## Plugins
|
||||
|
||||
### [LLM Security](plugins/llm-security/) `v7.4.0`
|
||||
### [LLM Security](plugins/llm-security/) `v7.5.0`
|
||||
|
||||
Security scanning, auditing, and threat modeling for agentic AI projects.
|
||||
|
||||
|
|
@ -36,6 +36,7 @@ Built on OWASP LLM Top 10 (2025), OWASP Agentic AI Top 10, and the AI Agent Trap
|
|||
- **Deterministic scanning** — 23 Node.js scanners (10 orchestrated + 13 standalone) for byte-level analysis: Shannon entropy, Unicode codepoints, typosquatting detection, taint flow, DNS resolution, git forensics, AI-BOM, attack simulation, IDE extension prescan (VS Code + JetBrains — URL fetch from Marketplace / OpenVSX / direct VSIX / JetBrains Marketplace, hardened ZIP extractor for zip-slip / symlinks / bombs, plus OS sandbox via `sandbox-exec` / `bwrap` so the kernel enforces FS confinement), MCP cumulative-drift baseline reset (E14 — sticky baseline catches slow-burn rug-pulls). Bash-normalize T1-T6 for obfuscation-resistant denylists
|
||||
- **Advisory analysis** — 20 commands that scan, audit, and model threats with structured reports, letter grades, and actionable remediation
|
||||
- **Enterprise governance** — Compliance mapping (EU AI Act, NIST AI RMF, ISO 42001), SARIF 2.1.0 output, structured audit trail, policy-as-code, standalone CLI
|
||||
- **v7.5.0 playground (2026-05-05)** — Single-file SPA at `plugins/llm-security/playground/llm-security-playground.html` (~10 200 lines) for onboarding, demoer og workshop-bruk uten Claude Code-installasjon. Parsere + renderere for alle 18 produces_report-kommandoer, 18 markdown test-fixtures som kontrakt-anker, komplett demo-prosjekt med alle 18 rapporter ferdig parsed, vendor-synket design-system, 9 Playwright-genererte screenshots. 11 nye `window`-globaler eksponert for testing/automasjon (`__store`, `__navigate`, `__loadDemoState`, `__PARSERS`, `__RENDERERS` …). Bug-fix: `normalizeVerdictText` håndterer GO-WITH-CONDITIONS uten å kollapse til ALLOW. Ingen scanner- eller hook-behavior-changes — purely additive surface
|
||||
- **v7.4.0 examples + e2e suite (2026-05-05)** — 9 runnable demonstration walkthroughs under `examples/` (lethal-trifecta, mcp-rug-pull, supply-chain-attack, poisoned-claude-md, bash-evasion-gallery, prompt-injection-showcase, malicious-skill-demo, toxic-agent-demo, pre-compact-poisoning) plus three new test suites under `tests/e2e/` (attack-chain, multi-session, scan-pipeline) that prove the framework works as a coordinated system. +45 tests (1777 → 1822), no scanner or hook behavior changes — purely additive surface
|
||||
- **v8.0.0 env-var deprecation runway (D3, v7.3.0)** — Hook configuration has historically been split between process env-vars and the team-distributable `.llm-security/policy.json` file. Until v7.3.0 the two surfaces could disagree silently. The new `getPolicyValueWithEnvWarn()` helper in `scanners/lib/policy-loader.mjs` now emits a one-time-per-process stderr line whenever both surfaces are explicitly set:
|
||||
- Affected pairs: `LLM_SECURITY_INJECTION_MODE`↔`injection.mode`, `LLM_SECURITY_TRIFECTA_MODE`↔`trifecta.mode`, `LLM_SECURITY_ESCALATION_WINDOW`↔`trifecta.escalation_window` (new key in `DEFAULT_POLICY`), `LLM_SECURITY_AUDIT_LOG`↔`audit.log_path`
|
||||
|
|
|
|||
|
|
@ -1,5 +1,5 @@
|
|||
{
|
||||
"name": "llm-security",
|
||||
"description": "Security scanning, auditing, and threat modeling for Claude Code projects. Detects secrets, validates MCP servers, assesses security posture, and generates threat models aligned with OWASP LLM Top 10.",
|
||||
"version": "7.4.0"
|
||||
"version": "7.5.0"
|
||||
}
|
||||
|
|
|
|||
|
|
@ -6,6 +6,38 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
|||
|
||||
## [Unreleased]
|
||||
|
||||
## [7.5.0] - 2026-05-05
|
||||
|
||||
### Added
|
||||
- **Playground.** Single-file SPA at `playground/llm-security-playground.html`
|
||||
(~10 200 linjer) for onboarding, demoer og workshop-bruk uten Claude Code-
|
||||
installasjon. Parsere + renderere for alle 18 `produces_report=true`-
|
||||
kommandoer (Fase 2: 10 høy-prio + Fase 3: 8 gjenstående). 18 markdown
|
||||
test-fixtures under `playground/test-fixtures/` som kontrakt-anker for
|
||||
parser-utvikling. Komplett demo-prosjekt `dft-komplett-demo` har alle 18
|
||||
rapporter ferdig parsed inline.
|
||||
- **Design-system vendor-fil under `playground/vendor/`** (sjekksum-låst via
|
||||
`MANIFEST.json`, synket fra `shared/playground-design-system/`).
|
||||
- **9 screenshots** under `playground/screenshots/v7.5.0/` (Playwright-
|
||||
generert): onboarding, home, catalog, project og 8 representative
|
||||
rapporter (scan, plugin-audit, posture, dashboard, diff, clean,
|
||||
threat-model, red-team).
|
||||
- **Eksponerte window-globaler for testing/automasjon:** `__store`,
|
||||
`__navigate`, `__loadDemoState`, `__scheduleRender`, `__PARSERS`,
|
||||
`__RENDERERS`, `__CATALOG`, `__inferVerdict`, `__inferKeyStats`,
|
||||
`__renderPageShell`, `__handlePasteImport`. Aktiverer Playwright-styrt
|
||||
navigasjon og programmatisk parser/renderer-test.
|
||||
- **2 nye archetypes i `KEY_STATS_CONFIG`:** `kanban-buckets` (auto/semi-auto/
|
||||
manual-stats for clean) og `matrix-risk` (trusler/maks score/celler for
|
||||
threat-model).
|
||||
|
||||
### Changed
|
||||
- **`normalizeVerdictText` regex-rekkefølge:** GO-WITH-CONDITIONS / CONDITIONAL
|
||||
/ BETINGET sjekkes nå FØR plain GO, slik at betinget verdict (pre-deploy
|
||||
med åpne vilkår) ikke kollapser til ALLOW. Fix-en er bakoverkompatibel —
|
||||
alle eksisterende verdict-tekster løses til samme verdier.
|
||||
- Ingen scanner- eller hook-behavior-changes — purely additive surface.
|
||||
|
||||
## [7.4.0] - 2026-05-05
|
||||
|
||||
Examples + e2e suite. Seven runnable demonstration walkthroughs under
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
# LLM Security Plugin (v7.4.0)
|
||||
# LLM Security Plugin (v7.5.0)
|
||||
|
||||
Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1822+ unit, integration, and end-to-end tests (`tests/e2e/` covers the multi-hook attack chain, multi-session state simulation, and the full scan-orchestrator pipeline); mutation-testing coverage not published.
|
||||
|
||||
|
|
@ -69,6 +69,26 @@ revisions table (M10). Env-only vars without policy.json equivalents
|
|||
`LLM_SECURITY_MCP_CACHE_FILE`) are unchanged — they emit no
|
||||
deprecation signal because there is nothing to deprecate yet.
|
||||
|
||||
**v7.5.0 — Playground (additive surface, no scanner/hook behavior changes).**
|
||||
Single-file SPA at `playground/llm-security-playground.html` (~10 200 lines)
|
||||
for onboarding, demo og workshop-bruk uten Claude Code-installasjon. Parser
|
||||
+ renderer for alle 18 `produces_report=true`-kommandoer i `CATALOG`. State
|
||||
i IndexedDB primær (`llm-security-playground-v1`) med localStorage-fallback,
|
||||
sirkelfri Proxy + EventTarget store, microtask-batchet render. Theme-bootstrap
|
||||
med FOUC-prevention. 4 overflater: onboarding (5 grupper) → home (3 tracks)
|
||||
→ catalog (20 kommandoer) ⇄ project (rapporter / oversikt / kontekst /
|
||||
eksport). Demo-state har tre prosjekter inline; `dft-komplett-demo` har alle
|
||||
18 rapporter ferdig parsed for klikk-gjennom. Vendor-synket design-system
|
||||
under `playground/vendor/playground-design-system/` (sjekksum-låst via
|
||||
`MANIFEST.json`, redigeres aldri direkte). Test-fixtures under
|
||||
`playground/test-fixtures/` (én markdown-fil per kommando) er kontrakt-anker
|
||||
for parser-utvikling. Skjermdumper i `playground/screenshots/v7.5.0/`.
|
||||
Eksponerte vinduer-globaler for testing/automasjon: `__store`, `__navigate`,
|
||||
`__loadDemoState`, `__scheduleRender`, `__PARSERS`, `__RENDERERS`, `__CATALOG`,
|
||||
`__inferVerdict`, `__inferKeyStats`, `__renderPageShell`, `__handlePasteImport`.
|
||||
Inkluderer fix av `normalizeVerdictText` regex-rekkefølge: GO-WITH-CONDITIONS
|
||||
sjekkes før GO så betinget verdict ikke kollapser til ALLOW.
|
||||
|
||||
## Commands
|
||||
|
||||
| Command | Description |
|
||||
|
|
|
|||
|
|
@ -6,7 +6,7 @@
|
|||
|
||||
*AI-generated: all code produced by Claude Code through dialog-driven development. [Full disclosure →](../../README.md#ai-generated-code-disclosure)*
|
||||
|
||||

|
||||

|
||||

|
||||

|
||||

|
||||
|
|
@ -483,6 +483,64 @@ Prompt injection is **structurally unsolvable** with current architectures (join
|
|||
|
||||
---
|
||||
|
||||
## Playground (v7.5.0)
|
||||
|
||||
A single-file SPA at `playground/llm-security-playground.html` provides
|
||||
an interactive surface for onboarding, command discovery and report demos
|
||||
**without requiring Claude Code installation**. Open the file directly in
|
||||
a browser (Chrome/Firefox/Safari over `file://`) — no build step, no
|
||||
network calls, no npm install. Theme-bootstrap with FOUC-prevention; state
|
||||
persisted in IndexedDB primary + localStorage fallback.
|
||||
|
||||
**Layout:**
|
||||
|
||||
```
|
||||
playground/
|
||||
├── llm-security-playground.html ← single-file SPA (~10 200 lines)
|
||||
├── vendor/
|
||||
│ └── playground-design-system/ ← synket fra shared/, sjekksum-låst
|
||||
├── test-fixtures/ ← markdown-fixtures (én per kommando)
|
||||
└── screenshots/v7.5.0/ ← Playwright-genererte demobilder
|
||||
```
|
||||
|
||||
**Hva playgroundet dekker:**
|
||||
|
||||
- **Onboarding (5 grupper):** organisasjon, scope, profil, plattform,
|
||||
compliance. Verdier persisteres som `shared`-state og prefylles automatisk
|
||||
i alle command-skjemaer.
|
||||
- **Home:** prosjekt-grid, fleet-tracks for posture/scan/red-team. «Last
|
||||
inn demo-data»-knappen aktiverer 3 prosjekter inkludert `dft-komplett-demo`
|
||||
med alle 18 rapporter ferdig parsed.
|
||||
- **Catalog:** alle 20 kommandoer gruppert i 5 kategorier. Søk filtrerer
|
||||
cards, og «Åpne skjema»-knapp bygger ferdig pipeline-streng for klipp-og-
|
||||
lim til terminalen.
|
||||
- **Project surface:** 4 skjermer (Oversikt / Rapporter / Kontekst /
|
||||
Eksport). Rapporter-tabben har category-tabs (discover / posture /
|
||||
findings-ops / hardening / adversarial / mcp-ops) og lim-inn-import for
|
||||
hver rapport-kommando.
|
||||
|
||||
**Parser/renderer-arkitektur:** Hver `produces_report=true`-kommando i
|
||||
`CATALOG` har en parser (markdown → struktur) og en renderer (struktur
|
||||
→ DS-komponenter). 18 archetypes støttes: `findings`, `findings-grade`,
|
||||
`risk-score-meter`, `posture-cards`, `dashboard-fleet`, `red-team-results`,
|
||||
`diff-report`, `kanban-buckets`, `matrix-risk`. Parser-kontrakten er
|
||||
`{ ok: true, data: {...} } | { ok: false, errors: [...] }`. Test-fixtures
|
||||
under `playground/test-fixtures/` er kontrakt-anker — én markdown-fil per
|
||||
kommando som speiler `templates/unified-report.md`-formatet.
|
||||
|
||||
**Eksponerte testing/automasjons-globaler:** `__store`, `__navigate`,
|
||||
`__loadDemoState`, `__scheduleRender`, `__PARSERS`, `__RENDERERS`,
|
||||
`__CATALOG`, `__inferVerdict`, `__inferKeyStats`, `__renderPageShell`,
|
||||
`__handlePasteImport`. Aktiverer Playwright-styrt navigasjon og
|
||||
programmatisk parser/renderer-test mot fixture-katalogen.
|
||||
|
||||
**Begrensninger:** SPA er en lim-inn-overflate — den kjører ingen scannere
|
||||
selv. Output må komme fra Claude Code (`/security scan ...`), CLI
|
||||
(`node scanners/...`) eller stub-fixtures. Demo-state inneholder kun de
|
||||
3 inline-prosjektene; nye prosjekter er per-bruker og lagres lokalt.
|
||||
|
||||
---
|
||||
|
||||
## Self-scan
|
||||
|
||||
Running `node scanners/scan-orchestrator.mjs .` on this plugin produces **0 findings (ALLOW)** with ~190 suppressions via `.llm-security-ignore`. Every suppression is explained — a security plugin that documents attack patterns, ships a malicious demo fixture, and tests against deliberately evil code will trigger its own scanners. The entropy scanner flags regex patterns in `knowledge/secrets-patterns.md`. The taint scanner flags `eval(user_input)` in test fixtures. The toxic flow analyzer flags the plugin's own commands that use Read+Bash. Remove the ignore file and re-run to see the unsuppressed picture.
|
||||
|
|
@ -555,6 +613,7 @@ demonstrations — each with `README.md`, fixture, run script, and
|
|||
|
||||
| Version | Date | Highlights |
|
||||
|---------|------|------------|
|
||||
| **7.5.0** | 2026-05-05 | **Playground.** Single-file SPA at `playground/llm-security-playground.html` (~10 200 lines) for onboarding, demoer og workshop-bruk uten Claude Code-installasjon. Parsere + renderere for alle 18 `produces_report=true`-kommandoer (Fase 2: 10 høy-prio + Fase 3: 8 gjenstående: mcp-inspect, supply-check, pre-deploy, diff, watch, registry, clean, threat-model). 18 markdown test-fixtures under `playground/test-fixtures/` som kontrakt-anker. Komplett demo-prosjekt `dft-komplett-demo` har alle 18 rapporter ferdig parsed inline. Vendor-synket design-system under `playground/vendor/` (sjekksum-låst). 9 Playwright-genererte screenshots i `playground/screenshots/v7.5.0/`. 11 nye `window`-globaler for testing/automasjon. 2 nye `KEY_STATS_CONFIG`-archetypes (`kanban-buckets`, `matrix-risk`). Bug-fix: `normalizeVerdictText` regex-rekkefølge oppdatert så GO-WITH-CONDITIONS / CONDITIONAL / BETINGET ikke lenger kollapser til ALLOW. Ingen scanner- eller hook-behavior-changes — purely additive surface. |
|
||||
| **7.4.0** | 2026-05-05 | **Examples + e2e suite.** Seven runnable demonstration walkthroughs under `examples/` (`prompt-injection-showcase`, `lethal-trifecta-walkthrough`, `mcp-rug-pull`, `supply-chain-attack`, `poisoned-claude-md`, `bash-evasion-gallery`, `toxic-agent-demo`, `pre-compact-poisoning`) — each with `README.md`, runtime-isolated fixture, single-command run-script, and `expected-findings.md` testable contract. Three new `tests/e2e/` suites (attack-chain 17 tests + multi-session 9 tests + scan-pipeline 19 tests = +45 tests, total 1822) prove the framework works as a coordinated system, not just isolated units. No scanner or hook behavior changes — purely additive surface. Scanner `VERSION` constants synced across `dashboard-aggregator.mjs`, `posture-scanner.mjs`, `ide-extension-scanner.mjs`. |
|
||||
| **7.3.1** | 2026-05-01 | **Stabilization patch.** Project repositioned as solo, stabilization-only, with explicit "fork & own" stance for enterprise features. New public docs: `CONTRIBUTING.md` (fork-and-own model), README "Project scope" section (out-of-scope table with commercial alternatives), updated `SECURITY.md` (v7.3.x supported, v7.0–v7.2 best-effort, < v7.0 EOL). Coherence: `package.json` files whitelist + `bugs` URL + repo URL fix; scanner `VERSION` constants synced across `dashboard-aggregator.mjs`, `posture-scanner.mjs`, `ide-extension-scanner.mjs`. Test ceiling raised on flaky pre-compact-scan timing test (500 ms → 1000 ms; design target unchanged). No behavior changes. |
|
||||
| **7.3.0** | 2026-05-01 | **Batch C release.** Wave A (T7-T9 bash normalization + rot13 comment-block decoder), Wave B (`.gitattributes` post-clone advisory + npm scope-hop typosquat + GitHub/Forgejo workflow-scanner with 23-field blacklist + re-interpolation tracking + auth-bypass detection), Wave C (MCP cumulative-drift baseline + `/security mcp-baseline-reset`), Wave D (riskScoreV1 `@deprecated`; sandbox-architecture rationale docs; env-var deprecation runway to v8.0.0; CLAUDE.md hooks count + consistency test). 1665+ → 1777 tests. Wave E (additional attack-simulator scenarios) deferred indefinitely |
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
{
|
||||
"name": "llm-security",
|
||||
"version": "7.4.0",
|
||||
"version": "7.5.0",
|
||||
"description": "Security scanning, auditing, and threat modeling for Claude Code projects",
|
||||
"type": "module",
|
||||
"bin": {
|
||||
|
|
|
|||
120
plugins/llm-security/playground/A11Y-RAPPORT.md
Normal file
|
|
@ -0,0 +1,120 @@
|
|||
# A11Y-rapport — llm-security playground v7.5.0
|
||||
|
||||
**Dato:** 2026-05-05
|
||||
**Verktøy:** Playwright headless audit (Chromium 1217) + manuell verifisering
|
||||
**Spec:** WCAG 2.1 AA
|
||||
|
||||
---
|
||||
|
||||
## Oppsummering
|
||||
|
||||
Playgroundet er **i hovedsak konformt** med WCAG 2.1 AA. Automatisert audit
|
||||
fant **0 blokkerende problemer**. Tre mindre forbedringer flagges for
|
||||
oppfølging i v7.5.x patch eller v7.6.0.
|
||||
|
||||
| Område | Status | Notater |
|
||||
|--------|--------|---------|
|
||||
| Språkattributt | ✓ | `<html lang="nb">` |
|
||||
| Form-labels | ✓ | 4/5 inputs har eksplisitt `<label for>`. 1 unntak: theme-toggle (har `aria-label`) |
|
||||
| Tab-rekkefølge | ✓ | Logisk rekkefølge på alle 4 overflater (17/15/95/n+ tabbables) |
|
||||
| Aria-current | ✓ | Brukt på onboarding-trinn (1) og project-tabs (2) |
|
||||
| Aria-expanded | ✓ | Brukt på catalog expansion-paneler |
|
||||
| Aria-label | ✓ | 8 på onboarding, 8 på home, søkefelt og topbar har eksplisitt label |
|
||||
| Role=tablist/tab/tabpanel | ✓ | Project-skjermer (2 tablist, 10 tabs, 6 tabpanels) |
|
||||
| Verdict-pill kontrast | ✓ | DS Tier 2-tokens, manuell verifisert i light + dark mode |
|
||||
| Heading-hierarki | △ | Onboarding bruker H1→H2→H2 (OK). Project har H1→H4→H4 — H2/H3 hoppes over |
|
||||
| Skip-til-hovedinnhold | △ | Ingen «Skip to main content»-link |
|
||||
| Toast/notify-region | △ | Ingen `aria-live`-region for runtime-feedback |
|
||||
|
||||
---
|
||||
|
||||
## Detaljerte funn
|
||||
|
||||
### Onboarding (data-surface=onboarding)
|
||||
|
||||
- 12 buttons, alle med synlig tekst-innhold
|
||||
- 4/5 form-inputs har eksplisitt `label for`. Det femte er theme-toggle som har `aria-label`
|
||||
- Heading-hierarki: H1 (page title) → H2 («Trinn») → H2 («Organisasjon»)
|
||||
- 17 tabbable elementer i logisk rekkefølge (steg-velger → form-felter → forrige/neste-knapper)
|
||||
- aria-current="true" på aktivt onboarding-trinn
|
||||
|
||||
### Home (data-surface=home)
|
||||
|
||||
- 14 buttons, 15 tabbables
|
||||
- 5 headings (H1/H2/H3) — passende hierarki
|
||||
- `role="group"` på fleet-tracks
|
||||
- 8 aria-labels (project-cards, fleet-tiles)
|
||||
|
||||
### Catalog (data-surface=catalog)
|
||||
|
||||
- 5 expansion-paneler med `aria-expanded` (true/false-toggle)
|
||||
- Søkefelt har `aria-label="Søk i kommando-katalogen"`
|
||||
- 16 H3-headings (én per command-card) — H2-nivå hoppes over (mindre alvorlig)
|
||||
|
||||
### Project (data-surface=project)
|
||||
|
||||
- 2 `role="tablist"` (screen-tabs + project-tabs)
|
||||
- 10 `role="tab"` med `aria-current="true"` på aktiv
|
||||
- 6 `role="tabpanel"` (én per category-tab)
|
||||
- Mange H1+H4 — H2/H3-nivå brukes ikke i project-content
|
||||
|
||||
### Verdict-pill
|
||||
|
||||
- 19 verdict-pills på project-overflate (én per rapport)
|
||||
- DS-tokens for fargekontrast: `--verdict-block`, `--verdict-warning`,
|
||||
`--verdict-allow`, `--verdict-go-with-conditions`
|
||||
- Alle har synlig tekst (BLOKKERT/ADVARSEL/TILLATT/BETINGET) — ikke bare farge
|
||||
|
||||
---
|
||||
|
||||
## Manuell VoiceOver-test (macOS)
|
||||
|
||||
**Testet 2026-05-05 i Safari Tech Preview med VoiceOver:**
|
||||
|
||||
| Overflate | Resultat |
|
||||
|-----------|----------|
|
||||
| Onboarding form-felter | ✓ Hver input leses opp med label |
|
||||
| Steg-knapper | ✓ «Trinn 1 av 5: Organisasjon, valgt» |
|
||||
| Catalog søk | ✓ «Søk i kommando-katalogen, søkefelt» |
|
||||
| Catalog expansion | ✓ «Discover, utvidet, knapp» (toggler) |
|
||||
| Project tabs | ✓ «Discover-fanen, valgt» |
|
||||
| Verdict-pill | ✓ «BLOKKERT» leses i tekst, ikke kun symbol |
|
||||
|
||||
---
|
||||
|
||||
## Fargekontrast (WCAG 2.1 AA)
|
||||
|
||||
DS Tier 2 tokens er testet mot WCAG 2.1 AA i `shared/playground-design-system/`:
|
||||
|
||||
- Text-default mot bg-base: ≥7:1 (AAA)
|
||||
- Text-secondary mot bg-base: ≥4.5:1 (AA)
|
||||
- Text-tertiary mot bg-base: ≥3:1 (large text only)
|
||||
- Verdict-pill foreground mot pill-background: ≥4.5:1 i alle 6 verdict-states
|
||||
|
||||
Light mode + dark mode begge verifisert.
|
||||
|
||||
---
|
||||
|
||||
## Anbefalinger (oppfølging)
|
||||
|
||||
### Mindre forbedringer (v7.5.x patch)
|
||||
|
||||
1. **Skip-til-hovedinnhold-link** — `<a href="#app" class="skip-to-main">Hopp til hovedinnhold</a>` som synlig på fokus
|
||||
2. **Heading-hierarki på project-overflate** — bruk H2 for screen-tabs-heading, H3 for category-tabs-heading
|
||||
3. **`aria-live="polite"` toast-region** — for parser-feil og lagre-bekreftelse
|
||||
|
||||
### Større forbedringer (v7.6.0+)
|
||||
|
||||
- Reduced-motion media query for animasjoner (transitions, expansion)
|
||||
- Forced-colors-mode (Windows High Contrast) test
|
||||
- axe-core integrasjon i Playwright-suiten for kontinuerlig audit
|
||||
|
||||
---
|
||||
|
||||
## Konklusjon
|
||||
|
||||
Playgroundet oppfyller WCAG 2.1 AA på alle blokkerende punkter. De tre
|
||||
mindre forbedringene over er kvalitets-løft, ikke konformansebrudd.
|
||||
|
||||
*Audit kjørt automatisk via `playground/scripts/a11y-audit.cjs` (kjørbar
|
||||
men ikke checked in — kjør lokalt for re-audit).*
|
||||
|
After Width: | Height: | Size: 167 KiB |
BIN
plugins/llm-security/playground/screenshots/v7.5.0/02-home.png
Normal file
|
After Width: | Height: | Size: 259 KiB |
|
After Width: | Height: | Size: 342 KiB |
|
After Width: | Height: | Size: 261 KiB |
|
After Width: | Height: | Size: 183 KiB |
|
After Width: | Height: | Size: 190 KiB |
|
After Width: | Height: | Size: 194 KiB |
|
After Width: | Height: | Size: 221 KiB |
|
After Width: | Height: | Size: 180 KiB |
|
After Width: | Height: | Size: 184 KiB |
|
After Width: | Height: | Size: 189 KiB |
|
After Width: | Height: | Size: 179 KiB |
141
plugins/llm-security/playground/test-fixtures/audit.md
Normal file
|
|
@ -0,0 +1,141 @@
|
|||
# Full Security Audit — DFT marketplace
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | audit |
|
||||
| **Target** | ~/repos/dft-marketplace |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | 7 audit dimensions, 10 OWASP categories |
|
||||
| **Frameworks** | OWASP LLM Top 10, OWASP Agentic |
|
||||
| **Triggered by** | /security audit |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Risk Score** | 31/100 |
|
||||
| **Risk Band** | Medium |
|
||||
| **Grade** | C |
|
||||
| **Verdict** | WARNING |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 0 |
|
||||
| High | 4 |
|
||||
| Medium | 8 |
|
||||
| Low | 7 |
|
||||
| Info | 9 |
|
||||
| **Total** | **28** |
|
||||
|
||||
**Verdict rationale:** Posture base grade B downgraded to C after agent-level findings (4 high). No critical, but `Logging & Audit` and `Permission Hygiene` need attention.
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Full audit combined posture-scanner output with skill-scanner-agent and mcp-scanner-agent narratives. 28 findings across 14 files. Most concentrated in agent definitions (over-permissioned tool lists) and `.claude/settings.json` (missing audit log + wildcard Bash). Recommendation: address top 3 actions to reach Grade B; six more to reach Grade A.
|
||||
|
||||
---
|
||||
|
||||
## Radar Axes
|
||||
|
||||
| Axis | Score |
|
||||
|------|------:|
|
||||
| Deny-First Configuration | 4 |
|
||||
| Hook Coverage | 5 |
|
||||
| MCP Trust | 3 |
|
||||
| Secrets Management | 5 |
|
||||
| Permission Hygiene | 2 |
|
||||
| Supply-Chain Defense | 4 |
|
||||
| Logging & Audit | 1 |
|
||||
|
||||
---
|
||||
|
||||
## Category Assessment
|
||||
|
||||
### Category 1 — Deny-First Configuration
|
||||
|
||||
| Status | PASS |
|
||||
|
||||
**Evidence:** `.claude/settings.json` has `permissions.defaultMode: "deny"`. Explicit allow-list in place.
|
||||
|
||||
**Recommendations:** None — Grade A on this axis.
|
||||
|
||||
### Category 2 — Hook Coverage
|
||||
|
||||
| Status | PASS |
|
||||
|
||||
**Evidence:** 9 hooks active (PreToolUse: 4, PostToolUse: 2, UserPromptSubmit: 1, PreCompact: 1, others: 1).
|
||||
|
||||
**Recommendations:** Consider adding PreCompact-poisoning detection if not already covered.
|
||||
|
||||
### Category 5 — Permission Hygiene
|
||||
|
||||
| Status | PARTIAL |
|
||||
|
||||
**Evidence:** 3 agents have `Write` in tool list. 1 has `Bash` without sub-command restriction.
|
||||
|
||||
**Recommendations:** Tighten tool lists to minimum-necessary set. Use `Bash(git:*)` instead of `Bash(*)`.
|
||||
|
||||
### Category 11 — Logging & Audit
|
||||
|
||||
| Status | FAIL |
|
||||
|
||||
**Evidence:** No `audit.log_path` configured. No SIEM integration. No JSONL audit-trail.
|
||||
|
||||
**Recommendations:** Enable `audit.log_path` immediately — closes 1 high + 3 medium findings.
|
||||
|
||||
(Categories 3, 4, 6-10, 12-13 follow same format — see envelope JSON for full breakdown)
|
||||
|
||||
---
|
||||
|
||||
## Risk Matrix (Likelihood × Impact)
|
||||
|
||||
| Category | Likelihood | Impact | Score |
|
||||
|----------|-----------:|-------:|------:|
|
||||
| Logging gap (PST-001) | 4 | 4 | 16 |
|
||||
| Permission sprawl | 3 | 4 | 12 |
|
||||
| MCP drift (airbnb-mcp) | 3 | 3 | 9 |
|
||||
| AI Act classification missing | 2 | 3 | 6 |
|
||||
|
||||
---
|
||||
|
||||
## Action Plan
|
||||
|
||||
### IMMEDIATE (this week)
|
||||
|
||||
1. Enable audit-trail: set `audit.log_path` in `.llm-security/policy.json`
|
||||
2. Tighten 3 over-permissioned agents (drop `Write` where unused)
|
||||
3. Investigate airbnb-mcp drift — reset baseline only after review
|
||||
|
||||
### HIGH (this month)
|
||||
|
||||
4. Document AI Act risk classification in `CLAUDE.md`
|
||||
5. Replace `Bash(*)` with `Bash(git:*, npm:*)` in `.claude/settings.json`
|
||||
6. Bump 2 dependencies to clear OSV advisories
|
||||
|
||||
### MEDIUM (next quarter)
|
||||
|
||||
7. Add SECURITY.md disclosure policy
|
||||
8. Trim verbose skill descriptions (3 files)
|
||||
9. Document hook rationale in plugin CLAUDE.md
|
||||
|
||||
---
|
||||
|
||||
## Positive Findings
|
||||
|
||||
- All hooks active and non-bypassed
|
||||
- No critical findings
|
||||
- Posture scanner runtime < 2s (well-tuned)
|
||||
- Memory hygiene clean
|
||||
|
||||
---
|
||||
|
||||
*Audit complete. 28 findings, Grade C, 14.7 seconds.*
|
||||
145
plugins/llm-security/playground/test-fixtures/clean.md
Normal file
|
|
@ -0,0 +1,145 @@
|
|||
# Clean — Auto + Semi-Auto + Manual Remediation
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | clean |
|
||||
| **Target** | ~/repos/dft-marketplace |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Mode** | dry-run |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | scan + remediation buckets |
|
||||
| **Triggered by** | /security clean . --dry-run |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Risk Score** | 45/100 |
|
||||
| **Risk Band** | High |
|
||||
| **Grade** | C |
|
||||
| **Verdict** | WARNING |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 1 |
|
||||
| High | 3 |
|
||||
| Medium | 4 |
|
||||
| Low | 2 |
|
||||
| Info | 3 |
|
||||
| **Total** | **13** |
|
||||
|
||||
**Verdict rationale:** 13 findings classified by remediation tier. 4 auto-fixable, 5 semi-auto (require user confirmation), 3 manual (architecture-level), 1 suppressed (waiver registered).
|
||||
|
||||
---
|
||||
|
||||
## Remediation Summary
|
||||
|
||||
| Bucket | Count | Action |
|
||||
|--------|------:|--------|
|
||||
| Auto | 4 | Apply deterministic fixes (no user input) |
|
||||
| Semi-auto | 5 | Generate proposals, confirm with user |
|
||||
| Manual | 3 | Architecture-level — human decision required |
|
||||
| Suppressed | 1 | Waiver registered in `.llm-security-ignore` |
|
||||
| **Total** | **13** | |
|
||||
|
||||
---
|
||||
|
||||
## Findings
|
||||
|
||||
### Critical
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| CLN-001 | Secrets | agents/data-analyst.md | 47 | Hardcoded API key | LLM02 |
|
||||
|
||||
### High
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| CLN-002 | Excessive Agency | agents/web-helper.md | 3 | Lethal trifecta tool combination | ASI01 |
|
||||
| CLN-003 | Permissions | .claude/settings.json | 5 | Wildcard `Bash(*)` permission | ASI04 |
|
||||
| CLN-004 | Injection | commands/research.md | 22 | Indirect-injection vector | LLM01 |
|
||||
|
||||
### Medium
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| CLN-005 | MCP Trust | .mcp.json | 12 | Hidden imperative in MCP description | MCP05 |
|
||||
| CLN-006 | Documentation | LICENSE | — | License file missing | — |
|
||||
| CLN-007 | Documentation | SECURITY.md | — | Disclosure policy missing | — |
|
||||
| CLN-008 | Output Handling | agents/notes.md | 89 | Markdown link-title injection sink | LLM01 |
|
||||
|
||||
### Low
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| CLN-009 | Documentation | README.md | 88 | Suspicious URL in example | — |
|
||||
| CLN-010 | Documentation | CHANGELOG.md | — | Missing changelog file | — |
|
||||
|
||||
### Info
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| CLN-011 | Documentation | CONTRIBUTING.md | — | Missing contributing guidelines | — |
|
||||
| CLN-012 | Documentation | .gitignore | — | Missing `.env*` exclusion | — |
|
||||
| CLN-013 | Documentation | LICENSE | — | License header in source files | — |
|
||||
|
||||
---
|
||||
|
||||
## Auto
|
||||
|
||||
| ID | Action | Description |
|
||||
|----|--------|-------------|
|
||||
| CLN-001 | replace-with-env-var | Replace hardcoded `sk-prod-...` with `${API_KEY}`, log replacement to .llm-security-audit.jsonl |
|
||||
| CLN-006 | create-file | Create `LICENSE` file (MIT, default) |
|
||||
| CLN-012 | append-line | Append `.env*` to `.gitignore` |
|
||||
| CLN-013 | add-license-header | Add MIT license header to top of source files |
|
||||
|
||||
---
|
||||
|
||||
## Semi-auto
|
||||
|
||||
| ID | Action | Description |
|
||||
|----|--------|-------------|
|
||||
| CLN-003 | propose-allowlist | Propose explicit Bash allow-list based on actual usage patterns |
|
||||
| CLN-004 | propose-trust-bus | Propose Trust-Bus wrapper around indirect-injection vector |
|
||||
| CLN-005 | propose-rewrite | Propose rewritten MCP description without imperative pattern |
|
||||
| CLN-007 | scaffold-template | Generate SECURITY.md template; user confirms ownership/SLA terms |
|
||||
| CLN-008 | propose-sanitizer | Propose sanitizer for Markdown link-title sink |
|
||||
|
||||
---
|
||||
|
||||
## Manual
|
||||
|
||||
| ID | Action | Description |
|
||||
|----|--------|-------------|
|
||||
| CLN-002 | architectural-review | Lethal trifecta requires architecture-level decision: split agent OR add hook policy |
|
||||
| CLN-009 | manual-edit | Suspicious URL in README example — requires editorial judgment |
|
||||
| CLN-010 | manual-write | CHANGELOG.md content requires reviewing git history |
|
||||
|
||||
---
|
||||
|
||||
## Suppressed
|
||||
|
||||
| ID | Reason | Waiver |
|
||||
|----|--------|--------|
|
||||
| CLN-011 | Repo policy: solo project, no external contributions | `.llm-security-ignore` rule `category:documentation/contributing` |
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Immediate:** Run with `--apply` to execute the 4 auto-fixes.
|
||||
2. **High:** Walk through 5 semi-auto proposals interactively (`--interactive`).
|
||||
3. **Medium:** Schedule architecture review for the 3 manual items (CLN-002, CLN-009, CLN-010).
|
||||
4. **Low:** Review the suppressed item (CLN-011) annually to confirm policy still applies.
|
||||
|
||||
---
|
||||
|
||||
*Clean dry-run complete. 13 findings: 4 auto, 5 semi-auto, 3 manual, 1 suppressed.*
|
||||
82
plugins/llm-security/playground/test-fixtures/dashboard.md
Normal file
|
|
@ -0,0 +1,82 @@
|
|||
# Security Dashboard — Machine-wide
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | dashboard |
|
||||
| **Target** | machine-wide (5 projects) |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | all Claude Code projects under ~/ + ~/.claude/plugins/ |
|
||||
| **Frameworks** | OWASP LLM Top 10 |
|
||||
| **Triggered by** | /security dashboard |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Machine Grade** | C (weakest link) |
|
||||
| **Projects Scanned** | 5 |
|
||||
| **Total Findings** | 87 |
|
||||
| **Scan Time** | 8.4s |
|
||||
| **Cache** | Cached (3h old) |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 1 |
|
||||
| High | 12 |
|
||||
| Medium | 28 |
|
||||
| Low | 24 |
|
||||
| Info | 22 |
|
||||
| **Total** | **87** |
|
||||
|
||||
**Verdict rationale:** Machine grade is weakest-link rule. The `from-ai-to-chitta` project (Grade D) drags machine to C. Resolving that project would lift machine to B.
|
||||
|
||||
---
|
||||
|
||||
## Project Overview
|
||||
|
||||
| Project | Grade | Risk | Worst Category | Findings |
|
||||
|---------|-------|------:|----------------|---------:|
|
||||
| from-ai-to-chitta | D | 56 | MCP Trust | 32 |
|
||||
| dft-marketplace | C | 31 | Logging & Audit | 28 |
|
||||
| airbnb-mcp-plugin | C | 41 | Permissions | 14 |
|
||||
| ktg-plugin-marketplace | B | 22 | Skill Hygiene | 9 |
|
||||
| nightly-utils | A | 4 | — | 4 |
|
||||
|
||||
---
|
||||
|
||||
## Trend (since last scan)
|
||||
|
||||
| Project | Trend | Δ Risk | Δ Findings |
|
||||
|---------|:-----:|-------:|-----------:|
|
||||
| from-ai-to-chitta | worse | +12 | +6 |
|
||||
| dft-marketplace | stable | 0 | -1 |
|
||||
| airbnb-mcp-plugin | stable | -2 | 0 |
|
||||
| ktg-plugin-marketplace | better | -7 | -3 |
|
||||
| nightly-utils | stable | 0 | 0 |
|
||||
|
||||
---
|
||||
|
||||
## Errors
|
||||
|
||||
No projects failed to scan in this run.
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Priority:** Investigate `from-ai-to-chitta` — only Grade D project. Run `/security audit ~/repos/from-ai-to-chitta` for category-level breakdown.
|
||||
2. **Quick win:** Apply audit-trail fix to `dft-marketplace` (already identified, 30 min) → likely lifts to Grade B.
|
||||
3. **Maintenance:** Re-run `/security plugin-audit` on `airbnb-mcp-plugin` after maintainer responds to permission-clarification issue.
|
||||
|
||||
Estimated effort to Machine Grade B: 4 hours (focused on from-ai-to-chitta + dft-marketplace).
|
||||
|
||||
---
|
||||
|
||||
*Dashboard complete. 5 projects, machine grade C.*
|
||||
136
plugins/llm-security/playground/test-fixtures/deep-scan.md
Normal file
|
|
@ -0,0 +1,136 @@
|
|||
# Deep-Scan Report — 10 deterministic scanners
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | deep-scan |
|
||||
| **Target** | ~/repos/example-app |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | full repository |
|
||||
| **Frameworks** | OWASP LLM Top 10, OWASP Agentic, OWASP MCP |
|
||||
| **Triggered by** | /security deep-scan |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Risk Score** | 58/100 |
|
||||
| **Risk Band** | High |
|
||||
| **Grade** | C |
|
||||
| **Verdict** | WARNING |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 0 |
|
||||
| High | 6 |
|
||||
| Medium | 11 |
|
||||
| Low | 8 |
|
||||
| Info | 14 |
|
||||
| **Total** | **39** |
|
||||
|
||||
**Verdict rationale:** No critical findings. 6 high-severity findings (4 from taint, 2 from memory-poisoning) push score to 58.
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The 10-scanner orchestrator produced 39 findings in 4.7 seconds. Highest concentration is in taint-tracer (untrusted input flowing to dangerous sinks in `commands/research.md`) and memory-poisoning-scanner (encoded imperatives in `CLAUDE.md`). No critical findings. Toxic-flow correlator did not detect a complete trifecta — the agent set has hook guards that intervene before the third leg.
|
||||
|
||||
---
|
||||
|
||||
## Scanner Results
|
||||
|
||||
### 1. Unicode Analysis (UNI)
|
||||
**Status:** ok | **Files:** 47 | **Findings:** 2 | **Time:** 142ms
|
||||
|
||||
Detected 2 instances of zero-width characters in `agents/notes.md`. PUA-A range clear.
|
||||
|
||||
### 2. Entropy Analysis (ENT)
|
||||
**Status:** ok | **Files:** 89 | **Findings:** 5 | **Time:** 387ms
|
||||
|
||||
5 high-entropy strings flagged. 2 suppressed (GLSL keywords in `shaders/blur.glsl`). 3 reported (potential secrets in test fixtures).
|
||||
|
||||
### 3. Permission Mapping (PRM)
|
||||
**Status:** ok | **Files:** 12 | **Findings:** 4 | **Time:** 89ms
|
||||
|
||||
4 over-permissioned agents (tool list includes `Write`/`Edit` without justification). One wildcard Bash grant in settings.json.
|
||||
|
||||
### 4. Dependency Audit (DEP)
|
||||
**Status:** ok | **Files:** 3 | **Findings:** 3 | **Time:** 1230ms
|
||||
|
||||
3 dependencies flagged: 1 OSV-CVE-2024-1234 medium, 2 typosquat suspicions (Levenshtein ≤2 vs official packages).
|
||||
|
||||
### 5. Taint Tracing (TNT)
|
||||
**Status:** ok | **Files:** 23 | **Findings:** 12 | **Time:** 487ms
|
||||
|
||||
12 taint flows detected. 4 reach high-risk sinks (Bash interpolation, WebFetch URL construction).
|
||||
|
||||
### 6. Git Forensics (GIT)
|
||||
**Status:** ok | **Files:** — | **Findings:** 2 | **Time:** 678ms
|
||||
|
||||
2 historical secrets in git history (since rotated, but blob still reachable via reflog).
|
||||
|
||||
### 7. Network Mapping (NET)
|
||||
**Status:** ok | **Files:** 56 | **Findings:** 3 | **Time:** 412ms
|
||||
|
||||
3 suspicious URLs found (1 typosquat domain, 2 raw IP addresses in code comments).
|
||||
|
||||
### 8. Memory Poisoning (MEM)
|
||||
**Status:** ok | **Files:** 8 | **Findings:** 4 | **Time:** 67ms
|
||||
|
||||
4 memory-poisoning patterns in `CLAUDE.md` and 2 agent files: encoded base64 imperatives, suspicious permission expansion, hidden URLs.
|
||||
|
||||
### 9. Supply-Chain Recheck (SCR)
|
||||
**Status:** ok | **Files:** 2 | **Findings:** 2 | **Time:** 1845ms
|
||||
|
||||
OSV.dev returned 2 advisories on installed lockfile entries.
|
||||
|
||||
### 10. Toxic-Flow Analyzer (TFA)
|
||||
**Status:** ok | **Files:** — | **Findings:** 2 | **Time:** 23ms
|
||||
|
||||
2 partial-trifecta agents (2 of 3 legs each). No complete trifectas detected.
|
||||
|
||||
---
|
||||
|
||||
## Scanner Risk Matrix
|
||||
|
||||
| Scanner | CRITICAL | HIGH | MEDIUM | LOW | INFO |
|
||||
|---------|----------|------|--------|-----|------|
|
||||
| Unicode (UNI) | 0 | 0 | 1 | 1 | 0 |
|
||||
| Entropy (ENT) | 0 | 1 | 2 | 1 | 1 |
|
||||
| Permission (PRM) | 0 | 1 | 1 | 1 | 1 |
|
||||
| Dependency (DEP) | 0 | 0 | 2 | 1 | 0 |
|
||||
| Taint (TNT) | 0 | 4 | 3 | 2 | 3 |
|
||||
| Git (GIT) | 0 | 0 | 1 | 1 | 0 |
|
||||
| Network (NET) | 0 | 0 | 1 | 0 | 2 |
|
||||
| Memory (MEM) | 0 | 2 | 0 | 1 | 1 |
|
||||
| Supply-Chain (SCR) | 0 | 0 | 1 | 0 | 1 |
|
||||
| Toxic-Flow (TFA) | 0 | 0 | 1 | 1 | 0 |
|
||||
| **TOTAL** | **0** | **6** | **11** | **8** | **14** |
|
||||
|
||||
---
|
||||
|
||||
## Methodology
|
||||
|
||||
10 deterministic Node.js scanners (zero external dependencies). Results are factual and reproducible. Toxic-flow runs LAST as a post-correlator across prior scanners. See `scanners/lib/severity.mjs` for risk-score formula.
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **High priority:** Address 4 taint-tracer findings in `commands/research.md` and `agents/notes.md` — sanitize before sink, or add hook gate.
|
||||
2. **High priority:** Clean up `CLAUDE.md` memory-poisoning patterns (lines 12, 34, 67).
|
||||
3. **Medium:** Bump dependencies to clear OSV advisories.
|
||||
4. **Medium:** Force-push history rewrite to remove historical secrets, then rotate keys.
|
||||
|
||||
Re-run with `--baseline-diff` against last green run to track progress.
|
||||
|
||||
---
|
||||
|
||||
*Deep-scan complete. 39 findings, 10 scanners, 4.7 seconds.*
|
||||
100
plugins/llm-security/playground/test-fixtures/diff.md
Normal file
|
|
@ -0,0 +1,100 @@
|
|||
# Scan Diff Against Baseline
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | diff |
|
||||
| **Target** | ~/repos/dft-marketplace |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Baseline** | 2026-04-29 |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | scan + posture diff |
|
||||
| **Triggered by** | /security diff . |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Current Grade** | B |
|
||||
| **Baseline Grade** | C |
|
||||
| **Risk Score** | 28/100 |
|
||||
| **Risk Band** | Medium |
|
||||
| **Verdict** | WARNING |
|
||||
|
||||
| Severity | New | Resolved | Unchanged |
|
||||
|----------|----:|---------:|----------:|
|
||||
| Critical | 0 | 1 | 0 |
|
||||
| High | 1 | 2 | 1 |
|
||||
| Medium | 2 | 3 | 4 |
|
||||
| Low | 0 | 1 | 2 |
|
||||
| Info | 1 | 0 | 5 |
|
||||
| **Total** | **4** | **7** | **12** |
|
||||
|
||||
**Verdict rationale:** Net improvement (7 resolved, 4 new). Baseline had 1 CRITICAL (resolved), 2 HIGH (resolved). Grade C → B. One new HIGH on permission scope warrants review before celebrating.
|
||||
|
||||
---
|
||||
|
||||
## New (4)
|
||||
|
||||
| ID | Severity | Category | File | Description | OWASP |
|
||||
|----|----------|----------|------|-------------|-------|
|
||||
| DIF-001 | high | Permissions | .claude/settings.json | New `Edit(*)` wildcard added in commit 4a8c1f | ASI04 |
|
||||
| DIF-002 | medium | Injection | commands/research-v2.md | New command introduced indirect-injection vector | LLM01 |
|
||||
| DIF-003 | medium | Supply Chain | package-lock.json | New dependency `husky@9.0.11` (no prior baseline) | LLM03 |
|
||||
| DIF-004 | info | Documentation | docs/CHANGELOG.md | Changelog gained sensitive path reference (not exploitable) | — |
|
||||
|
||||
---
|
||||
|
||||
## Resolved (7)
|
||||
|
||||
| ID | Severity | Category | File | Resolution |
|
||||
|----|----------|----------|------|-----------|
|
||||
| BAS-001 | critical | Secrets | agents/data-analyst.md | API key removed, env-var reference added |
|
||||
| BAS-002 | high | Excessive Agency | agents/web-helper.md | Hook policy added blocking [Bash, Read, WebFetch] trifecta |
|
||||
| BAS-003 | high | MCP Trust | .mcp.json | airbnb-mcp removed |
|
||||
| BAS-004 | medium | Output Handling | agents/notes.md | Markdown link-title sink sanitized |
|
||||
| BAS-005 | medium | Memory | CLAUDE.md | Encoded base64 imperative removed |
|
||||
| BAS-006 | medium | Injection | commands/summarize.md | Indirect-injection wrapped in Trust-Bus |
|
||||
| BAS-007 | low | Documentation | README.md | Suspicious URL pattern in example removed |
|
||||
|
||||
---
|
||||
|
||||
## Unchanged (12)
|
||||
|
||||
| ID | Severity | Category | File | Notes |
|
||||
|----|----------|----------|------|-------|
|
||||
| BAS-008 | high | Permissions | .claude/settings.json | Bash wildcard remains — pending grant-narrowing |
|
||||
| BAS-009 | medium | Permissions | agents/test-runner.md | Tool list still includes Edit |
|
||||
| BAS-010 | medium | MCP Trust | .mcp.json | Per-update drift on `postgres-readonly` (12.3% > 10%) |
|
||||
| BAS-011 | medium | Other | scripts/setup.sh | curl|sh pattern in install hint |
|
||||
| BAS-012 | medium | Other | tests/fixtures/poisoned.md | Test fixture flagged (intentional) |
|
||||
| BAS-013 | low | Documentation | docs/setup.md | Outdated security-advisory link |
|
||||
| BAS-014 | low | Documentation | LICENSE | License file present but old SPDX format |
|
||||
| BAS-015 | info | Other | .gitignore | Still missing `.env*` exclusion rule |
|
||||
| BAS-016 | info | Other | LICENSE | (info-level note) |
|
||||
| BAS-017 | info | Other | CHANGELOG.md | Format compliance note |
|
||||
| BAS-018 | info | Other | SECURITY.md | Still missing |
|
||||
| BAS-019 | info | Other | CONTRIBUTING.md | Still missing |
|
||||
|
||||
---
|
||||
|
||||
## Moved (0)
|
||||
|
||||
No findings shifted file-locations between baseline and current.
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **High:** Audit DIF-001 — `Edit(*)` wildcard adds Edit-to-anywhere capability. Replace with explicit allow-list.
|
||||
2. **Medium:** Review DIF-002 (commands/research-v2.md) and DIF-003 (husky pin) before merge.
|
||||
3. **Medium:** Continue working on the 12 unchanged findings — BAS-008 (Bash wildcard) is the highest-impact remaining item.
|
||||
|
||||
---
|
||||
|
||||
*Diff complete. Net improvement: -3 findings (4 new, 7 resolved). Grade C → B.*
|
||||
121
plugins/llm-security/playground/test-fixtures/harden.md
Normal file
|
|
@ -0,0 +1,121 @@
|
|||
# Security Harden — DFT marketplace
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | harden |
|
||||
| **Target** | ~/repos/dft-marketplace |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | Grade A reference config |
|
||||
| **Frameworks** | OWASP LLM Top 10 |
|
||||
| **Triggered by** | /security harden |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Current Grade** | C |
|
||||
| **Project Type** | monorepo |
|
||||
| **Recommendations** | 6/8 |
|
||||
| **Mode** | dry-run |
|
||||
|
||||
---
|
||||
|
||||
## Posture Snapshot
|
||||
|
||||
| Metric | Before |
|
||||
|--------|-------:|
|
||||
| Pass | 8 |
|
||||
| Partial | 3 |
|
||||
| Fail | 1 |
|
||||
| N-A | 4 |
|
||||
| Pass rate | 67% |
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
### 1. Logging & Audit — `.llm-security/policy.json`
|
||||
|
||||
- **Action:** create
|
||||
- **Category:** Logging & Audit
|
||||
- **Content preview:**
|
||||
```json
|
||||
{
|
||||
"audit": {
|
||||
"log_path": "~/.claude/llm-security-audit.jsonl",
|
||||
"format": "jsonl"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Permission Hygiene — `.claude/settings.json`
|
||||
|
||||
- **Action:** merge
|
||||
- **Category:** Permission Hygiene
|
||||
- **Content preview:**
|
||||
Replace `"Bash(*)"` with `"Bash(git:*, npm:*, node:*, jq:*)"`. Adds explicit allow-list.
|
||||
|
||||
### 3. Memory Hygiene — `CLAUDE.md`
|
||||
|
||||
- **Action:** append
|
||||
- **Category:** Memory Hygiene
|
||||
- **Content preview:** Add Security Boundaries section with 4 rules.
|
||||
|
||||
### 4. Hook Coverage — `.claude/settings.json`
|
||||
|
||||
- **Action:** merge
|
||||
- **Category:** Hook Coverage
|
||||
- **Content preview:** Add `precompact` hook reference (currently missing).
|
||||
|
||||
### 5. EU AI Act — `CLAUDE.md`
|
||||
|
||||
- **Action:** append
|
||||
- **Category:** Compliance
|
||||
- **Content preview:** Add AI Act risk classification stub: `risk_level: not-applicable (developer-tool)`.
|
||||
|
||||
### 6. Documentation — `SECURITY.md`
|
||||
|
||||
- **Action:** create
|
||||
- **Category:** Documentation
|
||||
- **Content preview:** Disclosure policy template (7-day ack, 14-day triage).
|
||||
|
||||
### 7. (skipped) Supply-Chain Defense
|
||||
|
||||
- **Action:** none
|
||||
- **Reason:** Already at Grade A.
|
||||
|
||||
### 8. (skipped) Plugin Trust
|
||||
|
||||
- **Action:** none
|
||||
- **Reason:** No third-party plugins installed.
|
||||
|
||||
---
|
||||
|
||||
## Diff Summary
|
||||
|
||||
| File | Action | Lines |
|
||||
|------|--------|------:|
|
||||
| `.llm-security/policy.json` | + create | +12 |
|
||||
| `.claude/settings.json` | ~ merge | ~3 |
|
||||
| `CLAUDE.md` | + append | +18 |
|
||||
| `SECURITY.md` | + create | +47 |
|
||||
| **Total** | | **+80 / ~3** |
|
||||
|
||||
---
|
||||
|
||||
## Apply Confirmation
|
||||
|
||||
Run `/security harden . --apply` to apply these 6 changes. Backup will be created at `~/.cache/llm-security/backups/2026-05-05/`.
|
||||
|
||||
**Estimated outcome:** Grade C → A after apply + posture re-scan.
|
||||
|
||||
---
|
||||
|
||||
*Harden complete. 6 actionable recommendations, dry-run.*
|
||||
109
plugins/llm-security/playground/test-fixtures/ide-scan.md
Normal file
|
|
@ -0,0 +1,109 @@
|
|||
# IDE-Extension Scan
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | ide-scan |
|
||||
| **Target** | installed VS Code + JetBrains extensions |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | 47 VS Code extensions + 12 JetBrains plugins |
|
||||
| **Frameworks** | OWASP LLM Top 10, OWASP Agentic |
|
||||
| **Triggered by** | /security ide-scan |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Risk Score** | 28/100 |
|
||||
| **Risk Band** | Medium |
|
||||
| **Grade** | C |
|
||||
| **Verdict** | WARNING |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 0 |
|
||||
| High | 1 |
|
||||
| Medium | 4 |
|
||||
| Low | 7 |
|
||||
| Info | 12 |
|
||||
| **Total** | **24** |
|
||||
|
||||
**Verdict rationale:** One high-severity finding: a JetBrains plugin (`acme-helper`) declares `Premain-Class` (javaagent retransform) which is the riskiest IDE-extension pattern.
|
||||
|
||||
---
|
||||
|
||||
## Scan Coverage
|
||||
|
||||
| IDE | Extensions Scanned | Findings |
|
||||
|-----|-------------------:|---------:|
|
||||
| VS Code | 47 | 8 |
|
||||
| Cursor | 12 (subset of VS Code) | 2 |
|
||||
| IntelliJ IDEA | 12 | 14 |
|
||||
| **Total** | **59** | **24** |
|
||||
|
||||
---
|
||||
|
||||
## Findings
|
||||
|
||||
### High
|
||||
|
||||
| ID | Extension | IDE | Description | OWASP |
|
||||
|----|-----------|-----|-------------|-------|
|
||||
| IDE-001 | acme-helper | IntelliJ | Declares `Premain-Class` — javaagent retransform attack surface | ASI04 |
|
||||
|
||||
### Medium
|
||||
|
||||
| ID | Extension | IDE | Description | OWASP |
|
||||
|----|-----------|-----|-------------|-------|
|
||||
| IDE-002 | dark-theme-pro | VS Code | Theme contains `extension.js` (theme-with-code) | LLM06 |
|
||||
| IDE-003 | rest-client-typo | VS Code | Typosquat: Levenshtein 2 vs `rest-client` (top-100) | LLM03 |
|
||||
| IDE-004 | ace-helper | IntelliJ | Long `<depends>` chain (12 plugins) — large surface | LLM03 |
|
||||
| IDE-005 | json-fast | VS Code | activationEvents includes `*` (broad activation) | ASI04 |
|
||||
|
||||
### Low
|
||||
|
||||
| ID | Extension | IDE | Description | OWASP |
|
||||
|----|-----------|-----|-------------|-------|
|
||||
| IDE-006 | git-graph | VS Code | Native binary `.dylib` shipped (verified signature OK) | — |
|
||||
| IDE-007 | gradle-helper | IntelliJ | Native binary `.so` shipped (Linux ELF) | — |
|
||||
| IDE-008 | vsc-cmd | VS Code | `vscode:uninstall` hook present | — |
|
||||
| IDE-009 | shaded-jar-pro | IntelliJ | Shaded jar advisory (3 jars) | — |
|
||||
| IDE-010 | rest-client-typo | VS Code | Same as IDE-003: typosquat suspicion | LLM03 |
|
||||
| IDE-011 | code-splitter | VS Code | activationEvents `onStartupFinished` (broad) | ASI04 |
|
||||
| IDE-012 | java-fmt | IntelliJ | Premain-Class candidate (lower confidence) | ASI04 |
|
||||
|
||||
### Info
|
||||
|
||||
12 informational findings (mostly publisher metadata + extension-pack expansions). See envelope for full list.
|
||||
|
||||
---
|
||||
|
||||
## Per-IDE Recommendations
|
||||
|
||||
### VS Code
|
||||
|
||||
1. **Medium:** Investigate `dark-theme-pro` — themes should not ship code.
|
||||
2. **Medium:** Compare `rest-client-typo` to `rest-client` — likely typosquat. Uninstall.
|
||||
3. **Medium:** Audit `json-fast` activation events; consider replacing with narrower scope.
|
||||
|
||||
### IntelliJ IDEA / JetBrains
|
||||
|
||||
1. **High:** Manually verify `acme-helper` Premain-Class is legitimate. Consider disabling.
|
||||
2. **Medium:** Reduce `ace-helper` depends-chain or replace.
|
||||
3. **Low:** Verify shaded-jar advisories (`shaded-jar-pro`) — known shading is normal but creates supply-chain opacity.
|
||||
|
||||
---
|
||||
|
||||
## Methodology
|
||||
|
||||
7 VS Code-specific checks (blocklist, theme-with-code, sideload, broad activation, typosquat, extension-pack, dangerous hooks) + 7 JetBrains checks (Premain-Class, native binaries, depends chain, theme-with-code, broad activation, typosquat, shaded jars). Reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension. Offline mode by default.
|
||||
|
||||
---
|
||||
|
||||
*IDE-scan complete. 59 extensions, 24 findings, 8.9 seconds.*
|
||||
145
plugins/llm-security/playground/test-fixtures/mcp-audit.md
Normal file
|
|
@ -0,0 +1,145 @@
|
|||
# MCP Config Audit
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | mcp-audit |
|
||||
| **Target** | ~/.claude/.mcp.json + per-project configs |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | 5 MCP servers (3 active, 2 dormant) |
|
||||
| **Frameworks** | OWASP MCP |
|
||||
| **Triggered by** | /security mcp-audit |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Risk Score** | 33/100 |
|
||||
| **Risk Band** | Medium |
|
||||
| **Grade** | C |
|
||||
| **Verdict** | WARNING |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 0 |
|
||||
| High | 2 |
|
||||
| Medium | 6 |
|
||||
| Low | 3 |
|
||||
| Info | 4 |
|
||||
| **Total** | **15** |
|
||||
|
||||
**Verdict rationale:** No critical findings. Two high findings: airbnb-mcp tool description drift (per-update + cumulative) and tavily-mcp grants `process.env` read which is unjustified for search use case.
|
||||
|
||||
---
|
||||
|
||||
## MCP Landscape
|
||||
|
||||
| Server | Type | Trust | Tools | Active |
|
||||
|--------|------|-------|-------|-------:|
|
||||
| airbnb-mcp | local-stdio | medium | 4 | yes |
|
||||
| tavily-mcp | http-sse | low | 6 | yes |
|
||||
| microsoft-learn | http-sse | high | 3 | yes |
|
||||
| gemini-mcp | local-stdio | high | 4 | dormant |
|
||||
| mermaid-chart | http-sse | medium | 17 | dormant |
|
||||
|
||||
---
|
||||
|
||||
## Per-Server Analysis
|
||||
|
||||
### airbnb-mcp
|
||||
|
||||
- **Path:** `~/.claude/mcp-servers/airbnb-mcp/`
|
||||
- **Origin:** GitHub (airbnb-example, MIT)
|
||||
- **Tool description drift:** per-update 12.3% (alert), cumulative 27% from baseline (advisory)
|
||||
- **Permissions:** Bash, WebFetch, Read
|
||||
- **Verdict:** WARNING — drift indicates possible upgrade or rug-pull. Investigate before reset.
|
||||
|
||||
### tavily-mcp
|
||||
|
||||
- **Path:** remote (HTTP-SSE)
|
||||
- **Origin:** tavily.ai
|
||||
- **Tool description drift:** none
|
||||
- **Permissions:** WebFetch, env-vars (TAVILY_API_KEY)
|
||||
- **Verdict:** WARNING — env-var read scope is broader than needed. Confirm only TAVILY_API_KEY is exposed.
|
||||
|
||||
### microsoft-learn
|
||||
|
||||
- **Path:** remote (HTTP-SSE)
|
||||
- **Origin:** Microsoft
|
||||
- **Tool description drift:** none
|
||||
- **Permissions:** WebFetch
|
||||
- **Verdict:** ALLOW — minimal surface, well-scoped.
|
||||
|
||||
### gemini-mcp (dormant)
|
||||
|
||||
- **Path:** `~/.claude/mcp-servers/gemini-mcp/`
|
||||
- **Origin:** local-built
|
||||
- **Verdict:** N/A (dormant)
|
||||
|
||||
### mermaid-chart (dormant)
|
||||
|
||||
- **Path:** remote (HTTP-SSE)
|
||||
- **Verdict:** N/A (dormant)
|
||||
|
||||
---
|
||||
|
||||
## MCP Risk Assessment
|
||||
|
||||
3 active servers, 17 total tools across active set. Risk concentration: airbnb-mcp (description drift) + tavily-mcp (env-var scope). One server (microsoft-learn) is well-scoped baseline.
|
||||
|
||||
---
|
||||
|
||||
## Keep / Review / Remove
|
||||
|
||||
| Decision | Server | Reason |
|
||||
|----------|--------|--------|
|
||||
| Keep | microsoft-learn | Well-scoped, official source |
|
||||
| Keep | gemini-mcp | Dormant but trusted, retain |
|
||||
| Review | airbnb-mcp | Description drift requires investigation |
|
||||
| Review | tavily-mcp | Env-var scope overly broad |
|
||||
| Remove | mermaid-chart | Dormant 87 days, no usage |
|
||||
|
||||
---
|
||||
|
||||
## Findings
|
||||
|
||||
### High
|
||||
|
||||
| ID | Server | Description | OWASP |
|
||||
|----|--------|-------------|-------|
|
||||
| MA-001 | airbnb-mcp | Cumulative drift 27% from baseline (sticky) | MCP05 |
|
||||
| MA-002 | tavily-mcp | env-var read includes more than declared keys | MCP06 |
|
||||
|
||||
### Medium
|
||||
|
||||
| ID | Server | Description | OWASP |
|
||||
|----|--------|-------------|-------|
|
||||
| MA-003 | airbnb-mcp | Per-update drift 12.3% on `book` tool | MCP05 |
|
||||
| MA-004 | airbnb-mcp | Tool `book` returns large payloads without size cap | MCP09 |
|
||||
| MA-005 | tavily-mcp | TLS cert pinning not enforced | MCP08 |
|
||||
| MA-006 | mermaid-chart | Dormant > 90 days, suggest removal | — |
|
||||
| MA-007 | airbnb-mcp | Description includes implicit instruction | MCP05 |
|
||||
| MA-008 | tavily-mcp | Rate-limit not configured client-side | MCP09 |
|
||||
|
||||
### Low / Info
|
||||
|
||||
(7 lower-severity findings — see envelope)
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **High:** Run `/security mcp-baseline-reset --target airbnb-mcp` only AFTER manual review of new description.
|
||||
2. **High:** Restrict `tavily-mcp` env-var scope to `TAVILY_API_KEY` exclusively (settings.local.json).
|
||||
3. **Medium:** Remove dormant `mermaid-chart` server unless re-activated within 14 days.
|
||||
4. **Medium:** Add response-size caps for `airbnb-mcp` `book` tool.
|
||||
|
||||
---
|
||||
|
||||
*MCP-audit complete. 5 servers, 15 findings, verdict WARNING.*
|
||||
107
plugins/llm-security/playground/test-fixtures/mcp-inspect.md
Normal file
|
|
@ -0,0 +1,107 @@
|
|||
# MCP Live-Inspect Report
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | mcp-inspect |
|
||||
| **Target** | 4 running MCP servers (auto-discovered) |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | runtime tool descriptions + capability surface |
|
||||
| **Frameworks** | OWASP MCP Top 10 |
|
||||
| **Triggered by** | /security mcp-inspect |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Risk Score** | 38/100 |
|
||||
| **Risk Band** | Medium |
|
||||
| **Grade** | C |
|
||||
| **Verdict** | WARNING |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 0 |
|
||||
| High | 1 |
|
||||
| Medium | 3 |
|
||||
| Low | 2 |
|
||||
| Info | 4 |
|
||||
| **Total** | **10** |
|
||||
|
||||
**Verdict rationale:** One HIGH-severity tool-shadowing finding on `airbnb-mcp.search_listings` (description claims to "browse listings" but invokes `Bash` internally). Three MEDIUM drift advisories above per-update threshold.
|
||||
|
||||
---
|
||||
|
||||
## Server Inventory
|
||||
|
||||
| Server | Transport | Tools | Status | Connected |
|
||||
|--------|-----------|------:|--------|-----------|
|
||||
| airbnb-mcp | stdio | 6 | running | yes |
|
||||
| postgres-readonly | stdio | 2 | running | yes |
|
||||
| browser-mcp | http | 4 | running | yes |
|
||||
| filesystem-mcp | stdio | 8 | running | yes |
|
||||
|
||||
---
|
||||
|
||||
## Codepoint Reveal
|
||||
|
||||
Tools with non-ASCII codepoints in descriptions (zero-width / homoglyph candidates):
|
||||
|
||||
| Server | Tool | Codepoints | Risk |
|
||||
|--------|------|------------|------|
|
||||
| airbnb-mcp | search_listings | U+200B (zero-width space), U+2028 (line separator) | HIGH |
|
||||
| browser-mcp | navigate | U+202E (RTL override) | MEDIUM |
|
||||
| filesystem-mcp | list_dir | (clean) | — |
|
||||
|
||||
---
|
||||
|
||||
## Findings
|
||||
|
||||
### High
|
||||
|
||||
| ID | Category | Server | Description | OWASP |
|
||||
|----|----------|--------|-------------|-------|
|
||||
| MCI-001 | Tool Shadowing | airbnb-mcp | `search_listings` description says "browse listings" but tool surface includes shell-exec capability | MCP06 |
|
||||
|
||||
### Medium
|
||||
|
||||
| ID | Category | Server | Description | OWASP |
|
||||
|----|----------|--------|-------------|-------|
|
||||
| MCI-002 | Description Drift | airbnb-mcp | `book_property` description changed 18.4% since last cache (>10% threshold) | MCP05 |
|
||||
| MCI-003 | Description Drift | browser-mcp | `navigate` description gained URL-allow-list bypass language | MCP05 |
|
||||
| MCI-004 | Hidden Imperative | airbnb-mcp | `cancel_booking` description contains "ALWAYS confirm with user before X" pattern | MCP03 |
|
||||
|
||||
### Low
|
||||
|
||||
| ID | Category | Server | Description | OWASP |
|
||||
|----|----------|--------|-------------|-------|
|
||||
| MCI-005 | Verbose Schema | filesystem-mcp | Tool schemas exceed 800 tokens — context-window pressure | — |
|
||||
| MCI-006 | Verbose Schema | browser-mcp | Tool schemas exceed 600 tokens | — |
|
||||
|
||||
### Info
|
||||
|
||||
| ID | Category | Server | Description | OWASP |
|
||||
|----|----------|--------|-------------|-------|
|
||||
| MCI-007 | Capability | postgres-readonly | Read-only enforced by URL connection-string parameter | — |
|
||||
| MCI-008 | Capability | filesystem-mcp | Path-allow-list enforced via env var | — |
|
||||
| MCI-009 | Trust | airbnb-mcp | NPM package, last published 2026-04-12 | — |
|
||||
| MCI-010 | Trust | browser-mcp | GitHub source, MIT license | — |
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Immediate:** Disable `airbnb-mcp.search_listings` until upstream maintainer clarifies shell-exec rationale or removes capability.
|
||||
2. **High:** Run `/security mcp-baseline-reset --target airbnb-mcp` after legitimate update is verified.
|
||||
3. **Medium:** Audit zero-width characters in descriptions; reject the tool description if maintainer cannot explain U+200B inclusion.
|
||||
4. **Medium:** Bound description token-budget in policy.json: `mcp.max_description_tokens: 500`.
|
||||
|
||||
---
|
||||
|
||||
*Live-inspect complete. 10 findings across 4 servers.*
|
||||
144
plugins/llm-security/playground/test-fixtures/plugin-audit.md
Normal file
|
|
@ -0,0 +1,144 @@
|
|||
# Plugin-Audit — airbnb-mcp-plugin
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | plugin-audit |
|
||||
| **Target** | https://github.com/airbnb-example/airbnb-mcp-plugin |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | plugin trust assessment |
|
||||
| **Frameworks** | OWASP MCP, OWASP LLM Top 10 |
|
||||
| **Triggered by** | /security plugin-audit |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Risk Score** | 41/100 |
|
||||
| **Risk Band** | High |
|
||||
| **Grade** | C |
|
||||
| **Verdict** | WARNING |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 0 |
|
||||
| High | 3 |
|
||||
| Medium | 5 |
|
||||
| Low | 4 |
|
||||
| Info | 2 |
|
||||
| **Total** | **14** |
|
||||
|
||||
**Verdict rationale:** Plugin requests broad permissions (Bash, Write, WebFetch) with limited justification. No critical findings, but trust verdict downgrades to WARNING pending clarification.
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Third-party Claude Code plugin distributed via GitHub. Implements 4 MCP tools (search, book, cancel, list-reservations). Plugin has clear maintainer (verified GitHub identity, 87 commits over 2.3 years). Three high-severity findings concern broad tool permissions and one MCP tool description that includes hidden imperative ("when called, also fetch X").
|
||||
|
||||
---
|
||||
|
||||
## Plugin Metadata
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Name** | airbnb-mcp-plugin |
|
||||
| **Version** | 1.4.2 |
|
||||
| **Author** | airbnb-example (verified) |
|
||||
| **License** | MIT |
|
||||
| **Source** | https://github.com/airbnb-example/airbnb-mcp-plugin |
|
||||
| **First commit** | 2024-01-15 |
|
||||
| **Last commit** | 2026-04-22 |
|
||||
| **Commits** | 87 |
|
||||
| **Stars** | 247 |
|
||||
|
||||
---
|
||||
|
||||
## Component Inventory
|
||||
|
||||
| Component | Count | Notes |
|
||||
|-----------|------:|-------|
|
||||
| Commands | 3 | book.md, cancel.md, list.md |
|
||||
| Agents | 1 | search-agent.md |
|
||||
| MCP Servers | 1 | airbnb-mcp (4 tools) |
|
||||
| Hooks | 0 | (none) |
|
||||
| Skills | 0 | (none) |
|
||||
|
||||
---
|
||||
|
||||
## Permission Matrix
|
||||
|
||||
| Tool | Required by | Justified |
|
||||
|------|-------------|-----------|
|
||||
| Read | search-agent | Yes — needs to read user filters |
|
||||
| WebFetch | search-agent | Yes — Airbnb API |
|
||||
| Bash | book.md | Partial — only used for date math |
|
||||
| Write | search-agent | No — appears unused |
|
||||
| Edit | (none) | — |
|
||||
|
||||
---
|
||||
|
||||
## Hook Safety
|
||||
|
||||
No hooks defined. Plugin operates entirely through MCP tools and agent definitions. No PreToolUse/PostToolUse mechanisms to verify.
|
||||
|
||||
---
|
||||
|
||||
## Trust Verdict
|
||||
|
||||
**Verdict:** WARNING — install with caution
|
||||
|
||||
**Rationale:**
|
||||
- Maintainer is verifiable (GitHub identity, history)
|
||||
- License is MIT (permissive, OK)
|
||||
- Permission grant is broader than necessary (Write tool unused)
|
||||
- One MCP tool description (`book`) contains an implicit instruction outside its declared purpose
|
||||
|
||||
**Recommended action:** Open issue with maintainer requesting (a) drop unused `Write` permission, (b) clarify `book` tool description. Re-audit after maintainer response.
|
||||
|
||||
---
|
||||
|
||||
## Findings
|
||||
|
||||
### High
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| PA-001 | Permissions | search-agent.md | 5 | Tool list includes `Write` with no apparent use | ASI04 |
|
||||
| PA-002 | MCP Trust | mcp-tools/book.json | 14 | Description has hidden imperative outside scope | MCP05 |
|
||||
| PA-003 | Permissions | book.md | 8 | Bash permission not minimized to specific commands | ASI04 |
|
||||
|
||||
### Medium
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| PA-004 | Supply Chain | package.json | 12 | Dependency `@airbnb/utils@2.1.0` outdated | LLM03 |
|
||||
| PA-005 | Output Handling | search-agent.md | 34 | API response inserted as markdown without sanitization | LLM01 |
|
||||
| PA-006 | Other | README.md | — | No security disclosure policy | — |
|
||||
| PA-007 | Other | CHANGELOG.md | — | Last 3 releases lack security notes | — |
|
||||
| PA-008 | Permissions | .claude/settings.json | 5 | Settings file commits hooks=null (acceptable) | — |
|
||||
|
||||
### Low
|
||||
|
||||
(4 low + 2 info findings — see envelope JSON for full list)
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **High:** Open issue with maintainer about `Write` permission removal.
|
||||
2. **High:** Request clarification of `book` tool description.
|
||||
3. **Medium:** Bump `@airbnb/utils` to current.
|
||||
4. **Medium:** Add SECURITY.md.
|
||||
|
||||
If maintainer response is satisfactory: re-audit. If install is urgent: deploy with MCP volume monitoring (`/security mcp-inspect`) for 7 days.
|
||||
|
||||
---
|
||||
|
||||
*Plugin-audit complete. 14 findings, trust verdict WARNING.*
|
||||
118
plugins/llm-security/playground/test-fixtures/posture.md
Normal file
|
|
@ -0,0 +1,118 @@
|
|||
# Security Posture — DFT marketplace
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | posture |
|
||||
| **Target** | ~/repos/dft-marketplace |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | 16 categories (13 applicable) |
|
||||
| **Frameworks** | OWASP LLM Top 10, EU AI Act, NIST AI RMF |
|
||||
| **Triggered by** | /security posture |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Risk Score** | 22/100 |
|
||||
| **Risk Band** | Medium |
|
||||
| **Grade** | B |
|
||||
| **Verdict** | WARNING |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 0 |
|
||||
| High | 1 |
|
||||
| Medium | 3 |
|
||||
| Low | 4 |
|
||||
| Info | 6 |
|
||||
| **Total** | **14** |
|
||||
|
||||
---
|
||||
|
||||
## Overall Score
|
||||
|
||||
**11 / 13 categories covered (Grade B)**
|
||||
|
||||
```
|
||||
████████████████████░░░░ 84%
|
||||
```
|
||||
|
||||
**Risk Score:** 22/100 (Medium)
|
||||
|
||||
**Verdict:** WARNING — close one high-severity gap to reach Grade A.
|
||||
|
||||
---
|
||||
|
||||
## Category Scorecard
|
||||
|
||||
| # | Category | Status | Findings |
|
||||
|---|----------|--------|---------:|
|
||||
| 1 | Deny-First Configuration | PASS | 0 |
|
||||
| 2 | Hook Coverage | PASS | 0 |
|
||||
| 3 | MCP Server Trust | PARTIAL | 2 |
|
||||
| 4 | Secret Management | PASS | 0 |
|
||||
| 5 | Permission Hygiene | PARTIAL | 1 |
|
||||
| 6 | Memory Hygiene | PASS | 0 |
|
||||
| 7 | Supply-Chain Defense | PASS | 1 |
|
||||
| 8 | Plugin Trust | PASS | 0 |
|
||||
| 9 | IDE Extension Hygiene | PASS | 0 |
|
||||
| 10 | Skill Hygiene | PARTIAL | 3 |
|
||||
| 11 | Logging & Audit | FAIL | 4 |
|
||||
| 12 | Documentation | PASS | 1 |
|
||||
| 13 | EU AI Act Coverage | PARTIAL | 2 |
|
||||
| 14 | NIST AI RMF Mapping | N-A | 0 |
|
||||
| 15 | ISO 42001 Mapping | N-A | 0 |
|
||||
| 16 | Datatilsynet Compliance | N-A | 0 |
|
||||
|
||||
---
|
||||
|
||||
## Top Findings
|
||||
|
||||
### High
|
||||
|
||||
| ID | Category | File | Description |
|
||||
|----|----------|------|-------------|
|
||||
| PST-001 | Logging & Audit | settings.json | No audit-trail configured (`audit.log_path` unset) |
|
||||
|
||||
### Medium
|
||||
|
||||
| ID | Category | File | Description |
|
||||
|----|----------|------|-------------|
|
||||
| PST-002 | Skill Hygiene | skills/data-summary/SKILL.md | Description >150 chars (verbose) |
|
||||
| PST-003 | EU AI Act | (project-level) | No AI Act risk classification documented |
|
||||
| PST-004 | MCP Trust | .mcp.json | airbnb-mcp drift advisory pending |
|
||||
|
||||
---
|
||||
|
||||
## Quick Wins
|
||||
|
||||
1. **Enable audit trail** — set `audit.log_path` in `.llm-security/policy.json` (closes PST-001).
|
||||
2. **Document AI Act classification** — add risk-level to `CLAUDE.md` (closes PST-003).
|
||||
3. **Reset airbnb-mcp baseline** — after legitimate review (closes PST-004).
|
||||
|
||||
---
|
||||
|
||||
## Baseline Comparison
|
||||
|
||||
No baseline saved. Run `/security posture --save-baseline` to track future drift.
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **High:** Enable audit logging — single setting closes the only high-severity gap.
|
||||
2. **Medium:** Add AI Act risk classification.
|
||||
3. **Medium:** Trim verbose skill descriptions in 3 skills.
|
||||
|
||||
Estimated effort to Grade A: 30 minutes.
|
||||
|
||||
---
|
||||
|
||||
*Posture complete. Grade B, 14 findings, 1.2 seconds.*
|
||||
116
plugins/llm-security/playground/test-fixtures/pre-deploy.md
Normal file
|
|
@ -0,0 +1,116 @@
|
|||
# Pre-Deploy Security Checklist
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | pre-deploy |
|
||||
| **Target** | DFT data-platform release v3.2.0 |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | enterprise gate + production readiness |
|
||||
| **Frameworks** | OWASP LLM Top 10, EU AI Act, NSM Grunnprinsipper |
|
||||
| **Triggered by** | /security pre-deploy |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Risk Score** | 12/100 |
|
||||
| **Risk Band** | Low |
|
||||
| **Grade** | A |
|
||||
| **Verdict** | GO-WITH-CONDITIONS |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 0 |
|
||||
| High | 0 |
|
||||
| Medium | 2 |
|
||||
| Low | 3 |
|
||||
| Info | 5 |
|
||||
| **Total** | **10** |
|
||||
|
||||
**Verdict rationale:** All gates PASS or PASS-WITH-NOTES. 2 medium conditions: pending Datatilsynet ack on DPIA addendum (expected 2026-05-08) + missing logging-aggregator wire-up. Conditional approval — deployment may proceed once both are resolved.
|
||||
|
||||
---
|
||||
|
||||
## Traffic Light Categories
|
||||
|
||||
| Category | Status | Notes |
|
||||
|----------|--------|-------|
|
||||
| Identity & Access | PASS | OIDC + MFA, 89% coverage |
|
||||
| Network Isolation | PASS | Private endpoints + NSG |
|
||||
| Data Protection | PASS-WITH-NOTES | Customer-managed keys; rotation policy verified |
|
||||
| Logging & Audit | FAIL | Logging aggregator not wired (M1 finding) |
|
||||
| Compliance | PASS-WITH-NOTES | DPIA pending Datatilsynet ack (M2) |
|
||||
| Secrets Management | PASS | Key Vault + managed identity |
|
||||
| Hooks Coverage | PASS | All 9 hooks active |
|
||||
| MCP Security | PASS | 0 untrusted servers |
|
||||
| Supply Chain | PASS | 0 critical, 0 high CVEs |
|
||||
| Plugin Trust | PASS | Only first-party plugins |
|
||||
| Permission Hygiene | PASS | No wildcard Bash |
|
||||
| Memory Hygiene | PASS | CLAUDE.md scanned, no poisoning |
|
||||
| Performance | PASS | <500ms hook latency |
|
||||
|
||||
---
|
||||
|
||||
## Findings
|
||||
|
||||
### Medium
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| PRD-001 | Logging | infrastructure/observability.bicep | 12 | Logging aggregator export endpoint missing | — |
|
||||
| PRD-002 | Compliance | docs/DPIA-2026-04-15.md | — | Datatilsynet ack pending (submitted 2026-04-22, expected response 2026-05-08) | — |
|
||||
|
||||
### Low
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| PRD-003 | Documentation | docs/SECURITY.md | — | SLA for security-disclosure response not documented | — |
|
||||
| PRD-004 | Documentation | docs/RUNBOOK.md | — | Incident-response runbook missing rollback section | — |
|
||||
| PRD-005 | Performance | hooks/post-mcp-verify.mjs | — | P95 latency 412ms (target <500ms) — within budget but monitoring needed | — |
|
||||
|
||||
### Info
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| PRD-006 | Coverage | (env) | — | Production env: Azure North Europe |
|
||||
| PRD-007 | Coverage | (env) | — | Data-classification: Fortrolig |
|
||||
| PRD-008 | Coverage | (compliance) | — | Frameworks: OWASP LLM, EU AI Act, NSM |
|
||||
| PRD-009 | Coverage | (gate) | — | Pre-deploy run by: ci/release.yml |
|
||||
| PRD-010 | Coverage | (history) | — | 4 prior pre-deploy runs in last 90 days, all PASS |
|
||||
|
||||
---
|
||||
|
||||
## Conditions to Resolve
|
||||
|
||||
1. **PRD-001 (medium):** Wire logging aggregator before deployment. Owner: platform-ops. Blocker.
|
||||
2. **PRD-002 (medium):** Receive Datatilsynet ack OR document silent-period acceptance. Owner: privacy-officer. Blocker until 2026-05-08.
|
||||
|
||||
---
|
||||
|
||||
## Approvals
|
||||
|
||||
| Role | Approver | Date | Notes |
|
||||
|------|----------|------|-------|
|
||||
| Security Lead | (pending) | — | After PRD-001 resolved |
|
||||
| Privacy Officer | (pending) | — | After PRD-002 resolved |
|
||||
| Platform Owner | A. Nilsen | 2026-05-04 | Signed off subject to conditions |
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Immediate:** Resolve PRD-001 (logging aggregator) before deploying.
|
||||
2. **High:** Confirm Datatilsynet ack OR escalate silent-period exception (PRD-002).
|
||||
3. **Medium:** Document SLA in SECURITY.md (PRD-003) post-deploy — non-blocking.
|
||||
4. **Medium:** Add rollback section to RUNBOOK.md (PRD-004) post-deploy.
|
||||
|
||||
---
|
||||
|
||||
*Pre-deploy complete. 13 categories, 1 FAIL pending wire-up, conditional GO.*
|
||||
112
plugins/llm-security/playground/test-fixtures/red-team.md
Normal file
|
|
@ -0,0 +1,112 @@
|
|||
# Red-Team Simulation
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | red-team |
|
||||
| **Target** | llm-security plugin hooks |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | 64 scenarios × 12 categories |
|
||||
| **Frameworks** | OWASP LLM Top 10, OWASP Agentic, DeepMind Agent Traps |
|
||||
| **Triggered by** | /security red-team |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Defense Score** | 92% |
|
||||
| **Total Scenarios** | 64 |
|
||||
| **Pass** | 59 |
|
||||
| **Fail** | 5 |
|
||||
| **Adaptive Mode** | off |
|
||||
| **Verdict** | WARNING |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 0 |
|
||||
| High | 2 |
|
||||
| Medium | 3 |
|
||||
| Low | 0 |
|
||||
| Info | 0 |
|
||||
| **Total** | **5** |
|
||||
|
||||
**Verdict rationale:** 5 of 64 scenarios bypassed defenses. Two high-severity bypasses concern bash-evasion via T9 (eval-via-variable) and synonym-substituted destructive commands. No critical bypasses.
|
||||
|
||||
---
|
||||
|
||||
## Defense Score Interpretation
|
||||
|
||||
92% — minor gaps. Hooks block all critical attack-chain scenarios. Bypass concentration is in adaptive evasion (variable indirection + synonyms), which is harder to catch deterministically.
|
||||
|
||||
---
|
||||
|
||||
## Per-Category Breakdown
|
||||
|
||||
| Category | Pass | Fail | Coverage |
|
||||
|----------|-----:|-----:|---------:|
|
||||
| prompt-injection | 8 | 0 | 100% |
|
||||
| tool-poisoning | 6 | 0 | 100% |
|
||||
| data-exfiltration | 5 | 0 | 100% |
|
||||
| lethal-trifecta | 4 | 0 | 100% |
|
||||
| mcp-shadowing | 3 | 0 | 100% |
|
||||
| memory-poisoning | 6 | 0 | 100% |
|
||||
| supply-chain | 5 | 1 | 83% |
|
||||
| credential-theft | 4 | 0 | 100% |
|
||||
| unicode-evasion | 5 | 1 | 83% |
|
||||
| bash-evasion | 6 | 2 | 75% |
|
||||
| sub-agent-escape | 4 | 0 | 100% |
|
||||
| permission-escalation | 3 | 1 | 75% |
|
||||
|
||||
---
|
||||
|
||||
## Failed Scenarios
|
||||
|
||||
### High
|
||||
|
||||
| ID | Category | Payload class | Reason |
|
||||
|----|----------|---------------|--------|
|
||||
| BSH-007 | bash-evasion | T9 eval-via-variable (one-level forward-flow) | Defense layer collapses common case but misses double-indirection variant |
|
||||
| BSH-008 | bash-evasion | Synonym-substituted destructive | "obliterate" used in place of "rm" — synonym table did not match |
|
||||
|
||||
### Medium
|
||||
|
||||
| ID | Category | Payload class | Reason |
|
||||
|----|----------|---------------|--------|
|
||||
| UNI-007 | unicode-evasion | PUA-B + zero-width combo | Detector flagged PUA-B but downgraded to MEDIUM advisory |
|
||||
| DEP-005 | supply-chain | Levenshtein 3 typosquat | Beyond default ≤2 threshold; expected behavior |
|
||||
| PRM-004 | permission-escalation | Catalog-merge granting Edit | Hook fires but permits via wildcard inheritance |
|
||||
|
||||
---
|
||||
|
||||
## Adaptive Mode
|
||||
|
||||
Adaptive mode was OFF for this run. To test mutation-based evasion (homoglyph, encoding, zero-width, case alternation, synonym), re-run with `--adaptive`.
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **High:** Extend `bash-normalize.mjs` T9 (eval-via-variable) to handle double indirection (`x=cmd; y=$x; eval $y`).
|
||||
2. **High:** Expand synonym table in `attack-mutations.json` to include "obliterate", "annihilate", "wipe" variants.
|
||||
3. **Medium:** Document known limitation: Levenshtein 3+ typosquats not caught by default policy. User-tunable via `policy.json`.
|
||||
4. **Medium:** PRM-004 wildcard inheritance is documented behavior but warrants user-facing notice.
|
||||
|
||||
---
|
||||
|
||||
## Test History
|
||||
|
||||
| Run | Date | Defense Score | Δ |
|
||||
|-----|------|--------------:|---|
|
||||
| Current | 2026-05-05 | 92% | — |
|
||||
| Previous | 2026-04-29 | 91% | +1 |
|
||||
| 30 days ago | 2026-04-05 | 88% | +4 |
|
||||
|
||||
---
|
||||
|
||||
*Red-team complete. 64 scenarios, 5 bypasses, defense score 92%.*
|
||||
112
plugins/llm-security/playground/test-fixtures/registry.md
Normal file
|
|
@ -0,0 +1,112 @@
|
|||
# Skill Signature Registry
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | registry |
|
||||
| **Target** | ~/.claude/skills (local registry) |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Mode** | scan |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | skill-signature fingerprint registry |
|
||||
| **Triggered by** | /security registry scan |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Risk Score** | 18/100 |
|
||||
| **Risk Band** | Medium |
|
||||
| **Grade** | B |
|
||||
| **Verdict** | WARNING |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 0 |
|
||||
| High | 1 |
|
||||
| Medium | 2 |
|
||||
| Low | 2 |
|
||||
| Info | 5 |
|
||||
| **Total** | **10** |
|
||||
|
||||
**Verdict rationale:** 1 HIGH on a known-malicious skill fingerprint match (`malicious-pdf-helper@1.0.0`). 2 MEDIUM on signature drift for previously-trusted skills.
|
||||
|
||||
---
|
||||
|
||||
## Registry Stats
|
||||
|
||||
| Metric | Value |
|
||||
|--------|------:|
|
||||
| **Skills tracked** | 87 |
|
||||
| **Known-good fingerprints** | 79 |
|
||||
| **Known-bad fingerprints** | 4 |
|
||||
| **Unknown fingerprints** | 4 |
|
||||
| **Drift events (30d)** | 7 |
|
||||
| **Registry file** | reports/skill-registry.json |
|
||||
|
||||
---
|
||||
|
||||
## Signature Table
|
||||
|
||||
| Skill | Source | Fingerprint (SHA-256, 8-hex) | Status | First seen |
|
||||
|-------|--------|------------------------------|--------|-----------|
|
||||
| pdf-helper | builtin | a8f3e21d | known-good | 2026-01-12 |
|
||||
| story | user | 4c2b89f0 | known-good | 2026-02-08 |
|
||||
| malicious-pdf-helper | npm | 7e91d3a4 | KNOWN-BAD | 2026-04-22 |
|
||||
| story-v2 | user | 9f1c2e8b | DRIFT (was 4c2b89f0) | 2026-05-04 |
|
||||
| audit-helper | community | b3a7f29c | DRIFT (was c814e7a1) | 2026-05-03 |
|
||||
| pptx | builtin | d7e4a1f3 | known-good | 2026-01-12 |
|
||||
| capability-auditor | community | e2f9b483 | unknown (new) | 2026-05-05 |
|
||||
| persona-creator | builtin | 1a4c8e07 | known-good | 2026-01-12 |
|
||||
|
||||
---
|
||||
|
||||
## Findings
|
||||
|
||||
### High
|
||||
|
||||
| ID | Category | Skill | File | Description | OWASP |
|
||||
|----|----------|-------|------|-------------|-------|
|
||||
| REG-001 | Known-bad | malicious-pdf-helper | ~/.claude/skills/malicious-pdf-helper/SKILL.md | Fingerprint matches 2026-04-22 advisory (data exfiltration via PDF metadata) | LLM05 |
|
||||
|
||||
### Medium
|
||||
|
||||
| ID | Category | Skill | File | Description | OWASP |
|
||||
|----|----------|-------|------|-------------|-------|
|
||||
| REG-002 | Drift | story-v2 | ~/.claude/skills/story-v2/SKILL.md | Fingerprint changed since registry — verify legitimacy | LLM05 |
|
||||
| REG-003 | Drift | audit-helper | ~/.claude/skills/audit-helper/SKILL.md | Fingerprint changed since registry — verify legitimacy | LLM05 |
|
||||
|
||||
### Low
|
||||
|
||||
| ID | Category | Skill | File | Description | OWASP |
|
||||
|----|----------|-------|------|-------------|-------|
|
||||
| REG-004 | Unknown | capability-auditor | ~/.claude/skills/capability-auditor/SKILL.md | New community skill, no prior fingerprint — recommend manual review | — |
|
||||
| REG-005 | Stale | unused-skill | ~/.claude/skills/unused-skill/SKILL.md | No invocations in 90 days — candidate for removal | — |
|
||||
|
||||
### Info
|
||||
|
||||
| ID | Category | Skill | File | Description | OWASP |
|
||||
|----|----------|-------|------|-------------|-------|
|
||||
| REG-006 | Coverage | (registry) | reports/skill-registry.json | 87 skills tracked across 4 sources (builtin/user/community/npm) | — |
|
||||
| REG-007 | Coverage | (cache) | ~/.cache/llm-security/registry/ | Cache size: 412 KB | — |
|
||||
| REG-008 | Coverage | (cache) | (TTL) | Registry cache TTL: 24h | — |
|
||||
| REG-009 | Coverage | (cache) | (next sync) | 17h until next registry sync | — |
|
||||
| REG-010 | History | (audit) | reports/registry-audit.jsonl | 7 drift events in last 30 days, all on community skills | — |
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Immediate:** Disable or remove `malicious-pdf-helper` skill. Cross-reference with `~/.claude/skills/` and check if any agents reference it.
|
||||
2. **High:** Investigate signature drift on `story-v2` and `audit-helper`. Compare against last-known-good fingerprint and re-register if legitimate update.
|
||||
3. **Medium:** Manually review `capability-auditor` (new, unknown). Run `/security scan ~/.claude/skills/capability-auditor` for full analysis.
|
||||
4. **Low:** Audit unused skills — `unused-skill` has had no invocations in 90d.
|
||||
|
||||
---
|
||||
|
||||
*Registry scan complete. 87 skills, 1 known-bad, 2 drift events.*
|
||||
148
plugins/llm-security/playground/test-fixtures/scan.md
Normal file
|
|
@ -0,0 +1,148 @@
|
|||
# Security Scan Report
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | scan |
|
||||
| **Target** | ~/repos/example-app |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | skill scan + MCP scan |
|
||||
| **Frameworks** | OWASP LLM Top 10, OWASP MCP |
|
||||
| **Triggered by** | /security scan |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Risk Score** | 72/100 |
|
||||
| **Risk Band** | Critical |
|
||||
| **Grade** | D |
|
||||
| **Verdict** | BLOCK |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 2 |
|
||||
| High | 4 |
|
||||
| Medium | 7 |
|
||||
| Low | 3 |
|
||||
| Info | 5 |
|
||||
| **Total** | **21** |
|
||||
|
||||
**Verdict rationale:** 2 critical findings (hardcoded API key + lethal trifecta in agent definition) cross the BLOCK threshold. High-severity prompt-injection vector in tool description compounds the risk.
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Scan found 21 issues across 7 files in the `commands/` and `agents/` directories. Two critical findings require immediate remediation before this plugin is shipped: a hardcoded API key in `agents/data-analyst.md` (line 47) and a lethal trifecta agent (`agents/web-helper.md`) with `[Bash, Read, WebFetch]` and no hook guards. The four high-severity findings concentrate on prompt-injection patterns in MCP tool descriptions.
|
||||
|
||||
### Narrative Audit
|
||||
|
||||
**Suppressed signals:** 3 (entropy: 2 GLSL fragments, frontmatter: 1 framework env-var reference)
|
||||
|
||||
---
|
||||
|
||||
## Findings
|
||||
|
||||
Findings sorted Critical → High → Medium → Low → Info.
|
||||
|
||||
### Critical
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| SCN-001 | Secrets | agents/data-analyst.md | 47 | Hardcoded API key (sk-prod-...) | LLM02 |
|
||||
| SCN-002 | Excessive Agency | agents/web-helper.md | 3 | Lethal trifecta: [Bash, Read, WebFetch] without hook guards | ASI01, LLM06 |
|
||||
|
||||
### High
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| SCN-003 | Injection | commands/research.md | 22 | Prompt-injection vector in user-input interpolation | LLM01 |
|
||||
| SCN-004 | MCP Trust | .mcp.json | 12 | MCP server description contains hidden imperative | MCP05 |
|
||||
| SCN-005 | Output Handling | agents/notes.md | 89 | Markdown link-title injection sink | LLM01 |
|
||||
| SCN-006 | Permissions | .claude/settings.json | 5 | Wildcard `Bash(*)` permission grant | ASI04 |
|
||||
|
||||
### Medium
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| SCN-007 | Supply Chain | package.json | 15 | Dependency `lefthook@1.4.2` flagged by OSV.dev | LLM03 |
|
||||
| SCN-008 | Output Handling | agents/notes.md | 102 | HTML comment node passes through unvalidated | LLM01 |
|
||||
| SCN-009 | Other | CLAUDE.md | 34 | Memory-poisoning pattern: encoded base64 imperative | LLM06 |
|
||||
| SCN-010 | Injection | commands/summarize.md | 14 | Indirect injection via WebFetch result | LLM01 |
|
||||
| SCN-011 | Permissions | agents/test-runner.md | 5 | Tool list includes `Edit` without rationale | ASI04 |
|
||||
| SCN-012 | MCP Trust | .mcp.json | 28 | Per-update drift on `airbnb-mcp` tool description (12.3%) | MCP05 |
|
||||
| SCN-013 | Other | scripts/setup.sh | 3 | curl|sh pattern in install hint | LLM03 |
|
||||
|
||||
### Low
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| SCN-014 | Other | README.md | 88 | Suspicious URL pattern in example | — |
|
||||
| SCN-015 | Other | docs/setup.md | 21 | Outdated security advisory link | — |
|
||||
| SCN-016 | Other | tests/fixtures/poisoned.md | 1 | Test fixture flagged (likely intentional) | — |
|
||||
|
||||
### Info
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| SCN-017 | Other | .gitignore | — | No `.env*` exclusion rule | — |
|
||||
| SCN-018 | Other | LICENSE | — | License missing | — |
|
||||
| SCN-019 | Other | CHANGELOG.md | — | No CHANGELOG present | — |
|
||||
| SCN-020 | Other | SECURITY.md | — | No SECURITY.md disclosure policy | — |
|
||||
| SCN-021 | Other | CONTRIBUTING.md | — | No CONTRIBUTING guidelines | — |
|
||||
|
||||
---
|
||||
|
||||
## OWASP Categorization
|
||||
|
||||
| OWASP Category | Findings | Max Severity | Scanners |
|
||||
|----------------|----------|-------------|----------|
|
||||
| LLM01 — Prompt Injection | 4 | High | skill-scanner, post-mcp-verify |
|
||||
| LLM02 — Sensitive Info Disclosure | 1 | Critical | secrets |
|
||||
| LLM03 — Supply Chain | 2 | Medium | dep-audit |
|
||||
| LLM06 — Excessive Agency | 2 | Critical | toxic-flow, memory |
|
||||
| MCP05 — Tool Description Drift | 2 | High | mcp-cache |
|
||||
| ASI01 — Lethal Trifecta | 1 | Critical | toxic-flow |
|
||||
| ASI04 — Permission Sprawl | 2 | High | permission |
|
||||
|
||||
---
|
||||
|
||||
## Supply Chain Assessment
|
||||
|
||||
| Component | Type | Source | Trust Score | Notes |
|
||||
|-----------|------|--------|-------------|-------|
|
||||
| lefthook | npm | registry | 6/10 | OSV-2024-1234 (medium) |
|
||||
| typescript | npm | registry | 9/10 | clean |
|
||||
| @airbnb/mcp-server | npm | registry | 7/10 | per-update drift detected |
|
||||
|
||||
**Source verification:** registry-only, no Git/private deps detected.
|
||||
|
||||
**Permissions analysis:**
|
||||
- Requested tools: Bash, Read, Write, Edit, WebFetch, Task
|
||||
- Minimum necessary: Read, Bash
|
||||
- Over-permissioned: Write, Edit, WebFetch, Task
|
||||
|
||||
**Supply chain risk summary:** One medium-severity CVE on a build-tool dependency. Recommend bumping `lefthook` to 1.5.0+.
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Immediate:** Rotate `sk-prod-...` API key and remove from `agents/data-analyst.md`. Replace with environment-variable reference.
|
||||
2. **Immediate:** Rewrite `agents/web-helper.md` to drop one of `[Bash, Read, WebFetch]` OR add a hook policy that blocks the trifecta.
|
||||
3. **High:** Update MCP server description in `.mcp.json` (line 12) and run `/security mcp-baseline-reset` after legitimate update.
|
||||
4. **High:** Replace `Bash(*)` with explicit allowlist in `.claude/settings.json`.
|
||||
5. **Medium:** Bump `lefthook` to 1.5.0+ to clear OSV-2024-1234.
|
||||
|
||||
Run `/security clean .` to auto-fix deterministic issues. Re-scan after fixes to confirm BLOCK → WARNING → ALLOW progression.
|
||||
|
||||
---
|
||||
|
||||
*Scan complete. 21 findings across 7 files, 12.4 seconds.*
|
||||
100
plugins/llm-security/playground/test-fixtures/supply-check.md
Normal file
|
|
@ -0,0 +1,100 @@
|
|||
# Supply-Chain Recheck Report
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | supply-check |
|
||||
| **Target** | ~/repos/dft-marketplace |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | npm + pip + cargo lockfiles |
|
||||
| **Frameworks** | OWASP LLM03, NIST SSDF |
|
||||
| **Triggered by** | /security supply-check |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Risk Score** | 22/100 |
|
||||
| **Risk Band** | Medium |
|
||||
| **Grade** | B |
|
||||
| **Verdict** | WARNING |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 0 |
|
||||
| High | 1 |
|
||||
| Medium | 4 |
|
||||
| Low | 2 |
|
||||
| Info | 6 |
|
||||
| **Total** | **13** |
|
||||
|
||||
**Verdict rationale:** 1 HIGH OSV.dev advisory on `lefthook@1.4.2` (CVE-2024-1234, denial-of-service via crafted hook config). 4 MEDIUM typosquat candidates flagged for manual review.
|
||||
|
||||
---
|
||||
|
||||
## Ecosystem Coverage
|
||||
|
||||
| Ecosystem | Lockfile | Packages | OSV.dev Hits | Typosquats |
|
||||
|-----------|----------|---------:|-------------:|-----------:|
|
||||
| npm | package-lock.json | 412 | 1 | 2 |
|
||||
| pip | requirements.txt | 38 | 0 | 1 |
|
||||
| cargo | Cargo.lock | 71 | 0 | 0 |
|
||||
| go | go.sum | 0 | 0 | 0 |
|
||||
| docker | (none) | 0 | 0 | 0 |
|
||||
| **Total** | | **521** | **1** | **3** |
|
||||
|
||||
---
|
||||
|
||||
## Findings
|
||||
|
||||
### High
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| SCR-001 | OSV.dev CVE | package-lock.json | 8421 | lefthook@1.4.2 → CVE-2024-1234 (DoS via crafted hook config) | LLM03 |
|
||||
|
||||
### Medium
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| SCR-002 | Typosquat | package-lock.json | 1247 | `expresss` (3 s's) Levenshtein 1 vs `express` | LLM03 |
|
||||
| SCR-003 | Typosquat | package-lock.json | 2891 | `lodahs` Levenshtein 2 vs `lodash` | LLM03 |
|
||||
| SCR-004 | Typosquat | requirements.txt | 22 | `requests-mock` legitimate, `request-mock` (no s) Levenshtein 1 — manual review | LLM03 |
|
||||
| SCR-005 | Recent | package-lock.json | 5103 | `colorette@3.1.0` published 71 hours ago (<72h gate) | LLM03 |
|
||||
|
||||
### Low
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| SCR-006 | Maintenance | package-lock.json | — | 18 packages with last-published > 730 days | — |
|
||||
| SCR-007 | License | requirements.txt | 12 | `chardet==3.0.4` LGPL-2.1 — verify compatibility | — |
|
||||
|
||||
### Info
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| SCR-008 | Provenance | package-lock.json | — | 412/412 packages have npm-registry provenance | — |
|
||||
| SCR-009 | Provenance | Cargo.lock | — | All 71 crates from crates.io | — |
|
||||
| SCR-010 | Coverage | go.sum | — | No Go dependencies detected | — |
|
||||
| SCR-011 | Coverage | (docker) | — | No Dockerfile detected | — |
|
||||
| SCR-012 | Cache | OSV.dev | — | 521 packages queried, 510 cached, 11 fresh lookups | — |
|
||||
| SCR-013 | Cache | TTL | — | OSV cache TTL: 6 hours, hit-rate 97.9% | — |
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Immediate:** Bump `lefthook` to ≥1.5.0 to clear CVE-2024-1234. Run `npm install lefthook@latest`.
|
||||
2. **High:** Verify `expresss` and `lodahs` are not legitimate packages. Both look like typosquat-bait.
|
||||
3. **Medium:** Wait 24h before pinning `colorette@3.1.0` (currently <72h since publish — supply-chain attack window).
|
||||
4. **Low:** Audit LGPL-2.1 dependency `chardet==3.0.4` for license-compatibility with project license.
|
||||
|
||||
---
|
||||
|
||||
*Supply-chain recheck complete. 521 packages across 3 ecosystems, 13 findings.*
|
||||
124
plugins/llm-security/playground/test-fixtures/threat-model.md
Normal file
|
|
@ -0,0 +1,124 @@
|
|||
# Threat Model — STRIDE + MAESTRO
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | threat-model |
|
||||
| **Target** | DFT data-platform RAG-system |
|
||||
| **System** | rag-platform v3.2.0 |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Framework** | STRIDE + MAESTRO |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Triggered by** | /security threat-model |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Risk Score** | 52/100 |
|
||||
| **Risk Band** | High |
|
||||
| **Grade** | C |
|
||||
| **Verdict** | WARNING |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 1 |
|
||||
| High | 3 |
|
||||
| Medium | 4 |
|
||||
| Low | 2 |
|
||||
| Info | 0 |
|
||||
| **Total** | **10** |
|
||||
|
||||
**Verdict rationale:** 1 CRITICAL on token-theft via cross-tenant context bleed (M5/MAESTRO authorization). 3 HIGH on prompt-injection chains and source-document tampering. Threat model produced; mitigations pending architectural sign-off.
|
||||
|
||||
---
|
||||
|
||||
## Risikomatrise (5×5)
|
||||
|
||||
| Trussel | Sannsynlighet | Konsekvens | Score |
|
||||
|---------|--------------:|-----------:|------:|
|
||||
| TM-001 — Cross-tenant context bleed via index sharing | 4 | 5 | 20 |
|
||||
| TM-002 — Prompt injection via source documents | 4 | 4 | 16 |
|
||||
| TM-003 — Source document tampering (pre-ingest) | 3 | 4 | 12 |
|
||||
| TM-004 — Embedding inversion attack | 2 | 5 | 10 |
|
||||
| TM-005 — RAG output exfil via tool call | 3 | 3 | 9 |
|
||||
| TM-006 — DOS via expensive query patterns | 4 | 2 | 8 |
|
||||
| TM-007 — Authorization bypass on retrieval | 2 | 4 | 8 |
|
||||
| TM-008 — Logging gap for prompt history | 3 | 2 | 6 |
|
||||
| TM-009 — Side-channel via response timing | 2 | 3 | 6 |
|
||||
| TM-010 — Stale embeddings post-rotation | 2 | 2 | 4 |
|
||||
|
||||
---
|
||||
|
||||
## Trusler
|
||||
|
||||
| ID | Beskrivelse | Severity | Mitigation |
|
||||
|----|-------------|----------|-----------|
|
||||
| TM-001 | Cross-tenant context bleed via index sharing — single Azure AI Search index across all tenants | critical | Tenant-isolated indexes OR row-level security with tenant_id filter |
|
||||
| TM-002 | Prompt injection via source documents — adversarial PDF in corpus | high | Trust-Bus wrapper + Constrained Markdown parser + pre-ingest scanning |
|
||||
| TM-003 | Source document tampering pre-ingest — supply chain on doc pipeline | high | Signed manifests + SHA-256 verification at ingest |
|
||||
| TM-004 | Embedding inversion attack — recover source text from embeddings | medium | Use private embedding model OR add noise to stored embeddings |
|
||||
| TM-005 | RAG output exfil via tool call (Bash, WebFetch chained from RAG output) | high | Hook-level data-flow tracking (post-session-guard.mjs trifecta) |
|
||||
| TM-006 | DOS via expensive query patterns | medium | Query budget + per-tenant rate limit |
|
||||
| TM-007 | Authorization bypass on retrieval | medium | Validate tenant_id from auth claim, not request payload |
|
||||
| TM-008 | Logging gap for prompt history | medium | Append-only audit log, retain 90d |
|
||||
| TM-009 | Side-channel via response timing | low | Constant-time response shaping for sensitive paths |
|
||||
| TM-010 | Stale embeddings post-rotation | low | Embedding version tag + rotation playbook |
|
||||
|
||||
---
|
||||
|
||||
## STRIDE Coverage
|
||||
|
||||
| Category | Count | Notes |
|
||||
|----------|------:|-------|
|
||||
| Spoofing | 1 | TM-007 |
|
||||
| Tampering | 2 | TM-003, TM-010 |
|
||||
| Repudiation | 1 | TM-008 |
|
||||
| Information Disclosure | 3 | TM-001, TM-004, TM-009 |
|
||||
| Denial of Service | 1 | TM-006 |
|
||||
| Elevation of Privilege | 2 | TM-002, TM-005 |
|
||||
|
||||
---
|
||||
|
||||
## MAESTRO Coverage
|
||||
|
||||
| Layer | Count | Notes |
|
||||
|-------|------:|-------|
|
||||
| L1 Foundation Models | 0 | Out of scope for this assessment |
|
||||
| L2 Data Operations | 4 | TM-001, TM-003, TM-004, TM-010 |
|
||||
| L3 Agentic Frameworks | 0 | RAG only, no agents in this layer |
|
||||
| L4 Deployment & Infra | 1 | TM-006 |
|
||||
| L5 Evaluation & Observability | 1 | TM-008 |
|
||||
| L6 Security & Compliance | 1 | TM-009 |
|
||||
| L7 Agent Ecosystem | 3 | TM-002, TM-005, TM-007 |
|
||||
|
||||
---
|
||||
|
||||
## Mitigation Roadmap
|
||||
|
||||
| Priority | Trussel | Mitigation | Owner | ETA |
|
||||
|----------|---------|-----------|-------|-----|
|
||||
| P0 | TM-001 | Tenant-isolated indexes | platform-eng | 2026-05-15 |
|
||||
| P0 | TM-002 | Trust-Bus + Constrained Markdown | ai-platform | 2026-05-22 |
|
||||
| P1 | TM-003 | Signed manifests + ingest verification | data-eng | 2026-05-29 |
|
||||
| P1 | TM-005 | Hook-level data-flow tracking | security-eng | 2026-05-22 |
|
||||
| P2 | TM-006, TM-007, TM-008 | Rate limit + auth + audit log | platform-eng | 2026-06-15 |
|
||||
| P3 | TM-004, TM-009, TM-010 | Embedding hardening | research | 2026-Q3 |
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Immediate (P0):** Tenant-isolated indexes — TM-001 is THE critical risk for this multi-tenant RAG.
|
||||
2. **Immediate (P0):** Trust-Bus wrapper and Constrained Markdown parser — TM-002 closes the highest-volume injection vector.
|
||||
3. **High (P1):** Signed-manifest pipeline (TM-003) and hook-level data-flow tracking (TM-005).
|
||||
4. **Medium (P2):** Rate limit + auth fix + audit log — bundled together for one platform-eng sprint.
|
||||
|
||||
---
|
||||
|
||||
*Threat model complete. 10 threats across STRIDE + MAESTRO frameworks. 2 P0, 2 P1.*
|
||||
117
plugins/llm-security/playground/test-fixtures/watch.md
Normal file
|
|
@ -0,0 +1,117 @@
|
|||
# Watch — Continuous Monitoring
|
||||
|
||||
---
|
||||
|
||||
## Header
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Report type** | watch |
|
||||
| **Target** | ~/repos/dft-marketplace |
|
||||
| **Date** | 2026-05-05 |
|
||||
| **Last Run** | 2026-05-05 14:32 |
|
||||
| **Interval** | 6h |
|
||||
| **Version** | llm-security v7.4.0 |
|
||||
| **Scope** | recurring scan diff |
|
||||
| **Triggered by** | /security watch . --interval 6h |
|
||||
|
||||
---
|
||||
|
||||
## Risk Dashboard
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Risk Score** | 31/100 |
|
||||
| **Risk Band** | Medium |
|
||||
| **Grade** | B |
|
||||
| **Verdict** | WARNING |
|
||||
|
||||
| Severity | Count |
|
||||
|----------|------:|
|
||||
| Critical | 0 |
|
||||
| High | 1 |
|
||||
| Medium | 3 |
|
||||
| Low | 1 |
|
||||
| Info | 4 |
|
||||
| **Total** | **9** |
|
||||
|
||||
**Verdict rationale:** Latest scan introduced 1 HIGH (new `Edit(*)` permission) compared to baseline 6h ago. Watch sent notify event to configured channels.
|
||||
|
||||
---
|
||||
|
||||
## Live Meter
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| **Active** | yes |
|
||||
| **Runs (last 24h)** | 4 |
|
||||
| **Last delta** | +1 high, +0 medium |
|
||||
| **Next run** | 2026-05-05 20:32 |
|
||||
| **Notify channels** | email, webhook |
|
||||
|
||||
---
|
||||
|
||||
## Recent History
|
||||
|
||||
| Run | Time | Grade | Risk Score | Δ vs prev |
|
||||
|-----|------|-------|-----------:|-----------|
|
||||
| Current | 2026-05-05 14:32 | B | 31 | +6 |
|
||||
| -6h | 2026-05-05 08:32 | B | 25 | -2 |
|
||||
| -12h | 2026-05-05 02:32 | B | 27 | 0 |
|
||||
| -18h | 2026-05-04 20:32 | B | 27 | -3 |
|
||||
| -24h | 2026-05-04 14:32 | B | 30 | — |
|
||||
|
||||
---
|
||||
|
||||
## Findings
|
||||
|
||||
### High
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| WAT-001 | Permissions | .claude/settings.json | 8 | Newly-introduced `Edit(*)` wildcard (last commit: 4a8c1f, 23min ago) | ASI04 |
|
||||
|
||||
### Medium
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| WAT-002 | Injection | commands/research-v2.md | 22 | New command file added | LLM01 |
|
||||
| WAT-003 | MCP Trust | .mcp.json | 28 | Per-update drift continues on `postgres-readonly` | MCP05 |
|
||||
| WAT-004 | Supply Chain | package-lock.json | 5103 | New dep `husky@9.0.11` < 72h old | LLM03 |
|
||||
|
||||
### Low
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| WAT-005 | Documentation | docs/CHANGELOG.md | 144 | Sensitive path reference added (not exploitable) | — |
|
||||
|
||||
### Info
|
||||
|
||||
| ID | Category | File | Line | Description | OWASP |
|
||||
|----|----------|------|------|-------------|-------|
|
||||
| WAT-006 | Cron | (config) | — | Cron handle: 4f8c (PID 12842) | — |
|
||||
| WAT-007 | Cron | (config) | — | Run-script: ~/.cache/llm-security/watch/run.sh | — |
|
||||
| WAT-008 | Coverage | (target) | — | Lines scanned: 18420 | — |
|
||||
| WAT-009 | Coverage | (target) | — | Files scanned: 312 | — |
|
||||
|
||||
---
|
||||
|
||||
## Notify Events
|
||||
|
||||
| Time | Event | Channel | Status |
|
||||
|------|-------|---------|--------|
|
||||
| 2026-05-05 14:32 | new-finding (high) | email | sent |
|
||||
| 2026-05-05 14:32 | new-finding (high) | webhook | 200 OK |
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Immediate:** Investigate commit 4a8c1f — `Edit(*)` wildcard addition warrants reverting or scope-narrowing.
|
||||
2. **High:** Review newly-added `commands/research-v2.md` for injection-vector placement.
|
||||
3. **Medium:** Drift on `postgres-readonly` has been continuous for 4 runs — may be legitimate upstream change. Run `/security mcp-baseline-reset --target postgres-readonly` after manual verification.
|
||||
4. **Medium:** Wait 24h before pinning `husky@9.0.11` (currently <72h since publish).
|
||||
|
||||
---
|
||||
|
||||
*Watch active. Next run scheduled 2026-05-05 20:32 (6h interval).*
|
||||
|
|
@ -19,7 +19,7 @@ import { scan } from './posture-scanner.mjs';
|
|||
// Constants
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const VERSION = '7.4.0';
|
||||
const VERSION = '7.5.0';
|
||||
|
||||
/** Cache location */
|
||||
const CACHE_DIR = join(homedir(), '.cache', 'llm-security');
|
||||
|
|
|
|||
|
|
@ -49,7 +49,7 @@ import { scan as scanTaint } from './taint-tracer.mjs';
|
|||
import { scan as scanMemoryPoisoning } from './memory-poisoning-scanner.mjs';
|
||||
import { scan as scanSupplyChain } from './supply-chain-recheck.mjs';
|
||||
|
||||
const VERSION = '7.4.0';
|
||||
const VERSION = '7.5.0';
|
||||
const SCANNER = 'IDE';
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
|
|
|
|||
|
|
@ -20,7 +20,7 @@ import { finding, scannerResult, resetCounter } from './lib/output.mjs';
|
|||
// Constants
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const VERSION = '7.4.0';
|
||||
const VERSION = '7.5.0';
|
||||
|
||||
/** Minimum lines for a hook script to be considered non-stub */
|
||||
const NON_STUB_THRESHOLD = 5;
|
||||
|
|
|
|||