chore: WIP marketplace doc adjustments across plugins

Pre-trekexecute snapshot of in-progress CLAUDE.md/SKILL.md edits and
extracted docs/ files. Captured as one commit so /trekexecute claude-design
can run against a clean working tree.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Kjell Tore Guttormsen 2026-05-18 12:04:02 +02:00
commit f460814fe9
26 changed files with 805 additions and 1078 deletions

View file

@ -31,11 +31,11 @@ Legacy bash scripts were removed in v1.0 (available in git history).
## Data storage
```
${CLAUDE_PLUGIN_DATA}/
$CLAUDE_PLUGIN_DATA/
├── sessions.jsonl Compact JSONL, one record per session
├── events.jsonl {ts, session_id, tool_name} per tool call
└── state/
└── {session_id}.json Live state during active session
└── <session_id>.json Live state during active session
```
State files are created at SessionStart and deleted at SessionEnd.
@ -92,20 +92,3 @@ node --test tests/*.test.mjs
- Conventional Commits: `type(scope): description`
- English for all code, comments, and documentation
- Norwegian for project-internal communication
## Communication patterns
### Linking to local files
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
- Always use absolute paths. Never `~/` or relative paths.
- For multiple files, render as a bullet list of named markdown links.
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
Example:
- [Brief](file:///Users/ktg/.../brief.html)
- [Research summary](file:///Users/ktg/.../research/summary.md)

View file

@ -41,20 +41,3 @@ Pipelinen er sannhet inntil release:
5. Review er release-gate
Voyage-policy: Opus på alle sub-agenter og orchestrator-faser.
## Communication patterns
### Linking to local files
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
- Always use absolute paths. Never `~/` or relative paths.
- For multiple files, render as a bullet list of named markdown links.
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
Example:
- [Brief](file:///Users/ktg/.../brief.html)
- [Research summary](file:///Users/ktg/.../research/summary.md)

View file

@ -50,79 +50,6 @@ Analyzes and optimizes Claude Code configuration across three pillars:
| verifier-agent | Verify results | sonnet | purple | Read, Glob, Grep |
| feature-gap-agent | Context-aware feature recommendations | opus | green | Read, Glob, Grep, Write |
## Deterministic Scanners
Node.js scanners (zero external dependencies), run via `node scanners/scan-orchestrator.mjs <path>`.
Posture CLI: `node scanners/posture.mjs <path> [--json] [--global] [--full-machine] [--output-file path]`.
Scanner CLI: `node scanners/scan-orchestrator.mjs <path> [--global] [--full-machine] [--no-suppress]`.
| Scanner | Prefix | Detects |
|---------|--------|---------|
| `claude-md-linter.mjs` | CML | Structure, length, sections, @imports, duplicates, TODOs |
| `settings-validator.mjs` | SET | Schema, unknown/deprecated keys, type mismatches, permissions |
| `hook-validator.mjs` | HKV | Format, script existence, event validity, timeouts |
| `rules-validator.mjs` | RUL | Glob matching, orphan rules, deprecated fields, unscoped rules |
| `mcp-config-validator.mjs` | MCP | Server types, trust levels, env vars, unknown fields |
| `import-resolver.mjs` | IMP | Broken @imports, circular refs, deep chains, tilde paths |
| `conflict-detector.mjs` | CNF | Settings conflicts, permission contradictions, hook duplicates |
| `feature-gap-scanner.mjs` | GAP | 25 feature checks across 4 tiers — shown as opportunities, not grades |
| `token-hotspots.mjs` | TOK | Cache-breaking volatile content, redundant tool permissions, deep import chains, oversized cascade, bloated SKILL.md descriptions, MCP tool-schema budget (Opus 4.7 patterns) |
| `cache-prefix-scanner.mjs` | CPS | Volatile content in lines 31150 of CLAUDE.md cascade (beyond Pattern A's top-30 window) |
| `disabled-in-schema-scanner.mjs` | DIS | Tools listed in BOTH `permissions.deny` AND `permissions.allow` — deny wins, allow entries are dead config |
| `collision-scanner.mjs` | COL | Cross-plugin skill name collisions (low); user-vs-plugin overlaps (medium); `details.namespaces` payload |
### Scanner Lib (`scanners/lib/`)
| Module | Purpose |
|--------|---------|
| `severity.mjs` | Severity constants, risk scoring, verdict logic, `WEIGHTS` named export (v5 F3) |
| `output.mjs` | Finding objects (CA-XXX-NNN format), scanner results, envelope, optional `details` payload (v5 N6) |
| `file-discovery.mjs` | Config file discovery: single-path, multi-path (`discoverConfigFilesMulti`), full-machine (`discoverFullMachinePaths`) |
| `yaml-parser.mjs` | Frontmatter parsing, JSON parsing, @import/section extraction |
| `string-utils.mjs` | Line counting, truncation, similarity, key extraction |
| `scoring.mjs` | Severity-weighted `scoreByArea` (v5 F3), health scorecard, dedup-by-area (v5 N3), `scoringVersion: 'v5'` |
| `backup.mjs` | Backup creation, manifest parsing, checksum verification |
| `diff-engine.mjs` | Drift diffing: diffEnvelopes(), formatDiffReport() |
| `baseline.mjs` | Baseline save/load/list/delete for drift detection |
| `report-generator.mjs` | Unified markdown reports: posture, drift, plugin health |
| `suppression.mjs` | .config-audit-ignore parsing, finding suppression, audit trail |
| `active-config-reader.mjs` | Read-only inventory: readActiveConfig(), detectGitRoot(), walkClaudeMdCascade(), readClaudeJsonProjectSlice() (longest-prefix match), enumeratePlugins(), enumerateSkills(), readActiveHooks(), readActiveMcpServers() (with cache → package.json tool-count fallback), estimateTokens() (v5: `'mcp'` kind = 500 + toolCount × 200) |
| `tokenizer-api.mjs` | Anthropic `count_tokens` wrapper for `--accurate-tokens` (v5 N5); 5s AbortController timeout, exponential 429 backoff, key masking |
| `humanizer.mjs` | Plain-language output translator (v5.1.0): `humanizeFinding`, `humanizeFindings`, `humanizeEnvelope`, `computeRelevanceContext`. Pure functions; never mutate inputs. Adds `userImpactCategory`, `userActionLanguage`, `relevanceContext` fields and replaces title/description/recommendation when a translation exists. Bypassed by `--raw` and `--json` paths. |
| `humanizer-data.mjs` | TRANSLATIONS table for 13 scanner prefixes (CML/SET/HKV/RUL/MCP/IMP/CNF/COL/TOK/CPS/DIS/GAP/PLH). Three-step lookup: exact title → regex pattern → `_default` → fall through to original |
### Action Engines (`scanners/`)
| Module | Purpose |
|--------|---------|
| `fix-engine.mjs` | planFixes(), applyFixes(), verifyFixes() — 9 fix types |
| `rollback-engine.mjs` | listBackups(), restoreBackup(), deleteBackup() |
| `fix-cli.mjs` | CLI: `node fix-cli.mjs <path> [--apply] [--json] [--global]` |
| `drift-cli.mjs` | CLI: `node drift-cli.mjs <path> [--save] [--baseline name] [--json]` |
| `whats-active.mjs` | CLI: `node whats-active.mjs <path> [--json] [--verbose] [--suggest-disables]` — read-only active-config inventory |
| `token-hotspots-cli.mjs` | CLI: `node token-hotspots-cli.mjs <path> [--json] [--global] [--output-file path] [--accurate-tokens] [--with-telemetry-recipe]` — Opus-4.7 token hotspots ranking with optional API calibration |
| `manifest.mjs` | CLI: `node manifest.mjs <path> [--json]` — ranked system-prompt token-source table (v5 N2) |
### Standalone Scanner
| Module | Prefix | Purpose |
|--------|--------|---------|
| `plugin-health-scanner.mjs` | PLH | Plugin structure, frontmatter, cross-plugin conflicts (runs independently) |
| `self-audit.mjs` | — | Runs all scanners + plugin health on this plugin itself |
## Knowledge Base (`knowledge/`)
| File | Content |
|------|---------|
| `claude-code-capabilities.md` | Feature register: 18 config surfaces, Anthropic guidance, relevance table |
| `configuration-best-practices.md` | Per-layer best practices (v5: Opus 4.7 cache-stability guidance replaces Sonnet-era 200-line rule) |
| `anti-patterns.md` | Common mistakes mapped to scanner IDs |
| `hook-events-reference.md` | All 26 hook events with details |
| `feature-evolution.md` | Feature timeline for staleness detection |
| `gap-closure-templates.md` | Config-specific templates for closing gaps |
| `opus-4.7-patterns.md` | Token-cost dynamics for Opus 4.7 era — patterns powering the TOK scanner |
| `cache-telemetry-recipe.md` | Manual `jq` recipe for verifying prompt-cache hit rate from session transcripts (v5 M7) |
## Hooks
| Event | Script | Purpose |
@ -132,56 +59,20 @@ Scanner CLI: `node scanners/scan-orchestrator.mjs <path> [--global] [--full-mach
| SessionStart | `session-start.mjs` | Checks for active (unfinished) sessions |
| Stop | `stop-session-reminder.mjs` | Reminds about current session phase |
## Plain-Language Output (v5.1.0)
## Reference docs (read on demand)
Default output of all 18 commands routes through `humanizeEnvelope` from `lib/humanizer.mjs`. Findings are decorated with three additive fields and may have title/description/recommendation replaced when a translation exists.
- **Scanner inventory, lib modules, action engines, knowledge base:** `docs/scanner-internals.md`
- **Plain-language output (v5.1.0), humanizer vocabularies, output modes:** `docs/humanizer.md`
### Output modes
## Plain-Language Output (v5.1.0) — summary
| Flag | Behavior |
|------|----------|
| (default, no flag) | Plain-language: humanizer applied, findings group by user-impact, titles lead with prose. Self-audit terminal render also humanized. |
| `--raw` | Byte-stable v5.0.0 verbatim — humanizer bypassed, technical IDs and severity-only labels. For tooling that scrapes stderr from v5.0.0. |
| `--json` | Unchanged from v5.0.0 — humanizer bypassed, byte-stable JSON envelope. Always preferred for programmatic consumption over `--raw`. |
| `--output-file <path>` | Writes raw v5.0.0-shape JSON (humanizer bypassed). Posture-specific. |
Default output of all 18 commands routes through `humanizeEnvelope` from `lib/humanizer.mjs`. Findings get three decorated fields:
`--raw` is threaded through every CLI: `posture.mjs`, `scan-orchestrator.mjs`, `token-hotspots-cli.mjs`, `manifest.mjs`, `whats-active.mjs`, `fix-cli.mjs`, `drift-cli.mjs`, `self-audit.mjs`.
- `userImpactCategory` — Configuration mistake / Conflict / Wasted tokens / Dead config / Missed opportunity
- `userActionLanguage` — Fix this now / Fix soon / Fix when convenient / Optional cleanup / FYI (derived from severity)
- `relevanceContext``affects-everyone` (default) / `affects-this-machine-only` (`*.local.*` files) / `test-fixture-no-impact`
### Vocabularies
User-impact category (added to each finding as `userImpactCategory`, derived from scanner prefix):
| Label | Scanners |
|-------|----------|
| Configuration mistake | CML, SET, HKV, RUL, MCP, IMP, PLH |
| Conflict | CNF, COL |
| Wasted tokens | TOK, CPS |
| Dead config | DIS |
| Missed opportunity | GAP |
Action language (added to each finding as `userActionLanguage`, derived from severity):
| Severity | Phrase |
|----------|--------|
| critical | Fix this now |
| high | Fix soon |
| medium | Fix when convenient |
| low | Optional cleanup |
| info | FYI |
Relevance context (added to each finding as `relevanceContext`, computed from finding's file path):
| Value | When |
|-------|------|
| `test-fixture-no-impact` | Path contains `/tests/fixtures/` or `/test/fixtures/` |
| `affects-this-machine-only` | Basename matches `*.local.*` (e.g., `settings.local.json`) |
| `affects-everyone` | Default — assumed shared/committed config |
### Wave 5 lessons
- Posture's stderr scorecard is rendered prose-side and is not part of the JSON envelope; `humanized.areas[].titleHumanized` referenced by command templates lives only in the prose render.
- Posture's `--output-file` writes raw v5.0.0-shape JSON because `posture.mjs` does not call `humanizeEnvelope`. If session-files should later be humanized, posture needs its own humanize pass — out of v5.1.0 scope.
- The default-output snapshot at `tests/snapshots/default-output/posture.json` is frozen — change requires `UPDATE_SNAPSHOT=1` plus intent confirmation.
`--raw` bypasses the humanizer for byte-stable v5.0.0 output. `--json` is also byte-stable. Full detail and Wave 5 lessons: `docs/humanizer.md`.
## Suppressions
@ -225,20 +116,3 @@ node --test 'tests/**/*.test.mjs'
- Session directories accumulate — use `/config-audit cleanup` to manage
- Scanners run on Node.js >= 18 (uses node:test, node:fs/promises)
- Plugin CLAUDE.md files in node_modules should be excluded via scope
## Communication patterns
### Linking to local files
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
- Always use absolute paths. Never `~/` or relative paths.
- For multiple files, render as a bullet list of named markdown links.
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
Example:
- [Brief](file:///Users/ktg/.../brief.html)
- [Research summary](file:///Users/ktg/.../research/summary.md)

View file

@ -0,0 +1,52 @@
# Config-Audit — Plain-language output (v5.1.0)
Imported from `CLAUDE.md` via pointer.
Default output of all 18 commands routes through `humanizeEnvelope` from `lib/humanizer.mjs`. Findings are decorated with three additive fields and may have title/description/recommendation replaced when a translation exists.
## Output modes
| Flag | Behavior |
|------|----------|
| (default, no flag) | Plain-language: humanizer applied, findings group by user-impact, titles lead with prose. Self-audit terminal render also humanized. |
| `--raw` | Byte-stable v5.0.0 verbatim — humanizer bypassed, technical IDs and severity-only labels. For tooling that scrapes stderr from v5.0.0. |
| `--json` | Unchanged from v5.0.0 — humanizer bypassed, byte-stable JSON envelope. Always preferred for programmatic consumption over `--raw`. |
| `--output-file <path>` | Writes raw v5.0.0-shape JSON (humanizer bypassed). Posture-specific. |
`--raw` is threaded through every CLI: `posture.mjs`, `scan-orchestrator.mjs`, `token-hotspots-cli.mjs`, `manifest.mjs`, `whats-active.mjs`, `fix-cli.mjs`, `drift-cli.mjs`, `self-audit.mjs`.
## Vocabularies
User-impact category (added to each finding as `userImpactCategory`, derived from scanner prefix):
| Label | Scanners |
|-------|----------|
| Configuration mistake | CML, SET, HKV, RUL, MCP, IMP, PLH |
| Conflict | CNF, COL |
| Wasted tokens | TOK, CPS |
| Dead config | DIS |
| Missed opportunity | GAP |
Action language (added to each finding as `userActionLanguage`, derived from severity):
| Severity | Phrase |
|----------|--------|
| critical | Fix this now |
| high | Fix soon |
| medium | Fix when convenient |
| low | Optional cleanup |
| info | FYI |
Relevance context (added to each finding as `relevanceContext`, computed from finding's file path):
| Value | When |
|-------|------|
| `test-fixture-no-impact` | Path contains `/tests/fixtures/` or `/test/fixtures/` |
| `affects-this-machine-only` | Basename matches `*.local.*` (e.g., `settings.local.json`) |
| `affects-everyone` | Default — assumed shared/committed config |
## Wave 5 lessons
- Posture's stderr scorecard is rendered prose-side and is not part of the JSON envelope; `humanized.areas[].titleHumanized` referenced by command templates lives only in the prose render.
- Posture's `--output-file` writes raw v5.0.0-shape JSON because `posture.mjs` does not call `humanizeEnvelope`. If session-files should later be humanized, posture needs its own humanize pass — out of v5.1.0 scope.
- The default-output snapshot at `tests/snapshots/default-output/posture.json` is frozen — change requires `UPDATE_SNAPSHOT=1` plus intent confirmation.

View file

@ -0,0 +1,76 @@
# Config-Audit — Scanner internals
Detailed scanner inventory, lib modules, action engines, knowledge base. Imported from `CLAUDE.md` via pointer.
## Deterministic Scanners
Node.js scanners (zero external dependencies), run via `node scanners/scan-orchestrator.mjs <path>`.
Posture CLI: `node scanners/posture.mjs <path> [--json] [--global] [--full-machine] [--output-file path]`.
Scanner CLI: `node scanners/scan-orchestrator.mjs <path> [--global] [--full-machine] [--no-suppress]`.
| Scanner | Prefix | Detects |
|---------|--------|---------|
| `claude-md-linter.mjs` | CML | Structure, length, sections, @imports, duplicates, TODOs |
| `settings-validator.mjs` | SET | Schema, unknown/deprecated keys, type mismatches, permissions |
| `hook-validator.mjs` | HKV | Format, script existence, event validity, timeouts |
| `rules-validator.mjs` | RUL | Glob matching, orphan rules, deprecated fields, unscoped rules |
| `mcp-config-validator.mjs` | MCP | Server types, trust levels, env vars, unknown fields |
| `import-resolver.mjs` | IMP | Broken @imports, circular refs, deep chains, tilde paths |
| `conflict-detector.mjs` | CNF | Settings conflicts, permission contradictions, hook duplicates |
| `feature-gap-scanner.mjs` | GAP | 25 feature checks across 4 tiers — shown as opportunities, not grades |
| `token-hotspots.mjs` | TOK | Cache-breaking volatile content, redundant tool permissions, deep import chains, oversized cascade, bloated SKILL.md descriptions, MCP tool-schema budget (Opus 4.7 patterns) |
| `cache-prefix-scanner.mjs` | CPS | Volatile content in lines 31150 of CLAUDE.md cascade (beyond Pattern A's top-30 window) |
| `disabled-in-schema-scanner.mjs` | DIS | Tools listed in BOTH `permissions.deny` AND `permissions.allow` — deny wins, allow entries are dead config |
| `collision-scanner.mjs` | COL | Cross-plugin skill name collisions (low); user-vs-plugin overlaps (medium); `details.namespaces` payload |
## Scanner Lib (`scanners/lib/`)
| Module | Purpose |
|--------|---------|
| `severity.mjs` | Severity constants, risk scoring, verdict logic, `WEIGHTS` named export (v5 F3) |
| `output.mjs` | Finding objects (CA-XXX-NNN format), scanner results, envelope, optional `details` payload (v5 N6) |
| `file-discovery.mjs` | Config file discovery: single-path, multi-path (`discoverConfigFilesMulti`), full-machine (`discoverFullMachinePaths`) |
| `yaml-parser.mjs` | Frontmatter parsing, JSON parsing, @import/section extraction |
| `string-utils.mjs` | Line counting, truncation, similarity, key extraction |
| `scoring.mjs` | Severity-weighted `scoreByArea` (v5 F3), health scorecard, dedup-by-area (v5 N3), `scoringVersion: 'v5'` |
| `backup.mjs` | Backup creation, manifest parsing, checksum verification |
| `diff-engine.mjs` | Drift diffing: diffEnvelopes(), formatDiffReport() |
| `baseline.mjs` | Baseline save/load/list/delete for drift detection |
| `report-generator.mjs` | Unified markdown reports: posture, drift, plugin health |
| `suppression.mjs` | .config-audit-ignore parsing, finding suppression, audit trail |
| `active-config-reader.mjs` | Read-only inventory: readActiveConfig(), detectGitRoot(), walkClaudeMdCascade(), readClaudeJsonProjectSlice() (longest-prefix match), enumeratePlugins(), enumerateSkills(), readActiveHooks(), readActiveMcpServers() (with cache → package.json tool-count fallback), estimateTokens() (v5: `'mcp'` kind = 500 + toolCount × 200) |
| `tokenizer-api.mjs` | Anthropic `count_tokens` wrapper for `--accurate-tokens` (v5 N5); 5s AbortController timeout, exponential 429 backoff, key masking |
| `humanizer.mjs` | Plain-language output translator (v5.1.0): `humanizeFinding`, `humanizeFindings`, `humanizeEnvelope`, `computeRelevanceContext`. Pure functions; never mutate inputs. Adds `userImpactCategory`, `userActionLanguage`, `relevanceContext` fields and replaces title/description/recommendation when a translation exists. Bypassed by `--raw` and `--json` paths. |
| `humanizer-data.mjs` | TRANSLATIONS table for 13 scanner prefixes (CML/SET/HKV/RUL/MCP/IMP/CNF/COL/TOK/CPS/DIS/GAP/PLH). Three-step lookup: exact title → regex pattern → `_default` → fall through to original |
## Action Engines (`scanners/`)
| Module | Purpose |
|--------|---------|
| `fix-engine.mjs` | planFixes(), applyFixes(), verifyFixes() — 9 fix types |
| `rollback-engine.mjs` | listBackups(), restoreBackup(), deleteBackup() |
| `fix-cli.mjs` | CLI: `node fix-cli.mjs <path> [--apply] [--json] [--global]` |
| `drift-cli.mjs` | CLI: `node drift-cli.mjs <path> [--save] [--baseline name] [--json]` |
| `whats-active.mjs` | CLI: `node whats-active.mjs <path> [--json] [--verbose] [--suggest-disables]` — read-only active-config inventory |
| `token-hotspots-cli.mjs` | CLI: `node token-hotspots-cli.mjs <path> [--json] [--global] [--output-file path] [--accurate-tokens] [--with-telemetry-recipe]` — Opus-4.7 token hotspots ranking with optional API calibration |
| `manifest.mjs` | CLI: `node manifest.mjs <path> [--json]` — ranked system-prompt token-source table (v5 N2) |
## Standalone Scanner
| Module | Prefix | Purpose |
|--------|--------|---------|
| `plugin-health-scanner.mjs` | PLH | Plugin structure, frontmatter, cross-plugin conflicts (runs independently) |
| `self-audit.mjs` | — | Runs all scanners + plugin health on this plugin itself |
## Knowledge Base (`knowledge/`)
| File | Content |
|------|---------|
| `claude-code-capabilities.md` | Feature register: 18 config surfaces, Anthropic guidance, relevance table |
| `configuration-best-practices.md` | Per-layer best practices (v5: Opus 4.7 cache-stability guidance replaces Sonnet-era 200-line rule) |
| `anti-patterns.md` | Common mistakes mapped to scanner IDs |
| `hook-events-reference.md` | All 26 hook events with details |
| `feature-evolution.md` | Feature timeline for staleness detection |
| `gap-closure-templates.md` | Config-specific templates for closing gaps |
| `opus-4.7-patterns.md` | Token-cost dynamics for Opus 4.7 era — patterns powering the TOK scanner |
| `cache-telemetry-recipe.md` | Manual `jq` recipe for verifying prompt-cache hit rate from session transcripts (v5 M7) |

View file

@ -63,20 +63,3 @@ node --test plugins/graceful-handoff/tests/
- v1.0.0 (2026-04-19): initial declarative command
- v2.0.0 (2026-05-01): skill-arkitektur + JSON-pipeline + 3 hooks + auto-trigger (BREAKING)
- v2.1.0 (2026-05-01): modell-bevisst kontekstvindu — 4-stegs resolution-kjede (used_percentage → payload-size → model-map → 1M default). Fikser for-tidlig auto-handoff på Opus 4.7
## Communication patterns
### Linking to local files
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
- Always use absolute paths. Never `~/` or relative paths.
- For multiple files, render as a bullet list of named markdown links.
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
Example:
- [Brief](file:///Users/ktg/.../brief.html)
- [Research summary](file:///Users/ktg/.../research/summary.md)

View file

@ -45,20 +45,3 @@ If `keep-coding-instructions` is removed or set to `false`, Claude Code will str
- Per-plugin variants (code-focused, deep-technical, etc.) — would belong in a future v1.1 if there's real demand
- Forcing the style on other plugins — it remains opt-in. Other plugins may reference it in their READMEs.
- Translation of the style file itself into Norwegian — defeats the purpose of language-agnostic instruction
## Communication patterns
### Linking to local files
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
- Always use absolute paths. Never `~/` or relative paths.
- For multiple files, render as a bullet list of named markdown links.
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
Example:
- [Brief](file:///Users/ktg/.../brief.html)
- [Research summary](file:///Users/ktg/.../research/summary.md)

View file

@ -98,20 +98,3 @@ All content commands (post, quick, react, pipeline, first-post, video, multiplat
5. Topic must align with user's 5 core expertise areas (360Brew signal)
6. Topic rotation: no back-to-back same pillar, no pillar >50% in 14 days (warn-only)
7. Progressive onboarding: personalization score hidden until 3+ posts; voice guardian suppressed until 5+ voice samples
## Communication patterns
### Linking to local files
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
- Always use absolute paths. Never `~/` or relative paths.
- For multiple files, render as a bullet list of named markdown links.
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
Example:
- [Brief](file:///Users/ktg/.../brief.html)
- [Research summary](file:///Users/ktg/.../research/summary.md)

View file

@ -2,167 +2,7 @@
Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1822+ unit, integration, and end-to-end tests (`tests/e2e/` covers the multi-hook attack chain, multi-session state simulation, and the full scan-orchestrator pipeline); mutation-testing coverage not published.
**v7.0.0 — Severity-dominated risk scoring (v2 model, BREAKING).** Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise):
1. **Risk-score v2 formula** (`scanners/lib/severity.mjs`) — severity-dominated, log-scaled within tier. Replaces v1 sum-and-cap that collapsed every non-trivial scan to 100/Extreme. Tiers: critical → 7095, high only → 4065, medium only → 1535, low only → 111. Verdict cutoffs realigned to new bands (BLOCK ≥65, WARNING ≥15). `info` findings are observability-only — counted in OWASP aggregates but contribute zero to risk_score, verdict, and riskBand (B3, v7.2.0 — was undocumented pre-7.2.0). See `severity.mjs` JSDoc for full contract.
2. **Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy** — extensions skipped (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`); line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source.
3. **DEP typosquat allowlist expansion** — 22 npm + 5 PyPI entries for short-name tools that tripped Levenshtein detection on every modern codebase (`knip`, `oxlint`, `tsx`, `nx`, `rimraf`, `uv`, `ruff`, etc.).
See `docs/security-hardening-guide.md` §6 for the calibration story.
**v7.1.1 — Scan-rapport narrative coherence (patch).** Three coordinated
edits address the whiplash symptom that survived v7.0.0 (numbers fixed,
narrative still walked findings back as "false positive" in prose):
(a) `agents/skill-scanner-agent.md` Step 2.5 mandates context-first
severity assignment — every signal has exactly one disposition (suppressed
OR reported), no per-finding walk-back; (b) `templates/unified-report.md`
gains a `### Narrative Audit` block in Executive Summary surfacing
`summary.narrative_audit.suppressed_findings.{count, by_category}` from
the agent's trailing JSON; (c) both files updated from stale v1
risk-formula constants to the v2 model that has been authoritative in
`severity.mjs` since v7.0.0. Counter is distinct from the existing
top-level `output.suppressed` (`.llm-security-ignore` rule integer).
Out-of-scope but flagged: `commands/scan.md:113-114` retains the v1
formula; resolution deferred to Batch B.
**v7.3.0 — MCP cumulative-drift baseline (in progress, Wave C of Batch C).**
Closes E14 from `docs/critical-review-2026-04-20.md`. The
`mcp-description-cache.mjs` schema gains a sticky `baseline` slot per
tool plus a 10-event rolling `history` array (FIFO). Cumulative drift =
`levenshtein(current, baseline) / max(|current|, |baseline|)`; when the
ratio crosses `mcp.cumulative_drift_threshold` (default 0.25),
`post-mcp-verify.mjs` emits a separate MEDIUM `mcp-cumulative-drift`
advisory. The existing per-update >10% drift signal is unchanged — both
fire independently. Slow-burn rug-pulls that keep each update under the
per-update threshold but cumulatively diverge from baseline are now
caught. Baseline survives the 7-day TTL purge so detection persists
across the full window. New `/security mcp-baseline-reset` slash command
(plus `scanners/mcp-baseline-reset.mjs` CLI: `--list`, `--target <tool>`,
or no-args clear-all) lets the user acknowledge a legitimate MCP server
upgrade — clearing the baseline causes the next call to seed a fresh
one from the incoming description; description, firstSeen, lastSeen, and
history are preserved for audit. `LLM_SECURITY_MCP_CACHE_FILE` env var
overrides the cache path for end-to-end testing without polluting the
user's real `~/.cache/llm-security/mcp-descriptions.json`.
**v7.3.0 — Env-var deprecation warnings (D3 of Batch C, Wave D).**
Closes 8.7 from `.claude/projects/2026-04-29-batch-c-scope-finalize/plan.md`.
`scanners/lib/policy-loader.mjs` exports a new helper
`getPolicyValueWithEnvWarn(section, key, envVarName, defaultValue)`
env still wins per Preferences (existing behaviour), but when both the
env-var AND the `policy.json` key are explicitly set, the helper emits a
single per-process stderr line: `[llm-security] Deprecation: env-var
${ENVVAR} will be removed in v8.0.0; policy.json key ${section}.${key}
also set — env wins for now. Suppress with LLM_SECURITY_DEPRECATION_QUIET=1.`
Module-scoped `Set` dedupes per env-var name across call-sites. Four
overlapping vars are wired through the helper:
`LLM_SECURITY_INJECTION_MODE``injection.mode` (in
`pre-prompt-inject-scan.mjs`), `LLM_SECURITY_TRIFECTA_MODE`
`trifecta.mode` and `LLM_SECURITY_ESCALATION_WINDOW`
`trifecta.escalation_window` (in `post-session-guard.mjs`),
`LLM_SECURITY_AUDIT_LOG``audit.log_path` (in
`scanners/lib/audit-trail.mjs`). `DEFAULT_POLICY` gains
`trifecta.escalation_window: 5` to close the gap noted in the plan
revisions table (M10). Env-only vars without policy.json equivalents
(`LLM_SECURITY_UPDATE_CHECK`, `LLM_SECURITY_PRECOMPACT_MODE`,
`LLM_SECURITY_PRECOMPACT_MAX_BYTES`, `LLM_SECURITY_IDE_ROOTS`,
`LLM_SECURITY_MCP_CACHE_FILE`) are unchanged — they emit no
deprecation signal because there is nothing to deprecate yet.
**v7.5.0 — Playground (additive surface, no scanner/hook behavior changes).**
Single-file SPA at `playground/llm-security-playground.html` (~10 200 lines)
for onboarding, demo og workshop-bruk uten Claude Code-installasjon. Parser
+ renderer for alle 18 `produces_report=true`-kommandoer i `CATALOG`. State
i IndexedDB primær (`llm-security-playground-v1`) med localStorage-fallback,
sirkelfri Proxy + EventTarget store, microtask-batchet render. Theme-bootstrap
med FOUC-prevention. 4 overflater: onboarding (5 grupper) → home (3 tracks)
→ catalog (20 kommandoer) ⇄ project (rapporter / oversikt / kontekst /
eksport). Demo-state har tre prosjekter inline; `dft-komplett-demo` har alle
18 rapporter ferdig parsed for klikk-gjennom. Vendor-synket design-system
under `playground/vendor/playground-design-system/` (sjekksum-låst via
`MANIFEST.json`, redigeres aldri direkte). Test-fixtures under
`playground/test-fixtures/` (én markdown-fil per kommando) er kontrakt-anker
for parser-utvikling. Skjermdumper i `playground/screenshots/v7.5.0/`.
Eksponerte vinduer-globaler for testing/automasjon: `__store`, `__navigate`,
`__loadDemoState`, `__scheduleRender`, `__PARSERS`, `__RENDERERS`, `__CATALOG`,
`__inferVerdict`, `__inferKeyStats`, `__renderPageShell`, `__handlePasteImport`.
Inkluderer fix av `normalizeVerdictText` regex-rekkefølge: GO-WITH-CONDITIONS
sjekkes før GO så betinget verdict ikke kollapser til ALLOW.
**v7.6.0 — Playground Tier 3-referanse-case (additive surface, no
scanner/hook behavior changes).** Playgroundet er nå en visuelt og
strukturelt fullført referanse-implementasjon for
`shared/playground-design-system/` Tier 3-supplementet. 8 nye Tier 3-
komponenter integrert i de 18 rapport-rendererne: `tfa-flow` + `tfa-leg`
+ `tfa-arrow` (lethal trifecta-kjede med `<button>`-elementer + ARIA-
group/aria-label) i `renderScan` + `renderDeepScan`; `mat-ladder` +
`mat-step` (5-trinns modenhets-stige med terskler 0/25/50/75/95% PASS)
i `renderPosture`; `suppressed-group` (narrative-audit fra
`summary.narrative_audit.suppressed_findings`) i `renderScan` +
`renderDeepScan`; `codepoint-reveal` + `cp-tag`/`cp-zw`/`cp-bidi`
(Unicode-steganografi side-ved-side reveal med U+200B-D|FEFF|2060|180E
`cp-zw`, U+202A-E|2066-9 → `cp-bidi`-detection) i `renderMcpInspect`;
`top-risks` + `top-risk[data-severity]` (rangert top-funn-listing,
semantisk `<ol>`, ekskluderer info-funn) i `renderScan`/`renderDeepScan`/
`renderPluginAudit`/`renderPosture`/`renderAudit`; utvidet
`recommendation-card[data-severity]` (severity-tinted advisory) på alle
inline-bruk + nye per-bucket advisory-cards i `renderClean` + intro
snapshot + diff-rows i `renderHarden` (action-mapping CREATE→positive /
APPEND→medium / MERGE→low / SKIP→low); `risk-meter` (band-visualisering
0-100 med Low/Medium/High/Critical/Extreme bands) på 5 archetypes
(scan, deep-scan, plugin-audit, audit, red-team); `card--severity-{level}`
modifier på `findings__item`-cards. Wave 1 (Sesjon 2) la til
`badge--scope-security` (identitets-chip), `verdict-pill-lg` med
`__verdict`+`__sub` (erstatter custom verdict-pill på alle 18 rapport-
typer), og DS Tier 3 `form-progress` + `fp-step` i onboarding-wizard.
Wave 0 (Sesjon 1) slettet ~30 duplikat-CSS-deklarasjoner fra `<style>`-
blokken (DS vinner cascade) og harmoniserte page-shell på alle 4
overflater. 5 nye DS-helpers: `renderToxicFlow`, `renderMatLadder`,
`renderSuppressedGroup`, `renderCodepointReveal`, `renderTopRisks`.
2 nye normaliserings-helpers: `mapSeverityToCardLevel(input)` (severity
+ action-types til DS-konvensjoner) og `parseNarrativeAudit(md)`. 12
skjermdumper planlagt i `playground/screenshots/v7.6.0/`. A11Y-rapport
oppdatert (`playground/A11Y-RAPPORT.md`) — WCAG 2.1 AA bekreftet,
severity-soft fargepar verifisert, semantiske elementer (`<ol>`,
`<button>`, `<section>`) erstatter generic `<div>`. Filendring totalt
over 5 sesjoner: 10209 → 10677 linjer. Kjent begrensning: `parsed.findings`
er tom for `deep-scan`/`audit` demo-fixturer (parser-begrensning,
ikke fikset i v7.6.0 — sporet for v7.6.x patch).
**v7.6.1 — Playground visuell-patch (no scanner/hook behavior changes).**
Seks bugs fanget av maintainer ved manuell verifisering i nettleser
etter v7.6.0-release. Alle skyldtes mismatch mellom DS-klasser og
hvordan playground-rendrere brukte dem (eller manglende DS-
implementasjoner av klasser playground-rendrere antok eksisterte).
(1) `renderFindingsBlock` brukte `.findings` outer-class som DS har som
2-kolonners grid (`grid-template-columns: 360px 1fr`) for list+detail-
panel-layout — playground brukte den uten detail-panel, headeren havnet
i venstre 360px-kolonne, items i 1fr. Erstattet med
`<section class="report-meta">` + `<h4>` + korrekt `findings__list >
findings__group > findings__group-header + findings__items`-mønster.
(2) `.report-table` manglet helt i DS men brukes i 7+ rendrere (OWASP-
kategorier, Supply chain, Scanner Risk Matrix, Plugin-meta, Permission-
matrise, Live-meter, Siste runs, Godkjenninger, Mitigation roadmap) —
lagt lokal CSS-implementasjon i playground-HTML `<style>`-blokk
(border-collapse, zebra-hover, header-styling). (3) `renderPreDeploy`
traffic-lights brukte `.sm-card__grade` som er fast 28×28 px (designet
for én A-F-bokstav) — kuttet "PASS" til "AS" og "PASS-WITH-NOTES" til
"PASS-WITH-..." i alle traffic-light-cards. Erstattet med bredde-
tilpasset status-pill via inline styling (severity-soft + on tokens).
(4) Threat-model matrix-bobler ikke klikkbare — `<span>` uten event-
handler. Erstattet med `<button type="button" data-threat-id>` +
`aria-label`. Click-handler scroller til tilsvarende rad i Trusler-
tabellen og fremhever den i 1.6 sek. (5) Radar-labels overlappet ved 6+
akser — alle brukte `text-anchor="middle"` med samme offset. Økt SVG-
størrelse fra 280×280 til 380×380, radius fra 105 til 125, bytter
`text-anchor` fra `middle` til `start`/`end` basert på horisontal-
posisjon (`Math.cos(ang)` > 0.2 / < -0.2 / mellom). (6)
`recommendation-card__body` tekstoverflyt på lange single-line tekster
(vilkår, owner-tags, dato) — lagt `overflow-wrap: anywhere; word-break:
break-word` i lokal `<style>`-blokk. 4/4 fix-spesifikke smoke-tester
passerer + 18/18 renderere produserer fortsatt komplett HTML mot
`dft-komplett-demo` (regresjons-test). Filendring 10677 → 10753 linjer
(+76 netto).
Release notes for v7.0.0 → v7.6.1: see `docs/version-history.md` — read on demand.
## Commands
@ -176,7 +16,7 @@ passerer + 18/18 renderere produserer fortsatt komplett HTML mot
| `/security mcp-audit [--live]` | MCP server config audit (add `--live` for runtime inspection) |
| `/security mcp-inspect` | Live MCP server inspection — connect via JSON-RPC 2.0, scan tool descriptions |
| `/security mcp-baseline-reset` | Reset MCP description baseline cache (E14, v7.3.0) — after legitimate MCP server upgrade |
| `/security ide-scan [target\|url]` | Scan installed VS Code + JetBrains extensions/plugins — OR fetch a remote VSIX from Marketplace, OpenVSX, or direct URL (v6.4.0), OR a JetBrains plugin from `plugins.jetbrains.com` (v6.6.0). 7 VS Code checks + 7 JetBrains-specific checks (theme-with-code, broad activation, Premain-Class instrumentation, native binaries, depends-chain, typosquat, shaded jars). Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps — no fuzz-testing results published to date). Orchestrates reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension. Offline by default, `--online` opt-in |
| `/security ide-scan [target\|url]` | Scan installed VS Code + JetBrains extensions/plugins, or fetch a remote VSIX/JetBrains plugin via URL. Details: `docs/scanner-reference.md` |
| `/security posture` | Quick scorecard (13 categories) |
| `/security threat-model` | Interactive STRIDE/MAESTRO session |
| `/security diff [path]` | Compare scan against baseline — shows new/resolved/unchanged/moved |
@ -186,7 +26,7 @@ passerer + 18/18 renderere produserer fortsatt komplett HTML mot
| `/security clean [path]` | Scan + remediate (auto/semi-auto/manual) |
| `/security dashboard` | Cross-project security dashboard — machine-wide posture overview |
| `/security harden [path]` | Generate Grade A config — settings.json, CLAUDE.md, .gitignore |
| `/security red-team [--category] [--adaptive]` | Attack simulation — 64 scenarios across 12 categories against plugin hooks. `--adaptive` for mutation-based evasion testing |
| `/security red-team [--category] [--adaptive]` | Attack simulation — 64 scenarios across 12 categories against plugin hooks |
| `/security pre-deploy` | Pre-deployment checklist |
## Agents
@ -206,185 +46,47 @@ passerer + 18/18 renderere produserer fortsatt komplett HTML mot
|--------|-------|---------|---------|
| `pre-prompt-inject-scan.mjs` | UserPromptSubmit | — | Block prompt injection, warn on manipulation (incl. oversight evasion, HTML obfuscation, MEDIUM advisory for leetspeak/homoglyphs/zero-width/multi-lang). Unicode Tag steganography detection. Mode: `LLM_SECURITY_INJECTION_MODE=block\|warn\|off` |
| `pre-edit-secrets.mjs` | PreToolUse | `Edit\|Write` | Block credentials in files |
| `pre-bash-destructive.mjs` | PreToolUse | `Bash` | Block rm -rf, curl\|sh, fork bombs, eval. Bash evasion normalization (T1-T6 via `bash-normalize.mjs`: empty quotes, ${} expansion, backslash splitting, IFS, ANSI-C hex) — defense-in-depth mot T1-T6; Claude Code 2.1.98+ dekker harness-nivå |
| `pre-bash-destructive.mjs` | PreToolUse | `Bash` | Block rm -rf, curl\|sh, fork bombs, eval. Bash evasion normalization (T1-T6 via `bash-normalize.mjs`) — defense-in-depth |
| `pre-install-supply-chain.mjs` | PreToolUse | `Bash` | Block compromised packages across ALL ecosystems. Bash evasion normalization before gate matching |
| `pre-write-pathguard.mjs` | PreToolUse | `Write` | Block writes to .env, .ssh/, .aws/, credentials, settings |
| `post-mcp-verify.mjs` | PostToolUse | — (all) | Injection scan on ALL tool output (incl. MEDIUM patterns, HITL traps, sub-agent spawn, NL indirection, cognitive load, hybrid P2SQL/recursive/XSS). HTML content trap detection. Bash-specific: secrets/URLs/size. MCP: per-update description drift (MCP05) AND cumulative drift vs sticky baseline (E14, v7.3.0) — slow-burn rug-pulls that stay under the per-update threshold but diverge >=25% from baseline emit MEDIUM `mcp-cumulative-drift` advisory. Per-tool volume tracking |
| `post-session-guard.mjs` | PostToolUse | — (all) | Runtime trifecta detection (Rule of Two). Sliding window (20 calls) + 100-call long-horizon. MCP-concentrated trifecta (same server = elevated severity). Sensitive path + exfil detection. Slow-burn trifecta (legs >50 calls apart = MEDIUM). Behavioral drift detection (Jensen-Shannon divergence). CaMeL-inspired data flow tagging (SHA-256 provenance tracking, output→input linking). Mode: `LLM_SECURITY_TRIFECTA_MODE=block\|warn\|off` (default: warn). Cumulative data volume tracking (100KB/500KB/1MB thresholds). Sub-agent delegation tracking (Task/Agent tools): escalation-after-input advisory when delegation occurs within `LLM_SECURITY_ESCALATION_WINDOW` calls (default 5) of untrusted input (DeepMind Agent Traps kat. 4); secondary 20-call MEDIUM advisory catches slow-burn variants outside the primary window (E17, v7.2.0) |
| `post-mcp-verify.mjs` | PostToolUse | — (all) | Injection scan on ALL tool output. MCP per-update drift + cumulative drift vs sticky baseline (E14, v7.3.0). Per-tool volume tracking |
| `post-session-guard.mjs` | PostToolUse | — (all) | Runtime trifecta detection (Rule of Two). Sliding window + long-horizon. Behavioral drift (Jensen-Shannon). Mode: `LLM_SECURITY_TRIFECTA_MODE=block\|warn\|off` (default: warn) |
| `update-check.mjs` | UserPromptSubmit | — | Checks for newer versions (max 1x/24h, cached). Disable: `LLM_SECURITY_UPDATE_CHECK=off` |
| `pre-compact-scan.mjs` | PreCompact | — | Scan transcript for injection patterns + credentials before context compaction; prevents poisoned content from surviving in compact form. Reads at most last 512 KB for <500ms latency. Mode: `LLM_SECURITY_PRECOMPACT_MODE=block\|warn\|off` (default: warn). Cap: `LLM_SECURITY_PRECOMPACT_MAX_BYTES` |
| `pre-compact-scan.mjs` | PreCompact | — | Scan transcript for injection + credentials before context compaction. Reads at most last 512 KB. Mode: `LLM_SECURITY_PRECOMPACT_MODE=block\|warn\|off` (default: warn) |
> `pre-install-supply-chain.mjs` covers 7 package managers: npm/yarn/pnpm, pip/pip3/uv, brew, docker, go, cargo, gem. Per-ecosystem blocklists, age gate (<72h), npm audit (critical=block, high=warn), PyPI API inspection, Levenshtein typosquat detection, Docker image verification.
Scanner internals, CLI surface, CI/CD templates, knowledge files, and runnable examples: see `docs/scanner-reference.md`.
Defense philosophy (v5.0), Opus 4.7 alignment, known limitations: see `docs/defense-philosophy.md`.
## Remote Repo Support
`scan` and `plugin-audit` accept GitHub URLs directly. The command clones to a temp dir via `scanners/lib/git-clone.mjs`, scans locally, then cleans up. Use `--branch <name>` for non-default branches.
**Clone sandboxing (v5.1):** `git clone` executes code via `.gitattributes` filter/smudge drivers — this is a known attack vector. Two layers of defense:
**Clone sandboxing (v5.1):** Two layers of defense against `git clone` filter/smudge driver attacks:
1. **Git config flags (all platforms):** `core.hooksPath=/dev/null`, `core.symlinks=false`, `core.fsmonitor=false`, all LFS filter drivers disabled, `protocol.file.allow=never`, `transfer.fsckObjects=true`. Environment: `GIT_CONFIG_NOSYSTEM=1`, `GIT_CONFIG_GLOBAL=/dev/null`, `GIT_ATTR_NOSYSTEM=1`, `GIT_TERMINAL_PROMPT=0`.
2. **OS sandbox:** macOS `sandbox-exec` or Linux `bubblewrap` (bwrap) restricts file writes to only the specific temp directory. Even if a filter driver bypasses git config, it cannot write outside the clone dir. Fallback on Windows or when neither sandbox is available: git config flags only, WARN logged.
2. **OS sandbox:** macOS `sandbox-exec` or Linux `bubblewrap` (bwrap) restricts file writes to only the specific temp directory. Fallback on Windows: git config flags only.
Platform matrix: macOS (`sandbox-exec`) — always works. Linux (`bwrap`) — works on Fedora/Arch, may fail on Ubuntu 24.04+ without admin AppArmor config. Windows — no OS sandbox, git config flags only.
Platform matrix: macOS (`sandbox-exec`) — always works. Linux (`bwrap`) — Fedora/Arch fine, may fail on Ubuntu 24.04+ without admin AppArmor config. Windows — no OS sandbox.
Post-clone: size check (100MB max), cleanup guarantee (temp dir + evidence file always removed, even on error).
**Prompt injection defense:** Remote scans use `scanners/content-extractor.mjs` to pre-extract structured evidence and strip injection patterns BEFORE LLM agents see the content. Agents analyze a JSON evidence package, never raw files from untrusted repos.
## Scanners
**Orchestrated (10):** Run via `node scanners/scan-orchestrator.mjs <target> [--fail-on <severity>] [--compact] [--output-file <path>] [--baseline] [--save-baseline]`.
`--fail-on <critical|high|medium|low>`: exit 1 if findings at/above severity, exit 0 otherwise. `--compact`: one-liner per finding format. Both configurable via `policy.json` `ci` section.
With `--output-file`: full JSON to file, compact aggregate to stdout. `--baseline` diffs against stored baseline. `--save-baseline` saves results for future diffs. Baselines stored in `reports/baselines/<target-hash>.json`.
10 scanners: unicode, entropy, permission, dep-audit, taint, git-forensics, network, memory-poisoning, supply-chain-recheck, toxic-flow.
Lib: `mcp-description-cache.mjs` — caches MCP tool descriptions in `~/.cache/llm-security/mcp-descriptions.json`, detects per-update drift via Levenshtein (>10% = alert), 7-day TTL. v7.3.0 (E14) adds a sticky baseline slot per tool plus a 10-event rolling history; cumulative drift = `levenshtein(current, baseline) / max(|current|,|baseline|)`. When ratio ≥ `mcp.cumulative_drift_threshold` (default 0.25), emits `mcp-cumulative-drift` advisory through `post-mcp-verify.mjs`. Baseline survives TTL purge so slow-burn drift is preserved across the 7-day window. `clearBaseline(tool?)` exposed for the `/security mcp-baseline-reset` command. `LLM_SECURITY_MCP_CACHE_FILE` env var overrides the cache path for testing.
Supply-chain-recheck (SCR) re-audits installed dependencies from lockfiles (package-lock.json, yarn.lock, requirements.txt, Pipfile.lock) against blocklists, OSV.dev batch API, and typosquat detection. Offline fallback available. Shared data module: `scanners/lib/supply-chain-data.mjs`.
Memory-poisoning (MEM) detects cognitive state poisoning in CLAUDE.md, memory files, and .claude/rules — injection patterns, shell commands, credential paths, permission expansion, suspicious URLs, encoded payloads.
Toxic-flow (TFA) is a post-processing correlator that runs LAST — detects "lethal trifecta" (untrusted input + sensitive data access + exfiltration sink) by correlating output from prior scanners.
Utility: `node scanners/lib/fs-utils.mjs <backup|restore|cleanup|tmppath> [args]`.
Lib: `sarif-formatter.mjs` — converts scan output to OASIS SARIF 2.1.0 format. Used by `--format sarif` flag.
Lib: `audit-trail.mjs` — writes structured JSONL audit events (ISO 8601, OWASP tags, SIEM-ready). Env: `LLM_SECURITY_AUDIT_*`.
Lib: `policy-loader.mjs` — reads `.llm-security/policy.json` for distributable hook configuration. Includes `ci` section (`failOn`, `compact`) for CI/CD defaults. Defaults match hardcoded values.
**Standalone (8):** `posture-scanner.mjs` — deterministic posture assessment, 16 categories (incl. EU AI Act, NIST AI RMF, ISO 42001), <50ms. NOT in scan-orchestrator (meta-level, not code-level).
Run: `node scanners/posture-scanner.mjs [path]` → JSON stdout. Scanner prefix: PST. Used by `/security posture` and `/security audit`.
`mcp-live-inspect.mjs` — NOT in scan-orchestrator. MCP servers are running processes, not files.
Run: `node scanners/mcp-live-inspect.mjs [target] [--timeout 10000] [--skip-global]`
Scanner prefix: MCI. OWASP: MCP03, MCP06, MCP09. Invoked by `mcp-inspect` and `mcp-audit --live`.
`watch-cron.mjs` — standalone cron wrapper. Reads `reports/watch/config.json`, scans all targets, writes `reports/watch/latest.json`. Run: `node scanners/watch-cron.mjs [--config <path>]`
`reference-config-generator.mjs` — generates Grade A reference config based on posture gaps. Detects project type (plugin/monorepo/standalone). Templates in `templates/reference-config/`. Run: `node scanners/reference-config-generator.mjs [path] [--apply]`
`dashboard-aggregator.mjs` — cross-project security dashboard. Discovers Claude Code projects under ~/ (depth 3) and ~/.claude/plugins/, runs posture-scanner on each, aggregates to machine-grade (weakest link). Cache in `~/.cache/llm-security/dashboard-latest.json` (24h staleness). Run: `node scanners/dashboard-aggregator.mjs [--no-cache] [--max-depth N]`
`attack-simulator.mjs` — red-team harness. Data-driven: 64 scenarios in 12 categories from `knowledge/attack-scenarios.json`. Payloads constructed at runtime (fragment assembly to avoid triggering hooks on source). Uses `runHook()` from test helper. Adaptive mode (`--adaptive`): 5 mutation rounds per passing scenario (homoglyph, encoding, zero-width, case alternation, synonym). Mutation rules in `knowledge/attack-mutations.json`. Benchmark mode (`--benchmark`): outputs structured pass/fail metrics. Run: `node scanners/attack-simulator.mjs [--category <name>] [--json] [--verbose] [--adaptive] [--benchmark]`
`ai-bom-generator.mjs` — AI Bill of Materials generator. Discovers AI components (models, MCP servers, plugins, knowledge, hooks) and outputs CycloneDX 1.6 JSON. Scanner prefix: BOM. Run: `node scanners/ai-bom-generator.mjs <target> [--output-file <path>]`
`ide-extension-scanner.mjs` — scans installed VS Code (and forks: Cursor, Windsurf, VSCodium, code-server, Insiders, Remote-SSH) extensions and JetBrains IDE plugins (IntelliJ IDEA, PyCharm, GoLand, WebStorm, RubyMine, PhpStorm, CLion, DataGrip, RustRover, Rider, Aqua, Writerside, Android Studio). Fleet + Toolbox excluded. OS-aware discovery via `lib/ide-extension-discovery.mjs` (`~/.vscode/extensions/` + `~/Library/Application Support/JetBrains/<IDE><version>/plugins/` on macOS, `%APPDATA%\JetBrains\...` on Windows, `~/.config/JetBrains/...` on Linux). Parses VS Code `package.json` via `lib/ide-extension-parser.mjs` and JetBrains `META-INF/plugin.xml` + `META-INF/MANIFEST.MF` (with nested-jar extraction) via `lib/ide-extension-parser-jb.mjs`. 7 VS Code checks: blocklist match, theme-with-code, sideload (vsix), broad activation (`*` / `onStartupFinished`), typosquat (Levenshtein ≤2 vs top-100), extension-pack expansion, dangerous `vscode:uninstall` hooks. 7 JetBrains checks: theme-with-code, broad activation (`application-components`), `Premain-Class` instrumentation (HIGH — javaagent retransform), native binaries (`.so`/`.dylib`/`.dll`/`.jnilib`), long `<depends>` chains, typosquat vs top JetBrains plugins, shaded-jar advisory. Both branches orchestrate reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension with bounded concurrency (default 4). Scanner prefix: IDE. OWASP: LLM01, LLM02, LLM03, LLM06, ASI02, ASI04. Offline by default, `--online` opt-in for Marketplace/OSV.dev lookups. Knowledge: `knowledge/top-vscode-extensions.json`, `knowledge/top-jetbrains-plugins.json`, `knowledge/ide-extension-threat-patterns.md`, `knowledge/marketplace-api-notes.md`, `knowledge/jetbrains-marketplace-api-notes.md`.
**v6.4.0 — URL support.** Targets can be Marketplace, OpenVSX, or direct `.vsix` URLs. Pipeline: `lib/vsix-fetch.mjs` (HTTPS-only fetch with 50MB cap, 30s timeout, SHA-256, manual redirect host whitelist) → `lib/zip-extract.mjs` (zero-dep ZIP parser, rejects zip-slip/symlink/absolute/drive-letter/encrypted/ZIP64, caps: 10 000 entries, 500MB uncomp, 100x ratio, depth 20) → existing scan pipeline against extracted `extension/` subdir → temp dir always cleaned in `try/finally`. Envelope.meta.source = `{ type: "url", kind, url, finalUrl, sha256, size, publisher?, name?, version? }`.
**v6.5.0 — OS sandbox.** Fetch + extract for URL targets now spawns `lib/vsix-fetch-worker.mjs` in a sub-process wrapped by `sandbox-exec` (macOS) or `bwrap` (Linux) — same primitives reused from `git-clone.mjs`. Helper: `lib/vsix-sandbox.mjs` exports `buildSandboxProfile`, `buildBwrapArgs`, `buildSandboxedWorker`, `runVsixWorker`. Worker IPC: argv `--url <url> --tmpdir <dir>` → single JSON line on stdout (`{ok, sha256, size, finalUrl, source, extRoot}` or `{ok:false, error, code?}`). Defense-in-depth — if the in-process ZIP parser ever has a bypass, the kernel still refuses writes outside `<tmpdir>`. `scan(target, { useSandbox })` defaults to `true`; tests pass `false` since `globalThis.fetch` mocks do not cross process boundaries. Windows fallback: in-process with `meta.warnings` advisory. Envelope `meta.source.sandbox`: `'sandbox-exec' | 'bwrap' | 'none' | 'in-process'`.
**v6.6.0 — JetBrains Marketplace URL fetch + JetBrains branch.** URL targets can also be `https://plugins.jetbrains.com/plugin/<numericId>-<slug>` (metadata-resolved → xmlId download) or `https://plugins.jetbrains.com/plugin/download?pluginId=<xmlId>&version=<v>` (direct). `lib/vsix-fetch.mjs` gains `detectUrlType` JetBrains kinds, `fetchJetBrainsPlugin`, host allowlist `plugins.jetbrains.com`. `buildSandboxedWorker(dirs, workerPath)` now accepts a custom worker path — `lib/jetbrains-fetch-worker.mjs` reuses the same IPC contract. Envelope `meta.source.kind` can be `'jetbrains-marketplace' | 'jetbrains-download'`. Installed-plugin scan runs JB-specific checks (see scanner bullet above) and shares the UNI/ENT/NET/TNT/MEM/SCR orchestration. `.kt`, `.groovy`, `.scala` added to `taint-tracer` code extensions.
Run: `node scanners/ide-extension-scanner.mjs [target|url] [--vscode-only] [--intellij-only] [--include-builtin] [--online] [--format json|compact] [--fail-on <sev>] [--output-file <path>]`. Invoked by `/security ide-scan`.
## Token Budget (ENFORCED)
All commands total ~600 lines. All commands use registered subagent types.
- Commands are short dispatchers (~30-60 lines) — no inline report templates or format specs
- All agents use registered `subagent_type` — agent instructions are system prompt, never file reads
- Max 1-2 knowledge files per agent invocation (threat-patterns + secrets-patterns)
- OWASP files are NEVER passed by commands — agents reference them from their own system prompt
- Agents run sequentially to avoid burst rate limits
- `pre-install-supply-chain.mjs` queries OSV.dev for CVEs on every package install
## CLI
`bin/llm-security.mjs` — standalone CLI entry point. Works without Claude Code via `npx llm-security` or `node bin/llm-security.mjs`.
Subcommands: `scan`, `deep-scan`, `posture`, `audit-bom`, `benchmark`. Dispatches to scanner scripts via `child_process.spawn`.
`package.json` `bin` field: `"llm-security": "./bin/llm-security.mjs"`. `files` whitelist: only `bin/` + `scanners/` published to npm.
## CI/CD Integration
Pipeline templates in `ci/`: `github-action.yml`, `azure-pipelines.yml`, `gitlab-ci.yml`. Documentation: `docs/ci-cd-guide.md`.
All templates use `--fail-on high --format sarif --output-file results.sarif` with SARIF upload per platform.
Standalone CLI makes zero network calls in default mode. Schrems II compatible in default offline mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`) transmits package identifiers to a Google-operated API and is a separate compliance consideration.
## Knowledge Files (20)
| File | Content |
|------|---------|
| `skill-threat-patterns.md` | 7 threat categories for skill/command scanning |
| `mcp-threat-patterns.md` | 9 MCP threat categories (MCP01-MCP10) |
| `secrets-patterns.md` | Regex patterns for 10+ secret types |
| `owasp-llm-top10.md` | OWASP LLM Top 10 (2025) with Claude Code mappings |
| `owasp-agentic-top10.md` | OWASP Agentic AI Top 10 (ASI01-ASI10) |
| `owasp-skills-top10.md` | OWASP Skills Top 10 (AST01-AST10) — skill-specific threats |
| `mitigation-matrix.md` | Threat-to-control mappings |
| `top-packages.json` | Known package lists for supply chain checks |
| `skill-registry.json` | Seed data for skill signature registry |
| `prompt-injection-research-2025-2026.md` | 7 research papers (2025-2026) with implications for hook defenses |
| `deepmind-agent-traps.md` | DeepMind AI Agent Traps — 6 categories, 43 techniques, coverage matrix |
| `attack-scenarios.json` | 64 red-team scenarios across 12 categories for attack simulation |
| `attack-mutations.json` | Synonym tables and mutation rules for adaptive red-team testing |
| `compliance-mapping.md` | EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS mappings to plugin capabilities |
| `norwegian-context.md` | Norwegian regulatory landscape — Datatilsynet, NSM, Digitaliseringsdirektoratet |
| `ide-extension-threat-patterns.md` | 10 IDE-extension detection categories (VS Code + JetBrains) with 2024-2026 case studies |
| `top-vscode-extensions.json` | Top ~100 VS Code Marketplace extension IDs (typosquat seed) + blocklist entries |
| `top-jetbrains-plugins.json` | Top JetBrains plugin IDs (typosquat seed) + blocklist entries (v6.6.0) |
| `marketplace-api-notes.md` | VS Code Marketplace + OpenVSX API endpoints used by `lib/vsix-fetch.mjs` (v6.4.0) |
| `jetbrains-marketplace-api-notes.md` | JetBrains Marketplace API endpoints used by `fetchJetBrainsPlugin` (v6.6.0) |
## Reports
Scan reports are stored in `reports/` as `.docx` (for sharing) with `.md` source.
## Examples (runnable demonstrations)
Self-contained, deterministic threat-fixture mappes under `examples/`.
Each mappe har `README.md`, fixture/script/transcript, `run-*.{sh,mjs}`,
og `expected-findings.md`. Demonstrasjoner — ikke unit-tester.
| Mappe | Demonstrerer | Hooks/scanners | Sentinel |
|-------|--------------|----------------|----------|
| `malicious-skill-demo/` | Skill scanner end-to-end (UNI/ENT/PRM/DEP/TNT/NET + 7 LLM-kategorier) | `scan-orchestrator` + agents | BLOCK 100/100 |
| `prompt-injection-showcase/` | 61 payloads × 19 kategorier mot `pre-prompt-inject-scan`, `post-mcp-verify`, `pre-bash-destructive` | runtime hooks | per-kategori expected outcome |
| `lethal-trifecta-walkthrough/` | Rule-of-Two advisory på leg 3 (WebFetch → Read .env → Bash curl POST) + suppression | `post-session-guard` | advisory på stage 3 |
| `mcp-rug-pull/` | Cumulative drift-advisory (E14, v7.3.0) — 7 stadier under per-update-terskel, kumulativt over 25% baseline | `post-mcp-verify` + `mcp-description-cache.mjs` | advisory på stage 7 |
| `supply-chain-attack/` | PreToolUse-blokk på kompromittert pakke + scope-hop advisory + dep-auditor typosquats + postinstall curl-pipe | `pre-install-supply-chain` + `dep-auditor` + `supply-chain-data` | 6+ funn, 2 advisories, 1 BLOCK |
| `poisoned-claude-md/` | 6 detektorer (injection / shell / URL / credential paths / permission expansion / encoded payloads) inkl. E15 agent-fil-overflate | `memory-poisoning-scanner` | ≥18 funn fordelt på 2 filer |
| `bash-evasion-gallery/` | T1-T9 disguised destructive commands → normalisert + blokkert (defense-in-depth over Claude Code 2.1.98+) | `pre-bash-destructive` + `bash-normalize` | 10 BLOCK eksitkoder |
| `toxic-agent-demo/` | Single-component lethal trifecta — agent med [Bash, Read, WebFetch] uten hook-guards = CRITICAL TFA-finding | `toxic-flow-analyzer` (TFA) | 1 CRITICAL `Lethal trifecta:` |
| `pre-compact-poisoning/` | PreCompact-hook fanger injection + AWS-shaped credential i syntetisk transcript på tvers av off/warn/block-modus | `pre-compact-scan` | 9 pass: block exit 2 + reason; warn systemMessage; off skip; benign passes |
State-isolering: alle eksempler som muterer global state bruker run-script
PID (post-session-guard via `${ppid}.jsonl`) eller env-overrides
(`LLM_SECURITY_MCP_CACHE_FILE` for MCP-cache). Brukerens reelle
`/tmp/llm-security-session-*.jsonl` og `~/.cache/llm-security/` røres aldri.
## Distribution
This plugin lives in the `ktg-plugin-marketplace` monorepo at
`https://git.fromaitochitta.com/open/ktg-plugin-marketplace` under
`plugins/llm-security/`. It is not published as a standalone repo —
users install it via the Claude Code marketplace mechanism:
This plugin lives in the `ktg-plugin-marketplace` monorepo at `https://git.fromaitochitta.com/open/ktg-plugin-marketplace` under `plugins/llm-security/`. It is not published as a standalone repo — users install it via the Claude Code marketplace mechanism:
```bash
claude plugin marketplace add https://git.fromaitochitta.com/open/ktg-plugin-marketplace.git
```
Issues, bug reports, and security disclosures all route to the
marketplace repo.
Issues, bug reports, and security disclosures all route to the marketplace repo.
## State
No persistent state except `post-session-guard.mjs` which maintains a per-session JSONL file in `/tmp/llm-security-session-${ppid}.jsonl` (auto-cleaned after 24h), `post-mcp-verify.mjs` which tracks per-MCP-tool volume in `/tmp/llm-security-mcp-volume-${ppid}.json`, `mcp-description-cache.mjs` which caches MCP tool descriptions in `~/.cache/llm-security/mcp-descriptions.json` (7-day TTL), `update-check.mjs` which caches version info in `~/.cache/llm-security/update-check.json` (24h TTL), `dashboard-aggregator.mjs` which caches dashboard results in `~/.cache/llm-security/dashboard-latest.json` (24h staleness), `reports/baselines/*.json` for scan diff baselines, `reports/watch/latest.json` for cron scan results (overwritten on each run), and `reports/skill-registry.json` for the skill signature registry (grows as skills are scanned). All scan outputs fresh per invocation.
## Defense Philosophy (v5.0)
Prompt injection is **structurally unsolvable** with current architectures (joint paper, 14 researchers, 95-100% ASR against all 12 tested defenses). v5.0 does not claim to "prevent" injection. Instead, it implements **defense-in-depth**:
- **Broader detection** — MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language), Unicode Tag steganography, bash expansion evasion
- **Increased attack cost** — Rule of Two detection (configurable block/warn/off for lethal trifecta; default `warn`, blocks on high-confidence trifectas in opt-in `block` mode; distributed trifectas across MCP servers are detected but not blocked by default), bash normalization before gate matching
- **Longer monitoring windows** — 100-call long-horizon alongside 20-call sliding window, slow-burn trifecta detection, behavioral drift via Jensen-Shannon divergence
- **Architectural constraints** — opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag; not semantic lineage; trivially bypassed by mutation or summarisation of tool output), sub-agent delegation tracking, HITL trap detection. Inspired by CaMeL (DeepMind, 2025), but this is a lightweight byte-fingerprint, not semantic capability tracking
- **Honest documentation** — Known Limitations section acknowledges what deterministic hooks cannot detect
**Bash evasion layers (T1-T6):** `bash-normalize.mjs` collapses six known obfuscation techniques before gate matching as a defense-in-depth layer. T1 empty quotes (`rm''-rf`), T2 `${}` parameter expansion, T3 backslash continuation, T4 tab/whitespace splitting, T5 `${IFS}` word-splitting, T6 ANSI-C hex quoting (`$'\x72\x6d'`). These layers complement — not replace — Claude Code 2.1.98+ harness-level protections. Full reference: `docs/security-hardening-guide.md`.
**Opus 4.7 system card alignment:**
- System card §5.2.1 (agentic safety evaluations) documents that multi-layer defenses outperform single-layer defenses against adaptive attacks. This plugin's posture (prompt-scan + pathguard + trifecta-guard + pre-compact-scan operating in depth) matches that guidance.
- System card §6.3.1.1 (instruction following and hierarchy) documents that Opus 4.7 interprets agent instructions more literally. Stacked imperatives (e.g., "MUST NOT do X") are therefore less useful than tool-level enforcement via `tools:` frontmatter. Agent files in this plugin have been updated accordingly.
- See `docs/security-hardening-guide.md` §5 for the full mapping.
**What v5.0 cannot do:**
- Prevent adaptive attacks from motivated human red-teamers (100% ASR per joint paper)
- Fix CLAUDE.md loading before hooks (platform limitation)
- Detect novel NL indirection without ML
- Prevent long-horizon attacks without detectable patterns
- Provide formal worst-case guarantees
Per-session JSONL in `/tmp/llm-security-session-${ppid}.jsonl` (auto-cleaned 24h). MCP description cache in `~/.cache/llm-security/mcp-descriptions.json` (7-day TTL). Update-check + dashboard caches in `~/.cache/llm-security/` (24h). Scan baselines under `reports/baselines/*.json`. Watch results in `reports/watch/latest.json`. Skill registry in `reports/skill-registry.json` (grows). All scan outputs fresh per invocation.
## Security Boundaries
@ -392,20 +94,3 @@ Prompt injection is **structurally unsolvable** with current architectures (join
- Agents operate read-only unless the specific command explicitly grants Write/Edit (`clean` and `harden` do)
- Irreversible operations (baseline overwrites, file edits) require user confirmation via AskUserQuestion
- Do not access paths outside the project root without explicit user instruction
## Communication patterns
### Linking to local files
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
- Always use absolute paths. Never `~/` or relative paths.
- For multiple files, render as a bullet list of named markdown links.
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
Example:
- [Brief](file:///Users/ktg/.../brief.html)
- [Research summary](file:///Users/ktg/.../research/summary.md)

View file

@ -0,0 +1,27 @@
# LLM Security — Defense philosophy (v5.0)
Imported from `CLAUDE.md` via `@docs/defense-philosophy.md`.
Prompt injection is **structurally unsolvable** with current architectures (joint paper, 14 researchers, 95-100% ASR against all 12 tested defenses). v5.0 does not claim to "prevent" injection. Instead, it implements **defense-in-depth**:
- **Broader detection** — MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language), Unicode Tag steganography, bash expansion evasion
- **Increased attack cost** — Rule of Two detection (configurable block/warn/off for lethal trifecta; default `warn`, blocks on high-confidence trifectas in opt-in `block` mode; distributed trifectas across MCP servers are detected but not blocked by default), bash normalization before gate matching
- **Longer monitoring windows** — 100-call long-horizon alongside 20-call sliding window, slow-burn trifecta detection, behavioral drift via Jensen-Shannon divergence
- **Architectural constraints** — opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag; not semantic lineage; trivially bypassed by mutation or summarisation of tool output), sub-agent delegation tracking, HITL trap detection. Inspired by CaMeL (DeepMind, 2025), but this is a lightweight byte-fingerprint, not semantic capability tracking
- **Honest documentation** — Known Limitations section acknowledges what deterministic hooks cannot detect
**Bash evasion layers (T1-T6):** `bash-normalize.mjs` collapses six known obfuscation techniques before gate matching as a defense-in-depth layer. T1 empty quotes (`rm''-rf`), T2 `${}` parameter expansion, T3 backslash continuation, T4 tab/whitespace splitting, T5 `${IFS}` word-splitting, T6 ANSI-C hex quoting (`$'\x72\x6d'`). These layers complement — not replace — Claude Code 2.1.98+ harness-level protections. Full reference: `docs/security-hardening-guide.md`.
**Opus 4.7 system card alignment:**
- System card §5.2.1 (agentic safety evaluations) documents that multi-layer defenses outperform single-layer defenses against adaptive attacks. This plugin's posture (prompt-scan + pathguard + trifecta-guard + pre-compact-scan operating in depth) matches that guidance.
- System card §6.3.1.1 (instruction following and hierarchy) documents that Opus 4.7 interprets agent instructions more literally. Stacked imperatives (e.g., "MUST NOT do X") are therefore less useful than tool-level enforcement via `tools:` frontmatter. Agent files in this plugin have been updated accordingly.
- See `docs/security-hardening-guide.md` §5 for the full mapping.
**What v5.0 cannot do:**
- Prevent adaptive attacks from motivated human red-teamers (100% ASR per joint paper)
- Fix CLAUDE.md loading before hooks (platform limitation)
- Detect novel NL indirection without ML
- Prevent long-horizon attacks without detectable patterns
- Provide formal worst-case guarantees

View file

@ -0,0 +1,122 @@
# LLM Security — Scanner reference
Detailed scanner, CLI, CI/CD, knowledge-file and example documentation. Imported from `CLAUDE.md` via `@docs/scanner-reference.md`.
## Scanners
**Orchestrated (10):** Run via `node scanners/scan-orchestrator.mjs <target> [--fail-on <severity>] [--compact] [--output-file <path>] [--baseline] [--save-baseline]`.
`--fail-on <critical|high|medium|low>`: exit 1 if findings at/above severity, exit 0 otherwise. `--compact`: one-liner per finding format. Both configurable via `policy.json` `ci` section.
With `--output-file`: full JSON to file, compact aggregate to stdout. `--baseline` diffs against stored baseline. `--save-baseline` saves results for future diffs. Baselines stored in `reports/baselines/<target-hash>.json`.
10 scanners: unicode, entropy, permission, dep-audit, taint, git-forensics, network, memory-poisoning, supply-chain-recheck, toxic-flow.
Lib: `mcp-description-cache.mjs` — caches MCP tool descriptions in `~/.cache/llm-security/mcp-descriptions.json`, detects per-update drift via Levenshtein (>10% = alert), 7-day TTL. v7.3.0 (E14) adds a sticky baseline slot per tool plus a 10-event rolling history; cumulative drift = `levenshtein(current, baseline) / max(|current|,|baseline|)`. When ratio ≥ `mcp.cumulative_drift_threshold` (default 0.25), emits `mcp-cumulative-drift` advisory through `post-mcp-verify.mjs`. Baseline survives TTL purge so slow-burn drift is preserved across the 7-day window. `clearBaseline(tool?)` exposed for the `/security mcp-baseline-reset` command. `LLM_SECURITY_MCP_CACHE_FILE` env var overrides the cache path for testing.
Supply-chain-recheck (SCR) re-audits installed dependencies from lockfiles (package-lock.json, yarn.lock, requirements.txt, Pipfile.lock) against blocklists, OSV.dev batch API, and typosquat detection. Offline fallback available. Shared data module: `scanners/lib/supply-chain-data.mjs`.
Memory-poisoning (MEM) detects cognitive state poisoning in CLAUDE.md, memory files, and .claude/rules — injection patterns, shell commands, credential paths, permission expansion, suspicious URLs, encoded payloads.
Toxic-flow (TFA) is a post-processing correlator that runs LAST — detects "lethal trifecta" (untrusted input + sensitive data access + exfiltration sink) by correlating output from prior scanners.
Utility: `node scanners/lib/fs-utils.mjs <backup|restore|cleanup|tmppath> [args]`.
Lib: `sarif-formatter.mjs` — converts scan output to OASIS SARIF 2.1.0 format. Used by `--format sarif` flag.
Lib: `audit-trail.mjs` — writes structured JSONL audit events (ISO 8601, OWASP tags, SIEM-ready). Env: `LLM_SECURITY_AUDIT_*`.
Lib: `policy-loader.mjs` — reads `.llm-security/policy.json` for distributable hook configuration. Includes `ci` section (`failOn`, `compact`) for CI/CD defaults. Defaults match hardcoded values.
**Standalone (8):** `posture-scanner.mjs` — deterministic posture assessment, 16 categories (incl. EU AI Act, NIST AI RMF, ISO 42001), <50ms. NOT in scan-orchestrator (meta-level, not code-level).
Run: `node scanners/posture-scanner.mjs [path]` → JSON stdout. Scanner prefix: PST. Used by `/security posture` and `/security audit`.
`mcp-live-inspect.mjs` — NOT in scan-orchestrator. MCP servers are running processes, not files.
Run: `node scanners/mcp-live-inspect.mjs [target] [--timeout 10000] [--skip-global]`
Scanner prefix: MCI. OWASP: MCP03, MCP06, MCP09. Invoked by `mcp-inspect` and `mcp-audit --live`.
`watch-cron.mjs` — standalone cron wrapper. Reads `reports/watch/config.json`, scans all targets, writes `reports/watch/latest.json`. Run: `node scanners/watch-cron.mjs [--config <path>]`
`reference-config-generator.mjs` — generates Grade A reference config based on posture gaps. Detects project type (plugin/monorepo/standalone). Templates in `templates/reference-config/`. Run: `node scanners/reference-config-generator.mjs [path] [--apply]`
`dashboard-aggregator.mjs` — cross-project security dashboard. Discovers Claude Code projects under ~/ (depth 3) and ~/.claude/plugins/, runs posture-scanner on each, aggregates to machine-grade (weakest link). Cache in `~/.cache/llm-security/dashboard-latest.json` (24h staleness). Run: `node scanners/dashboard-aggregator.mjs [--no-cache] [--max-depth N]`
`attack-simulator.mjs` — red-team harness. Data-driven: 64 scenarios in 12 categories from `knowledge/attack-scenarios.json`. Payloads constructed at runtime (fragment assembly to avoid triggering hooks on source). Uses `runHook()` from test helper. Adaptive mode (`--adaptive`): 5 mutation rounds per passing scenario (homoglyph, encoding, zero-width, case alternation, synonym). Mutation rules in `knowledge/attack-mutations.json`. Benchmark mode (`--benchmark`): outputs structured pass/fail metrics. Run: `node scanners/attack-simulator.mjs [--category <name>] [--json] [--verbose] [--adaptive] [--benchmark]`
`ai-bom-generator.mjs` — AI Bill of Materials generator. Discovers AI components (models, MCP servers, plugins, knowledge, hooks) and outputs CycloneDX 1.6 JSON. Scanner prefix: BOM. Run: `node scanners/ai-bom-generator.mjs <target> [--output-file <path>]`
`ide-extension-scanner.mjs` — scans installed VS Code (and forks: Cursor, Windsurf, VSCodium, code-server, Insiders, Remote-SSH) extensions and JetBrains IDE plugins (IntelliJ IDEA, PyCharm, GoLand, WebStorm, RubyMine, PhpStorm, CLion, DataGrip, RustRover, Rider, Aqua, Writerside, Android Studio). Fleet + Toolbox excluded. OS-aware discovery via `lib/ide-extension-discovery.mjs` (`~/.vscode/extensions/` + `~/Library/Application Support/JetBrains/<IDE><version>/plugins/` on macOS, `%APPDATA%\JetBrains\...` on Windows, `~/.config/JetBrains/...` on Linux). Parses VS Code `package.json` via `lib/ide-extension-parser.mjs` and JetBrains `META-INF/plugin.xml` + `META-INF/MANIFEST.MF` (with nested-jar extraction) via `lib/ide-extension-parser-jb.mjs`. 7 VS Code checks: blocklist match, theme-with-code, sideload (vsix), broad activation (`*` / `onStartupFinished`), typosquat (Levenshtein ≤2 vs top-100), extension-pack expansion, dangerous `vscode:uninstall` hooks. 7 JetBrains checks: theme-with-code, broad activation (`application-components`), `Premain-Class` instrumentation (HIGH — javaagent retransform), native binaries (`.so`/`.dylib`/`.dll`/`.jnilib`), long `<depends>` chains, typosquat vs top JetBrains plugins, shaded-jar advisory. Both branches orchestrate reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension with bounded concurrency (default 4). Scanner prefix: IDE. OWASP: LLM01, LLM02, LLM03, LLM06, ASI02, ASI04. Offline by default, `--online` opt-in for Marketplace/OSV.dev lookups. Knowledge: `knowledge/top-vscode-extensions.json`, `knowledge/top-jetbrains-plugins.json`, `knowledge/ide-extension-threat-patterns.md`, `knowledge/marketplace-api-notes.md`, `knowledge/jetbrains-marketplace-api-notes.md`.
**v6.4.0 — URL support.** Targets can be Marketplace, OpenVSX, or direct `.vsix` URLs. Pipeline: `lib/vsix-fetch.mjs` (HTTPS-only fetch with 50MB cap, 30s timeout, SHA-256, manual redirect host whitelist) → `lib/zip-extract.mjs` (zero-dep ZIP parser, rejects zip-slip/symlink/absolute/drive-letter/encrypted/ZIP64, caps: 10 000 entries, 500MB uncomp, 100x ratio, depth 20) → existing scan pipeline against extracted `extension/` subdir → temp dir always cleaned in `try/finally`. Envelope.meta.source = `{ type: "url", kind, url, finalUrl, sha256, size, publisher?, name?, version? }`.
**v6.5.0 — OS sandbox.** Fetch + extract for URL targets now spawns `lib/vsix-fetch-worker.mjs` in a sub-process wrapped by `sandbox-exec` (macOS) or `bwrap` (Linux) — same primitives reused from `git-clone.mjs`. Helper: `lib/vsix-sandbox.mjs` exports `buildSandboxProfile`, `buildBwrapArgs`, `buildSandboxedWorker`, `runVsixWorker`. Worker IPC: argv `--url <url> --tmpdir <dir>` → single JSON line on stdout (`{ok, sha256, size, finalUrl, source, extRoot}` or `{ok:false, error, code?}`). Defense-in-depth — if the in-process ZIP parser ever has a bypass, the kernel still refuses writes outside `<tmpdir>`. `scan(target, { useSandbox })` defaults to `true`; tests pass `false` since `globalThis.fetch` mocks do not cross process boundaries. Windows fallback: in-process with `meta.warnings` advisory. Envelope `meta.source.sandbox`: `'sandbox-exec' | 'bwrap' | 'none' | 'in-process'`.
**v6.6.0 — JetBrains Marketplace URL fetch + JetBrains branch.** URL targets can also be `https://plugins.jetbrains.com/plugin/<numericId>-<slug>` (metadata-resolved → xmlId download) or `https://plugins.jetbrains.com/plugin/download?pluginId=<xmlId>&version=<v>` (direct). `lib/vsix-fetch.mjs` gains `detectUrlType` JetBrains kinds, `fetchJetBrainsPlugin`, host allowlist `plugins.jetbrains.com`. `buildSandboxedWorker(dirs, workerPath)` now accepts a custom worker path — `lib/jetbrains-fetch-worker.mjs` reuses the same IPC contract. Envelope `meta.source.kind` can be `'jetbrains-marketplace' | 'jetbrains-download'`. Installed-plugin scan runs JB-specific checks (see scanner bullet above) and shares the UNI/ENT/NET/TNT/MEM/SCR orchestration. `.kt`, `.groovy`, `.scala` added to `taint-tracer` code extensions.
Run: `node scanners/ide-extension-scanner.mjs [target|url] [--vscode-only] [--intellij-only] [--include-builtin] [--online] [--format json|compact] [--fail-on <sev>] [--output-file <path>]`. Invoked by `/security ide-scan`.
## Token Budget (ENFORCED)
All commands total ~600 lines. All commands use registered subagent types.
- Commands are short dispatchers (~30-60 lines) — no inline report templates or format specs
- All agents use registered `subagent_type` — agent instructions are system prompt, never file reads
- Max 1-2 knowledge files per agent invocation (threat-patterns + secrets-patterns)
- OWASP files are NEVER passed by commands — agents reference them from their own system prompt
- Agents run sequentially to avoid burst rate limits
- `pre-install-supply-chain.mjs` queries OSV.dev for CVEs on every package install
## CLI
`bin/llm-security.mjs` — standalone CLI entry point. Works without Claude Code via `npx llm-security` or `node bin/llm-security.mjs`.
Subcommands: `scan`, `deep-scan`, `posture`, `audit-bom`, `benchmark`. Dispatches to scanner scripts via `child_process.spawn`.
`package.json` `bin` field: `"llm-security": "./bin/llm-security.mjs"`. `files` whitelist: only `bin/` + `scanners/` published to npm.
## CI/CD Integration
Pipeline templates in `ci/`: `github-action.yml`, `azure-pipelines.yml`, `gitlab-ci.yml`. Documentation: `docs/ci-cd-guide.md`.
All templates use `--fail-on high --format sarif --output-file results.sarif` with SARIF upload per platform.
Standalone CLI makes zero network calls in default mode. Schrems II compatible in default offline mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`) transmits package identifiers to a Google-operated API and is a separate compliance consideration.
## Knowledge Files (20)
| File | Content |
|------|---------|
| `skill-threat-patterns.md` | 7 threat categories for skill/command scanning |
| `mcp-threat-patterns.md` | 9 MCP threat categories (MCP01-MCP10) |
| `secrets-patterns.md` | Regex patterns for 10+ secret types |
| `owasp-llm-top10.md` | OWASP LLM Top 10 (2025) with Claude Code mappings |
| `owasp-agentic-top10.md` | OWASP Agentic AI Top 10 (ASI01-ASI10) |
| `owasp-skills-top10.md` | OWASP Skills Top 10 (AST01-AST10) — skill-specific threats |
| `mitigation-matrix.md` | Threat-to-control mappings |
| `top-packages.json` | Known package lists for supply chain checks |
| `skill-registry.json` | Seed data for skill signature registry |
| `prompt-injection-research-2025-2026.md` | 7 research papers (2025-2026) with implications for hook defenses |
| `deepmind-agent-traps.md` | DeepMind AI Agent Traps — 6 categories, 43 techniques, coverage matrix |
| `attack-scenarios.json` | 64 red-team scenarios across 12 categories for attack simulation |
| `attack-mutations.json` | Synonym tables and mutation rules for adaptive red-team testing |
| `compliance-mapping.md` | EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS mappings to plugin capabilities |
| `norwegian-context.md` | Norwegian regulatory landscape — Datatilsynet, NSM, Digitaliseringsdirektoratet |
| `ide-extension-threat-patterns.md` | 10 IDE-extension detection categories (VS Code + JetBrains) with 2024-2026 case studies |
| `top-vscode-extensions.json` | Top ~100 VS Code Marketplace extension IDs (typosquat seed) + blocklist entries |
| `top-jetbrains-plugins.json` | Top JetBrains plugin IDs (typosquat seed) + blocklist entries (v6.6.0) |
| `marketplace-api-notes.md` | VS Code Marketplace + OpenVSX API endpoints used by `lib/vsix-fetch.mjs` (v6.4.0) |
| `jetbrains-marketplace-api-notes.md` | JetBrains Marketplace API endpoints used by `fetchJetBrainsPlugin` (v6.6.0) |
## Reports
Scan reports are stored in `reports/` as `.docx` (for sharing) with `.md` source.
## Examples (runnable demonstrations)
Self-contained, deterministic threat-fixture mappes under `examples/`. Each mappe har `README.md`, fixture/script/transcript, `run-*.{sh,mjs}`, og `expected-findings.md`. Demonstrasjoner — ikke unit-tester.
| Mappe | Demonstrerer | Hooks/scanners | Sentinel |
|-------|--------------|----------------|----------|
| `malicious-skill-demo/` | Skill scanner end-to-end (UNI/ENT/PRM/DEP/TNT/NET + 7 LLM-kategorier) | `scan-orchestrator` + agents | BLOCK 100/100 |
| `prompt-injection-showcase/` | 61 payloads × 19 kategorier mot `pre-prompt-inject-scan`, `post-mcp-verify`, `pre-bash-destructive` | runtime hooks | per-kategori expected outcome |
| `lethal-trifecta-walkthrough/` | Rule-of-Two advisory på leg 3 (WebFetch → Read .env → Bash curl POST) + suppression | `post-session-guard` | advisory på stage 3 |
| `mcp-rug-pull/` | Cumulative drift-advisory (E14, v7.3.0) — 7 stadier under per-update-terskel, kumulativt over 25% baseline | `post-mcp-verify` + `mcp-description-cache.mjs` | advisory på stage 7 |
| `supply-chain-attack/` | PreToolUse-blokk på kompromittert pakke + scope-hop advisory + dep-auditor typosquats + postinstall curl-pipe | `pre-install-supply-chain` + `dep-auditor` + `supply-chain-data` | 6+ funn, 2 advisories, 1 BLOCK |
| `poisoned-claude-md/` | 6 detektorer (injection / shell / URL / credential paths / permission expansion / encoded payloads) inkl. E15 agent-fil-overflate | `memory-poisoning-scanner` | ≥18 funn fordelt på 2 filer |
| `bash-evasion-gallery/` | T1-T9 disguised destructive commands → normalisert + blokkert (defense-in-depth over Claude Code 2.1.98+) | `pre-bash-destructive` + `bash-normalize` | 10 BLOCK eksitkoder |
| `toxic-agent-demo/` | Single-component lethal trifecta — agent med [Bash, Read, WebFetch] uten hook-guards = CRITICAL TFA-finding | `toxic-flow-analyzer` (TFA) | 1 CRITICAL `Lethal trifecta:` |
| `pre-compact-poisoning/` | PreCompact-hook fanger injection + AWS-shaped credential i syntetisk transcript på tvers av off/warn/block-modus | `pre-compact-scan` | 9 pass: block exit 2 + reason; warn systemMessage; off skip; benign passes |
State-isolering: alle eksempler som muterer global state bruker run-script PID (post-session-guard via `${ppid}.jsonl`) eller env-overrides (`LLM_SECURITY_MCP_CACHE_FILE` for MCP-cache). Brukerens reelle `/tmp/llm-security-session-*.jsonl` og `~/.cache/llm-security/` røres aldri.

View file

@ -0,0 +1,47 @@
# LLM Security — Version history
Per-release notes for v7.0.0 onward. Imported from `CLAUDE.md` via `@docs/version-history.md`.
## v7.0.0 — Severity-dominated risk scoring (v2 model, BREAKING)
Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise):
1. **Risk-score v2 formula** (`scanners/lib/severity.mjs`) — severity-dominated, log-scaled within tier. Replaces v1 sum-and-cap that collapsed every non-trivial scan to 100/Extreme. Tiers: critical → 7095, high only → 4065, medium only → 1535, low only → 111. Verdict cutoffs realigned to new bands (BLOCK ≥65, WARNING ≥15). `info` findings are observability-only — counted in OWASP aggregates but contribute zero to risk_score, verdict, and riskBand (B3, v7.2.0 — was undocumented pre-7.2.0). See `severity.mjs` JSDoc for full contract.
2. **Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy** — extensions skipped (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`); line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source.
3. **DEP typosquat allowlist expansion** — 22 npm + 5 PyPI entries for short-name tools that tripped Levenshtein detection on every modern codebase (`knip`, `oxlint`, `tsx`, `nx`, `rimraf`, `uv`, `ruff`, etc.).
See `docs/security-hardening-guide.md` §6 for the calibration story.
## v7.1.1 — Scan-rapport narrative coherence (patch)
Three coordinated edits address the whiplash symptom that survived v7.0.0 (numbers fixed, narrative still walked findings back as "false positive" in prose):
(a) `agents/skill-scanner-agent.md` Step 2.5 mandates context-first severity assignment — every signal has exactly one disposition (suppressed OR reported), no per-finding walk-back; (b) `templates/unified-report.md` gains a `### Narrative Audit` block in Executive Summary surfacing `summary.narrative_audit.suppressed_findings.{count, by_category}` from the agent's trailing JSON; (c) both files updated from stale v1 risk-formula constants to the v2 model that has been authoritative in `severity.mjs` since v7.0.0. Counter is distinct from the existing top-level `output.suppressed` (`.llm-security-ignore` rule integer). Out-of-scope but flagged: `commands/scan.md:113-114` retains the v1 formula; resolution deferred to Batch B.
## v7.3.0 — MCP cumulative-drift baseline (Wave C of Batch C)
Closes E14 from `docs/critical-review-2026-04-20.md`. The `mcp-description-cache.mjs` schema gains a sticky `baseline` slot per tool plus a 10-event rolling `history` array (FIFO). Cumulative drift = `levenshtein(current, baseline) / max(|current|, |baseline|)`; when the ratio crosses `mcp.cumulative_drift_threshold` (default 0.25), `post-mcp-verify.mjs` emits a separate MEDIUM `mcp-cumulative-drift` advisory. The existing per-update >10% drift signal is unchanged — both fire independently. Slow-burn rug-pulls that keep each update under the per-update threshold but cumulatively diverge from baseline are now caught. Baseline survives the 7-day TTL purge so detection persists across the full window. New `/security mcp-baseline-reset` slash command (plus `scanners/mcp-baseline-reset.mjs` CLI: `--list`, `--target <tool>`, or no-args clear-all) lets the user acknowledge a legitimate MCP server upgrade — clearing the baseline causes the next call to seed a fresh one from the incoming description; description, firstSeen, lastSeen, and history are preserved for audit. `LLM_SECURITY_MCP_CACHE_FILE` env var overrides the cache path for end-to-end testing without polluting the user's real `~/.cache/llm-security/mcp-descriptions.json`.
## v7.3.0 — Env-var deprecation warnings (D3 of Batch C, Wave D)
Closes 8.7 from `.claude/projects/2026-04-29-batch-c-scope-finalize/plan.md`. `scanners/lib/policy-loader.mjs` exports a new helper `getPolicyValueWithEnvWarn(section, key, envVarName, defaultValue)` — env still wins per Preferences (existing behaviour), but when both the env-var AND the `policy.json` key are explicitly set, the helper emits a single per-process stderr line: `[llm-security] Deprecation: env-var ${ENVVAR} will be removed in v8.0.0; policy.json key ${section}.${key} also set — env wins for now. Suppress with LLM_SECURITY_DEPRECATION_QUIET=1.` Module-scoped `Set` dedupes per env-var name across call-sites. Four overlapping vars are wired through the helper: `LLM_SECURITY_INJECTION_MODE``injection.mode` (in `pre-prompt-inject-scan.mjs`), `LLM_SECURITY_TRIFECTA_MODE``trifecta.mode` and `LLM_SECURITY_ESCALATION_WINDOW``trifecta.escalation_window` (in `post-session-guard.mjs`), `LLM_SECURITY_AUDIT_LOG``audit.log_path` (in `scanners/lib/audit-trail.mjs`). `DEFAULT_POLICY` gains `trifecta.escalation_window: 5` to close the gap noted in the plan revisions table (M10). Env-only vars without policy.json equivalents (`LLM_SECURITY_UPDATE_CHECK`, `LLM_SECURITY_PRECOMPACT_MODE`, `LLM_SECURITY_PRECOMPACT_MAX_BYTES`, `LLM_SECURITY_IDE_ROOTS`, `LLM_SECURITY_MCP_CACHE_FILE`) are unchanged — they emit no deprecation signal because there is nothing to deprecate yet.
## v7.5.0 — Playground (additive surface, no scanner/hook behavior changes)
Single-file SPA at `playground/llm-security-playground.html` (~10 200 lines) for onboarding, demo og workshop-bruk uten Claude Code-installasjon. Parser + renderer for alle 18 `produces_report=true`-kommandoer i `CATALOG`. State i IndexedDB primær (`llm-security-playground-v1`) med localStorage-fallback, sirkelfri Proxy + EventTarget store, microtask-batchet render. Theme-bootstrap med FOUC-prevention. 4 overflater: onboarding (5 grupper) → home (3 tracks) → catalog (20 kommandoer) ⇄ project (rapporter / oversikt / kontekst / eksport). Demo-state har tre prosjekter inline; `dft-komplett-demo` har alle 18 rapporter ferdig parsed for klikk-gjennom. Vendor-synket design-system under `playground/vendor/playground-design-system/` (sjekksum-låst via `MANIFEST.json`, redigeres aldri direkte). Test-fixtures under `playground/test-fixtures/` (én markdown-fil per kommando) er kontrakt-anker for parser-utvikling. Skjermdumper i `playground/screenshots/v7.5.0/`. Eksponerte vinduer-globaler for testing/automasjon: `__store`, `__navigate`, `__loadDemoState`, `__scheduleRender`, `__PARSERS`, `__RENDERERS`, `__CATALOG`, `__inferVerdict`, `__inferKeyStats`, `__renderPageShell`, `__handlePasteImport`. Inkluderer fix av `normalizeVerdictText` regex-rekkefølge: GO-WITH-CONDITIONS sjekkes før GO så betinget verdict ikke kollapser til ALLOW.
## v7.6.0 — Playground Tier 3-referanse-case (additive surface, no scanner/hook behavior changes)
Playgroundet er nå en visuelt og strukturelt fullført referanse-implementasjon for `shared/playground-design-system/` Tier 3-supplementet. 8 nye Tier 3-komponenter integrert i de 18 rapport-rendererne: `tfa-flow` + `tfa-leg` + `tfa-arrow` (lethal trifecta-kjede med `<button>`-elementer + ARIA-group/aria-label) i `renderScan` + `renderDeepScan`; `mat-ladder` + `mat-step` (5-trinns modenhets-stige med terskler 0/25/50/75/95% PASS) i `renderPosture`; `suppressed-group` (narrative-audit fra `summary.narrative_audit.suppressed_findings`) i `renderScan` + `renderDeepScan`; `codepoint-reveal` + `cp-tag`/`cp-zw`/`cp-bidi` (Unicode-steganografi side-ved-side reveal med U+200B-D|FEFF|2060|180E → `cp-zw`, U+202A-E|2066-9 → `cp-bidi`-detection) i `renderMcpInspect`; `top-risks` + `top-risk[data-severity]` (rangert top-funn-listing, semantisk `<ol>`, ekskluderer info-funn) i `renderScan`/`renderDeepScan`/`renderPluginAudit`/`renderPosture`/`renderAudit`; utvidet `recommendation-card[data-severity]` (severity-tinted advisory) på alle inline-bruk + nye per-bucket advisory-cards i `renderClean` + intro snapshot + diff-rows i `renderHarden` (action-mapping CREATE→positive / APPEND→medium / MERGE→low / SKIP→low); `risk-meter` (band-visualisering 0-100 med Low/Medium/High/Critical/Extreme bands) på 5 archetypes (scan, deep-scan, plugin-audit, audit, red-team); `card--severity-{level}` modifier på `findings__item`-cards. Wave 1 (Sesjon 2) la til `badge--scope-security` (identitets-chip), `verdict-pill-lg` med `__verdict`+`__sub` (erstatter custom verdict-pill på alle 18 rapport-typer), og DS Tier 3 `form-progress` + `fp-step` i onboarding-wizard. Wave 0 (Sesjon 1) slettet ~30 duplikat-CSS-deklarasjoner fra `<style>`-blokken (DS vinner cascade) og harmoniserte page-shell på alle 4 overflater. 5 nye DS-helpers: `renderToxicFlow`, `renderMatLadder`, `renderSuppressedGroup`, `renderCodepointReveal`, `renderTopRisks`. 2 nye normaliserings-helpers: `mapSeverityToCardLevel(input)` (severity + action-types til DS-konvensjoner) og `parseNarrativeAudit(md)`. 12 skjermdumper planlagt i `playground/screenshots/v7.6.0/`. A11Y-rapport oppdatert (`playground/A11Y-RAPPORT.md`) — WCAG 2.1 AA bekreftet, severity-soft fargepar verifisert, semantiske elementer (`<ol>`, `<button>`, `<section>`) erstatter generic `<div>`. Filendring totalt over 5 sesjoner: 10209 → 10677 linjer. Kjent begrensning: `parsed.findings` er tom for `deep-scan`/`audit` demo-fixturer (parser-begrensning, ikke fikset i v7.6.0 — sporet for v7.6.x patch).
## v7.6.1 — Playground visuell-patch (no scanner/hook behavior changes)
Seks bugs fanget av maintainer ved manuell verifisering i nettleser etter v7.6.0-release. Alle skyldtes mismatch mellom DS-klasser og hvordan playground-rendrere brukte dem (eller manglende DS-implementasjoner av klasser playground-rendrere antok eksisterte).
(1) `renderFindingsBlock` brukte `.findings` outer-class som DS har som 2-kolonners grid (`grid-template-columns: 360px 1fr`) for list+detail-panel-layout — playground brukte den uten detail-panel, headeren havnet i venstre 360px-kolonne, items i 1fr. Erstattet med `<section class="report-meta">` + `<h4>` + korrekt `findings__list > findings__group > findings__group-header + findings__items`-mønster.
(2) `.report-table` manglet helt i DS men brukes i 7+ rendrere (OWASP-kategorier, Supply chain, Scanner Risk Matrix, Plugin-meta, Permission-matrise, Live-meter, Siste runs, Godkjenninger, Mitigation roadmap) — lagt lokal CSS-implementasjon i playground-HTML `<style>`-blokk (border-collapse, zebra-hover, header-styling).
(3) `renderPreDeploy` traffic-lights brukte `.sm-card__grade` som er fast 28×28 px (designet for én A-F-bokstav) — kuttet "PASS" til "AS" og "PASS-WITH-NOTES" til "PASS-WITH-..." i alle traffic-light-cards. Erstattet med bredde-tilpasset status-pill via inline styling (severity-soft + on tokens).
(4) Threat-model matrix-bobler ikke klikkbare — `<span>` uten event-handler. Erstattet med `<button type="button" data-threat-id>` + `aria-label`. Click-handler scroller til tilsvarende rad i Trusler-tabellen og fremhever den i 1.6 sek.
(5) Radar-labels overlappet ved 6+ akser — alle brukte `text-anchor="middle"` med samme offset. Økt SVG-størrelse fra 280×280 til 380×380, radius fra 105 til 125, bytter `text-anchor` fra `middle` til `start`/`end` basert på horisontal-posisjon (`Math.cos(ang)` > 0.2 / < -0.2 / mellom).
(6) `recommendation-card__body` tekstoverflyt på lange single-line tekster (vilkår, owner-tags, dato) — lagt `overflow-wrap: anywhere; word-break: break-word` i lokal `<style>`-blokk.
4/4 fix-spesifikke smoke-tester passerer + 18/18 renderere produserer fortsatt komplett HTML mot `dft-komplett-demo` (regresjons-test). Filendring 10677 → 10753 linjer (+76 netto).

View file

@ -96,178 +96,6 @@ Agenter leser navngitte kjernefiler, ikke hele kataloger:
Se `references/architecture/recommended-mcp-servers.md` for detaljer.
## Utvikling
### Legge til ny kunnskapsbase
1. Opprett `.md`-fil i riktig undermappe under den relevante skillens `references/`-mappe (f.eks. `skills/ms-ai-engineering/references/`)
2. Følg format fra eksisterende filer (header, dato, seksjoner, "For Cosmo"-seksjon)
3. Oppdater relevant SKILL.md med referanse
### Legge til ny kommando
1. Opprett `commands/navn.md` med frontmatter (`description`, `argument-hint`)
2. Følg mønster fra eksisterende kommandoer
3. Oppdater `commands/help.md` med ny kommando
4. Oppdater denne CLAUDE.md
### Legge til ny agent
1. Opprett `agents/navn-agent.md` med frontmatter (`name`, `description`, `model`, `color`, `tools`)
2. Inkluder tydelig "triggers on" i description
3. Oppdater denne CLAUDE.md
### Testing
#### Statisk validering
```bash
# Kjør plugin-validering (frontmatter, encoding, KB-referanser)
bash tests/validate-plugin.sh
```
#### KB-ferskhet (sitemap-basert, manuell drift)
**Apply-fasen kjøres via slash-kommandoen** (krever aktiv Claude Code-sesjon, holder oss innenfor Anthropic Consumer Terms § 3):
```text
/architect:kb-update # default: critical + high
/architect:kb-update --priorities critical # bare critical
/architect:kb-update --skip-discover # hopp over new-URL discovery
/architect:kb-update --dry-run # rapport uten apply
```
**Endringsrapport-fasen kan kjøres som rent Node-script (ingen LLM-kostnad):**
```bash
# Poll sitemaps → endringsrapport (ingen filendringer)
node scripts/kb-update/run-weekly-update.mjs --force
# Med discovery av nye relevante sider
node scripts/kb-update/run-weekly-update.mjs --force --discover
# Vis rapport på nytt etter polling
node scripts/kb-update/report-changes.mjs
# Bygg/oppdater URL-registry fra referansefiler
node scripts/kb-update/build-registry.mjs [--merge]
```
Systemet sammenligner Microsoft Learn sitemap-`<lastmod>` med filenes `Last updated:` header, og genererer en prioritert endringsrapport (critical/high/medium/low).
**Match rate:** ~69% av 1342 URLer matche mot sitemaps. ~31% (mest `azure/ai-foundry/openai/`-stier) finnes ikke i sitemaps pga. Microsofts URL-restrukturering.
**Schedulering:** Pluginen schedulerer ingenting. Bruker som vil ha periodisk varsling kan sette opp egen cron / launchd / systemd / GitHub Actions som kjører `node scripts/kb-update/run-weekly-update.mjs --force --discover` (rapport-fasen, ikke apply). Apply-fasen er bevisst manuell — den krever LLM-resonnering på diff og kjører fra en åpen Claude Code-sesjon.
Legacy (deprecated):
```bash
bash scripts/kb-staleness-check.sh # mtime-basert, upålitelig etter git clone
```
#### E2E-regresjonstester
```bash
# Kjør alle E2E-suiter
bash tests/run-e2e.sh
# Kjør enkeltsuiter
bash tests/run-e2e.sh --security
bash tests/run-e2e.sh --cost
bash tests/run-e2e.sh --summary
bash tests/run-e2e.sh --ai-act
```
Fixture-basert validering av agent-output (sikkerhet, kostnad, sammendrag). Tester struktur, encoding, og domene-spesifikke krav uten å invokere Claude.
#### Manuell test
```bash
# Test at plugin registreres
cd <plugin-root>
claude --plugin ./plugins/ms-ai-architect
# Kjør hovedcommand
/architect
# Vis alle kommandoer
/architect:help
```
## Playground (v3 / v1.15.0)
Interaktiv decision-builder + rapport-viewer for Microsoft AI-beslutninger. Erstatter v2 5-stegs-pipelinen med en multi-surface-app som persisterer state og visualiserer importerte rapporter inline. Spec: v3-arkitektur dokumentert under `.claude/projects/2026-05-03-playground-v3-architecture/`. v1.10.0-utvidelser dokumentert under `.claude/projects/2026-05-03-ms-ai-architect-v1-10-playground/`. v1.11.0 leverer design-system 100%-adoption. v1.13.0/.1 patchet 10+ symptomatiske visuelle bugs. v1.14.0 leverer root-cause refaktor over 6 sesjoner (DS-konvensjon-adopsjon på 14 renderere, lokal CSS halvert).
**v1.15.0 (sesjon 5 av ~8 i v2-prosjektet):** Project-surface byttet fra v2 `renderProjectSurface` (screen-tabs + category-tabs + per-command paste-cards) til v3 `renderProjectView` (sidebar med 17 artifacts + main-area + import-modal overlay). `renderActive()` ruter `project`-surface til `renderProjectSurfaceV3()` som wrapper renderProjectView + topbar + app-shell. V2-surface helt fjernet: `renderProjectSurface` (152 linjer), `renderCommandSubCard` (87 linjer), `rehydratePasteImports` (15 linjer), `currentProjectScreen`, `ACTIONS['project-screen']`, 5 v2-CSS-klasser. Zombie-handlers beholdt for test-back-compat: `currentProjectTab`, `ACTIONS['project-tab']`, `ACTIONS['parse']`, `handlePasteImport`, `window.__handlePasteImport`. 2 fingerprint-gap lukket: requirements.headers + license.headers. `migrateDataVersion` utvidet med `parserFor` → demo-state (kun `raw_markdown`) auto-parses til `project.artifacts[cid]`. Ship-QA-bugfixes: `components-tier4-project-view.css` lagt til i `<link>`-kjeden (manglet → modal-overlay og two-column layout virket ikke); `renderImportModal` setter `data-open="true"` (DS-kontrakt).
- **Fil:** `playground/ms-ai-architect-playground.html` (~3870+ linjer, single-file v3-arkitektur)
- **4 surfaces:** Onboarding (18 felles felt — 4 strukturerte / 14 fritekst etter v1.10.0) → Home (prosjekt-liste + 3 entry-tracks) → Catalog (25 commands gruppert i 5 expansion-grupper med søk) → **Project v3** (sidebar med 17 artifacts gruppert i 4 kategorier + søk + main-area med per-artifact view eller overview med top-risks/next-actions + import-modal som DS-overlay)
- **Persistens:** IndexedDB-primær med localStorage-fallback. Schema-versjonert (`STATE_KEY = 'ms-ai-architect-state-v1'`) med eager `MIGRATIONS`-pipeline. v1.10.0 introduserer `dataVersion v1→v2`-migrasjon (idempotent) som backfill-er `verdict`+`keyStats`.
- **17 rapport-renderers (felles grunnskjelett):** Alle wrapper output via `renderPageShell()` med eyebrow + h1 + valgfri verdict-pill + valgfri key-stats-grid + arketype-spesifikk body. Parser → struktur → HTML rutet via kanonisk archetype-routing-tabell.
- **Foundation-helpers:** `renderPageShell`, `renderVerdictPill`, `renderKeyStatsGrid`, `inferVerdict`, `inferKeyStats`, `KEY_STATS_CONFIG`.
- **Tier 3-adopsjon:** kanban (conformity, review), mat-ladder (migrate, poc), screen-tabs (utredning, project surface), scenario-card-grid (license, compare), residual-pair (dpia, ros), top-risks (ros), recommendation-card (security, ros), suppressed-panel (review), critique-card (adr), read-more (utredning, summary), traffic-light (poc).
- **Theme:** Mørk default + lys theme-toggle med Aksel-tokens i begge moduser (lagt til i v1.10.0). Persistert i `localStorage('ms-ai-architect-theme')`. Theme-bootstrap-script i `<head>` unngår FOUC.
- **Eksport/import:** JSON Decision Record-envelope (Blob + FileReader), schema-versjon-bevisst på import.
### Validering (v1.15.0-tall)
| Test | Kommando | Dekning |
|------|----------|---------|
| Statisk struktur | `bash tests/test-playground-v3.sh` | 219 PASS, 2 WARN (pre-eks.) — vendored CSS, surfaces, 25 commands, 14 parsere, 17 renderers via PROJECT_VIEW_CONFIG.renderers-routing, action-handlers |
| Parser-fixtures | `bash tests/test-playground-parsers.sh` | 70 PASS — 17 fixtures × parser-routing |
| Migrasjon | `bash tests/test-playground-migrations.sh` | 16 PASS — v1→v2 + v2→v3 idempotent migrasjon |
| Fingerprints | `bash tests/test-playground-fingerprints.sh` | 32 PASS — 17-fixture true-positive + 4 anti-match + API-sanity |
| Project-view | `bash tests/test-playground-projectview.sh` | 30 PASS — 4 view-states + nav-søk + null-guard |
| ACTIONS | `bash tests/test-playground-actions.sh` | 19 PASS — 6 pure-state-handlers + projectViewUiState |
| Kombinert (E2E) | `bash tests/run-e2e.sh --playground` | 386 PASS, 0 FAIL, 2 WARN |
| Plugin-validering | `bash tests/validate-plugin.sh` | 219 PASS |
| Manuell A11Y QA | Se `playground/MANUAL-CHECKLIST.md` | 10 seksjoner inkl. axe-core-kjøring per surface |
| A11Y-rapport | `playground/A11Y-RAPPORT.md` | Statisk vurdering klar — browser-axe-kjøring pending |
### Demo system (v1.11.0 → v1.15.0)
`scripts/build-demo-state.mjs` leser alle 17 fixture-filer fra `playground/test-fixtures/` og injiserer dem som en `<script type="application/json" id="demo-state-v1">`-blokk i playground HTML (idempotent — erstatter eksisterende blokk). "Last inn demo-data"-knappen på onboarding-overflaten kaller `ACTIONS['load-demo']` som leser blokken, erstatter alle state-grener via Proxy-mutasjon, kjører `migrateDataVersion` (v2→v3 auto-parser raw_markdown til artifacts), og navigerer til project-surface. Demo viser 17 artifacts gruppert i sidebar med severity-badges, aggregate verdict (BLOKKERT), top-risks-liste, og fungerende re-importer/slett-knapper per artifact.
`tests/screenshot/` inneholder en frittstående Playwright-runner med egen `package.json` (gitignored `node_modules`). `node run.mjs` produserer 24 PNG-er (12 surfaces × 2 tema) under `playground/screenshots/v1.15.0/`. v1.15.0-surfaces: onboarding-empty, project-overview, project-artifact-{classify,security,ros,cost,summary}, project-import-modal (viewport-only — modal er position:fixed overlay), project-search, home, catalog, onboarding-prefilled. v1.10.0/v1.11.0/v1.14.0 beholdt som historisk referanse. Disse committes så forkere ser pluginen uten å installere noe. Demo-org er "Acme Kommune" og demo-prosjekt er "Acme: Kunde-chatbot".
### Design-system 100%-adoption (v1.11.0 → v1.14.0)
Sesjon 3-5 la til inline CSS i `playground/ms-ai-architect-playground.html`. v1.11.0 hoisted alle generiske komponenter til `shared/playground-design-system/components-tier3-supplement.css` (DS v0.3.0):
- `.pyramide-desc` / `.pyramide-desc__item`
- `.scenario-card-grid` / `.scenario-card`
- `.residual-pair` / `__cell` / `__cell-label/__cell-value/__cell-meta` / `__arrow`
- `.read-more` / `.read-more__trigger` / `.read-more__chev` / `.read-more__body`
- `.top-risks` / `.top-risk[data-severity]`
- `.recommendation-card`
- `.suppressed-panel`
- `.screen-tabs` / `.screen-tab` / `.screen[data-active]`
v1.14.0 (DS v0.4.0): root-cause fix for tre DS-bugs som tidligere ble symptomatisk patchet i lokal CSS — `.kanban-card__name` (break-word + overflow-wrap; var break-all), `.expansion__title-main/sub` (display: block), `.matrix__bubble` (cursor + hover/focus). Fix-en re-syncet til vendored DS, og tilsvarende lokal-overrides slettet. Plus: 14 renderere refaktorert til DS-konvensjon (3 risk-renderere → DS-summary-grid + ros-layout, 6 compliance/govern-renderere → DS-konvensjon, renderMigrate + renderPoc → expansion-list per fase). Lokal `<style>`-blokk: 191 → 122 effektive linjer (~36% reduksjon siden v1.13.1).
Alle PARALLEL-CSS-navngrupper migrert til DS-konvensjon. `renderPageShell` + `renderKeyStatsGrid` refaktorert til DS markup. Severity-coded card-borders på rapport-cards, app-header-restruktur, `.stack-lg` body spacing på home/project/catalog, AI Act-pyramide bredde-fix, eyebrow-label på home-projects.
Ved videre hoisting: re-sync via `node scripts/sync-design-system.mjs ms-ai-architect`. Dette er endringer i delt asset — krever drift-deteksjon-handling per `MANIFEST.json`.
### Vendored design-system
Playground laster CSS fra `playground/vendor/playground-design-system/` — en vendored
kopi av marketplace-rotens `shared/playground-design-system/`. Dette holder pluginen
**standalone**: HTML-filen kan åpnes fra `file://` uavhengig av marketplace-roten.
- **Sync-skript:** `node scripts/sync-design-system.mjs ms-ai-architect` (ved marketplace-rot)
- **Drift-deteksjon:** `MANIFEST.json` lagrer SHA-256 per fil. Re-sync feiler hvis
vendored fil er endret lokalt — `--force` overstyrer.
- **Lastes i HTML:** `<link>`-tags til `fonts.css`, `tokens.css`, `base.css`,
`components.css`, `components-tier2.css`, `components-tier3.css`,
`components-tier3-supplement.css` (i den rekkefølgen).
- **Aldri rediger** filer under `vendor/playground-design-system/` direkte —
endringer går i `shared/`, deretter re-sync.
> v2-spec under `docs/playground-v2-spec.md` er beholdt som historisk
> referanse, men er IKKE gjeldende kontrakt. v3-arkitekturen er
> dokumentert i `.claude/projects/2026-05-03-playground-v3-architecture/`.
## Relaterte plugins (fremtidig)
- `ms-rag-architect` — RAG-spesialist (egen plugin)
- `ms-power-automate-architect` — Power Automate deep-dive
- `ms-azure-ai-architect` — Azure AI Services deep-dive
- `ms-foundry-architect` — Azure AI Foundry spesialist
- `ms-copilot-studio-architect` — Copilot Studio spesialist
## Hooks (2)
| Event | Script | Formål |
@ -277,6 +105,12 @@ kopi av marketplace-rotens `shared/playground-design-system/`. Dette holder plug
> Secrets scanning consolidated to llm-security plugin.
## Reference docs (read on demand)
- **Utvikling, testing, KB-refresh-workflow:** `docs/development.md`
- **Playground v3 (decision-builder + rapport-viewer):** `docs/playground.md`
- **Recommended MCP servers (detail):** `references/architecture/recommended-mcp-servers.md`
## Viktige frister (EU AI Act)
| Frist | Krav | Status |
@ -288,20 +122,10 @@ kopi av marketplace-rotens `shared/playground-design-system/`. Dette holder plug
**Tilsynsmyndigheter:** Datatilsynet (personvern), nasjonal AI-tilsynsmyndighet (under etablering), sektortilsyn.
## Relaterte plugins (fremtidig)
## Communication patterns
### Linking to local files
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
- Always use absolute paths. Never `~/` or relative paths.
- For multiple files, render as a bullet list of named markdown links.
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
Example:
- [Brief](file:///Users/ktg/.../brief.html)
- [Research summary](file:///Users/ktg/.../research/summary.md)
- `ms-rag-architect` — RAG-spesialist (egen plugin)
- `ms-power-automate-architect` — Power Automate deep-dive
- `ms-azure-ai-architect` — Azure AI Services deep-dive
- `ms-foundry-architect` — Azure AI Foundry spesialist
- `ms-copilot-studio-architect` — Copilot Studio spesialist

View file

@ -0,0 +1,92 @@
# ms-ai-architect — Development
Plugin development, testing, KB-refresh. Imported from `CLAUDE.md` via pointer.
## Legge til ny kunnskapsbase
1. Opprett `.md`-fil i riktig undermappe under den relevante skillens `references/`-mappe (f.eks. `skills/ms-ai-engineering/references/`)
2. Følg format fra eksisterende filer (header, dato, seksjoner, "For Cosmo"-seksjon)
3. Oppdater relevant SKILL.md med referanse
## Legge til ny kommando
1. Opprett `commands/navn.md` med frontmatter (`description`, `argument-hint`)
2. Følg mønster fra eksisterende kommandoer
3. Oppdater `commands/help.md` med ny kommando
4. Oppdater `CLAUDE.md`
## Legge til ny agent
1. Opprett `agents/navn-agent.md` med frontmatter (`name`, `description`, `model`, `color`, `tools`)
2. Inkluder tydelig "triggers on" i description
3. Oppdater `CLAUDE.md`
## Testing
### Statisk validering
```bash
# Kjør plugin-validering (frontmatter, encoding, KB-referanser)
bash tests/validate-plugin.sh
```
### KB-ferskhet (sitemap-basert, manuell drift)
**Apply-fasen kjøres via slash-kommandoen** (krever aktiv Claude Code-sesjon, holder oss innenfor Anthropic Consumer Terms § 3):
```text
/architect:kb-update # default: critical + high
/architect:kb-update --priorities critical # bare critical
/architect:kb-update --skip-discover # hopp over new-URL discovery
/architect:kb-update --dry-run # rapport uten apply
```
**Endringsrapport-fasen kan kjøres som rent Node-script (ingen LLM-kostnad):**
```bash
# Poll sitemaps → endringsrapport (ingen filendringer)
node scripts/kb-update/run-weekly-update.mjs --force
# Med discovery av nye relevante sider
node scripts/kb-update/run-weekly-update.mjs --force --discover
# Vis rapport på nytt etter polling
node scripts/kb-update/report-changes.mjs
# Bygg/oppdater URL-registry fra referansefiler
node scripts/kb-update/build-registry.mjs [--merge]
```
Systemet sammenligner Microsoft Learn sitemap-`<lastmod>` med filenes `Last updated:` header, og genererer en prioritert endringsrapport (critical/high/medium/low).
**Match rate:** ~69% av 1342 URLer matche mot sitemaps. ~31% (mest `azure/ai-foundry/openai/`-stier) finnes ikke i sitemaps pga. Microsofts URL-restrukturering.
**Schedulering:** Pluginen schedulerer ingenting. Bruker som vil ha periodisk varsling kan sette opp egen cron / launchd / systemd / GitHub Actions som kjører `node scripts/kb-update/run-weekly-update.mjs --force --discover` (rapport-fasen, ikke apply). Apply-fasen er bevisst manuell — den krever LLM-resonnering på diff og kjører fra en åpen Claude Code-sesjon.
Legacy (deprecated):
```bash
bash scripts/kb-staleness-check.sh # mtime-basert, upålitelig etter git clone
```
### E2E-regresjonstester
```bash
# Kjør alle E2E-suiter
bash tests/run-e2e.sh
# Kjør enkeltsuiter
bash tests/run-e2e.sh --security
bash tests/run-e2e.sh --cost
bash tests/run-e2e.sh --summary
bash tests/run-e2e.sh --ai-act
```
Fixture-basert validering av agent-output (sikkerhet, kostnad, sammendrag). Tester struktur, encoding, og domene-spesifikke krav uten å invokere Claude.
### Manuell test
```bash
# Test at plugin registreres
cd <plugin-root>
claude --plugin ./plugins/ms-ai-architect
# Kjør hovedcommand
/architect
# Vis alle kommandoer
/architect:help
```

View file

@ -0,0 +1,66 @@
# ms-ai-architect — Playground (v3 / v1.15.0)
Interaktiv decision-builder + rapport-viewer for Microsoft AI-beslutninger. Imported from `CLAUDE.md` via pointer.
Erstatter v2 5-stegs-pipelinen med en multi-surface-app som persisterer state og visualiserer importerte rapporter inline. Spec: v3-arkitektur dokumentert under `.claude/projects/2026-05-03-playground-v3-architecture/`. v1.10.0-utvidelser dokumentert under `.claude/projects/2026-05-03-ms-ai-architect-v1-10-playground/`. v1.11.0 leverer design-system 100%-adoption. v1.13.0/.1 patchet 10+ symptomatiske visuelle bugs. v1.14.0 leverer root-cause refaktor over 6 sesjoner (DS-konvensjon-adopsjon på 14 renderere, lokal CSS halvert).
**v1.15.0 (sesjon 5 av ~8 i v2-prosjektet):** Project-surface byttet fra v2 `renderProjectSurface` (screen-tabs + category-tabs + per-command paste-cards) til v3 `renderProjectView` (sidebar med 17 artifacts + main-area + import-modal overlay). `renderActive()` ruter `project`-surface til `renderProjectSurfaceV3()` som wrapper renderProjectView + topbar + app-shell. V2-surface helt fjernet: `renderProjectSurface` (152 linjer), `renderCommandSubCard` (87 linjer), `rehydratePasteImports` (15 linjer), `currentProjectScreen`, `ACTIONS['project-screen']`, 5 v2-CSS-klasser. Zombie-handlers beholdt for test-back-compat: `currentProjectTab`, `ACTIONS['project-tab']`, `ACTIONS['parse']`, `handlePasteImport`, `window.__handlePasteImport`. 2 fingerprint-gap lukket: requirements.headers + license.headers. `migrateDataVersion` utvidet med `parserFor` → demo-state (kun `raw_markdown`) auto-parses til `project.artifacts[cid]`. Ship-QA-bugfixes: `components-tier4-project-view.css` lagt til i `<link>`-kjeden (manglet → modal-overlay og two-column layout virket ikke); `renderImportModal` setter `data-open="true"` (DS-kontrakt).
- **Fil:** `playground/ms-ai-architect-playground.html` (~3870+ linjer, single-file v3-arkitektur)
- **4 surfaces:** Onboarding (18 felles felt — 4 strukturerte / 14 fritekst etter v1.10.0) → Home (prosjekt-liste + 3 entry-tracks) → Catalog (25 commands gruppert i 5 expansion-grupper med søk) → **Project v3** (sidebar med 17 artifacts gruppert i 4 kategorier + søk + main-area med per-artifact view eller overview med top-risks/next-actions + import-modal som DS-overlay)
- **Persistens:** IndexedDB-primær med localStorage-fallback. Schema-versjonert (`STATE_KEY = 'ms-ai-architect-state-v1'`) med eager `MIGRATIONS`-pipeline. v1.10.0 introduserer `dataVersion v1→v2`-migrasjon (idempotent) som backfill-er `verdict`+`keyStats`.
- **17 rapport-renderers (felles grunnskjelett):** Alle wrapper output via `renderPageShell()` med eyebrow + h1 + valgfri verdict-pill + valgfri key-stats-grid + arketype-spesifikk body. Parser → struktur → HTML rutet via kanonisk archetype-routing-tabell.
- **Foundation-helpers:** `renderPageShell`, `renderVerdictPill`, `renderKeyStatsGrid`, `inferVerdict`, `inferKeyStats`, `KEY_STATS_CONFIG`.
- **Tier 3-adopsjon:** kanban (conformity, review), mat-ladder (migrate, poc), screen-tabs (utredning, project surface), scenario-card-grid (license, compare), residual-pair (dpia, ros), top-risks (ros), recommendation-card (security, ros), suppressed-panel (review), critique-card (adr), read-more (utredning, summary), traffic-light (poc).
- **Theme:** Mørk default + lys theme-toggle med Aksel-tokens i begge moduser (lagt til i v1.10.0). Persistert i `localStorage('ms-ai-architect-theme')`. Theme-bootstrap-script i `<head>` unngår FOUC.
- **Eksport/import:** JSON Decision Record-envelope (Blob + FileReader), schema-versjon-bevisst på import.
## Validering (v1.15.0-tall)
| Test | Kommando | Dekning |
|------|----------|---------|
| Statisk struktur | `bash tests/test-playground-v3.sh` | 219 PASS, 2 WARN (pre-eks.) — vendored CSS, surfaces, 25 commands, 14 parsere, 17 renderers via PROJECT_VIEW_CONFIG.renderers-routing, action-handlers |
| Parser-fixtures | `bash tests/test-playground-parsers.sh` | 70 PASS — 17 fixtures × parser-routing |
| Migrasjon | `bash tests/test-playground-migrations.sh` | 16 PASS — v1→v2 + v2→v3 idempotent migrasjon |
| Fingerprints | `bash tests/test-playground-fingerprints.sh` | 32 PASS — 17-fixture true-positive + 4 anti-match + API-sanity |
| Project-view | `bash tests/test-playground-projectview.sh` | 30 PASS — 4 view-states + nav-søk + null-guard |
| ACTIONS | `bash tests/test-playground-actions.sh` | 19 PASS — 6 pure-state-handlers + projectViewUiState |
| Kombinert (E2E) | `bash tests/run-e2e.sh --playground` | 386 PASS, 0 FAIL, 2 WARN |
| Plugin-validering | `bash tests/validate-plugin.sh` | 219 PASS |
| Manuell A11Y QA | Se `playground/MANUAL-CHECKLIST.md` | 10 seksjoner inkl. axe-core-kjøring per surface |
| A11Y-rapport | `playground/A11Y-RAPPORT.md` | Statisk vurdering klar — browser-axe-kjøring pending |
## Demo system (v1.11.0 → v1.15.0)
`scripts/build-demo-state.mjs` leser alle 17 fixture-filer fra `playground/test-fixtures/` og injiserer dem som en `<script type="application/json" id="demo-state-v1">`-blokk i playground HTML (idempotent — erstatter eksisterende blokk). "Last inn demo-data"-knappen på onboarding-overflaten kaller `ACTIONS['load-demo']` som leser blokken, erstatter alle state-grener via Proxy-mutasjon, kjører `migrateDataVersion` (v2→v3 auto-parser raw_markdown til artifacts), og navigerer til project-surface. Demo viser 17 artifacts gruppert i sidebar med severity-badges, aggregate verdict (BLOKKERT), top-risks-liste, og fungerende re-importer/slett-knapper per artifact.
`tests/screenshot/` inneholder en frittstående Playwright-runner med egen `package.json` (gitignored `node_modules`). `node run.mjs` produserer 24 PNG-er (12 surfaces × 2 tema) under `playground/screenshots/v1.15.0/`. v1.15.0-surfaces: onboarding-empty, project-overview, project-artifact-{classify,security,ros,cost,summary}, project-import-modal (viewport-only — modal er position:fixed overlay), project-search, home, catalog, onboarding-prefilled. v1.10.0/v1.11.0/v1.14.0 beholdt som historisk referanse. Disse committes så forkere ser pluginen uten å installere noe. Demo-org er "Acme Kommune" og demo-prosjekt er "Acme: Kunde-chatbot".
## Design-system 100%-adoption (v1.11.0 → v1.14.0)
Sesjon 3-5 la til inline CSS i `playground/ms-ai-architect-playground.html`. v1.11.0 hoisted alle generiske komponenter til `shared/playground-design-system/components-tier3-supplement.css` (DS v0.3.0):
- `.pyramide-desc` / `.pyramide-desc__item`
- `.scenario-card-grid` / `.scenario-card`
- `.residual-pair` / `__cell` / `__cell-label/__cell-value/__cell-meta` / `__arrow`
- `.read-more` / `.read-more__trigger` / `.read-more__chev` / `.read-more__body`
- `.top-risks` / `.top-risk[data-severity]`
- `.recommendation-card`
- `.suppressed-panel`
- `.screen-tabs` / `.screen-tab` / `.screen[data-active]`
v1.14.0 (DS v0.4.0): root-cause fix for tre DS-bugs som tidligere ble symptomatisk patchet i lokal CSS — `.kanban-card__name` (break-word + overflow-wrap; var break-all), `.expansion__title-main/sub` (display: block), `.matrix__bubble` (cursor + hover/focus). Fix-en re-syncet til vendored DS, og tilsvarende lokal-overrides slettet. Plus: 14 renderere refaktorert til DS-konvensjon (3 risk-renderere → DS-summary-grid + ros-layout, 6 compliance/govern-renderere → DS-konvensjon, renderMigrate + renderPoc → expansion-list per fase). Lokal `<style>`-blokk: 191 → 122 effektive linjer (~36% reduksjon siden v1.13.1).
Alle PARALLEL-CSS-navngrupper migrert til DS-konvensjon. `renderPageShell` + `renderKeyStatsGrid` refaktorert til DS markup. Severity-coded card-borders på rapport-cards, app-header-restruktur, `.stack-lg` body spacing på home/project/catalog, AI Act-pyramide bredde-fix, eyebrow-label på home-projects.
Ved videre hoisting: re-sync via `node scripts/sync-design-system.mjs ms-ai-architect`. Dette er endringer i delt asset — krever drift-deteksjon-handling per `MANIFEST.json`.
## Vendored design-system
Playground laster CSS fra `playground/vendor/playground-design-system/` — en vendored kopi av marketplace-rotens `shared/playground-design-system/`. Dette holder pluginen **standalone**: HTML-filen kan åpnes fra `file://` uavhengig av marketplace-roten.
- **Sync-skript:** `node scripts/sync-design-system.mjs ms-ai-architect` (ved marketplace-rot)
- **Drift-deteksjon:** `MANIFEST.json` lagrer SHA-256 per fil. Re-sync feiler hvis vendored fil er endret lokalt — `--force` overstyrer.
- **Lastes i HTML:** `<link>`-tags til `fonts.css`, `tokens.css`, `base.css`, `components.css`, `components-tier2.css`, `components-tier3.css`, `components-tier3-supplement.css` (i den rekkefølgen).
- **Aldri rediger** filer under `vendor/playground-design-system/` direkte — endringer går i `shared/`, deretter re-sync.
> v2-spec under `docs/playground-v2-spec.md` er beholdt som historisk referanse, men er IKKE gjeldende kontrakt. v3-arkitekturen er dokumentert i `.claude/projects/2026-05-03-playground-v3-architecture/`.

View file

@ -1,14 +1,7 @@
---
name: ms-ai-advisor
description: |
This skill should be used when the user needs Microsoft AI architecture guidance, wants help
choosing between Azure AI platforms, or asks about Copilot vs Foundry trade-offs. Cosmo Skyberg
persona guides through structured problem understanding before technology selection. Specialist
in Azure AI Foundry, M365 Copilot, Copilot Studio, Power Platform, Azure OpenAI, and
Microsoft Agent Framework.
Triggers on: "Microsoft AI architecture", "Copilot vs Foundry", "which Microsoft AI platform",
"Azure AI advice", "M365 Copilot vs Copilot Studio", "help me choose between Azure OpenAI and Copilot Studio",
"trenger arkitekturveiledning", "hvilken Copilot skal jeg bruke", "/architect", "Cosmo".
description: >-
Microsoft AI architecture guidance, choosing between Azure AI platforms, Copilot vs Foundry trade-offs. Cosmo Skyberg persona guides through structured problem understanding before technology selection. Specialist in Azure AI Foundry, M365 Copilot, Copilot Studio, Power Platform, Azure OpenAI, Microsoft Agent Framework. Triggers on: "Microsoft AI architecture", "Copilot vs Foundry", "which Microsoft AI platform", "Cosmo", "/architect".
---
> **INSTRUKSJON:** Du ER Cosmo Skyberg. Følg arbeidsprosessen nedenfor.

View file

@ -1,13 +1,7 @@
---
name: ms-ai-engineering
description: |
This skill should be used when the user needs deep technical guidance for building AI solutions
in the Microsoft stack — RAG architecture, multi-agent orchestration, Azure AI Services,
data engineering with Fabric, MLOps/GenAIOps, multimodal AI, or API Management for AI.
Triggers on: "RAG architecture on Azure", "multi-agent orchestration pattern",
"MLOps for generative AI", "Azure AI Search implementation", "Semantic Kernel agent",
"Fabric data pipeline for AI", "API gateway for AI", "chunking strategy",
"embedding model", "APIM for Azure OpenAI".
description: >-
Deep technical guidance for building AI solutions in the Microsoft stack — RAG architecture, multi-agent orchestration, Azure AI Services, data engineering with Fabric, MLOps/GenAIOps, multimodal AI, API Management for AI. Triggers on: "RAG architecture on Azure", "multi-agent orchestration pattern", "MLOps for generative AI", "Azure AI Search", "Semantic Kernel agent", "Fabric data pipeline".
---
> **INSTRUKSJON:** Denne skillen gir dyp teknisk kunnskap for AI-løsningsbygging.

View file

@ -1,14 +1,7 @@
---
name: ms-ai-governance
description: |
This skill should be used when the user asks about Norwegian public sector AI compliance,
utredningsinstruksen for AI projects, EU AI Act risk classification, DPIA for AI systems,
Digdir architecture principles, responsible AI governance, or monitoring and observability
for AI in production.
Triggers on: "Norwegian public sector AI compliance", "utredningsinstruksen for AI",
"AI Act risk classification", "DPIA for AI system", "Digdir architecture principles",
"ansvarlig AI i offentlig sektor", "compliance-vurdering for AI", "Forvaltningsloven AI",
"Schrems II AI", "bias detection", "AI governance framework".
description: >-
Norwegian public sector AI compliance, utredningsinstruksen for AI, EU AI Act risk classification, DPIA for AI systems, Digdir architecture principles, responsible AI governance, monitoring and observability. Triggers on: "Norwegian public sector AI compliance", "AI Act risk classification", "DPIA for AI system", "Digdir architecture principles", "ansvarlig AI i offentlig sektor", "Forvaltningsloven AI".
---
# ms-ai-governance

View file

@ -1,14 +1,7 @@
---
name: ms-ai-infrastructure
description: |
This skill should be used when the user asks about disaster recovery for AI workloads,
multi-region Azure AI deployment, hybrid or edge AI architecture, sovereign cloud for Norway,
offline-first AI patterns, or AI infrastructure resilience planning.
Covers BCDR, Azure Arc for AI, ONNX Runtime edge deployment, disconnected scenarios,
and Norwegian data sovereignty requirements.
Triggers on: "disaster recovery for AI workloads", "edge AI deployment", "sovereign cloud AI",
"multi-region Azure AI", "Azure Arc for AI", "offline AI deployment",
"AI infrastructure resilience", "BCDR for AI", "hybrid AI", "Norway East failover".
description: >-
Disaster recovery for AI workloads, multi-region Azure AI deployment, hybrid or edge AI architecture, sovereign cloud for Norway, offline-first AI patterns, AI infrastructure resilience. Covers BCDR, Azure Arc for AI, ONNX Runtime edge deployment, disconnected scenarios, Norwegian data sovereignty. Triggers on: "disaster recovery for AI workloads", "edge AI deployment", "sovereign cloud AI", "Azure Arc for AI", "BCDR for AI".
---
> **INSTRUKSJON:** Denne ferdigheten dekker infrastrukturresiliens og driftsarkitektur for AI-arbeidsbelastninger.

View file

@ -1,13 +1,7 @@
---
name: ms-ai-security
description: |
This skill should be used when the user needs a security assessment for an AI solution,
wants cost estimation for Azure AI workloads, asks about OWASP LLM Top 10 mitigations,
or needs performance optimization guidance. Provides deterministic 6x5 security scoring,
P10/P50/P90 cost confidence intervals, and FinOps practices for AI.
Triggers on: "security assessment for AI", "AI threat modeling", "cost estimation for Azure AI",
"FinOps for AI workloads", "prompt injection defense", "kostnadsestimat for AI-løsning",
"sikkerhetsscoring for AI", "OWASP LLM", "6x5 scoring", "PTU vs pay-as-you-go".
description: >-
Security assessment, cost estimation, OWASP LLM Top 10 mitigations, performance optimization for AI on Microsoft stack. Deterministic 6x5 security scoring, P10/P50/P90 cost confidence intervals, FinOps practices. Triggers on: "security assessment for AI", "AI threat modeling", "cost estimation for Azure AI", "FinOps for AI workloads", "OWASP LLM", "kostnadsestimat for AI-løsning".
---
> **INSTRUKSJON:** Denne skillen dekker kvantitative vurderingsaktiviteter med deterministiske

View file

@ -75,20 +75,3 @@ Cycle archival: `/okr:oppsett arkiver` — moves `syklus/` to `historikk/`, gene
/okr:oppsett arkiver ──→ cycle archival + retrospektiv-generering
SessionStart ──→ coaching-hook.mjs (proactive coaching)
```
## Communication patterns
### Linking to local files
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
- Always use absolute paths. Never `~/` or relative paths.
- For multiple files, render as a bullet list of named markdown links.
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
Example:
- [Brief](file:///Users/ktg/.../brief.html)
- [Research summary](file:///Users/ktg/.../research/summary.md)

View file

@ -1,13 +1,7 @@
---
name: okr-offentlig-sektor
description: |
This skill should be used when the user asks about OKR (Objectives and Key Results)
in Norwegian public sector context, including writing OKR, reviewing OKR quality,
cascading OKR from strategy to team level, tracking OKR progress, running OKR meetings,
or translating tildelingsbrev to OKR. Also for CFR, OKR antipatterns, scoring, and Oboard.
Triggers on: "OKR", "objectives and key results", "skriv OKR", "vurder OKR",
"OKR-scoring", "kaskadere OKR", "OKR-workshop", "tildelingsbrev til OKR",
"OKR for offentlig sektor", "Oboard", "CFR", "OKR antipatterns", "OKR kvalitetssjekk".
description: >-
OKR (Objectives and Key Results) for Norwegian public sector: writing OKR, reviewing OKR quality, cascading OKR from strategy to team, tracking progress, running OKR meetings, translating tildelingsbrev to OKR. Also CFR, OKR antipatterns, scoring, Oboard. Triggers on: "OKR", "skriv OKR", "vurder OKR", "OKR-scoring", "kaskadere OKR", "tildelingsbrev til OKR", "OKR for offentlig sektor".
version: "1.0.0"
---

View file

@ -22,87 +22,7 @@ Voyage — a contract-driven Claude Code pipeline: brief, research, plan, execut
| `/trekcontinue` | Continue — resumes the next session of a multi-session voyage project. Reads `.session-state.local.json` (Handover 7) and immediately begins executing | opus |
| `/trekendsession` | End-session — mark the current session complete and write session-state pointing at the next session. Helper for informal multi-session flows | opus |
### /trekbrief modes
| Flag | Behavior |
|------|----------|
| _(default)_ | Dynamic interview until quality gates pass → brief.md with research plan |
| `--quick` | Compact start; still escalates if required sections are weak or the brief-review gate fails → brief.md with research plan |
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
| `--profile <name>` | (v4.1.0) Model profile: `economy` / `balanced` / `premium` / `<custom>`. Sets `phase_models` for the brief phase. See `## Profile system` below. |
Always interactive. Phase 3 is a section-driven completeness loop (no hard cap on question count); Phase 4 runs a `brief-reviewer` stop-gate with max 3 review iterations. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground).
### /trekresearch modes
| Flag | Behavior |
|------|----------|
| _(default)_ | Interview + research (local + external) + synthesis + brief (foreground) |
| `--project <dir>` | Write brief to `{dir}/research/{NN}-{slug}.md` (auto-incremented) |
| `--quick` | Interview (short) + inline research (no agent swarm) |
| `--local` | Only codebase analysis agents (skip external + Gemini) |
| `--external` | Only external research agents (skip codebase analysis) |
| `--fg` | No-op alias (foreground is default since v2.4.0) |
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
| `--profile <name>` | (v4.1.0) Model profile for the research phase. See `## Profile system` below. |
Flags combine: `--project <dir> --local`, `--external --quick`.
### /trekplan modes
| Flag | Behavior |
|------|----------|
| `--project <dir>` | **Required path A** — read `{dir}/brief.md`, auto-discover `{dir}/research/*.md`, write `{dir}/plan.md` |
| `--brief <path>` | **Required path B** — plan from a specific brief file; write to `.claude/plans/trekplan-{date}-{slug}.md` |
| `--research <brief> [brief2]` | Enrich with extra research briefs beyond what is in `{project_dir}/research/` |
| `--fg` | No-op alias (foreground is default since v2.4.0) |
| `--quick` | Plan directly (no agent swarm) |
| `--export <pr\|issue\|markdown\|headless> <plan>` | Generate shareable output from existing plan |
| `--decompose <plan>` | Split plan into self-contained headless sessions |
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
| `--profile <name>` | (v4.1.0) Model profile for the plan phase (and others, since plan emits `profile:` to plan.md frontmatter). See `## Profile system` below. |
**Breaking change (v2.0):** one of `--brief` or `--project` is required. There is no interview inside `/trekplan`. The `--spec` flag has been removed — use `/trekbrief` to produce a brief instead.
If `{project_dir}/architecture/overview.md` exists (typically produced by an opt-in upstream architect plugin, not bundled), the plan command auto-discovers it and treats `cc_features_proposed` as priors. Missing file is fine — discovery is additive, not required.
### /trekexecute modes
| Flag | Behavior |
|------|----------|
| _(default)_ | Execute plan — auto-detects Execution Strategy for multi-session |
| `--project <dir>` | Read `{dir}/plan.md`, write `{dir}/progress.json` |
| `--resume` | Resume from last progress checkpoint |
| `--dry-run` | Validate plan structure without executing |
| `--validate` | Schema-only check — parse steps + manifests, report `READY \| FAIL`, no execution |
| `--step N` | Execute only step N |
| `--fg` | Force foreground — run all steps sequentially, ignore Execution Strategy |
| `--session N` | Execute only session N from plan's Execution Strategy |
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
| `--profile <name>` | (v4.1.0) Model profile for the execute phase. Inherited from plan.md frontmatter `profile:` if present. See `## Profile system` below. |
### /trekreview modes
| Flag | Behavior |
|------|----------|
| _(default)_ | Run brief-conformance + code-correctness reviewers in parallel, coordinator dedup + verdict, write `{project_dir}/review.md` |
| `--project <dir>` | **Required.** Path to trekplan project folder containing `brief.md`. Review is written to `{dir}/review.md` |
| `--since <ref>` | Override "before" SHA for the diff range. Validated via `git rev-parse --verify` |
| `--quick` | Skip brief-conformance reviewer; skip coordinator's reasonableness filter — fast correctness-only pass |
| `--validate` | Schema-only check on existing `{dir}/review.md`. No LLM calls |
| `--dry-run` | Print discovered scope + triage map; skip writes |
| `--fg` | No-op alias (foreground is default) |
| `--profile <name>` | (v4.1.0) Model profile for the review phase. See `## Profile system` below. |
### /trekcontinue modes
| Flag | Behavior |
|------|----------|
| _(default)_ | Auto-discover active project's `.session-state.local.json` and resume |
| `<project-dir>` | Resume the next session of an explicit project directory |
| `--profile <name>` | (v4.1.0) Model profile for the resumed session. Inherited from the previous session's plan.md frontmatter when absent. See `## Profile system` below. |
The triage gate is deterministic — path-pattern classifier produces `{file → deep-review|summary-only|skip}`. Hard refuse-with-suggestion above 100 files / 100K diff tokens.
Full flag reference for each command (modes, `--gates`, `--profile`, breaking changes): see `docs/command-modes.md`.
## Agents
@ -132,191 +52,11 @@ The triage gate is deterministic — path-pattern classifier produces `{file →
| contrarian-researcher | opus | Counter-evidence, overlooked alternatives |
| gemini-bridge | opus | Gemini Deep Research second opinion (conditional) |
## Quality infrastructure (v3.4.0)
## Reference docs (read on demand)
`lib/` contains zero-dep validators, parsers, and autonomy primitives wired into the four commands:
- `lib/util/{frontmatter,result,atomic-write,autonomy-gate}.mjs` — shared YAML-frontmatter parser + Result helpers + `atomicWriteJson(path, obj)` for tmp+rename writes + autonomy-gate state machine (v3.4.0)
- `lib/parsers/{plan-schema,manifest-yaml,project-discovery,arg-parser,bash-normalize,jaccard,finding-id}.mjs` — pure parsers (no I/O), unit-tested. `manifest-yaml` extended in v3.4.0 with additive `skip_commit_check` + `memory_write` flags (forward-compat: unknown keys ignored)
- `lib/review/{rule-catalogue,plan-review-dedup}.mjs` — version-pinned rule catalogue (12 keys) + Phase 9 inline dedup helpers (v3.4.0)
- `lib/stats/event-emit.mjs` — single-source stats event emitter for autonomy-gate transitions and main-merge-gate (v3.4.0)
- `lib/validators/{brief,research,plan,progress,session-state}-validator.mjs` — schema validators with CLI shims (`node lib/validators/X.mjs --json <path>`)
- `lib/validators/architecture-discovery.mjs` — drift-WARN external-contract discovery for `architecture/overview.md`
Wiring points (replaces previous prose-grep instructions):
- `/trekbrief` Phase 4g → `brief-validator` (post-write sanity check)
- `/trekplan` Phase 1 → `brief-validator --soft`, `research-validator --dir`, `architecture-discovery`
- `planning-orchestrator` Phase 5.5 → `plan-validator --strict` (replaces 3 `grep -cE` calls)
- `/trekexecute --validate``plan-validator --strict` + `progress-validator`
Tests under `tests/**/*.test.mjs` (~290 tests, 0 deps). `npm test` is the fork-readiness gate. v3.4.0 adds: synthetic determinism fixtures (`tests/synthetic/plan-run-*.md` + `review-run-*.md` + companion `*-determinism.test.mjs` enforcing Jaccard ≥ 0.833 SC7 floor) and hook baseline regression pins (`tests/hooks/{path-guard,bash-guard}.test.mjs` exercising `pre-write-executor.mjs` + `pre-bash-executor.mjs` denylist BLOCK paths).
Doc-consistency test at `tests/lib/doc-consistency.test.mjs` pins agent-table count, command-table coverage, plan_version invariant, settings.json scope cleanliness, Handover 7 presence, and `session-state-validator` CLI shim.
`docs/HANDOVER-CONTRACTS.md` is the single source of truth for the 7 pipeline handovers (brief→research, research→plan, architecture→plan EXTERNAL, plan→execute, progress.json resume, review→plan, `.session-state.local.json`). Read it before changing any artifact format.
`hooks/scripts/pre-compact-flush.mjs` (PreCompact event, CC v2.1.105+) fixes the documented P0 in `docs/trekexecute-v2-observations-from-config-audit-v4.md`: keeps `progress.json` in sync with git history before context compaction so `--resume` works after long conversations. Atomic write, monotonic only, never blocks compaction.
`hooks/scripts/session-title.mjs` (UserPromptSubmit, CC v2.1.94+) sets `sessionTitle` to `voyage:<command>:<slug>` for voyage-command invocations. Helps multi-session headless runs identify themselves in process lists.
`hooks/scripts/post-bash-stats.mjs` (PostToolUse, CC v2.1.97+) appends `duration_ms` for each Bash call into `${CLAUDE_PLUGIN_DATA}/trekexecute-stats.jsonl`. Useful for finding long-running verify or checkpoint commands.
`hooks/scripts/post-compact-flush.mjs` (PostCompact event, v3.4.0) re-injects `.session-state.local.json` after context compaction so multi-session work survives a compaction boundary. Companion to `pre-compact-flush.mjs` (which writes the state file before compaction); together they form the rehydrate cycle that keeps `/trekcontinue` reliable across long-running multi-session work.
## Autonomy mode (`--gates`, v3.4.0)
All four pipeline commands accept `--gates {open|closed|adaptive}`:
| Value | Behavior |
|-------|----------|
| `open` | Skip optional checkpoints; trust manifests + verify gates only |
| `closed` | Stop at every autonomy boundary; operator confirms each transition |
| `adaptive` (default) | Stop only at meaningful boundaries (manifest-audit FAIL, plan-critic BLOCKER, main-merge gate) |
Under the hood: `lib/util/autonomy-gate.mjs` runs the state machine `idle → approved → executing → merge-pending → main-merged`. `lib/stats/event-emit.mjs` records each transition to `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl`. The main-merge gate is the final autonomy boundary before HEAD lands on `main`.
### Path A/B/C decision (v3.4.0; Path C closed 2026-05-05)
Three architectural options were considered for the speedup work:
- **Path A — cache-first** (drop `--allowedTools` per child to recover cross-phase cache sharing): REJECTED. Inverts the security model; plugin hooks don't fire reliably in `claude -p` (research/06 GH #36071).
- **Path B — sequential `--no-ff` parallel waves with manifest-driven failure recovery**: CHOSEN. Ships in v3.4.0. Phase 2.6 of `/trekexecute` runs the wave executor with hardenings for plugin-in-monorepo + gitignored-state topology.
- **Path C — hybrid (cache-warm sentinel + identical-tool parallel)**: **CLOSED 2026-05-05.** Q3 experiment measured median `cache_creation_input_tokens` = 163,903 across 3 fork-children at 186K parent context (CC v2.1.128, Sonnet 4.6). Master-plan thresholds: ≤ 1,500 POSITIVE / ≥ 3,500 NEGATIVE. Result is solidly NEGATIVE — `CLAUDE_CODE_FORK_SUBAGENT` does not preserve cache prefix across identical-tool children at our context size. Path C migration is deferred indefinitely; reassessment is appropriate when CC v2.2.xxx ships fork-cache-relevant features. Harness: `scripts/q3-cache-prefix-experiment.mjs`. Companion analyser: `lib/stats/cache-analyzer.mjs`.
A revived Path C (post-v2.2.xxx) would require: (1) re-architecting tool-list to be identical across all wave children, (2) cache-telemetry analysis confirming the new fork-cache behaviour holds, (3) prompt-level deny re-enablement to compensate for tool scoping rollback.
## Profile system (`--profile`, v4.1.0)
Three built-in model profiles plus operator-defined `<custom>.yaml`. Each profile pins `phase_models` for the six pipeline phases (`brief`, `research`, `plan`, `execute`, `review`, `continue`). Profile is recorded in plan.md frontmatter as `profile: <name>` and emitted to `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl` for cost-attribution.
| Profile | Brief | Research | Plan | Execute | Review | Continue | Use case |
|---------|-------|----------|------|---------|--------|----------|----------|
| `economy` | sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; high-confidence small-scope tasks (operator-opt-in via `--profile economy`) |
| `balanced` | sonnet | sonnet | opus | sonnet | opus | sonnet | Mixed — opus where reasoning depth pays off (operator-opt-in via `--profile balanced`) |
| `premium` (default) | opus | opus | opus | opus | opus | opus | Maximum quality — Opus on every phase. Default since 2026-05-13 operator request; also the hardcoded resolver default at `lib/profiles/resolver.mjs:145` |
### Lookup order
1. Explicit `--profile <name>` flag passed to the command
2. Plan-file frontmatter `profile:` (when resuming via `/trekexecute --resume` or `/trekcontinue`)
3. `VOYAGE_PROFILE` environment variable
4. Default `balanced`
### Custom profiles
Create `lib/profiles/<custom>.yaml` to define a new tier. The validator (`lib/validators/profile-validator.mjs`) enforces: every `phase_models[].phase` must be a known phase enum; every `phase_models[].model` must match `^(opus|sonnet)(\b|-).*` or one of the canonical short names. Custom profiles override built-ins of the same name (lookup is alphabetical with `<custom>` taking precedence).
Drift between plan-frontmatter `profile:` and step-manifest `profile_used:` emits a `MANIFEST_PROFILE_DRIFT` warning from `plan-validator --strict` (Step 20). Plan remains valid; the warning surfaces accidental tier-mismatch.
## Observability (Stop hook, v4.1.0)
The `Stop` hook in `hooks/hooks.json` runs `hooks/scripts/otel-export.mjs` at session-end. The hook is **opt-in** — when `VOYAGE_EXPORT_MODE` is unset or `off`, no work is done.
| Mode | Output | Endpoint env-var |
|------|--------|------------------|
| `off` (default) | _(no export)_ | — |
| `textfile` | `voyage.prom` (Prometheus exposition format) | `VOYAGE_TEXTFILE_DIR` |
| `otlp` | OTLP/JSON POST | `VOYAGE_OTEL_ENDPOINT` |
Endpoint validation: `VOYAGE_OTEL_ALLOW_PRIVATE=1` is required to send to loopback or RFC1918 destinations (CWE-918 SSRF mitigation). Allowlist `lib/exporters/field-allowlist.mjs` redacts records before export (CWE-212). Path validation (`lib/exporters/path-validator.mjs`) rejects symlink + traversal (CWE-22).
Local Docker Compose stack: `examples/observability/`. Operator docs: `docs/observability.md`. Both pin minimum versions per CVE history (`prom/prometheus:v3.0.1`, `grafana/grafana:11.4.0`, `otel/opentelemetry-collector-contrib:0.115.0`).
## Architecture
**Brief:** 7-phase workflow: Parse mode → Create project dir → Phase 3 completeness loop (section-driven, no question cap) → Phase 3.5 per-phase effort dialog (v5.1) → Phase 4 draft/review/revise with `brief-reviewer` as stop-gate (max 3 iterations; gate = all dimensions ≥ 4 and research plan = 5) → Finalize (`brief.md` on pass, or `brief_quality: partial` on cap/force-stop) → Manual/auto opt-in → Stats. Always interactive. Auto mode runs research + plan inline in the main context (v2.4.0).
**Phase 3.5 (v5.1) — adaptive-depth signals:** Between Phase 3 completeness exit and Phase 4 draft, the operator commits an effort level (`low | standard | high`) and an optional `model` (`sonnet | opus`) per downstream phase (`research`, `plan`, `execute`, `review`) via 4 tier-coupled `AskUserQuestion` calls. The choices land in `brief.md` frontmatter as `phase_signals:` (a list of `{phase, effort?, model?}` entries) when committed, or `phase_signals_partial: true` when the operator force-stops. `brief_version: 2.1` activates the **sequencing gate**: validator emits `BRIEF_V51_MISSING_SIGNALS` if a 2.1-versioned brief lacks both fields. Downstream commands surface a friendly hint pointing back to `/trekbrief` — enforcement is validator-only. Composition is documented prose in each downstream command's `## Composition rule (v5.1)` section: `brief.phase_signals[phase] > profile.phase_models[phase]`. The brief signal wins per-phase when present; the profile fills gaps. `effort == low` activates each command's existing `--quick`-equivalent code-path (`/trekexecute` low-effort = `--gates open` + sequential-only). High-effort behavior is deferred to v5.1.1 per brief Non-Goal.
**Research:** Foreground workflow (v2.4.0): Parse mode → Interview → Parallel research swarm (5 local + 4 external + 1 bridge, spawned from main context) → Follow-ups → Triangulation → Synthesis + brief → Stats. With `--project`, writes to `{dir}/research/NN-slug.md`.
**Plan:** Foreground workflow (v2.4.0): Parse mode (validate brief input) → Codebase sizing → Brief review (`brief-reviewer`) → Parallel exploration (6-8 agents, spawned from main context) → Deep-dives → Synthesis (with architecture-note cross-reference if present) → Planning → Adversarial review (`plan-critic` + `scope-guardian`) → Present/refine → Handoff. With `--project`, writes to `{dir}/plan.md` and auto-detects `{dir}/architecture/overview.md` (produced by an opt-in upstream architect plugin if installed; not bundled).
**Decompose:** Parse plan → Analyze step dependencies → Group into sessions → Identify parallel waves → Generate session specs + dependency graph + launch script.
**Execute:** Parse plan → Security scan (Phase 2.4) → Detect Execution Strategy → Single-session (step loop) or multi-session (parallel waves via `claude -p` with scoped `--allowedTools`) → Phase 7.5 manifest audit → Phase 7.6 bounded recovery (if partial) → Phase 8 atomically writes `progress.json` + `.session-state.local.json` (Handover 7) → Report. With `--project`, reads `{dir}/plan.md`. Phase 2.55 (pre-flight stop) and Phase 4 (entry-condition stop) also write `.session-state.local.json` so `/trekcontinue` can surface the stop and prompt for next steps.
**Continue:** `/trekcontinue` reads `{dir}/.session-state.local.json` (Handover 7), validates schema-v1 via `session-state-validator`, narrates a 3-line summary (project / next-session-label / brief-path), and immediately begins executing the next session. Auto-discovers active project state files under `.claude/projects/*/.session-state.local.json` if no explicit `<project-dir>` argument. Operator-invoked only — never auto-loaded via SessionStart. The `/trekendsession` helper is the informal-flow producer: writes the same state file for ad-hoc multi-session handovers that don't run through `/trekexecute`.
**Operator-UX guarantee (since v5.0.2):** `/trekbrief`, `/trekplan`, and `/trekreview` MUST always emit (a) a plain `file://<abs path>` URL AND (b) a copy-pasteable `open file://<abs path>` command in the final report block. The file:// URL must use an ABSOLUTE path (not relative or `~/`-prefixed) so terminals with cmd+click support (Ghostty, iTerm2, modern Terminal.app) can resolve it without shell interpretation. This is a non-negotiable operator-UX contract — the doc-consistency test pins both forms in all three commands' final report blocks.
**Operator-annotation HTML (v5.0.3):** the last step of `/trekbrief`, `/trekplan`, and `/trekreview` runs `scripts/annotate.mjs` against the just-written `.md` and prints the resulting `file://<abs path>` link. The HTML is self-contained (zero npm deps, zero external network, design-system-styled, light + dark + print) and modelled on `~/repos/claude-code-100x/claude-code-100x/build-site.js` (lines 14312255). The operator opens the file, the document renders as a proper article (headings / paragraphs / lists / tables / code / quotes — every element gets a stable `data-anchor-id`). In annotation mode (default ON, pencil-toggle in topbar), the operator can **select any text or click any element** → a form popover opens at the cursor with: section context auto-detected from nearest h1/h2, the anchored snippet (selection if any, else element text), **three intent buttons (Fiks / Endre / Spørsmål)**, comment textarea, Save/Cancel. The sidebar (Show annotations button) lists every annotation grouped by section with intent badge + snippet + comment + delete; clicking a card scrolls to and flashes the source element. **Copy Prompt** assembles a structured markdown (`### N. [Intent] Section: <…>` + `Quote: «…»` + `Comment: …`) and copies to clipboard. Persistence: `localStorage` keyed on absolute artifact path (`voyage-annotate:v2:<abs path>`). v5.0.0 removed the v4.2/v4.3 bespoke playground SPA + `/trekrevise` + Handover 8; v5.0.1 pointed at `/playground document-critique` (Claude-leads, wrong direction); v5.0.2 was operator-led but too thin (line-click + freeform note, no intents); v5.0.3 matches the claude-code-100x reference the operator first pointed at, with pencil-toggle / selection capture / intent categories / popover form / structured export. See [CHANGELOG.md](CHANGELOG.md) § v5.0.3.
**Security:** 4-layer defense-in-depth: plugin hooks (pre-bash-executor, pre-write-executor), prompt-level denylist (works in headless sessions), pre-execution plan scan (Phase 2.4), scoped `--allowedTools` replacing `--dangerously-skip-permissions`. Hard Rules 14-16 enforce verify command security, repo-boundary writes, and sensitive path protection.
**Pipeline:** `/trekbrief` produces the task brief. `/trekresearch --project <dir>` fills in `{dir}/research/`. `/trekplan --project <dir>` reads brief + research to produce `{dir}/plan.md` (and auto-discovers `{dir}/architecture/overview.md` if an opt-in upstream architect plugin produced one). `/trekexecute --project <dir>` executes and writes `{dir}/progress.json`. `/trekreview --project <dir>` produces `{dir}/review.md`. `/trekbrief`, `/trekplan`, and `/trekreview` each end by running `scripts/annotate.mjs` on the just-written artifact, producing `{dir}/{artifact}.html` — a self-contained operator-annotation surface — and printing the `file://` link. The operator opens it, clicks lines, writes their own notes, copies a structured prompt, pastes back, Claude revises the `.md`. All artifacts live in one project directory.
**Project-directory contract (v3.0.0):** trekplan owns the directory layout below. The `architecture/` subdirectory is opt-in and produced by an opt-in upstream architect plugin (not bundled) — the architect plugin is no longer publicly distributed, but the `architecture/overview.md` slot remains available for any compatible producer.
```
.claude/projects/{YYYY-MM-DD}-{slug}/
brief.md ← trekbrief writes; everyone reads
brief.html ← trekbrief annotates (operator-annotation HTML, gitignored, re-buildable from brief.md)
research/*.md ← trekresearch writes; plan + architect read
architecture/ ← OPT-IN, owned by an opt-in upstream architect plugin (not bundled)
overview.md
gaps.md
plan.md ← trekplan writes; trekexecute reads
plan.html ← trekplan annotates
progress.json ← trekexecute writes
review.md ← trekreview writes; trekplan reads (Handover 6)
review.html ← trekreview annotates
```
The `.html` files (`brief.html`, `plan.html`, `review.html`) are produced by `scripts/annotate.mjs` and live alongside their `.md` siblings in the project directory. They are re-buildable from the `.md` source at any time (deterministic, byte-identical output on re-run), so they are conventionally gitignored along with the rest of `.claude/projects/`. Operator annotations live in browser `localStorage` keyed on the absolute artifact path — they survive refresh and browser-close, but are local to the operator's machine.
No code-level dependency between plugins — the contract is filesystem-level only.
## State
All artifacts in one project directory (default):
- Project root: `.claude/projects/{YYYY-MM-DD}-{slug}/`
- `brief.md` + `brief.html` (task brief from `/trekbrief`; `.html` is the operator-annotation surface from `scripts/annotate.mjs`)
- `research/{NN}-{slug}.md` (research briefs from `/trekresearch --project`)
- `architecture/overview.md` + `architecture/gaps.md` (opt-in, produced by an opt-in upstream architect plugin, not bundled)
- `plan.md` + `plan.html` (from `/trekplan --project`)
- `sessions/session-*.md` (from `--decompose`)
- `progress.json` (from `/trekexecute --project`)
- `review.md` + `review.html` (from `/trekreview --project`)
- `.session-state.local.json` (Handover 7 — gitignored via `*.local.json`; written by `/trekexecute` Phase 8/2.55/4 or `/trekendsession`; read by `/trekcontinue`)
Legacy paths (still work without `--project`):
- Research briefs: `.claude/research/trekresearch-{date}-{slug}.md`
- Plans: `.claude/plans/trekplan-{date}-{slug}.md`
- Sessions: `.claude/trekplan-sessions/{slug}/session-*.md`
- Launch scripts: `.claude/trekplan-sessions/{slug}/launch.sh`
- Progress: `{plan-dir}/.trekexecute-progress-{slug}.json`
Stats:
- Brief stats: `${CLAUDE_PLUGIN_DATA}/trekbrief-stats.jsonl`
- Plan stats: `${CLAUDE_PLUGIN_DATA}/trekplan-stats.jsonl`
- Exec stats: `${CLAUDE_PLUGIN_DATA}/trekexecute-stats.jsonl`
- Research stats: `${CLAUDE_PLUGIN_DATA}/trekresearch-stats.jsonl`
- Continue stats: `${CLAUDE_PLUGIN_DATA}/trekcontinue-stats.jsonl`
## Terminology
- **Task brief** — produced by `/trekbrief`. Declares intent, goal, and research plan. Drives planning.
- **Research brief** — produced by `/trekresearch`. Answers a specific research question. Feeds planning.
- **Architecture note** — opt-in, produced by an opt-in upstream architect plugin (not bundled; the architect plugin is no longer publicly distributed, but the `architecture/overview.md` filesystem slot remains available for any compatible producer). Proposes which Claude Code features fit the task with brief-anchored rationale + explicit gaps. When present, enriches planning.
- **Review** — produced by `/trekreview`. Independent post-hoc review of delivered code against the task brief. **Handover 6 (review → plan)** routes BLOCKER + MAJOR findings into `/trekplan --brief review.md` for a remediation plan. The plan's optional `source_findings:` frontmatter list is the audit trail back to the consumed findings. MINOR + SUGGESTION are skipped for v1.0 plan-input.
- **Session state**`.session-state.local.json` per project. **Handover 7** — produced by any session-end mechanism (`/trekexecute` Phase 8/2.55/4, `/trekendsession` helper, future graceful-handoff v2.2). Consumed by `/trekcontinue` to resume the next session in a fresh chat. Schema v1 is forward-compat (unknown top-level keys ignored). Never committed (gitignored via `*.local.json`).
A project typically has 1 task brief, 0N research briefs, 0 or 1 architecture note, 0N reviews (one per review iteration), and 0 or 1 session-state file (overwritten on every session-end).
## Communication patterns
### Linking to local files
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
- Always use absolute paths. Never `~/` or relative paths.
- For multiple files, render as a bullet list of named markdown links.
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
Example:
- [Brief](file:///Users/ktg/.../brief.html)
- [Research summary](file:///Users/ktg/.../research/summary.md)
- **Architecture, workflows, project-directory contract, state, terminology:** `docs/architecture.md`
- **Quality infrastructure (`lib/` validators, parsers, autonomy primitives, hooks):** `docs/architecture.md` §Quality infrastructure
- **Autonomy gates (`--gates`), Path A/B/C decision:** `docs/operations.md`
- **Profile system (`--profile economy/balanced/premium`), lookup order, custom profiles:** `docs/operations.md`
- **Observability (Stop hook, OTLP/textfile export, SSRF mitigation):** `docs/operations.md`
- **Handover contracts (the 7 pipeline handovers):** `docs/HANDOVER-CONTRACTS.md`

View file

@ -0,0 +1,116 @@
# Voyage — Architecture, project layout, state, terminology
Imported from `CLAUDE.md` via pointer.
## Quality infrastructure (v3.4.0)
`lib/` contains zero-dep validators, parsers, and autonomy primitives wired into the four commands:
- `lib/util/{frontmatter,result,atomic-write,autonomy-gate}.mjs` — shared YAML-frontmatter parser + Result helpers + `atomicWriteJson(path, obj)` for tmp+rename writes + autonomy-gate state machine (v3.4.0)
- `lib/parsers/{plan-schema,manifest-yaml,project-discovery,arg-parser,bash-normalize,jaccard,finding-id}.mjs` — pure parsers (no I/O), unit-tested. `manifest-yaml` extended in v3.4.0 with additive `skip_commit_check` + `memory_write` flags (forward-compat: unknown keys ignored)
- `lib/review/{rule-catalogue,plan-review-dedup}.mjs` — version-pinned rule catalogue (12 keys) + Phase 9 inline dedup helpers (v3.4.0)
- `lib/stats/event-emit.mjs` — single-source stats event emitter for autonomy-gate transitions and main-merge-gate (v3.4.0)
- `lib/validators/{brief,research,plan,progress,session-state}-validator.mjs` — schema validators with CLI shims (`node lib/validators/X.mjs --json <path>`)
- `lib/validators/architecture-discovery.mjs` — drift-WARN external-contract discovery for `architecture/overview.md`
Wiring points (replaces previous prose-grep instructions):
- `/trekbrief` Phase 4g → `brief-validator` (post-write sanity check)
- `/trekplan` Phase 1 → `brief-validator --soft`, `research-validator --dir`, `architecture-discovery`
- `planning-orchestrator` Phase 5.5 → `plan-validator --strict` (replaces 3 `grep -cE` calls)
- `/trekexecute --validate``plan-validator --strict` + `progress-validator`
Tests under `tests/**/*.test.mjs` (~290 tests, 0 deps). `npm test` is the fork-readiness gate. v3.4.0 adds: synthetic determinism fixtures (`tests/synthetic/plan-run-*.md` + `review-run-*.md` + companion `*-determinism.test.mjs` enforcing Jaccard ≥ 0.833 SC7 floor) and hook baseline regression pins (`tests/hooks/{path-guard,bash-guard}.test.mjs` exercising `pre-write-executor.mjs` + `pre-bash-executor.mjs` denylist BLOCK paths).
Doc-consistency test at `tests/lib/doc-consistency.test.mjs` pins agent-table count, command-table coverage, plan_version invariant, settings.json scope cleanliness, Handover 7 presence, and `session-state-validator` CLI shim.
`docs/HANDOVER-CONTRACTS.md` is the single source of truth for the 7 pipeline handovers (brief→research, research→plan, architecture→plan EXTERNAL, plan→execute, progress.json resume, review→plan, `.session-state.local.json`). Read it before changing any artifact format.
`hooks/scripts/pre-compact-flush.mjs` (PreCompact event, CC v2.1.105+) fixes the documented P0 in `docs/trekexecute-v2-observations-from-config-audit-v4.md`: keeps `progress.json` in sync with git history before context compaction so `--resume` works after long conversations. Atomic write, monotonic only, never blocks compaction.
`hooks/scripts/session-title.mjs` (UserPromptSubmit, CC v2.1.94+) sets `sessionTitle` to `voyage:<command>:<slug>` for voyage-command invocations. Helps multi-session headless runs identify themselves in process lists.
`hooks/scripts/post-bash-stats.mjs` (PostToolUse, CC v2.1.97+) appends `duration_ms` for each Bash call into `${CLAUDE_PLUGIN_DATA}/trekexecute-stats.jsonl`. Useful for finding long-running verify or checkpoint commands.
`hooks/scripts/post-compact-flush.mjs` (PostCompact event, v3.4.0) re-injects `.session-state.local.json` after context compaction so multi-session work survives a compaction boundary. Companion to `pre-compact-flush.mjs` (which writes the state file before compaction); together they form the rehydrate cycle that keeps `/trekcontinue` reliable across long-running multi-session work.
## Architecture
**Brief:** 7-phase workflow: Parse mode → Create project dir → Phase 3 completeness loop (section-driven, no question cap) → Phase 3.5 per-phase effort dialog (v5.1) → Phase 4 draft/review/revise with `brief-reviewer` as stop-gate (max 3 iterations; gate = all dimensions ≥ 4 and research plan = 5) → Finalize (`brief.md` on pass, or `brief_quality: partial` on cap/force-stop) → Manual/auto opt-in → Stats. Always interactive. Auto mode runs research + plan inline in the main context (v2.4.0).
**Phase 3.5 (v5.1) — adaptive-depth signals:** Between Phase 3 completeness exit and Phase 4 draft, the operator commits an effort level (`low | standard | high`) and an optional `model` (`sonnet | opus`) per downstream phase (`research`, `plan`, `execute`, `review`) via 4 tier-coupled `AskUserQuestion` calls. The choices land in `brief.md` frontmatter as `phase_signals:` (a list of `{phase, effort?, model?}` entries) when committed, or `phase_signals_partial: true` when the operator force-stops. `brief_version: 2.1` activates the **sequencing gate**: validator emits `BRIEF_V51_MISSING_SIGNALS` if a 2.1-versioned brief lacks both fields. Downstream commands surface a friendly hint pointing back to `/trekbrief` — enforcement is validator-only. Composition is documented prose in each downstream command's `## Composition rule (v5.1)` section: `brief.phase_signals[phase] > profile.phase_models[phase]`. The brief signal wins per-phase when present; the profile fills gaps. `effort == low` activates each command's existing `--quick`-equivalent code-path (`/trekexecute` low-effort = `--gates open` + sequential-only). High-effort behavior is deferred to v5.1.1 per brief Non-Goal.
**Research:** Foreground workflow (v2.4.0): Parse mode → Interview → Parallel research swarm (5 local + 4 external + 1 bridge, spawned from main context) → Follow-ups → Triangulation → Synthesis + brief → Stats. With `--project`, writes to `{dir}/research/NN-slug.md`.
**Plan:** Foreground workflow (v2.4.0): Parse mode (validate brief input) → Codebase sizing → Brief review (`brief-reviewer`) → Parallel exploration (6-8 agents, spawned from main context) → Deep-dives → Synthesis (with architecture-note cross-reference if present) → Planning → Adversarial review (`plan-critic` + `scope-guardian`) → Present/refine → Handoff. With `--project`, writes to `{dir}/plan.md` and auto-detects `{dir}/architecture/overview.md` (produced by an opt-in upstream architect plugin if installed; not bundled).
**Decompose:** Parse plan → Analyze step dependencies → Group into sessions → Identify parallel waves → Generate session specs + dependency graph + launch script.
**Execute:** Parse plan → Security scan (Phase 2.4) → Detect Execution Strategy → Single-session (step loop) or multi-session (parallel waves via `claude -p` with scoped `--allowedTools`) → Phase 7.5 manifest audit → Phase 7.6 bounded recovery (if partial) → Phase 8 atomically writes `progress.json` + `.session-state.local.json` (Handover 7) → Report. With `--project`, reads `{dir}/plan.md`. Phase 2.55 (pre-flight stop) and Phase 4 (entry-condition stop) also write `.session-state.local.json` so `/trekcontinue` can surface the stop and prompt for next steps.
**Continue:** `/trekcontinue` reads `{dir}/.session-state.local.json` (Handover 7), validates schema-v1 via `session-state-validator`, narrates a 3-line summary (project / next-session-label / brief-path), and immediately begins executing the next session. Auto-discovers active project state files under `.claude/projects/*/.session-state.local.json` if no explicit `<project-dir>` argument. Operator-invoked only — never auto-loaded via SessionStart. The `/trekendsession` helper is the informal-flow producer: writes the same state file for ad-hoc multi-session handovers that don't run through `/trekexecute`.
**Operator-UX guarantee (since v5.0.2):** `/trekbrief`, `/trekplan`, and `/trekreview` MUST always emit (a) a plain `file://<abs path>` URL AND (b) a copy-pasteable `open file://<abs path>` command in the final report block. The file:// URL must use an ABSOLUTE path (not relative or `~/`-prefixed) so terminals with cmd+click support (Ghostty, iTerm2, modern Terminal.app) can resolve it without shell interpretation. This is a non-negotiable operator-UX contract — the doc-consistency test pins both forms in all three commands' final report blocks.
**Operator-annotation HTML (v5.0.3):** the last step of `/trekbrief`, `/trekplan`, and `/trekreview` runs `scripts/annotate.mjs` against the just-written `.md` and prints the resulting `file://<abs path>` link. The HTML is self-contained (zero npm deps, zero external network, design-system-styled, light + dark + print) and modelled on `~/repos/claude-code-100x/claude-code-100x/build-site.js` (lines 14312255). The operator opens the file, the document renders as a proper article (headings / paragraphs / lists / tables / code / quotes — every element gets a stable `data-anchor-id`). In annotation mode (default ON, pencil-toggle in topbar), the operator can **select any text or click any element** → a form popover opens at the cursor with: section context auto-detected from nearest h1/h2, the anchored snippet (selection if any, else element text), **three intent buttons (Fiks / Endre / Spørsmål)**, comment textarea, Save/Cancel. The sidebar (Show annotations button) lists every annotation grouped by section with intent badge + snippet + comment + delete; clicking a card scrolls to and flashes the source element. **Copy Prompt** assembles a structured markdown (`### N. [Intent] Section: <…>` + `Quote: «…»` + `Comment: …`) and copies to clipboard. Persistence: `localStorage` keyed on absolute artifact path (`voyage-annotate:v2:<abs path>`). v5.0.0 removed the v4.2/v4.3 bespoke playground SPA + `/trekrevise` + Handover 8; v5.0.1 pointed at `/playground document-critique` (Claude-leads, wrong direction); v5.0.2 was operator-led but too thin (line-click + freeform note, no intents); v5.0.3 matches the claude-code-100x reference the operator first pointed at, with pencil-toggle / selection capture / intent categories / popover form / structured export. See [CHANGELOG.md](../CHANGELOG.md) § v5.0.3.
**Security:** 4-layer defense-in-depth: plugin hooks (pre-bash-executor, pre-write-executor), prompt-level denylist (works in headless sessions), pre-execution plan scan (Phase 2.4), scoped `--allowedTools` replacing `--dangerously-skip-permissions`. Hard Rules 14-16 enforce verify command security, repo-boundary writes, and sensitive path protection.
**Pipeline:** `/trekbrief` produces the task brief. `/trekresearch --project <dir>` fills in `{dir}/research/`. `/trekplan --project <dir>` reads brief + research to produce `{dir}/plan.md` (and auto-discovers `{dir}/architecture/overview.md` if an opt-in upstream architect plugin produced one). `/trekexecute --project <dir>` executes and writes `{dir}/progress.json`. `/trekreview --project <dir>` produces `{dir}/review.md`. `/trekbrief`, `/trekplan`, and `/trekreview` each end by running `scripts/annotate.mjs` on the just-written artifact, producing `{dir}/{artifact}.html` — a self-contained operator-annotation surface — and printing the `file://` link. The operator opens it, clicks lines, writes their own notes, copies a structured prompt, pastes back, Claude revises the `.md`. All artifacts live in one project directory.
**Project-directory contract (v3.0.0):** trekplan owns the directory layout below. The `architecture/` subdirectory is opt-in and produced by an opt-in upstream architect plugin (not bundled) — the architect plugin is no longer publicly distributed, but the `architecture/overview.md` slot remains available for any compatible producer.
```
.claude/projects/{YYYY-MM-DD}-{slug}/
brief.md ← trekbrief writes; everyone reads
brief.html ← trekbrief annotates (operator-annotation HTML, gitignored, re-buildable from brief.md)
research/*.md ← trekresearch writes; plan + architect read
architecture/ ← OPT-IN, owned by an opt-in upstream architect plugin (not bundled)
overview.md
gaps.md
plan.md ← trekplan writes; trekexecute reads
plan.html ← trekplan annotates
progress.json ← trekexecute writes
review.md ← trekreview writes; trekplan reads (Handover 6)
review.html ← trekreview annotates
```
The `.html` files (`brief.html`, `plan.html`, `review.html`) are produced by `scripts/annotate.mjs` and live alongside their `.md` siblings in the project directory. They are re-buildable from the `.md` source at any time (deterministic, byte-identical output on re-run), so they are conventionally gitignored along with the rest of `.claude/projects/`. Operator annotations live in browser `localStorage` keyed on the absolute artifact path — they survive refresh and browser-close, but are local to the operator's machine.
No code-level dependency between plugins — the contract is filesystem-level only.
## State
All artifacts in one project directory (default):
- Project root: `.claude/projects/{YYYY-MM-DD}-{slug}/`
- `brief.md` + `brief.html` (task brief from `/trekbrief`; `.html` is the operator-annotation surface from `scripts/annotate.mjs`)
- `research/{NN}-{slug}.md` (research briefs from `/trekresearch --project`)
- `architecture/overview.md` + `architecture/gaps.md` (opt-in, produced by an opt-in upstream architect plugin, not bundled)
- `plan.md` + `plan.html` (from `/trekplan --project`)
- `sessions/session-*.md` (from `--decompose`)
- `progress.json` (from `/trekexecute --project`)
- `review.md` + `review.html` (from `/trekreview --project`)
- `.session-state.local.json` (Handover 7 — gitignored via `*.local.json`; written by `/trekexecute` Phase 8/2.55/4 or `/trekendsession`; read by `/trekcontinue`)
Legacy paths (still work without `--project`):
- Research briefs: `.claude/research/trekresearch-{date}-{slug}.md`
- Plans: `.claude/plans/trekplan-{date}-{slug}.md`
- Sessions: `.claude/trekplan-sessions/{slug}/session-*.md`
- Launch scripts: `.claude/trekplan-sessions/{slug}/launch.sh`
- Progress: `{plan-dir}/.trekexecute-progress-{slug}.json`
Stats:
- Brief stats: `${CLAUDE_PLUGIN_DATA}/trekbrief-stats.jsonl`
- Plan stats: `${CLAUDE_PLUGIN_DATA}/trekplan-stats.jsonl`
- Exec stats: `${CLAUDE_PLUGIN_DATA}/trekexecute-stats.jsonl`
- Research stats: `${CLAUDE_PLUGIN_DATA}/trekresearch-stats.jsonl`
- Continue stats: `${CLAUDE_PLUGIN_DATA}/trekcontinue-stats.jsonl`
## Terminology
- **Task brief** — produced by `/trekbrief`. Declares intent, goal, and research plan. Drives planning.
- **Research brief** — produced by `/trekresearch`. Answers a specific research question. Feeds planning.
- **Architecture note** — opt-in, produced by an opt-in upstream architect plugin (not bundled; the architect plugin is no longer publicly distributed, but the `architecture/overview.md` filesystem slot remains available for any compatible producer). Proposes which Claude Code features fit the task with brief-anchored rationale + explicit gaps. When present, enriches planning.
- **Review** — produced by `/trekreview`. Independent post-hoc review of delivered code against the task brief. **Handover 6 (review → plan)** routes BLOCKER + MAJOR findings into `/trekplan --brief review.md` for a remediation plan. The plan's optional `source_findings:` frontmatter list is the audit trail back to the consumed findings. MINOR + SUGGESTION are skipped for v1.0 plan-input.
- **Session state**`.session-state.local.json` per project. **Handover 7** — produced by any session-end mechanism (`/trekexecute` Phase 8/2.55/4, `/trekendsession` helper, future graceful-handoff v2.2). Consumed by `/trekcontinue` to resume the next session in a fresh chat. Schema v1 is forward-compat (unknown top-level keys ignored). Never committed (gitignored via `*.local.json`).
A project typically has 1 task brief, 0N research briefs, 0 or 1 architecture note, 0N reviews (one per review iteration), and 0 or 1 session-state file (overwritten on every session-end).

View file

@ -0,0 +1,85 @@
# Voyage — Command flag reference
Per-command flag tables, imported from `CLAUDE.md` via pointer.
## /trekbrief modes
| Flag | Behavior |
|------|----------|
| _(default)_ | Dynamic interview until quality gates pass → brief.md with research plan |
| `--quick` | Compact start; still escalates if required sections are weak or the brief-review gate fails → brief.md with research plan |
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
| `--profile <name>` | (v4.1.0) Model profile: `economy` / `balanced` / `premium` / `<custom>`. Sets `phase_models` for the brief phase. See `## Profile system` in `docs/operations.md`. |
Always interactive. Phase 3 is a section-driven completeness loop (no hard cap on question count); Phase 4 runs a `brief-reviewer` stop-gate with max 3 review iterations. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground).
## /trekresearch modes
| Flag | Behavior |
|------|----------|
| _(default)_ | Interview + research (local + external) + synthesis + brief (foreground) |
| `--project <dir>` | Write brief to `{dir}/research/{NN}-{slug}.md` (auto-incremented) |
| `--quick` | Interview (short) + inline research (no agent swarm) |
| `--local` | Only codebase analysis agents (skip external + Gemini) |
| `--external` | Only external research agents (skip codebase analysis) |
| `--fg` | No-op alias (foreground is default since v2.4.0) |
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
| `--profile <name>` | (v4.1.0) Model profile for the research phase. |
Flags combine: `--project <dir> --local`, `--external --quick`.
## /trekplan modes
| Flag | Behavior |
|------|----------|
| `--project <dir>` | **Required path A** — read `{dir}/brief.md`, auto-discover `{dir}/research/*.md`, write `{dir}/plan.md` |
| `--brief <path>` | **Required path B** — plan from a specific brief file; write to `.claude/plans/trekplan-{date}-{slug}.md` |
| `--research <brief> [brief2]` | Enrich with extra research briefs beyond what is in `{project_dir}/research/` |
| `--fg` | No-op alias (foreground is default since v2.4.0) |
| `--quick` | Plan directly (no agent swarm) |
| `--export <pr\|issue\|markdown\|headless> <plan>` | Generate shareable output from existing plan |
| `--decompose <plan>` | Split plan into self-contained headless sessions |
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
| `--profile <name>` | (v4.1.0) Model profile for the plan phase (and others, since plan emits `profile:` to plan.md frontmatter). |
**Breaking change (v2.0):** one of `--brief` or `--project` is required. There is no interview inside `/trekplan`. The `--spec` flag has been removed — use `/trekbrief` to produce a brief instead.
If `{project_dir}/architecture/overview.md` exists (typically produced by an opt-in upstream architect plugin, not bundled), the plan command auto-discovers it and treats `cc_features_proposed` as priors. Missing file is fine — discovery is additive, not required.
## /trekexecute modes
| Flag | Behavior |
|------|----------|
| _(default)_ | Execute plan — auto-detects Execution Strategy for multi-session |
| `--project <dir>` | Read `{dir}/plan.md`, write `{dir}/progress.json` |
| `--resume` | Resume from last progress checkpoint |
| `--dry-run` | Validate plan structure without executing |
| `--validate` | Schema-only check — parse steps + manifests, report `READY \| FAIL`, no execution |
| `--step N` | Execute only step N |
| `--fg` | Force foreground — run all steps sequentially, ignore Execution Strategy |
| `--session N` | Execute only session N from plan's Execution Strategy |
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
| `--profile <name>` | (v4.1.0) Model profile for the execute phase. Inherited from plan.md frontmatter `profile:` if present. |
## /trekreview modes
| Flag | Behavior |
|------|----------|
| _(default)_ | Run brief-conformance + code-correctness reviewers in parallel, coordinator dedup + verdict, write `{project_dir}/review.md` |
| `--project <dir>` | **Required.** Path to trekplan project folder containing `brief.md`. Review is written to `{dir}/review.md` |
| `--since <ref>` | Override "before" SHA for the diff range. Validated via `git rev-parse --verify` |
| `--quick` | Skip brief-conformance reviewer; skip coordinator's reasonableness filter — fast correctness-only pass |
| `--validate` | Schema-only check on existing `{dir}/review.md`. No LLM calls |
| `--dry-run` | Print discovered scope + triage map; skip writes |
| `--fg` | No-op alias (foreground is default) |
| `--profile <name>` | (v4.1.0) Model profile for the review phase. |
## /trekcontinue modes
| Flag | Behavior |
|------|----------|
| _(default)_ | Auto-discover active project's `.session-state.local.json` and resume |
| `<project-dir>` | Resume the next session of an explicit project directory |
| `--profile <name>` | (v4.1.0) Model profile for the resumed session. Inherited from the previous session's plan.md frontmatter when absent. |
The triage gate is deterministic — path-pattern classifier produces `{file → deep-review|summary-only|skip}`. Hard refuse-with-suggestion above 100 files / 100K diff tokens.

View file

@ -0,0 +1,62 @@
# Voyage — Autonomy gates, profile system, observability
Imported from `CLAUDE.md` via pointer.
## Autonomy mode (`--gates`, v3.4.0)
All four pipeline commands accept `--gates {open|closed|adaptive}`:
| Value | Behavior |
|-------|----------|
| `open` | Skip optional checkpoints; trust manifests + verify gates only |
| `closed` | Stop at every autonomy boundary; operator confirms each transition |
| `adaptive` (default) | Stop only at meaningful boundaries (manifest-audit FAIL, plan-critic BLOCKER, main-merge gate) |
Under the hood: `lib/util/autonomy-gate.mjs` runs the state machine `idle → approved → executing → merge-pending → main-merged`. `lib/stats/event-emit.mjs` records each transition to `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl`. The main-merge gate is the final autonomy boundary before HEAD lands on `main`.
### Path A/B/C decision (v3.4.0; Path C closed 2026-05-05)
Three architectural options were considered for the speedup work:
- **Path A — cache-first** (drop `--allowedTools` per child to recover cross-phase cache sharing): REJECTED. Inverts the security model; plugin hooks don't fire reliably in `claude -p` (research/06 GH #36071).
- **Path B — sequential `--no-ff` parallel waves with manifest-driven failure recovery**: CHOSEN. Ships in v3.4.0. Phase 2.6 of `/trekexecute` runs the wave executor with hardenings for plugin-in-monorepo + gitignored-state topology.
- **Path C — hybrid (cache-warm sentinel + identical-tool parallel)**: **CLOSED 2026-05-05.** Q3 experiment measured median `cache_creation_input_tokens` = 163,903 across 3 fork-children at 186K parent context (CC v2.1.128, Sonnet 4.6). Master-plan thresholds: ≤ 1,500 POSITIVE / ≥ 3,500 NEGATIVE. Result is solidly NEGATIVE — `CLAUDE_CODE_FORK_SUBAGENT` does not preserve cache prefix across identical-tool children at our context size. Path C migration is deferred indefinitely; reassessment is appropriate when CC v2.2.xxx ships fork-cache-relevant features. Harness: `scripts/q3-cache-prefix-experiment.mjs`. Companion analyser: `lib/stats/cache-analyzer.mjs`.
A revived Path C (post-v2.2.xxx) would require: (1) re-architecting tool-list to be identical across all wave children, (2) cache-telemetry analysis confirming the new fork-cache behaviour holds, (3) prompt-level deny re-enablement to compensate for tool scoping rollback.
## Profile system (`--profile`, v4.1.0)
Three built-in model profiles plus operator-defined `<custom>.yaml`. Each profile pins `phase_models` for the six pipeline phases (`brief`, `research`, `plan`, `execute`, `review`, `continue`). Profile is recorded in plan.md frontmatter as `profile: <name>` and emitted to `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl` for cost-attribution.
| Profile | Brief | Research | Plan | Execute | Review | Continue | Use case |
|---------|-------|----------|------|---------|--------|----------|----------|
| `economy` | sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; high-confidence small-scope tasks (operator-opt-in via `--profile economy`) |
| `balanced` | sonnet | sonnet | opus | sonnet | opus | sonnet | Mixed — opus where reasoning depth pays off (operator-opt-in via `--profile balanced`) |
| `premium` (default) | opus | opus | opus | opus | opus | opus | Maximum quality — Opus on every phase. Default since 2026-05-13 operator request; also the hardcoded resolver default at `lib/profiles/resolver.mjs:145` |
### Lookup order
1. Explicit `--profile <name>` flag passed to the command
2. Plan-file frontmatter `profile:` (when resuming via `/trekexecute --resume` or `/trekcontinue`)
3. `VOYAGE_PROFILE` environment variable
4. Default `balanced`
### Custom profiles
Create `lib/profiles/<custom>.yaml` to define a new tier. The validator (`lib/validators/profile-validator.mjs`) enforces: every `phase_models[].phase` must be a known phase enum; every `phase_models[].model` must match `^(opus|sonnet)(\b|-).*` or one of the canonical short names. Custom profiles override built-ins of the same name (lookup is alphabetical with `<custom>` taking precedence).
Drift between plan-frontmatter `profile:` and step-manifest `profile_used:` emits a `MANIFEST_PROFILE_DRIFT` warning from `plan-validator --strict` (Step 20). Plan remains valid; the warning surfaces accidental tier-mismatch.
## Observability (Stop hook, v4.1.0)
The `Stop` hook in `hooks/hooks.json` runs `hooks/scripts/otel-export.mjs` at session-end. The hook is **opt-in** — when `VOYAGE_EXPORT_MODE` is unset or `off`, no work is done.
| Mode | Output | Endpoint env-var |
|------|--------|------------------|
| `off` (default) | _(no export)_ | — |
| `textfile` | `voyage.prom` (Prometheus exposition format) | `VOYAGE_TEXTFILE_DIR` |
| `otlp` | OTLP/JSON POST | `VOYAGE_OTEL_ENDPOINT` |
Endpoint validation: `VOYAGE_OTEL_ALLOW_PRIVATE=1` is required to send to loopback or RFC1918 destinations (CWE-918 SSRF mitigation). Allowlist `lib/exporters/field-allowlist.mjs` redacts records before export (CWE-212). Path validation (`lib/exporters/path-validator.mjs`) rejects symlink + traversal (CWE-22).
Local Docker Compose stack: `examples/observability/`. Operator docs: `docs/observability.md`. Both pin minimum versions per CVE history (`prom/prometheus:v3.0.1`, `grafana/grafana:11.4.0`, `otel/opentelemetry-collector-contrib:0.115.0`).