chore: WIP marketplace doc adjustments across plugins
Pre-trekexecute snapshot of in-progress CLAUDE.md/SKILL.md edits and extracted docs/ files. Captured as one commit so /trekexecute claude-design can run against a clean working tree. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
0dc7ff485f
commit
f460814fe9
26 changed files with 805 additions and 1078 deletions
|
|
@ -31,11 +31,11 @@ Legacy bash scripts were removed in v1.0 (available in git history).
|
|||
## Data storage
|
||||
|
||||
```
|
||||
${CLAUDE_PLUGIN_DATA}/
|
||||
$CLAUDE_PLUGIN_DATA/
|
||||
├── sessions.jsonl Compact JSONL, one record per session
|
||||
├── events.jsonl {ts, session_id, tool_name} per tool call
|
||||
└── state/
|
||||
└── {session_id}.json Live state during active session
|
||||
└── <session_id>.json Live state during active session
|
||||
```
|
||||
|
||||
State files are created at SessionStart and deleted at SessionEnd.
|
||||
|
|
@ -92,20 +92,3 @@ node --test tests/*.test.mjs
|
|||
- Conventional Commits: `type(scope): description`
|
||||
- English for all code, comments, and documentation
|
||||
- Norwegian for project-internal communication
|
||||
|
||||
## Communication patterns
|
||||
|
||||
### Linking to local files
|
||||
|
||||
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
|
||||
|
||||
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
|
||||
- Always use absolute paths. Never `~/` or relative paths.
|
||||
- For multiple files, render as a bullet list of named markdown links.
|
||||
|
||||
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
|
||||
|
||||
Example:
|
||||
|
||||
- [Brief](file:///Users/ktg/.../brief.html)
|
||||
- [Research summary](file:///Users/ktg/.../research/summary.md)
|
||||
|
|
|
|||
|
|
@ -41,20 +41,3 @@ Pipelinen er sannhet inntil release:
|
|||
5. Review er release-gate
|
||||
|
||||
Voyage-policy: Opus på alle sub-agenter og orchestrator-faser.
|
||||
|
||||
## Communication patterns
|
||||
|
||||
### Linking to local files
|
||||
|
||||
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
|
||||
|
||||
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
|
||||
- Always use absolute paths. Never `~/` or relative paths.
|
||||
- For multiple files, render as a bullet list of named markdown links.
|
||||
|
||||
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
|
||||
|
||||
Example:
|
||||
|
||||
- [Brief](file:///Users/ktg/.../brief.html)
|
||||
- [Research summary](file:///Users/ktg/.../research/summary.md)
|
||||
|
|
|
|||
|
|
@ -50,79 +50,6 @@ Analyzes and optimizes Claude Code configuration across three pillars:
|
|||
| verifier-agent | Verify results | sonnet | purple | Read, Glob, Grep |
|
||||
| feature-gap-agent | Context-aware feature recommendations | opus | green | Read, Glob, Grep, Write |
|
||||
|
||||
## Deterministic Scanners
|
||||
|
||||
Node.js scanners (zero external dependencies), run via `node scanners/scan-orchestrator.mjs <path>`.
|
||||
Posture CLI: `node scanners/posture.mjs <path> [--json] [--global] [--full-machine] [--output-file path]`.
|
||||
Scanner CLI: `node scanners/scan-orchestrator.mjs <path> [--global] [--full-machine] [--no-suppress]`.
|
||||
|
||||
| Scanner | Prefix | Detects |
|
||||
|---------|--------|---------|
|
||||
| `claude-md-linter.mjs` | CML | Structure, length, sections, @imports, duplicates, TODOs |
|
||||
| `settings-validator.mjs` | SET | Schema, unknown/deprecated keys, type mismatches, permissions |
|
||||
| `hook-validator.mjs` | HKV | Format, script existence, event validity, timeouts |
|
||||
| `rules-validator.mjs` | RUL | Glob matching, orphan rules, deprecated fields, unscoped rules |
|
||||
| `mcp-config-validator.mjs` | MCP | Server types, trust levels, env vars, unknown fields |
|
||||
| `import-resolver.mjs` | IMP | Broken @imports, circular refs, deep chains, tilde paths |
|
||||
| `conflict-detector.mjs` | CNF | Settings conflicts, permission contradictions, hook duplicates |
|
||||
| `feature-gap-scanner.mjs` | GAP | 25 feature checks across 4 tiers — shown as opportunities, not grades |
|
||||
| `token-hotspots.mjs` | TOK | Cache-breaking volatile content, redundant tool permissions, deep import chains, oversized cascade, bloated SKILL.md descriptions, MCP tool-schema budget (Opus 4.7 patterns) |
|
||||
| `cache-prefix-scanner.mjs` | CPS | Volatile content in lines 31–150 of CLAUDE.md cascade (beyond Pattern A's top-30 window) |
|
||||
| `disabled-in-schema-scanner.mjs` | DIS | Tools listed in BOTH `permissions.deny` AND `permissions.allow` — deny wins, allow entries are dead config |
|
||||
| `collision-scanner.mjs` | COL | Cross-plugin skill name collisions (low); user-vs-plugin overlaps (medium); `details.namespaces` payload |
|
||||
|
||||
### Scanner Lib (`scanners/lib/`)
|
||||
|
||||
| Module | Purpose |
|
||||
|--------|---------|
|
||||
| `severity.mjs` | Severity constants, risk scoring, verdict logic, `WEIGHTS` named export (v5 F3) |
|
||||
| `output.mjs` | Finding objects (CA-XXX-NNN format), scanner results, envelope, optional `details` payload (v5 N6) |
|
||||
| `file-discovery.mjs` | Config file discovery: single-path, multi-path (`discoverConfigFilesMulti`), full-machine (`discoverFullMachinePaths`) |
|
||||
| `yaml-parser.mjs` | Frontmatter parsing, JSON parsing, @import/section extraction |
|
||||
| `string-utils.mjs` | Line counting, truncation, similarity, key extraction |
|
||||
| `scoring.mjs` | Severity-weighted `scoreByArea` (v5 F3), health scorecard, dedup-by-area (v5 N3), `scoringVersion: 'v5'` |
|
||||
| `backup.mjs` | Backup creation, manifest parsing, checksum verification |
|
||||
| `diff-engine.mjs` | Drift diffing: diffEnvelopes(), formatDiffReport() |
|
||||
| `baseline.mjs` | Baseline save/load/list/delete for drift detection |
|
||||
| `report-generator.mjs` | Unified markdown reports: posture, drift, plugin health |
|
||||
| `suppression.mjs` | .config-audit-ignore parsing, finding suppression, audit trail |
|
||||
| `active-config-reader.mjs` | Read-only inventory: readActiveConfig(), detectGitRoot(), walkClaudeMdCascade(), readClaudeJsonProjectSlice() (longest-prefix match), enumeratePlugins(), enumerateSkills(), readActiveHooks(), readActiveMcpServers() (with cache → package.json tool-count fallback), estimateTokens() (v5: `'mcp'` kind = 500 + toolCount × 200) |
|
||||
| `tokenizer-api.mjs` | Anthropic `count_tokens` wrapper for `--accurate-tokens` (v5 N5); 5s AbortController timeout, exponential 429 backoff, key masking |
|
||||
| `humanizer.mjs` | Plain-language output translator (v5.1.0): `humanizeFinding`, `humanizeFindings`, `humanizeEnvelope`, `computeRelevanceContext`. Pure functions; never mutate inputs. Adds `userImpactCategory`, `userActionLanguage`, `relevanceContext` fields and replaces title/description/recommendation when a translation exists. Bypassed by `--raw` and `--json` paths. |
|
||||
| `humanizer-data.mjs` | TRANSLATIONS table for 13 scanner prefixes (CML/SET/HKV/RUL/MCP/IMP/CNF/COL/TOK/CPS/DIS/GAP/PLH). Three-step lookup: exact title → regex pattern → `_default` → fall through to original |
|
||||
|
||||
### Action Engines (`scanners/`)
|
||||
|
||||
| Module | Purpose |
|
||||
|--------|---------|
|
||||
| `fix-engine.mjs` | planFixes(), applyFixes(), verifyFixes() — 9 fix types |
|
||||
| `rollback-engine.mjs` | listBackups(), restoreBackup(), deleteBackup() |
|
||||
| `fix-cli.mjs` | CLI: `node fix-cli.mjs <path> [--apply] [--json] [--global]` |
|
||||
| `drift-cli.mjs` | CLI: `node drift-cli.mjs <path> [--save] [--baseline name] [--json]` |
|
||||
| `whats-active.mjs` | CLI: `node whats-active.mjs <path> [--json] [--verbose] [--suggest-disables]` — read-only active-config inventory |
|
||||
| `token-hotspots-cli.mjs` | CLI: `node token-hotspots-cli.mjs <path> [--json] [--global] [--output-file path] [--accurate-tokens] [--with-telemetry-recipe]` — Opus-4.7 token hotspots ranking with optional API calibration |
|
||||
| `manifest.mjs` | CLI: `node manifest.mjs <path> [--json]` — ranked system-prompt token-source table (v5 N2) |
|
||||
|
||||
### Standalone Scanner
|
||||
|
||||
| Module | Prefix | Purpose |
|
||||
|--------|--------|---------|
|
||||
| `plugin-health-scanner.mjs` | PLH | Plugin structure, frontmatter, cross-plugin conflicts (runs independently) |
|
||||
| `self-audit.mjs` | — | Runs all scanners + plugin health on this plugin itself |
|
||||
|
||||
## Knowledge Base (`knowledge/`)
|
||||
|
||||
| File | Content |
|
||||
|------|---------|
|
||||
| `claude-code-capabilities.md` | Feature register: 18 config surfaces, Anthropic guidance, relevance table |
|
||||
| `configuration-best-practices.md` | Per-layer best practices (v5: Opus 4.7 cache-stability guidance replaces Sonnet-era 200-line rule) |
|
||||
| `anti-patterns.md` | Common mistakes mapped to scanner IDs |
|
||||
| `hook-events-reference.md` | All 26 hook events with details |
|
||||
| `feature-evolution.md` | Feature timeline for staleness detection |
|
||||
| `gap-closure-templates.md` | Config-specific templates for closing gaps |
|
||||
| `opus-4.7-patterns.md` | Token-cost dynamics for Opus 4.7 era — patterns powering the TOK scanner |
|
||||
| `cache-telemetry-recipe.md` | Manual `jq` recipe for verifying prompt-cache hit rate from session transcripts (v5 M7) |
|
||||
|
||||
## Hooks
|
||||
|
||||
| Event | Script | Purpose |
|
||||
|
|
@ -132,56 +59,20 @@ Scanner CLI: `node scanners/scan-orchestrator.mjs <path> [--global] [--full-mach
|
|||
| SessionStart | `session-start.mjs` | Checks for active (unfinished) sessions |
|
||||
| Stop | `stop-session-reminder.mjs` | Reminds about current session phase |
|
||||
|
||||
## Plain-Language Output (v5.1.0)
|
||||
## Reference docs (read on demand)
|
||||
|
||||
Default output of all 18 commands routes through `humanizeEnvelope` from `lib/humanizer.mjs`. Findings are decorated with three additive fields and may have title/description/recommendation replaced when a translation exists.
|
||||
- **Scanner inventory, lib modules, action engines, knowledge base:** `docs/scanner-internals.md`
|
||||
- **Plain-language output (v5.1.0), humanizer vocabularies, output modes:** `docs/humanizer.md`
|
||||
|
||||
### Output modes
|
||||
## Plain-Language Output (v5.1.0) — summary
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| (default, no flag) | Plain-language: humanizer applied, findings group by user-impact, titles lead with prose. Self-audit terminal render also humanized. |
|
||||
| `--raw` | Byte-stable v5.0.0 verbatim — humanizer bypassed, technical IDs and severity-only labels. For tooling that scrapes stderr from v5.0.0. |
|
||||
| `--json` | Unchanged from v5.0.0 — humanizer bypassed, byte-stable JSON envelope. Always preferred for programmatic consumption over `--raw`. |
|
||||
| `--output-file <path>` | Writes raw v5.0.0-shape JSON (humanizer bypassed). Posture-specific. |
|
||||
Default output of all 18 commands routes through `humanizeEnvelope` from `lib/humanizer.mjs`. Findings get three decorated fields:
|
||||
|
||||
`--raw` is threaded through every CLI: `posture.mjs`, `scan-orchestrator.mjs`, `token-hotspots-cli.mjs`, `manifest.mjs`, `whats-active.mjs`, `fix-cli.mjs`, `drift-cli.mjs`, `self-audit.mjs`.
|
||||
- `userImpactCategory` — Configuration mistake / Conflict / Wasted tokens / Dead config / Missed opportunity
|
||||
- `userActionLanguage` — Fix this now / Fix soon / Fix when convenient / Optional cleanup / FYI (derived from severity)
|
||||
- `relevanceContext` — `affects-everyone` (default) / `affects-this-machine-only` (`*.local.*` files) / `test-fixture-no-impact`
|
||||
|
||||
### Vocabularies
|
||||
|
||||
User-impact category (added to each finding as `userImpactCategory`, derived from scanner prefix):
|
||||
|
||||
| Label | Scanners |
|
||||
|-------|----------|
|
||||
| Configuration mistake | CML, SET, HKV, RUL, MCP, IMP, PLH |
|
||||
| Conflict | CNF, COL |
|
||||
| Wasted tokens | TOK, CPS |
|
||||
| Dead config | DIS |
|
||||
| Missed opportunity | GAP |
|
||||
|
||||
Action language (added to each finding as `userActionLanguage`, derived from severity):
|
||||
|
||||
| Severity | Phrase |
|
||||
|----------|--------|
|
||||
| critical | Fix this now |
|
||||
| high | Fix soon |
|
||||
| medium | Fix when convenient |
|
||||
| low | Optional cleanup |
|
||||
| info | FYI |
|
||||
|
||||
Relevance context (added to each finding as `relevanceContext`, computed from finding's file path):
|
||||
|
||||
| Value | When |
|
||||
|-------|------|
|
||||
| `test-fixture-no-impact` | Path contains `/tests/fixtures/` or `/test/fixtures/` |
|
||||
| `affects-this-machine-only` | Basename matches `*.local.*` (e.g., `settings.local.json`) |
|
||||
| `affects-everyone` | Default — assumed shared/committed config |
|
||||
|
||||
### Wave 5 lessons
|
||||
|
||||
- Posture's stderr scorecard is rendered prose-side and is not part of the JSON envelope; `humanized.areas[].titleHumanized` referenced by command templates lives only in the prose render.
|
||||
- Posture's `--output-file` writes raw v5.0.0-shape JSON because `posture.mjs` does not call `humanizeEnvelope`. If session-files should later be humanized, posture needs its own humanize pass — out of v5.1.0 scope.
|
||||
- The default-output snapshot at `tests/snapshots/default-output/posture.json` is frozen — change requires `UPDATE_SNAPSHOT=1` plus intent confirmation.
|
||||
`--raw` bypasses the humanizer for byte-stable v5.0.0 output. `--json` is also byte-stable. Full detail and Wave 5 lessons: `docs/humanizer.md`.
|
||||
|
||||
## Suppressions
|
||||
|
||||
|
|
@ -225,20 +116,3 @@ node --test 'tests/**/*.test.mjs'
|
|||
- Session directories accumulate — use `/config-audit cleanup` to manage
|
||||
- Scanners run on Node.js >= 18 (uses node:test, node:fs/promises)
|
||||
- Plugin CLAUDE.md files in node_modules should be excluded via scope
|
||||
|
||||
## Communication patterns
|
||||
|
||||
### Linking to local files
|
||||
|
||||
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
|
||||
|
||||
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
|
||||
- Always use absolute paths. Never `~/` or relative paths.
|
||||
- For multiple files, render as a bullet list of named markdown links.
|
||||
|
||||
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
|
||||
|
||||
Example:
|
||||
|
||||
- [Brief](file:///Users/ktg/.../brief.html)
|
||||
- [Research summary](file:///Users/ktg/.../research/summary.md)
|
||||
|
|
|
|||
52
plugins/config-audit/docs/humanizer.md
Normal file
52
plugins/config-audit/docs/humanizer.md
Normal file
|
|
@ -0,0 +1,52 @@
|
|||
# Config-Audit — Plain-language output (v5.1.0)
|
||||
|
||||
Imported from `CLAUDE.md` via pointer.
|
||||
|
||||
Default output of all 18 commands routes through `humanizeEnvelope` from `lib/humanizer.mjs`. Findings are decorated with three additive fields and may have title/description/recommendation replaced when a translation exists.
|
||||
|
||||
## Output modes
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| (default, no flag) | Plain-language: humanizer applied, findings group by user-impact, titles lead with prose. Self-audit terminal render also humanized. |
|
||||
| `--raw` | Byte-stable v5.0.0 verbatim — humanizer bypassed, technical IDs and severity-only labels. For tooling that scrapes stderr from v5.0.0. |
|
||||
| `--json` | Unchanged from v5.0.0 — humanizer bypassed, byte-stable JSON envelope. Always preferred for programmatic consumption over `--raw`. |
|
||||
| `--output-file <path>` | Writes raw v5.0.0-shape JSON (humanizer bypassed). Posture-specific. |
|
||||
|
||||
`--raw` is threaded through every CLI: `posture.mjs`, `scan-orchestrator.mjs`, `token-hotspots-cli.mjs`, `manifest.mjs`, `whats-active.mjs`, `fix-cli.mjs`, `drift-cli.mjs`, `self-audit.mjs`.
|
||||
|
||||
## Vocabularies
|
||||
|
||||
User-impact category (added to each finding as `userImpactCategory`, derived from scanner prefix):
|
||||
|
||||
| Label | Scanners |
|
||||
|-------|----------|
|
||||
| Configuration mistake | CML, SET, HKV, RUL, MCP, IMP, PLH |
|
||||
| Conflict | CNF, COL |
|
||||
| Wasted tokens | TOK, CPS |
|
||||
| Dead config | DIS |
|
||||
| Missed opportunity | GAP |
|
||||
|
||||
Action language (added to each finding as `userActionLanguage`, derived from severity):
|
||||
|
||||
| Severity | Phrase |
|
||||
|----------|--------|
|
||||
| critical | Fix this now |
|
||||
| high | Fix soon |
|
||||
| medium | Fix when convenient |
|
||||
| low | Optional cleanup |
|
||||
| info | FYI |
|
||||
|
||||
Relevance context (added to each finding as `relevanceContext`, computed from finding's file path):
|
||||
|
||||
| Value | When |
|
||||
|-------|------|
|
||||
| `test-fixture-no-impact` | Path contains `/tests/fixtures/` or `/test/fixtures/` |
|
||||
| `affects-this-machine-only` | Basename matches `*.local.*` (e.g., `settings.local.json`) |
|
||||
| `affects-everyone` | Default — assumed shared/committed config |
|
||||
|
||||
## Wave 5 lessons
|
||||
|
||||
- Posture's stderr scorecard is rendered prose-side and is not part of the JSON envelope; `humanized.areas[].titleHumanized` referenced by command templates lives only in the prose render.
|
||||
- Posture's `--output-file` writes raw v5.0.0-shape JSON because `posture.mjs` does not call `humanizeEnvelope`. If session-files should later be humanized, posture needs its own humanize pass — out of v5.1.0 scope.
|
||||
- The default-output snapshot at `tests/snapshots/default-output/posture.json` is frozen — change requires `UPDATE_SNAPSHOT=1` plus intent confirmation.
|
||||
76
plugins/config-audit/docs/scanner-internals.md
Normal file
76
plugins/config-audit/docs/scanner-internals.md
Normal file
|
|
@ -0,0 +1,76 @@
|
|||
# Config-Audit — Scanner internals
|
||||
|
||||
Detailed scanner inventory, lib modules, action engines, knowledge base. Imported from `CLAUDE.md` via pointer.
|
||||
|
||||
## Deterministic Scanners
|
||||
|
||||
Node.js scanners (zero external dependencies), run via `node scanners/scan-orchestrator.mjs <path>`.
|
||||
Posture CLI: `node scanners/posture.mjs <path> [--json] [--global] [--full-machine] [--output-file path]`.
|
||||
Scanner CLI: `node scanners/scan-orchestrator.mjs <path> [--global] [--full-machine] [--no-suppress]`.
|
||||
|
||||
| Scanner | Prefix | Detects |
|
||||
|---------|--------|---------|
|
||||
| `claude-md-linter.mjs` | CML | Structure, length, sections, @imports, duplicates, TODOs |
|
||||
| `settings-validator.mjs` | SET | Schema, unknown/deprecated keys, type mismatches, permissions |
|
||||
| `hook-validator.mjs` | HKV | Format, script existence, event validity, timeouts |
|
||||
| `rules-validator.mjs` | RUL | Glob matching, orphan rules, deprecated fields, unscoped rules |
|
||||
| `mcp-config-validator.mjs` | MCP | Server types, trust levels, env vars, unknown fields |
|
||||
| `import-resolver.mjs` | IMP | Broken @imports, circular refs, deep chains, tilde paths |
|
||||
| `conflict-detector.mjs` | CNF | Settings conflicts, permission contradictions, hook duplicates |
|
||||
| `feature-gap-scanner.mjs` | GAP | 25 feature checks across 4 tiers — shown as opportunities, not grades |
|
||||
| `token-hotspots.mjs` | TOK | Cache-breaking volatile content, redundant tool permissions, deep import chains, oversized cascade, bloated SKILL.md descriptions, MCP tool-schema budget (Opus 4.7 patterns) |
|
||||
| `cache-prefix-scanner.mjs` | CPS | Volatile content in lines 31–150 of CLAUDE.md cascade (beyond Pattern A's top-30 window) |
|
||||
| `disabled-in-schema-scanner.mjs` | DIS | Tools listed in BOTH `permissions.deny` AND `permissions.allow` — deny wins, allow entries are dead config |
|
||||
| `collision-scanner.mjs` | COL | Cross-plugin skill name collisions (low); user-vs-plugin overlaps (medium); `details.namespaces` payload |
|
||||
|
||||
## Scanner Lib (`scanners/lib/`)
|
||||
|
||||
| Module | Purpose |
|
||||
|--------|---------|
|
||||
| `severity.mjs` | Severity constants, risk scoring, verdict logic, `WEIGHTS` named export (v5 F3) |
|
||||
| `output.mjs` | Finding objects (CA-XXX-NNN format), scanner results, envelope, optional `details` payload (v5 N6) |
|
||||
| `file-discovery.mjs` | Config file discovery: single-path, multi-path (`discoverConfigFilesMulti`), full-machine (`discoverFullMachinePaths`) |
|
||||
| `yaml-parser.mjs` | Frontmatter parsing, JSON parsing, @import/section extraction |
|
||||
| `string-utils.mjs` | Line counting, truncation, similarity, key extraction |
|
||||
| `scoring.mjs` | Severity-weighted `scoreByArea` (v5 F3), health scorecard, dedup-by-area (v5 N3), `scoringVersion: 'v5'` |
|
||||
| `backup.mjs` | Backup creation, manifest parsing, checksum verification |
|
||||
| `diff-engine.mjs` | Drift diffing: diffEnvelopes(), formatDiffReport() |
|
||||
| `baseline.mjs` | Baseline save/load/list/delete for drift detection |
|
||||
| `report-generator.mjs` | Unified markdown reports: posture, drift, plugin health |
|
||||
| `suppression.mjs` | .config-audit-ignore parsing, finding suppression, audit trail |
|
||||
| `active-config-reader.mjs` | Read-only inventory: readActiveConfig(), detectGitRoot(), walkClaudeMdCascade(), readClaudeJsonProjectSlice() (longest-prefix match), enumeratePlugins(), enumerateSkills(), readActiveHooks(), readActiveMcpServers() (with cache → package.json tool-count fallback), estimateTokens() (v5: `'mcp'` kind = 500 + toolCount × 200) |
|
||||
| `tokenizer-api.mjs` | Anthropic `count_tokens` wrapper for `--accurate-tokens` (v5 N5); 5s AbortController timeout, exponential 429 backoff, key masking |
|
||||
| `humanizer.mjs` | Plain-language output translator (v5.1.0): `humanizeFinding`, `humanizeFindings`, `humanizeEnvelope`, `computeRelevanceContext`. Pure functions; never mutate inputs. Adds `userImpactCategory`, `userActionLanguage`, `relevanceContext` fields and replaces title/description/recommendation when a translation exists. Bypassed by `--raw` and `--json` paths. |
|
||||
| `humanizer-data.mjs` | TRANSLATIONS table for 13 scanner prefixes (CML/SET/HKV/RUL/MCP/IMP/CNF/COL/TOK/CPS/DIS/GAP/PLH). Three-step lookup: exact title → regex pattern → `_default` → fall through to original |
|
||||
|
||||
## Action Engines (`scanners/`)
|
||||
|
||||
| Module | Purpose |
|
||||
|--------|---------|
|
||||
| `fix-engine.mjs` | planFixes(), applyFixes(), verifyFixes() — 9 fix types |
|
||||
| `rollback-engine.mjs` | listBackups(), restoreBackup(), deleteBackup() |
|
||||
| `fix-cli.mjs` | CLI: `node fix-cli.mjs <path> [--apply] [--json] [--global]` |
|
||||
| `drift-cli.mjs` | CLI: `node drift-cli.mjs <path> [--save] [--baseline name] [--json]` |
|
||||
| `whats-active.mjs` | CLI: `node whats-active.mjs <path> [--json] [--verbose] [--suggest-disables]` — read-only active-config inventory |
|
||||
| `token-hotspots-cli.mjs` | CLI: `node token-hotspots-cli.mjs <path> [--json] [--global] [--output-file path] [--accurate-tokens] [--with-telemetry-recipe]` — Opus-4.7 token hotspots ranking with optional API calibration |
|
||||
| `manifest.mjs` | CLI: `node manifest.mjs <path> [--json]` — ranked system-prompt token-source table (v5 N2) |
|
||||
|
||||
## Standalone Scanner
|
||||
|
||||
| Module | Prefix | Purpose |
|
||||
|--------|--------|---------|
|
||||
| `plugin-health-scanner.mjs` | PLH | Plugin structure, frontmatter, cross-plugin conflicts (runs independently) |
|
||||
| `self-audit.mjs` | — | Runs all scanners + plugin health on this plugin itself |
|
||||
|
||||
## Knowledge Base (`knowledge/`)
|
||||
|
||||
| File | Content |
|
||||
|------|---------|
|
||||
| `claude-code-capabilities.md` | Feature register: 18 config surfaces, Anthropic guidance, relevance table |
|
||||
| `configuration-best-practices.md` | Per-layer best practices (v5: Opus 4.7 cache-stability guidance replaces Sonnet-era 200-line rule) |
|
||||
| `anti-patterns.md` | Common mistakes mapped to scanner IDs |
|
||||
| `hook-events-reference.md` | All 26 hook events with details |
|
||||
| `feature-evolution.md` | Feature timeline for staleness detection |
|
||||
| `gap-closure-templates.md` | Config-specific templates for closing gaps |
|
||||
| `opus-4.7-patterns.md` | Token-cost dynamics for Opus 4.7 era — patterns powering the TOK scanner |
|
||||
| `cache-telemetry-recipe.md` | Manual `jq` recipe for verifying prompt-cache hit rate from session transcripts (v5 M7) |
|
||||
|
|
@ -63,20 +63,3 @@ node --test plugins/graceful-handoff/tests/
|
|||
- v1.0.0 (2026-04-19): initial declarative command
|
||||
- v2.0.0 (2026-05-01): skill-arkitektur + JSON-pipeline + 3 hooks + auto-trigger (BREAKING)
|
||||
- v2.1.0 (2026-05-01): modell-bevisst kontekstvindu — 4-stegs resolution-kjede (used_percentage → payload-size → model-map → 1M default). Fikser for-tidlig auto-handoff på Opus 4.7
|
||||
|
||||
## Communication patterns
|
||||
|
||||
### Linking to local files
|
||||
|
||||
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
|
||||
|
||||
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
|
||||
- Always use absolute paths. Never `~/` or relative paths.
|
||||
- For multiple files, render as a bullet list of named markdown links.
|
||||
|
||||
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
|
||||
|
||||
Example:
|
||||
|
||||
- [Brief](file:///Users/ktg/.../brief.html)
|
||||
- [Research summary](file:///Users/ktg/.../research/summary.md)
|
||||
|
|
|
|||
|
|
@ -45,20 +45,3 @@ If `keep-coding-instructions` is removed or set to `false`, Claude Code will str
|
|||
- Per-plugin variants (code-focused, deep-technical, etc.) — would belong in a future v1.1 if there's real demand
|
||||
- Forcing the style on other plugins — it remains opt-in. Other plugins may reference it in their READMEs.
|
||||
- Translation of the style file itself into Norwegian — defeats the purpose of language-agnostic instruction
|
||||
|
||||
## Communication patterns
|
||||
|
||||
### Linking to local files
|
||||
|
||||
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
|
||||
|
||||
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
|
||||
- Always use absolute paths. Never `~/` or relative paths.
|
||||
- For multiple files, render as a bullet list of named markdown links.
|
||||
|
||||
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
|
||||
|
||||
Example:
|
||||
|
||||
- [Brief](file:///Users/ktg/.../brief.html)
|
||||
- [Research summary](file:///Users/ktg/.../research/summary.md)
|
||||
|
|
|
|||
|
|
@ -98,20 +98,3 @@ All content commands (post, quick, react, pipeline, first-post, video, multiplat
|
|||
5. Topic must align with user's 5 core expertise areas (360Brew signal)
|
||||
6. Topic rotation: no back-to-back same pillar, no pillar >50% in 14 days (warn-only)
|
||||
7. Progressive onboarding: personalization score hidden until 3+ posts; voice guardian suppressed until 5+ voice samples
|
||||
|
||||
## Communication patterns
|
||||
|
||||
### Linking to local files
|
||||
|
||||
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
|
||||
|
||||
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
|
||||
- Always use absolute paths. Never `~/` or relative paths.
|
||||
- For multiple files, render as a bullet list of named markdown links.
|
||||
|
||||
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
|
||||
|
||||
Example:
|
||||
|
||||
- [Brief](file:///Users/ktg/.../brief.html)
|
||||
- [Research summary](file:///Users/ktg/.../research/summary.md)
|
||||
|
|
|
|||
|
|
@ -2,167 +2,7 @@
|
|||
|
||||
Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1822+ unit, integration, and end-to-end tests (`tests/e2e/` covers the multi-hook attack chain, multi-session state simulation, and the full scan-orchestrator pipeline); mutation-testing coverage not published.
|
||||
|
||||
**v7.0.0 — Severity-dominated risk scoring (v2 model, BREAKING).** Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise):
|
||||
|
||||
1. **Risk-score v2 formula** (`scanners/lib/severity.mjs`) — severity-dominated, log-scaled within tier. Replaces v1 sum-and-cap that collapsed every non-trivial scan to 100/Extreme. Tiers: critical → 70–95, high only → 40–65, medium only → 15–35, low only → 1–11. Verdict cutoffs realigned to new bands (BLOCK ≥65, WARNING ≥15). `info` findings are observability-only — counted in OWASP aggregates but contribute zero to risk_score, verdict, and riskBand (B3, v7.2.0 — was undocumented pre-7.2.0). See `severity.mjs` JSDoc for full contract.
|
||||
2. **Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy** — extensions skipped (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`); line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source.
|
||||
3. **DEP typosquat allowlist expansion** — 22 npm + 5 PyPI entries for short-name tools that tripped Levenshtein detection on every modern codebase (`knip`, `oxlint`, `tsx`, `nx`, `rimraf`, `uv`, `ruff`, etc.).
|
||||
|
||||
See `docs/security-hardening-guide.md` §6 for the calibration story.
|
||||
|
||||
**v7.1.1 — Scan-rapport narrative coherence (patch).** Three coordinated
|
||||
edits address the whiplash symptom that survived v7.0.0 (numbers fixed,
|
||||
narrative still walked findings back as "false positive" in prose):
|
||||
(a) `agents/skill-scanner-agent.md` Step 2.5 mandates context-first
|
||||
severity assignment — every signal has exactly one disposition (suppressed
|
||||
OR reported), no per-finding walk-back; (b) `templates/unified-report.md`
|
||||
gains a `### Narrative Audit` block in Executive Summary surfacing
|
||||
`summary.narrative_audit.suppressed_findings.{count, by_category}` from
|
||||
the agent's trailing JSON; (c) both files updated from stale v1
|
||||
risk-formula constants to the v2 model that has been authoritative in
|
||||
`severity.mjs` since v7.0.0. Counter is distinct from the existing
|
||||
top-level `output.suppressed` (`.llm-security-ignore` rule integer).
|
||||
Out-of-scope but flagged: `commands/scan.md:113-114` retains the v1
|
||||
formula; resolution deferred to Batch B.
|
||||
|
||||
**v7.3.0 — MCP cumulative-drift baseline (in progress, Wave C of Batch C).**
|
||||
Closes E14 from `docs/critical-review-2026-04-20.md`. The
|
||||
`mcp-description-cache.mjs` schema gains a sticky `baseline` slot per
|
||||
tool plus a 10-event rolling `history` array (FIFO). Cumulative drift =
|
||||
`levenshtein(current, baseline) / max(|current|, |baseline|)`; when the
|
||||
ratio crosses `mcp.cumulative_drift_threshold` (default 0.25),
|
||||
`post-mcp-verify.mjs` emits a separate MEDIUM `mcp-cumulative-drift`
|
||||
advisory. The existing per-update >10% drift signal is unchanged — both
|
||||
fire independently. Slow-burn rug-pulls that keep each update under the
|
||||
per-update threshold but cumulatively diverge from baseline are now
|
||||
caught. Baseline survives the 7-day TTL purge so detection persists
|
||||
across the full window. New `/security mcp-baseline-reset` slash command
|
||||
(plus `scanners/mcp-baseline-reset.mjs` CLI: `--list`, `--target <tool>`,
|
||||
or no-args clear-all) lets the user acknowledge a legitimate MCP server
|
||||
upgrade — clearing the baseline causes the next call to seed a fresh
|
||||
one from the incoming description; description, firstSeen, lastSeen, and
|
||||
history are preserved for audit. `LLM_SECURITY_MCP_CACHE_FILE` env var
|
||||
overrides the cache path for end-to-end testing without polluting the
|
||||
user's real `~/.cache/llm-security/mcp-descriptions.json`.
|
||||
|
||||
**v7.3.0 — Env-var deprecation warnings (D3 of Batch C, Wave D).**
|
||||
Closes 8.7 from `.claude/projects/2026-04-29-batch-c-scope-finalize/plan.md`.
|
||||
`scanners/lib/policy-loader.mjs` exports a new helper
|
||||
`getPolicyValueWithEnvWarn(section, key, envVarName, defaultValue)` —
|
||||
env still wins per Preferences (existing behaviour), but when both the
|
||||
env-var AND the `policy.json` key are explicitly set, the helper emits a
|
||||
single per-process stderr line: `[llm-security] Deprecation: env-var
|
||||
${ENVVAR} will be removed in v8.0.0; policy.json key ${section}.${key}
|
||||
also set — env wins for now. Suppress with LLM_SECURITY_DEPRECATION_QUIET=1.`
|
||||
Module-scoped `Set` dedupes per env-var name across call-sites. Four
|
||||
overlapping vars are wired through the helper:
|
||||
`LLM_SECURITY_INJECTION_MODE` ↔ `injection.mode` (in
|
||||
`pre-prompt-inject-scan.mjs`), `LLM_SECURITY_TRIFECTA_MODE` ↔
|
||||
`trifecta.mode` and `LLM_SECURITY_ESCALATION_WINDOW` ↔
|
||||
`trifecta.escalation_window` (in `post-session-guard.mjs`),
|
||||
`LLM_SECURITY_AUDIT_LOG` ↔ `audit.log_path` (in
|
||||
`scanners/lib/audit-trail.mjs`). `DEFAULT_POLICY` gains
|
||||
`trifecta.escalation_window: 5` to close the gap noted in the plan
|
||||
revisions table (M10). Env-only vars without policy.json equivalents
|
||||
(`LLM_SECURITY_UPDATE_CHECK`, `LLM_SECURITY_PRECOMPACT_MODE`,
|
||||
`LLM_SECURITY_PRECOMPACT_MAX_BYTES`, `LLM_SECURITY_IDE_ROOTS`,
|
||||
`LLM_SECURITY_MCP_CACHE_FILE`) are unchanged — they emit no
|
||||
deprecation signal because there is nothing to deprecate yet.
|
||||
|
||||
**v7.5.0 — Playground (additive surface, no scanner/hook behavior changes).**
|
||||
Single-file SPA at `playground/llm-security-playground.html` (~10 200 lines)
|
||||
for onboarding, demo og workshop-bruk uten Claude Code-installasjon. Parser
|
||||
+ renderer for alle 18 `produces_report=true`-kommandoer i `CATALOG`. State
|
||||
i IndexedDB primær (`llm-security-playground-v1`) med localStorage-fallback,
|
||||
sirkelfri Proxy + EventTarget store, microtask-batchet render. Theme-bootstrap
|
||||
med FOUC-prevention. 4 overflater: onboarding (5 grupper) → home (3 tracks)
|
||||
→ catalog (20 kommandoer) ⇄ project (rapporter / oversikt / kontekst /
|
||||
eksport). Demo-state har tre prosjekter inline; `dft-komplett-demo` har alle
|
||||
18 rapporter ferdig parsed for klikk-gjennom. Vendor-synket design-system
|
||||
under `playground/vendor/playground-design-system/` (sjekksum-låst via
|
||||
`MANIFEST.json`, redigeres aldri direkte). Test-fixtures under
|
||||
`playground/test-fixtures/` (én markdown-fil per kommando) er kontrakt-anker
|
||||
for parser-utvikling. Skjermdumper i `playground/screenshots/v7.5.0/`.
|
||||
Eksponerte vinduer-globaler for testing/automasjon: `__store`, `__navigate`,
|
||||
`__loadDemoState`, `__scheduleRender`, `__PARSERS`, `__RENDERERS`, `__CATALOG`,
|
||||
`__inferVerdict`, `__inferKeyStats`, `__renderPageShell`, `__handlePasteImport`.
|
||||
Inkluderer fix av `normalizeVerdictText` regex-rekkefølge: GO-WITH-CONDITIONS
|
||||
sjekkes før GO så betinget verdict ikke kollapser til ALLOW.
|
||||
|
||||
**v7.6.0 — Playground Tier 3-referanse-case (additive surface, no
|
||||
scanner/hook behavior changes).** Playgroundet er nå en visuelt og
|
||||
strukturelt fullført referanse-implementasjon for
|
||||
`shared/playground-design-system/` Tier 3-supplementet. 8 nye Tier 3-
|
||||
komponenter integrert i de 18 rapport-rendererne: `tfa-flow` + `tfa-leg`
|
||||
+ `tfa-arrow` (lethal trifecta-kjede med `<button>`-elementer + ARIA-
|
||||
group/aria-label) i `renderScan` + `renderDeepScan`; `mat-ladder` +
|
||||
`mat-step` (5-trinns modenhets-stige med terskler 0/25/50/75/95% PASS)
|
||||
i `renderPosture`; `suppressed-group` (narrative-audit fra
|
||||
`summary.narrative_audit.suppressed_findings`) i `renderScan` +
|
||||
`renderDeepScan`; `codepoint-reveal` + `cp-tag`/`cp-zw`/`cp-bidi`
|
||||
(Unicode-steganografi side-ved-side reveal med U+200B-D|FEFF|2060|180E
|
||||
→ `cp-zw`, U+202A-E|2066-9 → `cp-bidi`-detection) i `renderMcpInspect`;
|
||||
`top-risks` + `top-risk[data-severity]` (rangert top-funn-listing,
|
||||
semantisk `<ol>`, ekskluderer info-funn) i `renderScan`/`renderDeepScan`/
|
||||
`renderPluginAudit`/`renderPosture`/`renderAudit`; utvidet
|
||||
`recommendation-card[data-severity]` (severity-tinted advisory) på alle
|
||||
inline-bruk + nye per-bucket advisory-cards i `renderClean` + intro
|
||||
snapshot + diff-rows i `renderHarden` (action-mapping CREATE→positive /
|
||||
APPEND→medium / MERGE→low / SKIP→low); `risk-meter` (band-visualisering
|
||||
0-100 med Low/Medium/High/Critical/Extreme bands) på 5 archetypes
|
||||
(scan, deep-scan, plugin-audit, audit, red-team); `card--severity-{level}`
|
||||
modifier på `findings__item`-cards. Wave 1 (Sesjon 2) la til
|
||||
`badge--scope-security` (identitets-chip), `verdict-pill-lg` med
|
||||
`__verdict`+`__sub` (erstatter custom verdict-pill på alle 18 rapport-
|
||||
typer), og DS Tier 3 `form-progress` + `fp-step` i onboarding-wizard.
|
||||
Wave 0 (Sesjon 1) slettet ~30 duplikat-CSS-deklarasjoner fra `<style>`-
|
||||
blokken (DS vinner cascade) og harmoniserte page-shell på alle 4
|
||||
overflater. 5 nye DS-helpers: `renderToxicFlow`, `renderMatLadder`,
|
||||
`renderSuppressedGroup`, `renderCodepointReveal`, `renderTopRisks`.
|
||||
2 nye normaliserings-helpers: `mapSeverityToCardLevel(input)` (severity
|
||||
+ action-types til DS-konvensjoner) og `parseNarrativeAudit(md)`. 12
|
||||
skjermdumper planlagt i `playground/screenshots/v7.6.0/`. A11Y-rapport
|
||||
oppdatert (`playground/A11Y-RAPPORT.md`) — WCAG 2.1 AA bekreftet,
|
||||
severity-soft fargepar verifisert, semantiske elementer (`<ol>`,
|
||||
`<button>`, `<section>`) erstatter generic `<div>`. Filendring totalt
|
||||
over 5 sesjoner: 10209 → 10677 linjer. Kjent begrensning: `parsed.findings`
|
||||
er tom for `deep-scan`/`audit` demo-fixturer (parser-begrensning,
|
||||
ikke fikset i v7.6.0 — sporet for v7.6.x patch).
|
||||
|
||||
**v7.6.1 — Playground visuell-patch (no scanner/hook behavior changes).**
|
||||
Seks bugs fanget av maintainer ved manuell verifisering i nettleser
|
||||
etter v7.6.0-release. Alle skyldtes mismatch mellom DS-klasser og
|
||||
hvordan playground-rendrere brukte dem (eller manglende DS-
|
||||
implementasjoner av klasser playground-rendrere antok eksisterte).
|
||||
(1) `renderFindingsBlock` brukte `.findings` outer-class som DS har som
|
||||
2-kolonners grid (`grid-template-columns: 360px 1fr`) for list+detail-
|
||||
panel-layout — playground brukte den uten detail-panel, headeren havnet
|
||||
i venstre 360px-kolonne, items i 1fr. Erstattet med
|
||||
`<section class="report-meta">` + `<h4>` + korrekt `findings__list >
|
||||
findings__group > findings__group-header + findings__items`-mønster.
|
||||
(2) `.report-table` manglet helt i DS men brukes i 7+ rendrere (OWASP-
|
||||
kategorier, Supply chain, Scanner Risk Matrix, Plugin-meta, Permission-
|
||||
matrise, Live-meter, Siste runs, Godkjenninger, Mitigation roadmap) —
|
||||
lagt lokal CSS-implementasjon i playground-HTML `<style>`-blokk
|
||||
(border-collapse, zebra-hover, header-styling). (3) `renderPreDeploy`
|
||||
traffic-lights brukte `.sm-card__grade` som er fast 28×28 px (designet
|
||||
for én A-F-bokstav) — kuttet "PASS" til "AS" og "PASS-WITH-NOTES" til
|
||||
"PASS-WITH-..." i alle traffic-light-cards. Erstattet med bredde-
|
||||
tilpasset status-pill via inline styling (severity-soft + on tokens).
|
||||
(4) Threat-model matrix-bobler ikke klikkbare — `<span>` uten event-
|
||||
handler. Erstattet med `<button type="button" data-threat-id>` +
|
||||
`aria-label`. Click-handler scroller til tilsvarende rad i Trusler-
|
||||
tabellen og fremhever den i 1.6 sek. (5) Radar-labels overlappet ved 6+
|
||||
akser — alle brukte `text-anchor="middle"` med samme offset. Økt SVG-
|
||||
størrelse fra 280×280 til 380×380, radius fra 105 til 125, bytter
|
||||
`text-anchor` fra `middle` til `start`/`end` basert på horisontal-
|
||||
posisjon (`Math.cos(ang)` > 0.2 / < -0.2 / mellom). (6)
|
||||
`recommendation-card__body` tekstoverflyt på lange single-line tekster
|
||||
(vilkår, owner-tags, dato) — lagt `overflow-wrap: anywhere; word-break:
|
||||
break-word` i lokal `<style>`-blokk. 4/4 fix-spesifikke smoke-tester
|
||||
passerer + 18/18 renderere produserer fortsatt komplett HTML mot
|
||||
`dft-komplett-demo` (regresjons-test). Filendring 10677 → 10753 linjer
|
||||
(+76 netto).
|
||||
Release notes for v7.0.0 → v7.6.1: see `docs/version-history.md` — read on demand.
|
||||
|
||||
## Commands
|
||||
|
||||
|
|
@ -176,7 +16,7 @@ passerer + 18/18 renderere produserer fortsatt komplett HTML mot
|
|||
| `/security mcp-audit [--live]` | MCP server config audit (add `--live` for runtime inspection) |
|
||||
| `/security mcp-inspect` | Live MCP server inspection — connect via JSON-RPC 2.0, scan tool descriptions |
|
||||
| `/security mcp-baseline-reset` | Reset MCP description baseline cache (E14, v7.3.0) — after legitimate MCP server upgrade |
|
||||
| `/security ide-scan [target\|url]` | Scan installed VS Code + JetBrains extensions/plugins — OR fetch a remote VSIX from Marketplace, OpenVSX, or direct URL (v6.4.0), OR a JetBrains plugin from `plugins.jetbrains.com` (v6.6.0). 7 VS Code checks + 7 JetBrains-specific checks (theme-with-code, broad activation, Premain-Class instrumentation, native binaries, depends-chain, typosquat, shaded jars). Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps — no fuzz-testing results published to date). Orchestrates reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension. Offline by default, `--online` opt-in |
|
||||
| `/security ide-scan [target\|url]` | Scan installed VS Code + JetBrains extensions/plugins, or fetch a remote VSIX/JetBrains plugin via URL. Details: `docs/scanner-reference.md` |
|
||||
| `/security posture` | Quick scorecard (13 categories) |
|
||||
| `/security threat-model` | Interactive STRIDE/MAESTRO session |
|
||||
| `/security diff [path]` | Compare scan against baseline — shows new/resolved/unchanged/moved |
|
||||
|
|
@ -186,7 +26,7 @@ passerer + 18/18 renderere produserer fortsatt komplett HTML mot
|
|||
| `/security clean [path]` | Scan + remediate (auto/semi-auto/manual) |
|
||||
| `/security dashboard` | Cross-project security dashboard — machine-wide posture overview |
|
||||
| `/security harden [path]` | Generate Grade A config — settings.json, CLAUDE.md, .gitignore |
|
||||
| `/security red-team [--category] [--adaptive]` | Attack simulation — 64 scenarios across 12 categories against plugin hooks. `--adaptive` for mutation-based evasion testing |
|
||||
| `/security red-team [--category] [--adaptive]` | Attack simulation — 64 scenarios across 12 categories against plugin hooks |
|
||||
| `/security pre-deploy` | Pre-deployment checklist |
|
||||
|
||||
## Agents
|
||||
|
|
@ -206,185 +46,47 @@ passerer + 18/18 renderere produserer fortsatt komplett HTML mot
|
|||
|--------|-------|---------|---------|
|
||||
| `pre-prompt-inject-scan.mjs` | UserPromptSubmit | — | Block prompt injection, warn on manipulation (incl. oversight evasion, HTML obfuscation, MEDIUM advisory for leetspeak/homoglyphs/zero-width/multi-lang). Unicode Tag steganography detection. Mode: `LLM_SECURITY_INJECTION_MODE=block\|warn\|off` |
|
||||
| `pre-edit-secrets.mjs` | PreToolUse | `Edit\|Write` | Block credentials in files |
|
||||
| `pre-bash-destructive.mjs` | PreToolUse | `Bash` | Block rm -rf, curl\|sh, fork bombs, eval. Bash evasion normalization (T1-T6 via `bash-normalize.mjs`: empty quotes, ${} expansion, backslash splitting, IFS, ANSI-C hex) — defense-in-depth mot T1-T6; Claude Code 2.1.98+ dekker harness-nivå |
|
||||
| `pre-bash-destructive.mjs` | PreToolUse | `Bash` | Block rm -rf, curl\|sh, fork bombs, eval. Bash evasion normalization (T1-T6 via `bash-normalize.mjs`) — defense-in-depth |
|
||||
| `pre-install-supply-chain.mjs` | PreToolUse | `Bash` | Block compromised packages across ALL ecosystems. Bash evasion normalization before gate matching |
|
||||
| `pre-write-pathguard.mjs` | PreToolUse | `Write` | Block writes to .env, .ssh/, .aws/, credentials, settings |
|
||||
| `post-mcp-verify.mjs` | PostToolUse | — (all) | Injection scan on ALL tool output (incl. MEDIUM patterns, HITL traps, sub-agent spawn, NL indirection, cognitive load, hybrid P2SQL/recursive/XSS). HTML content trap detection. Bash-specific: secrets/URLs/size. MCP: per-update description drift (MCP05) AND cumulative drift vs sticky baseline (E14, v7.3.0) — slow-burn rug-pulls that stay under the per-update threshold but diverge >=25% from baseline emit MEDIUM `mcp-cumulative-drift` advisory. Per-tool volume tracking |
|
||||
| `post-session-guard.mjs` | PostToolUse | — (all) | Runtime trifecta detection (Rule of Two). Sliding window (20 calls) + 100-call long-horizon. MCP-concentrated trifecta (same server = elevated severity). Sensitive path + exfil detection. Slow-burn trifecta (legs >50 calls apart = MEDIUM). Behavioral drift detection (Jensen-Shannon divergence). CaMeL-inspired data flow tagging (SHA-256 provenance tracking, output→input linking). Mode: `LLM_SECURITY_TRIFECTA_MODE=block\|warn\|off` (default: warn). Cumulative data volume tracking (100KB/500KB/1MB thresholds). Sub-agent delegation tracking (Task/Agent tools): escalation-after-input advisory when delegation occurs within `LLM_SECURITY_ESCALATION_WINDOW` calls (default 5) of untrusted input (DeepMind Agent Traps kat. 4); secondary 20-call MEDIUM advisory catches slow-burn variants outside the primary window (E17, v7.2.0) |
|
||||
| `post-mcp-verify.mjs` | PostToolUse | — (all) | Injection scan on ALL tool output. MCP per-update drift + cumulative drift vs sticky baseline (E14, v7.3.0). Per-tool volume tracking |
|
||||
| `post-session-guard.mjs` | PostToolUse | — (all) | Runtime trifecta detection (Rule of Two). Sliding window + long-horizon. Behavioral drift (Jensen-Shannon). Mode: `LLM_SECURITY_TRIFECTA_MODE=block\|warn\|off` (default: warn) |
|
||||
| `update-check.mjs` | UserPromptSubmit | — | Checks for newer versions (max 1x/24h, cached). Disable: `LLM_SECURITY_UPDATE_CHECK=off` |
|
||||
| `pre-compact-scan.mjs` | PreCompact | — | Scan transcript for injection patterns + credentials before context compaction; prevents poisoned content from surviving in compact form. Reads at most last 512 KB for <500ms latency. Mode: `LLM_SECURITY_PRECOMPACT_MODE=block\|warn\|off` (default: warn). Cap: `LLM_SECURITY_PRECOMPACT_MAX_BYTES` |
|
||||
| `pre-compact-scan.mjs` | PreCompact | — | Scan transcript for injection + credentials before context compaction. Reads at most last 512 KB. Mode: `LLM_SECURITY_PRECOMPACT_MODE=block\|warn\|off` (default: warn) |
|
||||
|
||||
> `pre-install-supply-chain.mjs` covers 7 package managers: npm/yarn/pnpm, pip/pip3/uv, brew, docker, go, cargo, gem. Per-ecosystem blocklists, age gate (<72h), npm audit (critical=block, high=warn), PyPI API inspection, Levenshtein typosquat detection, Docker image verification.
|
||||
|
||||
Scanner internals, CLI surface, CI/CD templates, knowledge files, and runnable examples: see `docs/scanner-reference.md`.
|
||||
|
||||
Defense philosophy (v5.0), Opus 4.7 alignment, known limitations: see `docs/defense-philosophy.md`.
|
||||
|
||||
## Remote Repo Support
|
||||
|
||||
`scan` and `plugin-audit` accept GitHub URLs directly. The command clones to a temp dir via `scanners/lib/git-clone.mjs`, scans locally, then cleans up. Use `--branch <name>` for non-default branches.
|
||||
|
||||
**Clone sandboxing (v5.1):** `git clone` executes code via `.gitattributes` filter/smudge drivers — this is a known attack vector. Two layers of defense:
|
||||
**Clone sandboxing (v5.1):** Two layers of defense against `git clone` filter/smudge driver attacks:
|
||||
1. **Git config flags (all platforms):** `core.hooksPath=/dev/null`, `core.symlinks=false`, `core.fsmonitor=false`, all LFS filter drivers disabled, `protocol.file.allow=never`, `transfer.fsckObjects=true`. Environment: `GIT_CONFIG_NOSYSTEM=1`, `GIT_CONFIG_GLOBAL=/dev/null`, `GIT_ATTR_NOSYSTEM=1`, `GIT_TERMINAL_PROMPT=0`.
|
||||
2. **OS sandbox:** macOS `sandbox-exec` or Linux `bubblewrap` (bwrap) restricts file writes to only the specific temp directory. Even if a filter driver bypasses git config, it cannot write outside the clone dir. Fallback on Windows or when neither sandbox is available: git config flags only, WARN logged.
|
||||
2. **OS sandbox:** macOS `sandbox-exec` or Linux `bubblewrap` (bwrap) restricts file writes to only the specific temp directory. Fallback on Windows: git config flags only.
|
||||
|
||||
Platform matrix: macOS (`sandbox-exec`) — always works. Linux (`bwrap`) — works on Fedora/Arch, may fail on Ubuntu 24.04+ without admin AppArmor config. Windows — no OS sandbox, git config flags only.
|
||||
Platform matrix: macOS (`sandbox-exec`) — always works. Linux (`bwrap`) — Fedora/Arch fine, may fail on Ubuntu 24.04+ without admin AppArmor config. Windows — no OS sandbox.
|
||||
|
||||
Post-clone: size check (100MB max), cleanup guarantee (temp dir + evidence file always removed, even on error).
|
||||
|
||||
**Prompt injection defense:** Remote scans use `scanners/content-extractor.mjs` to pre-extract structured evidence and strip injection patterns BEFORE LLM agents see the content. Agents analyze a JSON evidence package, never raw files from untrusted repos.
|
||||
|
||||
## Scanners
|
||||
|
||||
**Orchestrated (10):** Run via `node scanners/scan-orchestrator.mjs <target> [--fail-on <severity>] [--compact] [--output-file <path>] [--baseline] [--save-baseline]`.
|
||||
`--fail-on <critical|high|medium|low>`: exit 1 if findings at/above severity, exit 0 otherwise. `--compact`: one-liner per finding format. Both configurable via `policy.json` `ci` section.
|
||||
With `--output-file`: full JSON to file, compact aggregate to stdout. `--baseline` diffs against stored baseline. `--save-baseline` saves results for future diffs. Baselines stored in `reports/baselines/<target-hash>.json`.
|
||||
|
||||
10 scanners: unicode, entropy, permission, dep-audit, taint, git-forensics, network, memory-poisoning, supply-chain-recheck, toxic-flow.
|
||||
Lib: `mcp-description-cache.mjs` — caches MCP tool descriptions in `~/.cache/llm-security/mcp-descriptions.json`, detects per-update drift via Levenshtein (>10% = alert), 7-day TTL. v7.3.0 (E14) adds a sticky baseline slot per tool plus a 10-event rolling history; cumulative drift = `levenshtein(current, baseline) / max(|current|,|baseline|)`. When ratio ≥ `mcp.cumulative_drift_threshold` (default 0.25), emits `mcp-cumulative-drift` advisory through `post-mcp-verify.mjs`. Baseline survives TTL purge so slow-burn drift is preserved across the 7-day window. `clearBaseline(tool?)` exposed for the `/security mcp-baseline-reset` command. `LLM_SECURITY_MCP_CACHE_FILE` env var overrides the cache path for testing.
|
||||
Supply-chain-recheck (SCR) re-audits installed dependencies from lockfiles (package-lock.json, yarn.lock, requirements.txt, Pipfile.lock) against blocklists, OSV.dev batch API, and typosquat detection. Offline fallback available. Shared data module: `scanners/lib/supply-chain-data.mjs`.
|
||||
Memory-poisoning (MEM) detects cognitive state poisoning in CLAUDE.md, memory files, and .claude/rules — injection patterns, shell commands, credential paths, permission expansion, suspicious URLs, encoded payloads.
|
||||
Toxic-flow (TFA) is a post-processing correlator that runs LAST — detects "lethal trifecta" (untrusted input + sensitive data access + exfiltration sink) by correlating output from prior scanners.
|
||||
Utility: `node scanners/lib/fs-utils.mjs <backup|restore|cleanup|tmppath> [args]`.
|
||||
|
||||
Lib: `sarif-formatter.mjs` — converts scan output to OASIS SARIF 2.1.0 format. Used by `--format sarif` flag.
|
||||
Lib: `audit-trail.mjs` — writes structured JSONL audit events (ISO 8601, OWASP tags, SIEM-ready). Env: `LLM_SECURITY_AUDIT_*`.
|
||||
Lib: `policy-loader.mjs` — reads `.llm-security/policy.json` for distributable hook configuration. Includes `ci` section (`failOn`, `compact`) for CI/CD defaults. Defaults match hardcoded values.
|
||||
|
||||
**Standalone (8):** `posture-scanner.mjs` — deterministic posture assessment, 16 categories (incl. EU AI Act, NIST AI RMF, ISO 42001), <50ms. NOT in scan-orchestrator (meta-level, not code-level).
|
||||
Run: `node scanners/posture-scanner.mjs [path]` → JSON stdout. Scanner prefix: PST. Used by `/security posture` and `/security audit`.
|
||||
`mcp-live-inspect.mjs` — NOT in scan-orchestrator. MCP servers are running processes, not files.
|
||||
Run: `node scanners/mcp-live-inspect.mjs [target] [--timeout 10000] [--skip-global]`
|
||||
Scanner prefix: MCI. OWASP: MCP03, MCP06, MCP09. Invoked by `mcp-inspect` and `mcp-audit --live`.
|
||||
`watch-cron.mjs` — standalone cron wrapper. Reads `reports/watch/config.json`, scans all targets, writes `reports/watch/latest.json`. Run: `node scanners/watch-cron.mjs [--config <path>]`
|
||||
`reference-config-generator.mjs` — generates Grade A reference config based on posture gaps. Detects project type (plugin/monorepo/standalone). Templates in `templates/reference-config/`. Run: `node scanners/reference-config-generator.mjs [path] [--apply]`
|
||||
`dashboard-aggregator.mjs` — cross-project security dashboard. Discovers Claude Code projects under ~/ (depth 3) and ~/.claude/plugins/, runs posture-scanner on each, aggregates to machine-grade (weakest link). Cache in `~/.cache/llm-security/dashboard-latest.json` (24h staleness). Run: `node scanners/dashboard-aggregator.mjs [--no-cache] [--max-depth N]`
|
||||
|
||||
`attack-simulator.mjs` — red-team harness. Data-driven: 64 scenarios in 12 categories from `knowledge/attack-scenarios.json`. Payloads constructed at runtime (fragment assembly to avoid triggering hooks on source). Uses `runHook()` from test helper. Adaptive mode (`--adaptive`): 5 mutation rounds per passing scenario (homoglyph, encoding, zero-width, case alternation, synonym). Mutation rules in `knowledge/attack-mutations.json`. Benchmark mode (`--benchmark`): outputs structured pass/fail metrics. Run: `node scanners/attack-simulator.mjs [--category <name>] [--json] [--verbose] [--adaptive] [--benchmark]`
|
||||
`ai-bom-generator.mjs` — AI Bill of Materials generator. Discovers AI components (models, MCP servers, plugins, knowledge, hooks) and outputs CycloneDX 1.6 JSON. Scanner prefix: BOM. Run: `node scanners/ai-bom-generator.mjs <target> [--output-file <path>]`
|
||||
`ide-extension-scanner.mjs` — scans installed VS Code (and forks: Cursor, Windsurf, VSCodium, code-server, Insiders, Remote-SSH) extensions and JetBrains IDE plugins (IntelliJ IDEA, PyCharm, GoLand, WebStorm, RubyMine, PhpStorm, CLion, DataGrip, RustRover, Rider, Aqua, Writerside, Android Studio). Fleet + Toolbox excluded. OS-aware discovery via `lib/ide-extension-discovery.mjs` (`~/.vscode/extensions/` + `~/Library/Application Support/JetBrains/<IDE><version>/plugins/` on macOS, `%APPDATA%\JetBrains\...` on Windows, `~/.config/JetBrains/...` on Linux). Parses VS Code `package.json` via `lib/ide-extension-parser.mjs` and JetBrains `META-INF/plugin.xml` + `META-INF/MANIFEST.MF` (with nested-jar extraction) via `lib/ide-extension-parser-jb.mjs`. 7 VS Code checks: blocklist match, theme-with-code, sideload (vsix), broad activation (`*` / `onStartupFinished`), typosquat (Levenshtein ≤2 vs top-100), extension-pack expansion, dangerous `vscode:uninstall` hooks. 7 JetBrains checks: theme-with-code, broad activation (`application-components`), `Premain-Class` instrumentation (HIGH — javaagent retransform), native binaries (`.so`/`.dylib`/`.dll`/`.jnilib`), long `<depends>` chains, typosquat vs top JetBrains plugins, shaded-jar advisory. Both branches orchestrate reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension with bounded concurrency (default 4). Scanner prefix: IDE. OWASP: LLM01, LLM02, LLM03, LLM06, ASI02, ASI04. Offline by default, `--online` opt-in for Marketplace/OSV.dev lookups. Knowledge: `knowledge/top-vscode-extensions.json`, `knowledge/top-jetbrains-plugins.json`, `knowledge/ide-extension-threat-patterns.md`, `knowledge/marketplace-api-notes.md`, `knowledge/jetbrains-marketplace-api-notes.md`.
|
||||
|
||||
**v6.4.0 — URL support.** Targets can be Marketplace, OpenVSX, or direct `.vsix` URLs. Pipeline: `lib/vsix-fetch.mjs` (HTTPS-only fetch with 50MB cap, 30s timeout, SHA-256, manual redirect host whitelist) → `lib/zip-extract.mjs` (zero-dep ZIP parser, rejects zip-slip/symlink/absolute/drive-letter/encrypted/ZIP64, caps: 10 000 entries, 500MB uncomp, 100x ratio, depth 20) → existing scan pipeline against extracted `extension/` subdir → temp dir always cleaned in `try/finally`. Envelope.meta.source = `{ type: "url", kind, url, finalUrl, sha256, size, publisher?, name?, version? }`.
|
||||
|
||||
**v6.5.0 — OS sandbox.** Fetch + extract for URL targets now spawns `lib/vsix-fetch-worker.mjs` in a sub-process wrapped by `sandbox-exec` (macOS) or `bwrap` (Linux) — same primitives reused from `git-clone.mjs`. Helper: `lib/vsix-sandbox.mjs` exports `buildSandboxProfile`, `buildBwrapArgs`, `buildSandboxedWorker`, `runVsixWorker`. Worker IPC: argv `--url <url> --tmpdir <dir>` → single JSON line on stdout (`{ok, sha256, size, finalUrl, source, extRoot}` or `{ok:false, error, code?}`). Defense-in-depth — if the in-process ZIP parser ever has a bypass, the kernel still refuses writes outside `<tmpdir>`. `scan(target, { useSandbox })` defaults to `true`; tests pass `false` since `globalThis.fetch` mocks do not cross process boundaries. Windows fallback: in-process with `meta.warnings` advisory. Envelope `meta.source.sandbox`: `'sandbox-exec' | 'bwrap' | 'none' | 'in-process'`.
|
||||
|
||||
**v6.6.0 — JetBrains Marketplace URL fetch + JetBrains branch.** URL targets can also be `https://plugins.jetbrains.com/plugin/<numericId>-<slug>` (metadata-resolved → xmlId download) or `https://plugins.jetbrains.com/plugin/download?pluginId=<xmlId>&version=<v>` (direct). `lib/vsix-fetch.mjs` gains `detectUrlType` JetBrains kinds, `fetchJetBrainsPlugin`, host allowlist `plugins.jetbrains.com`. `buildSandboxedWorker(dirs, workerPath)` now accepts a custom worker path — `lib/jetbrains-fetch-worker.mjs` reuses the same IPC contract. Envelope `meta.source.kind` can be `'jetbrains-marketplace' | 'jetbrains-download'`. Installed-plugin scan runs JB-specific checks (see scanner bullet above) and shares the UNI/ENT/NET/TNT/MEM/SCR orchestration. `.kt`, `.groovy`, `.scala` added to `taint-tracer` code extensions.
|
||||
|
||||
Run: `node scanners/ide-extension-scanner.mjs [target|url] [--vscode-only] [--intellij-only] [--include-builtin] [--online] [--format json|compact] [--fail-on <sev>] [--output-file <path>]`. Invoked by `/security ide-scan`.
|
||||
|
||||
## Token Budget (ENFORCED)
|
||||
|
||||
All commands total ~600 lines. All commands use registered subagent types.
|
||||
|
||||
- Commands are short dispatchers (~30-60 lines) — no inline report templates or format specs
|
||||
- All agents use registered `subagent_type` — agent instructions are system prompt, never file reads
|
||||
- Max 1-2 knowledge files per agent invocation (threat-patterns + secrets-patterns)
|
||||
- OWASP files are NEVER passed by commands — agents reference them from their own system prompt
|
||||
- Agents run sequentially to avoid burst rate limits
|
||||
- `pre-install-supply-chain.mjs` queries OSV.dev for CVEs on every package install
|
||||
|
||||
## CLI
|
||||
|
||||
`bin/llm-security.mjs` — standalone CLI entry point. Works without Claude Code via `npx llm-security` or `node bin/llm-security.mjs`.
|
||||
Subcommands: `scan`, `deep-scan`, `posture`, `audit-bom`, `benchmark`. Dispatches to scanner scripts via `child_process.spawn`.
|
||||
`package.json` `bin` field: `"llm-security": "./bin/llm-security.mjs"`. `files` whitelist: only `bin/` + `scanners/` published to npm.
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
Pipeline templates in `ci/`: `github-action.yml`, `azure-pipelines.yml`, `gitlab-ci.yml`. Documentation: `docs/ci-cd-guide.md`.
|
||||
All templates use `--fail-on high --format sarif --output-file results.sarif` with SARIF upload per platform.
|
||||
Standalone CLI makes zero network calls in default mode. Schrems II compatible in default offline mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`) transmits package identifiers to a Google-operated API and is a separate compliance consideration.
|
||||
|
||||
## Knowledge Files (20)
|
||||
|
||||
| File | Content |
|
||||
|------|---------|
|
||||
| `skill-threat-patterns.md` | 7 threat categories for skill/command scanning |
|
||||
| `mcp-threat-patterns.md` | 9 MCP threat categories (MCP01-MCP10) |
|
||||
| `secrets-patterns.md` | Regex patterns for 10+ secret types |
|
||||
| `owasp-llm-top10.md` | OWASP LLM Top 10 (2025) with Claude Code mappings |
|
||||
| `owasp-agentic-top10.md` | OWASP Agentic AI Top 10 (ASI01-ASI10) |
|
||||
| `owasp-skills-top10.md` | OWASP Skills Top 10 (AST01-AST10) — skill-specific threats |
|
||||
| `mitigation-matrix.md` | Threat-to-control mappings |
|
||||
| `top-packages.json` | Known package lists for supply chain checks |
|
||||
| `skill-registry.json` | Seed data for skill signature registry |
|
||||
| `prompt-injection-research-2025-2026.md` | 7 research papers (2025-2026) with implications for hook defenses |
|
||||
| `deepmind-agent-traps.md` | DeepMind AI Agent Traps — 6 categories, 43 techniques, coverage matrix |
|
||||
| `attack-scenarios.json` | 64 red-team scenarios across 12 categories for attack simulation |
|
||||
| `attack-mutations.json` | Synonym tables and mutation rules for adaptive red-team testing |
|
||||
| `compliance-mapping.md` | EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS mappings to plugin capabilities |
|
||||
| `norwegian-context.md` | Norwegian regulatory landscape — Datatilsynet, NSM, Digitaliseringsdirektoratet |
|
||||
| `ide-extension-threat-patterns.md` | 10 IDE-extension detection categories (VS Code + JetBrains) with 2024-2026 case studies |
|
||||
| `top-vscode-extensions.json` | Top ~100 VS Code Marketplace extension IDs (typosquat seed) + blocklist entries |
|
||||
| `top-jetbrains-plugins.json` | Top JetBrains plugin IDs (typosquat seed) + blocklist entries (v6.6.0) |
|
||||
| `marketplace-api-notes.md` | VS Code Marketplace + OpenVSX API endpoints used by `lib/vsix-fetch.mjs` (v6.4.0) |
|
||||
| `jetbrains-marketplace-api-notes.md` | JetBrains Marketplace API endpoints used by `fetchJetBrainsPlugin` (v6.6.0) |
|
||||
|
||||
## Reports
|
||||
|
||||
Scan reports are stored in `reports/` as `.docx` (for sharing) with `.md` source.
|
||||
|
||||
## Examples (runnable demonstrations)
|
||||
|
||||
Self-contained, deterministic threat-fixture mappes under `examples/`.
|
||||
Each mappe har `README.md`, fixture/script/transcript, `run-*.{sh,mjs}`,
|
||||
og `expected-findings.md`. Demonstrasjoner — ikke unit-tester.
|
||||
|
||||
| Mappe | Demonstrerer | Hooks/scanners | Sentinel |
|
||||
|-------|--------------|----------------|----------|
|
||||
| `malicious-skill-demo/` | Skill scanner end-to-end (UNI/ENT/PRM/DEP/TNT/NET + 7 LLM-kategorier) | `scan-orchestrator` + agents | BLOCK 100/100 |
|
||||
| `prompt-injection-showcase/` | 61 payloads × 19 kategorier mot `pre-prompt-inject-scan`, `post-mcp-verify`, `pre-bash-destructive` | runtime hooks | per-kategori expected outcome |
|
||||
| `lethal-trifecta-walkthrough/` | Rule-of-Two advisory på leg 3 (WebFetch → Read .env → Bash curl POST) + suppression | `post-session-guard` | advisory på stage 3 |
|
||||
| `mcp-rug-pull/` | Cumulative drift-advisory (E14, v7.3.0) — 7 stadier under per-update-terskel, kumulativt over 25% baseline | `post-mcp-verify` + `mcp-description-cache.mjs` | advisory på stage 7 |
|
||||
| `supply-chain-attack/` | PreToolUse-blokk på kompromittert pakke + scope-hop advisory + dep-auditor typosquats + postinstall curl-pipe | `pre-install-supply-chain` + `dep-auditor` + `supply-chain-data` | 6+ funn, 2 advisories, 1 BLOCK |
|
||||
| `poisoned-claude-md/` | 6 detektorer (injection / shell / URL / credential paths / permission expansion / encoded payloads) inkl. E15 agent-fil-overflate | `memory-poisoning-scanner` | ≥18 funn fordelt på 2 filer |
|
||||
| `bash-evasion-gallery/` | T1-T9 disguised destructive commands → normalisert + blokkert (defense-in-depth over Claude Code 2.1.98+) | `pre-bash-destructive` + `bash-normalize` | 10 BLOCK eksitkoder |
|
||||
| `toxic-agent-demo/` | Single-component lethal trifecta — agent med [Bash, Read, WebFetch] uten hook-guards = CRITICAL TFA-finding | `toxic-flow-analyzer` (TFA) | 1 CRITICAL `Lethal trifecta:` |
|
||||
| `pre-compact-poisoning/` | PreCompact-hook fanger injection + AWS-shaped credential i syntetisk transcript på tvers av off/warn/block-modus | `pre-compact-scan` | 9 pass: block exit 2 + reason; warn systemMessage; off skip; benign passes |
|
||||
|
||||
State-isolering: alle eksempler som muterer global state bruker run-script
|
||||
PID (post-session-guard via `${ppid}.jsonl`) eller env-overrides
|
||||
(`LLM_SECURITY_MCP_CACHE_FILE` for MCP-cache). Brukerens reelle
|
||||
`/tmp/llm-security-session-*.jsonl` og `~/.cache/llm-security/` røres aldri.
|
||||
|
||||
## Distribution
|
||||
|
||||
This plugin lives in the `ktg-plugin-marketplace` monorepo at
|
||||
`https://git.fromaitochitta.com/open/ktg-plugin-marketplace` under
|
||||
`plugins/llm-security/`. It is not published as a standalone repo —
|
||||
users install it via the Claude Code marketplace mechanism:
|
||||
This plugin lives in the `ktg-plugin-marketplace` monorepo at `https://git.fromaitochitta.com/open/ktg-plugin-marketplace` under `plugins/llm-security/`. It is not published as a standalone repo — users install it via the Claude Code marketplace mechanism:
|
||||
|
||||
```bash
|
||||
claude plugin marketplace add https://git.fromaitochitta.com/open/ktg-plugin-marketplace.git
|
||||
```
|
||||
|
||||
Issues, bug reports, and security disclosures all route to the
|
||||
marketplace repo.
|
||||
Issues, bug reports, and security disclosures all route to the marketplace repo.
|
||||
|
||||
## State
|
||||
|
||||
No persistent state except `post-session-guard.mjs` which maintains a per-session JSONL file in `/tmp/llm-security-session-${ppid}.jsonl` (auto-cleaned after 24h), `post-mcp-verify.mjs` which tracks per-MCP-tool volume in `/tmp/llm-security-mcp-volume-${ppid}.json`, `mcp-description-cache.mjs` which caches MCP tool descriptions in `~/.cache/llm-security/mcp-descriptions.json` (7-day TTL), `update-check.mjs` which caches version info in `~/.cache/llm-security/update-check.json` (24h TTL), `dashboard-aggregator.mjs` which caches dashboard results in `~/.cache/llm-security/dashboard-latest.json` (24h staleness), `reports/baselines/*.json` for scan diff baselines, `reports/watch/latest.json` for cron scan results (overwritten on each run), and `reports/skill-registry.json` for the skill signature registry (grows as skills are scanned). All scan outputs fresh per invocation.
|
||||
|
||||
## Defense Philosophy (v5.0)
|
||||
|
||||
Prompt injection is **structurally unsolvable** with current architectures (joint paper, 14 researchers, 95-100% ASR against all 12 tested defenses). v5.0 does not claim to "prevent" injection. Instead, it implements **defense-in-depth**:
|
||||
|
||||
- **Broader detection** — MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language), Unicode Tag steganography, bash expansion evasion
|
||||
- **Increased attack cost** — Rule of Two detection (configurable block/warn/off for lethal trifecta; default `warn`, blocks on high-confidence trifectas in opt-in `block` mode; distributed trifectas across MCP servers are detected but not blocked by default), bash normalization before gate matching
|
||||
- **Longer monitoring windows** — 100-call long-horizon alongside 20-call sliding window, slow-burn trifecta detection, behavioral drift via Jensen-Shannon divergence
|
||||
- **Architectural constraints** — opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag; not semantic lineage; trivially bypassed by mutation or summarisation of tool output), sub-agent delegation tracking, HITL trap detection. Inspired by CaMeL (DeepMind, 2025), but this is a lightweight byte-fingerprint, not semantic capability tracking
|
||||
- **Honest documentation** — Known Limitations section acknowledges what deterministic hooks cannot detect
|
||||
|
||||
**Bash evasion layers (T1-T6):** `bash-normalize.mjs` collapses six known obfuscation techniques before gate matching as a defense-in-depth layer. T1 empty quotes (`rm''-rf`), T2 `${}` parameter expansion, T3 backslash continuation, T4 tab/whitespace splitting, T5 `${IFS}` word-splitting, T6 ANSI-C hex quoting (`$'\x72\x6d'`). These layers complement — not replace — Claude Code 2.1.98+ harness-level protections. Full reference: `docs/security-hardening-guide.md`.
|
||||
|
||||
**Opus 4.7 system card alignment:**
|
||||
|
||||
- System card §5.2.1 (agentic safety evaluations) documents that multi-layer defenses outperform single-layer defenses against adaptive attacks. This plugin's posture (prompt-scan + pathguard + trifecta-guard + pre-compact-scan operating in depth) matches that guidance.
|
||||
- System card §6.3.1.1 (instruction following and hierarchy) documents that Opus 4.7 interprets agent instructions more literally. Stacked imperatives (e.g., "MUST NOT do X") are therefore less useful than tool-level enforcement via `tools:` frontmatter. Agent files in this plugin have been updated accordingly.
|
||||
- See `docs/security-hardening-guide.md` §5 for the full mapping.
|
||||
|
||||
**What v5.0 cannot do:**
|
||||
- Prevent adaptive attacks from motivated human red-teamers (100% ASR per joint paper)
|
||||
- Fix CLAUDE.md loading before hooks (platform limitation)
|
||||
- Detect novel NL indirection without ML
|
||||
- Prevent long-horizon attacks without detectable patterns
|
||||
- Provide formal worst-case guarantees
|
||||
Per-session JSONL in `/tmp/llm-security-session-${ppid}.jsonl` (auto-cleaned 24h). MCP description cache in `~/.cache/llm-security/mcp-descriptions.json` (7-day TTL). Update-check + dashboard caches in `~/.cache/llm-security/` (24h). Scan baselines under `reports/baselines/*.json`. Watch results in `reports/watch/latest.json`. Skill registry in `reports/skill-registry.json` (grows). All scan outputs fresh per invocation.
|
||||
|
||||
## Security Boundaries
|
||||
|
||||
|
|
@ -392,20 +94,3 @@ Prompt injection is **structurally unsolvable** with current architectures (join
|
|||
- Agents operate read-only unless the specific command explicitly grants Write/Edit (`clean` and `harden` do)
|
||||
- Irreversible operations (baseline overwrites, file edits) require user confirmation via AskUserQuestion
|
||||
- Do not access paths outside the project root without explicit user instruction
|
||||
|
||||
## Communication patterns
|
||||
|
||||
### Linking to local files
|
||||
|
||||
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
|
||||
|
||||
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
|
||||
- Always use absolute paths. Never `~/` or relative paths.
|
||||
- For multiple files, render as a bullet list of named markdown links.
|
||||
|
||||
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
|
||||
|
||||
Example:
|
||||
|
||||
- [Brief](file:///Users/ktg/.../brief.html)
|
||||
- [Research summary](file:///Users/ktg/.../research/summary.md)
|
||||
|
|
|
|||
27
plugins/llm-security/docs/defense-philosophy.md
Normal file
27
plugins/llm-security/docs/defense-philosophy.md
Normal file
|
|
@ -0,0 +1,27 @@
|
|||
# LLM Security — Defense philosophy (v5.0)
|
||||
|
||||
Imported from `CLAUDE.md` via `@docs/defense-philosophy.md`.
|
||||
|
||||
Prompt injection is **structurally unsolvable** with current architectures (joint paper, 14 researchers, 95-100% ASR against all 12 tested defenses). v5.0 does not claim to "prevent" injection. Instead, it implements **defense-in-depth**:
|
||||
|
||||
- **Broader detection** — MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language), Unicode Tag steganography, bash expansion evasion
|
||||
- **Increased attack cost** — Rule of Two detection (configurable block/warn/off for lethal trifecta; default `warn`, blocks on high-confidence trifectas in opt-in `block` mode; distributed trifectas across MCP servers are detected but not blocked by default), bash normalization before gate matching
|
||||
- **Longer monitoring windows** — 100-call long-horizon alongside 20-call sliding window, slow-burn trifecta detection, behavioral drift via Jensen-Shannon divergence
|
||||
- **Architectural constraints** — opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag; not semantic lineage; trivially bypassed by mutation or summarisation of tool output), sub-agent delegation tracking, HITL trap detection. Inspired by CaMeL (DeepMind, 2025), but this is a lightweight byte-fingerprint, not semantic capability tracking
|
||||
- **Honest documentation** — Known Limitations section acknowledges what deterministic hooks cannot detect
|
||||
|
||||
**Bash evasion layers (T1-T6):** `bash-normalize.mjs` collapses six known obfuscation techniques before gate matching as a defense-in-depth layer. T1 empty quotes (`rm''-rf`), T2 `${}` parameter expansion, T3 backslash continuation, T4 tab/whitespace splitting, T5 `${IFS}` word-splitting, T6 ANSI-C hex quoting (`$'\x72\x6d'`). These layers complement — not replace — Claude Code 2.1.98+ harness-level protections. Full reference: `docs/security-hardening-guide.md`.
|
||||
|
||||
**Opus 4.7 system card alignment:**
|
||||
|
||||
- System card §5.2.1 (agentic safety evaluations) documents that multi-layer defenses outperform single-layer defenses against adaptive attacks. This plugin's posture (prompt-scan + pathguard + trifecta-guard + pre-compact-scan operating in depth) matches that guidance.
|
||||
- System card §6.3.1.1 (instruction following and hierarchy) documents that Opus 4.7 interprets agent instructions more literally. Stacked imperatives (e.g., "MUST NOT do X") are therefore less useful than tool-level enforcement via `tools:` frontmatter. Agent files in this plugin have been updated accordingly.
|
||||
- See `docs/security-hardening-guide.md` §5 for the full mapping.
|
||||
|
||||
**What v5.0 cannot do:**
|
||||
|
||||
- Prevent adaptive attacks from motivated human red-teamers (100% ASR per joint paper)
|
||||
- Fix CLAUDE.md loading before hooks (platform limitation)
|
||||
- Detect novel NL indirection without ML
|
||||
- Prevent long-horizon attacks without detectable patterns
|
||||
- Provide formal worst-case guarantees
|
||||
122
plugins/llm-security/docs/scanner-reference.md
Normal file
122
plugins/llm-security/docs/scanner-reference.md
Normal file
|
|
@ -0,0 +1,122 @@
|
|||
# LLM Security — Scanner reference
|
||||
|
||||
Detailed scanner, CLI, CI/CD, knowledge-file and example documentation. Imported from `CLAUDE.md` via `@docs/scanner-reference.md`.
|
||||
|
||||
## Scanners
|
||||
|
||||
**Orchestrated (10):** Run via `node scanners/scan-orchestrator.mjs <target> [--fail-on <severity>] [--compact] [--output-file <path>] [--baseline] [--save-baseline]`.
|
||||
`--fail-on <critical|high|medium|low>`: exit 1 if findings at/above severity, exit 0 otherwise. `--compact`: one-liner per finding format. Both configurable via `policy.json` `ci` section.
|
||||
With `--output-file`: full JSON to file, compact aggregate to stdout. `--baseline` diffs against stored baseline. `--save-baseline` saves results for future diffs. Baselines stored in `reports/baselines/<target-hash>.json`.
|
||||
|
||||
10 scanners: unicode, entropy, permission, dep-audit, taint, git-forensics, network, memory-poisoning, supply-chain-recheck, toxic-flow.
|
||||
|
||||
Lib: `mcp-description-cache.mjs` — caches MCP tool descriptions in `~/.cache/llm-security/mcp-descriptions.json`, detects per-update drift via Levenshtein (>10% = alert), 7-day TTL. v7.3.0 (E14) adds a sticky baseline slot per tool plus a 10-event rolling history; cumulative drift = `levenshtein(current, baseline) / max(|current|,|baseline|)`. When ratio ≥ `mcp.cumulative_drift_threshold` (default 0.25), emits `mcp-cumulative-drift` advisory through `post-mcp-verify.mjs`. Baseline survives TTL purge so slow-burn drift is preserved across the 7-day window. `clearBaseline(tool?)` exposed for the `/security mcp-baseline-reset` command. `LLM_SECURITY_MCP_CACHE_FILE` env var overrides the cache path for testing.
|
||||
|
||||
Supply-chain-recheck (SCR) re-audits installed dependencies from lockfiles (package-lock.json, yarn.lock, requirements.txt, Pipfile.lock) against blocklists, OSV.dev batch API, and typosquat detection. Offline fallback available. Shared data module: `scanners/lib/supply-chain-data.mjs`.
|
||||
|
||||
Memory-poisoning (MEM) detects cognitive state poisoning in CLAUDE.md, memory files, and .claude/rules — injection patterns, shell commands, credential paths, permission expansion, suspicious URLs, encoded payloads.
|
||||
|
||||
Toxic-flow (TFA) is a post-processing correlator that runs LAST — detects "lethal trifecta" (untrusted input + sensitive data access + exfiltration sink) by correlating output from prior scanners.
|
||||
|
||||
Utility: `node scanners/lib/fs-utils.mjs <backup|restore|cleanup|tmppath> [args]`.
|
||||
|
||||
Lib: `sarif-formatter.mjs` — converts scan output to OASIS SARIF 2.1.0 format. Used by `--format sarif` flag.
|
||||
Lib: `audit-trail.mjs` — writes structured JSONL audit events (ISO 8601, OWASP tags, SIEM-ready). Env: `LLM_SECURITY_AUDIT_*`.
|
||||
Lib: `policy-loader.mjs` — reads `.llm-security/policy.json` for distributable hook configuration. Includes `ci` section (`failOn`, `compact`) for CI/CD defaults. Defaults match hardcoded values.
|
||||
|
||||
**Standalone (8):** `posture-scanner.mjs` — deterministic posture assessment, 16 categories (incl. EU AI Act, NIST AI RMF, ISO 42001), <50ms. NOT in scan-orchestrator (meta-level, not code-level).
|
||||
Run: `node scanners/posture-scanner.mjs [path]` → JSON stdout. Scanner prefix: PST. Used by `/security posture` and `/security audit`.
|
||||
|
||||
`mcp-live-inspect.mjs` — NOT in scan-orchestrator. MCP servers are running processes, not files.
|
||||
Run: `node scanners/mcp-live-inspect.mjs [target] [--timeout 10000] [--skip-global]`
|
||||
Scanner prefix: MCI. OWASP: MCP03, MCP06, MCP09. Invoked by `mcp-inspect` and `mcp-audit --live`.
|
||||
|
||||
`watch-cron.mjs` — standalone cron wrapper. Reads `reports/watch/config.json`, scans all targets, writes `reports/watch/latest.json`. Run: `node scanners/watch-cron.mjs [--config <path>]`
|
||||
|
||||
`reference-config-generator.mjs` — generates Grade A reference config based on posture gaps. Detects project type (plugin/monorepo/standalone). Templates in `templates/reference-config/`. Run: `node scanners/reference-config-generator.mjs [path] [--apply]`
|
||||
|
||||
`dashboard-aggregator.mjs` — cross-project security dashboard. Discovers Claude Code projects under ~/ (depth 3) and ~/.claude/plugins/, runs posture-scanner on each, aggregates to machine-grade (weakest link). Cache in `~/.cache/llm-security/dashboard-latest.json` (24h staleness). Run: `node scanners/dashboard-aggregator.mjs [--no-cache] [--max-depth N]`
|
||||
|
||||
`attack-simulator.mjs` — red-team harness. Data-driven: 64 scenarios in 12 categories from `knowledge/attack-scenarios.json`. Payloads constructed at runtime (fragment assembly to avoid triggering hooks on source). Uses `runHook()` from test helper. Adaptive mode (`--adaptive`): 5 mutation rounds per passing scenario (homoglyph, encoding, zero-width, case alternation, synonym). Mutation rules in `knowledge/attack-mutations.json`. Benchmark mode (`--benchmark`): outputs structured pass/fail metrics. Run: `node scanners/attack-simulator.mjs [--category <name>] [--json] [--verbose] [--adaptive] [--benchmark]`
|
||||
|
||||
`ai-bom-generator.mjs` — AI Bill of Materials generator. Discovers AI components (models, MCP servers, plugins, knowledge, hooks) and outputs CycloneDX 1.6 JSON. Scanner prefix: BOM. Run: `node scanners/ai-bom-generator.mjs <target> [--output-file <path>]`
|
||||
|
||||
`ide-extension-scanner.mjs` — scans installed VS Code (and forks: Cursor, Windsurf, VSCodium, code-server, Insiders, Remote-SSH) extensions and JetBrains IDE plugins (IntelliJ IDEA, PyCharm, GoLand, WebStorm, RubyMine, PhpStorm, CLion, DataGrip, RustRover, Rider, Aqua, Writerside, Android Studio). Fleet + Toolbox excluded. OS-aware discovery via `lib/ide-extension-discovery.mjs` (`~/.vscode/extensions/` + `~/Library/Application Support/JetBrains/<IDE><version>/plugins/` on macOS, `%APPDATA%\JetBrains\...` on Windows, `~/.config/JetBrains/...` on Linux). Parses VS Code `package.json` via `lib/ide-extension-parser.mjs` and JetBrains `META-INF/plugin.xml` + `META-INF/MANIFEST.MF` (with nested-jar extraction) via `lib/ide-extension-parser-jb.mjs`. 7 VS Code checks: blocklist match, theme-with-code, sideload (vsix), broad activation (`*` / `onStartupFinished`), typosquat (Levenshtein ≤2 vs top-100), extension-pack expansion, dangerous `vscode:uninstall` hooks. 7 JetBrains checks: theme-with-code, broad activation (`application-components`), `Premain-Class` instrumentation (HIGH — javaagent retransform), native binaries (`.so`/`.dylib`/`.dll`/`.jnilib`), long `<depends>` chains, typosquat vs top JetBrains plugins, shaded-jar advisory. Both branches orchestrate reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension with bounded concurrency (default 4). Scanner prefix: IDE. OWASP: LLM01, LLM02, LLM03, LLM06, ASI02, ASI04. Offline by default, `--online` opt-in for Marketplace/OSV.dev lookups. Knowledge: `knowledge/top-vscode-extensions.json`, `knowledge/top-jetbrains-plugins.json`, `knowledge/ide-extension-threat-patterns.md`, `knowledge/marketplace-api-notes.md`, `knowledge/jetbrains-marketplace-api-notes.md`.
|
||||
|
||||
**v6.4.0 — URL support.** Targets can be Marketplace, OpenVSX, or direct `.vsix` URLs. Pipeline: `lib/vsix-fetch.mjs` (HTTPS-only fetch with 50MB cap, 30s timeout, SHA-256, manual redirect host whitelist) → `lib/zip-extract.mjs` (zero-dep ZIP parser, rejects zip-slip/symlink/absolute/drive-letter/encrypted/ZIP64, caps: 10 000 entries, 500MB uncomp, 100x ratio, depth 20) → existing scan pipeline against extracted `extension/` subdir → temp dir always cleaned in `try/finally`. Envelope.meta.source = `{ type: "url", kind, url, finalUrl, sha256, size, publisher?, name?, version? }`.
|
||||
|
||||
**v6.5.0 — OS sandbox.** Fetch + extract for URL targets now spawns `lib/vsix-fetch-worker.mjs` in a sub-process wrapped by `sandbox-exec` (macOS) or `bwrap` (Linux) — same primitives reused from `git-clone.mjs`. Helper: `lib/vsix-sandbox.mjs` exports `buildSandboxProfile`, `buildBwrapArgs`, `buildSandboxedWorker`, `runVsixWorker`. Worker IPC: argv `--url <url> --tmpdir <dir>` → single JSON line on stdout (`{ok, sha256, size, finalUrl, source, extRoot}` or `{ok:false, error, code?}`). Defense-in-depth — if the in-process ZIP parser ever has a bypass, the kernel still refuses writes outside `<tmpdir>`. `scan(target, { useSandbox })` defaults to `true`; tests pass `false` since `globalThis.fetch` mocks do not cross process boundaries. Windows fallback: in-process with `meta.warnings` advisory. Envelope `meta.source.sandbox`: `'sandbox-exec' | 'bwrap' | 'none' | 'in-process'`.
|
||||
|
||||
**v6.6.0 — JetBrains Marketplace URL fetch + JetBrains branch.** URL targets can also be `https://plugins.jetbrains.com/plugin/<numericId>-<slug>` (metadata-resolved → xmlId download) or `https://plugins.jetbrains.com/plugin/download?pluginId=<xmlId>&version=<v>` (direct). `lib/vsix-fetch.mjs` gains `detectUrlType` JetBrains kinds, `fetchJetBrainsPlugin`, host allowlist `plugins.jetbrains.com`. `buildSandboxedWorker(dirs, workerPath)` now accepts a custom worker path — `lib/jetbrains-fetch-worker.mjs` reuses the same IPC contract. Envelope `meta.source.kind` can be `'jetbrains-marketplace' | 'jetbrains-download'`. Installed-plugin scan runs JB-specific checks (see scanner bullet above) and shares the UNI/ENT/NET/TNT/MEM/SCR orchestration. `.kt`, `.groovy`, `.scala` added to `taint-tracer` code extensions.
|
||||
|
||||
Run: `node scanners/ide-extension-scanner.mjs [target|url] [--vscode-only] [--intellij-only] [--include-builtin] [--online] [--format json|compact] [--fail-on <sev>] [--output-file <path>]`. Invoked by `/security ide-scan`.
|
||||
|
||||
## Token Budget (ENFORCED)
|
||||
|
||||
All commands total ~600 lines. All commands use registered subagent types.
|
||||
|
||||
- Commands are short dispatchers (~30-60 lines) — no inline report templates or format specs
|
||||
- All agents use registered `subagent_type` — agent instructions are system prompt, never file reads
|
||||
- Max 1-2 knowledge files per agent invocation (threat-patterns + secrets-patterns)
|
||||
- OWASP files are NEVER passed by commands — agents reference them from their own system prompt
|
||||
- Agents run sequentially to avoid burst rate limits
|
||||
- `pre-install-supply-chain.mjs` queries OSV.dev for CVEs on every package install
|
||||
|
||||
## CLI
|
||||
|
||||
`bin/llm-security.mjs` — standalone CLI entry point. Works without Claude Code via `npx llm-security` or `node bin/llm-security.mjs`.
|
||||
Subcommands: `scan`, `deep-scan`, `posture`, `audit-bom`, `benchmark`. Dispatches to scanner scripts via `child_process.spawn`.
|
||||
`package.json` `bin` field: `"llm-security": "./bin/llm-security.mjs"`. `files` whitelist: only `bin/` + `scanners/` published to npm.
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
Pipeline templates in `ci/`: `github-action.yml`, `azure-pipelines.yml`, `gitlab-ci.yml`. Documentation: `docs/ci-cd-guide.md`.
|
||||
All templates use `--fail-on high --format sarif --output-file results.sarif` with SARIF upload per platform.
|
||||
Standalone CLI makes zero network calls in default mode. Schrems II compatible in default offline mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`) transmits package identifiers to a Google-operated API and is a separate compliance consideration.
|
||||
|
||||
## Knowledge Files (20)
|
||||
|
||||
| File | Content |
|
||||
|------|---------|
|
||||
| `skill-threat-patterns.md` | 7 threat categories for skill/command scanning |
|
||||
| `mcp-threat-patterns.md` | 9 MCP threat categories (MCP01-MCP10) |
|
||||
| `secrets-patterns.md` | Regex patterns for 10+ secret types |
|
||||
| `owasp-llm-top10.md` | OWASP LLM Top 10 (2025) with Claude Code mappings |
|
||||
| `owasp-agentic-top10.md` | OWASP Agentic AI Top 10 (ASI01-ASI10) |
|
||||
| `owasp-skills-top10.md` | OWASP Skills Top 10 (AST01-AST10) — skill-specific threats |
|
||||
| `mitigation-matrix.md` | Threat-to-control mappings |
|
||||
| `top-packages.json` | Known package lists for supply chain checks |
|
||||
| `skill-registry.json` | Seed data for skill signature registry |
|
||||
| `prompt-injection-research-2025-2026.md` | 7 research papers (2025-2026) with implications for hook defenses |
|
||||
| `deepmind-agent-traps.md` | DeepMind AI Agent Traps — 6 categories, 43 techniques, coverage matrix |
|
||||
| `attack-scenarios.json` | 64 red-team scenarios across 12 categories for attack simulation |
|
||||
| `attack-mutations.json` | Synonym tables and mutation rules for adaptive red-team testing |
|
||||
| `compliance-mapping.md` | EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS mappings to plugin capabilities |
|
||||
| `norwegian-context.md` | Norwegian regulatory landscape — Datatilsynet, NSM, Digitaliseringsdirektoratet |
|
||||
| `ide-extension-threat-patterns.md` | 10 IDE-extension detection categories (VS Code + JetBrains) with 2024-2026 case studies |
|
||||
| `top-vscode-extensions.json` | Top ~100 VS Code Marketplace extension IDs (typosquat seed) + blocklist entries |
|
||||
| `top-jetbrains-plugins.json` | Top JetBrains plugin IDs (typosquat seed) + blocklist entries (v6.6.0) |
|
||||
| `marketplace-api-notes.md` | VS Code Marketplace + OpenVSX API endpoints used by `lib/vsix-fetch.mjs` (v6.4.0) |
|
||||
| `jetbrains-marketplace-api-notes.md` | JetBrains Marketplace API endpoints used by `fetchJetBrainsPlugin` (v6.6.0) |
|
||||
|
||||
## Reports
|
||||
|
||||
Scan reports are stored in `reports/` as `.docx` (for sharing) with `.md` source.
|
||||
|
||||
## Examples (runnable demonstrations)
|
||||
|
||||
Self-contained, deterministic threat-fixture mappes under `examples/`. Each mappe har `README.md`, fixture/script/transcript, `run-*.{sh,mjs}`, og `expected-findings.md`. Demonstrasjoner — ikke unit-tester.
|
||||
|
||||
| Mappe | Demonstrerer | Hooks/scanners | Sentinel |
|
||||
|-------|--------------|----------------|----------|
|
||||
| `malicious-skill-demo/` | Skill scanner end-to-end (UNI/ENT/PRM/DEP/TNT/NET + 7 LLM-kategorier) | `scan-orchestrator` + agents | BLOCK 100/100 |
|
||||
| `prompt-injection-showcase/` | 61 payloads × 19 kategorier mot `pre-prompt-inject-scan`, `post-mcp-verify`, `pre-bash-destructive` | runtime hooks | per-kategori expected outcome |
|
||||
| `lethal-trifecta-walkthrough/` | Rule-of-Two advisory på leg 3 (WebFetch → Read .env → Bash curl POST) + suppression | `post-session-guard` | advisory på stage 3 |
|
||||
| `mcp-rug-pull/` | Cumulative drift-advisory (E14, v7.3.0) — 7 stadier under per-update-terskel, kumulativt over 25% baseline | `post-mcp-verify` + `mcp-description-cache.mjs` | advisory på stage 7 |
|
||||
| `supply-chain-attack/` | PreToolUse-blokk på kompromittert pakke + scope-hop advisory + dep-auditor typosquats + postinstall curl-pipe | `pre-install-supply-chain` + `dep-auditor` + `supply-chain-data` | 6+ funn, 2 advisories, 1 BLOCK |
|
||||
| `poisoned-claude-md/` | 6 detektorer (injection / shell / URL / credential paths / permission expansion / encoded payloads) inkl. E15 agent-fil-overflate | `memory-poisoning-scanner` | ≥18 funn fordelt på 2 filer |
|
||||
| `bash-evasion-gallery/` | T1-T9 disguised destructive commands → normalisert + blokkert (defense-in-depth over Claude Code 2.1.98+) | `pre-bash-destructive` + `bash-normalize` | 10 BLOCK eksitkoder |
|
||||
| `toxic-agent-demo/` | Single-component lethal trifecta — agent med [Bash, Read, WebFetch] uten hook-guards = CRITICAL TFA-finding | `toxic-flow-analyzer` (TFA) | 1 CRITICAL `Lethal trifecta:` |
|
||||
| `pre-compact-poisoning/` | PreCompact-hook fanger injection + AWS-shaped credential i syntetisk transcript på tvers av off/warn/block-modus | `pre-compact-scan` | 9 pass: block exit 2 + reason; warn systemMessage; off skip; benign passes |
|
||||
|
||||
State-isolering: alle eksempler som muterer global state bruker run-script PID (post-session-guard via `${ppid}.jsonl`) eller env-overrides (`LLM_SECURITY_MCP_CACHE_FILE` for MCP-cache). Brukerens reelle `/tmp/llm-security-session-*.jsonl` og `~/.cache/llm-security/` røres aldri.
|
||||
47
plugins/llm-security/docs/version-history.md
Normal file
47
plugins/llm-security/docs/version-history.md
Normal file
|
|
@ -0,0 +1,47 @@
|
|||
# LLM Security — Version history
|
||||
|
||||
Per-release notes for v7.0.0 onward. Imported from `CLAUDE.md` via `@docs/version-history.md`.
|
||||
|
||||
## v7.0.0 — Severity-dominated risk scoring (v2 model, BREAKING)
|
||||
|
||||
Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise):
|
||||
|
||||
1. **Risk-score v2 formula** (`scanners/lib/severity.mjs`) — severity-dominated, log-scaled within tier. Replaces v1 sum-and-cap that collapsed every non-trivial scan to 100/Extreme. Tiers: critical → 70–95, high only → 40–65, medium only → 15–35, low only → 1–11. Verdict cutoffs realigned to new bands (BLOCK ≥65, WARNING ≥15). `info` findings are observability-only — counted in OWASP aggregates but contribute zero to risk_score, verdict, and riskBand (B3, v7.2.0 — was undocumented pre-7.2.0). See `severity.mjs` JSDoc for full contract.
|
||||
2. **Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy** — extensions skipped (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`); line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source.
|
||||
3. **DEP typosquat allowlist expansion** — 22 npm + 5 PyPI entries for short-name tools that tripped Levenshtein detection on every modern codebase (`knip`, `oxlint`, `tsx`, `nx`, `rimraf`, `uv`, `ruff`, etc.).
|
||||
|
||||
See `docs/security-hardening-guide.md` §6 for the calibration story.
|
||||
|
||||
## v7.1.1 — Scan-rapport narrative coherence (patch)
|
||||
|
||||
Three coordinated edits address the whiplash symptom that survived v7.0.0 (numbers fixed, narrative still walked findings back as "false positive" in prose):
|
||||
(a) `agents/skill-scanner-agent.md` Step 2.5 mandates context-first severity assignment — every signal has exactly one disposition (suppressed OR reported), no per-finding walk-back; (b) `templates/unified-report.md` gains a `### Narrative Audit` block in Executive Summary surfacing `summary.narrative_audit.suppressed_findings.{count, by_category}` from the agent's trailing JSON; (c) both files updated from stale v1 risk-formula constants to the v2 model that has been authoritative in `severity.mjs` since v7.0.0. Counter is distinct from the existing top-level `output.suppressed` (`.llm-security-ignore` rule integer). Out-of-scope but flagged: `commands/scan.md:113-114` retains the v1 formula; resolution deferred to Batch B.
|
||||
|
||||
## v7.3.0 — MCP cumulative-drift baseline (Wave C of Batch C)
|
||||
|
||||
Closes E14 from `docs/critical-review-2026-04-20.md`. The `mcp-description-cache.mjs` schema gains a sticky `baseline` slot per tool plus a 10-event rolling `history` array (FIFO). Cumulative drift = `levenshtein(current, baseline) / max(|current|, |baseline|)`; when the ratio crosses `mcp.cumulative_drift_threshold` (default 0.25), `post-mcp-verify.mjs` emits a separate MEDIUM `mcp-cumulative-drift` advisory. The existing per-update >10% drift signal is unchanged — both fire independently. Slow-burn rug-pulls that keep each update under the per-update threshold but cumulatively diverge from baseline are now caught. Baseline survives the 7-day TTL purge so detection persists across the full window. New `/security mcp-baseline-reset` slash command (plus `scanners/mcp-baseline-reset.mjs` CLI: `--list`, `--target <tool>`, or no-args clear-all) lets the user acknowledge a legitimate MCP server upgrade — clearing the baseline causes the next call to seed a fresh one from the incoming description; description, firstSeen, lastSeen, and history are preserved for audit. `LLM_SECURITY_MCP_CACHE_FILE` env var overrides the cache path for end-to-end testing without polluting the user's real `~/.cache/llm-security/mcp-descriptions.json`.
|
||||
|
||||
## v7.3.0 — Env-var deprecation warnings (D3 of Batch C, Wave D)
|
||||
|
||||
Closes 8.7 from `.claude/projects/2026-04-29-batch-c-scope-finalize/plan.md`. `scanners/lib/policy-loader.mjs` exports a new helper `getPolicyValueWithEnvWarn(section, key, envVarName, defaultValue)` — env still wins per Preferences (existing behaviour), but when both the env-var AND the `policy.json` key are explicitly set, the helper emits a single per-process stderr line: `[llm-security] Deprecation: env-var ${ENVVAR} will be removed in v8.0.0; policy.json key ${section}.${key} also set — env wins for now. Suppress with LLM_SECURITY_DEPRECATION_QUIET=1.` Module-scoped `Set` dedupes per env-var name across call-sites. Four overlapping vars are wired through the helper: `LLM_SECURITY_INJECTION_MODE` ↔ `injection.mode` (in `pre-prompt-inject-scan.mjs`), `LLM_SECURITY_TRIFECTA_MODE` ↔ `trifecta.mode` and `LLM_SECURITY_ESCALATION_WINDOW` ↔ `trifecta.escalation_window` (in `post-session-guard.mjs`), `LLM_SECURITY_AUDIT_LOG` ↔ `audit.log_path` (in `scanners/lib/audit-trail.mjs`). `DEFAULT_POLICY` gains `trifecta.escalation_window: 5` to close the gap noted in the plan revisions table (M10). Env-only vars without policy.json equivalents (`LLM_SECURITY_UPDATE_CHECK`, `LLM_SECURITY_PRECOMPACT_MODE`, `LLM_SECURITY_PRECOMPACT_MAX_BYTES`, `LLM_SECURITY_IDE_ROOTS`, `LLM_SECURITY_MCP_CACHE_FILE`) are unchanged — they emit no deprecation signal because there is nothing to deprecate yet.
|
||||
|
||||
## v7.5.0 — Playground (additive surface, no scanner/hook behavior changes)
|
||||
|
||||
Single-file SPA at `playground/llm-security-playground.html` (~10 200 lines) for onboarding, demo og workshop-bruk uten Claude Code-installasjon. Parser + renderer for alle 18 `produces_report=true`-kommandoer i `CATALOG`. State i IndexedDB primær (`llm-security-playground-v1`) med localStorage-fallback, sirkelfri Proxy + EventTarget store, microtask-batchet render. Theme-bootstrap med FOUC-prevention. 4 overflater: onboarding (5 grupper) → home (3 tracks) → catalog (20 kommandoer) ⇄ project (rapporter / oversikt / kontekst / eksport). Demo-state har tre prosjekter inline; `dft-komplett-demo` har alle 18 rapporter ferdig parsed for klikk-gjennom. Vendor-synket design-system under `playground/vendor/playground-design-system/` (sjekksum-låst via `MANIFEST.json`, redigeres aldri direkte). Test-fixtures under `playground/test-fixtures/` (én markdown-fil per kommando) er kontrakt-anker for parser-utvikling. Skjermdumper i `playground/screenshots/v7.5.0/`. Eksponerte vinduer-globaler for testing/automasjon: `__store`, `__navigate`, `__loadDemoState`, `__scheduleRender`, `__PARSERS`, `__RENDERERS`, `__CATALOG`, `__inferVerdict`, `__inferKeyStats`, `__renderPageShell`, `__handlePasteImport`. Inkluderer fix av `normalizeVerdictText` regex-rekkefølge: GO-WITH-CONDITIONS sjekkes før GO så betinget verdict ikke kollapser til ALLOW.
|
||||
|
||||
## v7.6.0 — Playground Tier 3-referanse-case (additive surface, no scanner/hook behavior changes)
|
||||
|
||||
Playgroundet er nå en visuelt og strukturelt fullført referanse-implementasjon for `shared/playground-design-system/` Tier 3-supplementet. 8 nye Tier 3-komponenter integrert i de 18 rapport-rendererne: `tfa-flow` + `tfa-leg` + `tfa-arrow` (lethal trifecta-kjede med `<button>`-elementer + ARIA-group/aria-label) i `renderScan` + `renderDeepScan`; `mat-ladder` + `mat-step` (5-trinns modenhets-stige med terskler 0/25/50/75/95% PASS) i `renderPosture`; `suppressed-group` (narrative-audit fra `summary.narrative_audit.suppressed_findings`) i `renderScan` + `renderDeepScan`; `codepoint-reveal` + `cp-tag`/`cp-zw`/`cp-bidi` (Unicode-steganografi side-ved-side reveal med U+200B-D|FEFF|2060|180E → `cp-zw`, U+202A-E|2066-9 → `cp-bidi`-detection) i `renderMcpInspect`; `top-risks` + `top-risk[data-severity]` (rangert top-funn-listing, semantisk `<ol>`, ekskluderer info-funn) i `renderScan`/`renderDeepScan`/`renderPluginAudit`/`renderPosture`/`renderAudit`; utvidet `recommendation-card[data-severity]` (severity-tinted advisory) på alle inline-bruk + nye per-bucket advisory-cards i `renderClean` + intro snapshot + diff-rows i `renderHarden` (action-mapping CREATE→positive / APPEND→medium / MERGE→low / SKIP→low); `risk-meter` (band-visualisering 0-100 med Low/Medium/High/Critical/Extreme bands) på 5 archetypes (scan, deep-scan, plugin-audit, audit, red-team); `card--severity-{level}` modifier på `findings__item`-cards. Wave 1 (Sesjon 2) la til `badge--scope-security` (identitets-chip), `verdict-pill-lg` med `__verdict`+`__sub` (erstatter custom verdict-pill på alle 18 rapport-typer), og DS Tier 3 `form-progress` + `fp-step` i onboarding-wizard. Wave 0 (Sesjon 1) slettet ~30 duplikat-CSS-deklarasjoner fra `<style>`-blokken (DS vinner cascade) og harmoniserte page-shell på alle 4 overflater. 5 nye DS-helpers: `renderToxicFlow`, `renderMatLadder`, `renderSuppressedGroup`, `renderCodepointReveal`, `renderTopRisks`. 2 nye normaliserings-helpers: `mapSeverityToCardLevel(input)` (severity + action-types til DS-konvensjoner) og `parseNarrativeAudit(md)`. 12 skjermdumper planlagt i `playground/screenshots/v7.6.0/`. A11Y-rapport oppdatert (`playground/A11Y-RAPPORT.md`) — WCAG 2.1 AA bekreftet, severity-soft fargepar verifisert, semantiske elementer (`<ol>`, `<button>`, `<section>`) erstatter generic `<div>`. Filendring totalt over 5 sesjoner: 10209 → 10677 linjer. Kjent begrensning: `parsed.findings` er tom for `deep-scan`/`audit` demo-fixturer (parser-begrensning, ikke fikset i v7.6.0 — sporet for v7.6.x patch).
|
||||
|
||||
## v7.6.1 — Playground visuell-patch (no scanner/hook behavior changes)
|
||||
|
||||
Seks bugs fanget av maintainer ved manuell verifisering i nettleser etter v7.6.0-release. Alle skyldtes mismatch mellom DS-klasser og hvordan playground-rendrere brukte dem (eller manglende DS-implementasjoner av klasser playground-rendrere antok eksisterte).
|
||||
|
||||
(1) `renderFindingsBlock` brukte `.findings` outer-class som DS har som 2-kolonners grid (`grid-template-columns: 360px 1fr`) for list+detail-panel-layout — playground brukte den uten detail-panel, headeren havnet i venstre 360px-kolonne, items i 1fr. Erstattet med `<section class="report-meta">` + `<h4>` + korrekt `findings__list > findings__group > findings__group-header + findings__items`-mønster.
|
||||
(2) `.report-table` manglet helt i DS men brukes i 7+ rendrere (OWASP-kategorier, Supply chain, Scanner Risk Matrix, Plugin-meta, Permission-matrise, Live-meter, Siste runs, Godkjenninger, Mitigation roadmap) — lagt lokal CSS-implementasjon i playground-HTML `<style>`-blokk (border-collapse, zebra-hover, header-styling).
|
||||
(3) `renderPreDeploy` traffic-lights brukte `.sm-card__grade` som er fast 28×28 px (designet for én A-F-bokstav) — kuttet "PASS" til "AS" og "PASS-WITH-NOTES" til "PASS-WITH-..." i alle traffic-light-cards. Erstattet med bredde-tilpasset status-pill via inline styling (severity-soft + on tokens).
|
||||
(4) Threat-model matrix-bobler ikke klikkbare — `<span>` uten event-handler. Erstattet med `<button type="button" data-threat-id>` + `aria-label`. Click-handler scroller til tilsvarende rad i Trusler-tabellen og fremhever den i 1.6 sek.
|
||||
(5) Radar-labels overlappet ved 6+ akser — alle brukte `text-anchor="middle"` med samme offset. Økt SVG-størrelse fra 280×280 til 380×380, radius fra 105 til 125, bytter `text-anchor` fra `middle` til `start`/`end` basert på horisontal-posisjon (`Math.cos(ang)` > 0.2 / < -0.2 / mellom).
|
||||
(6) `recommendation-card__body` tekstoverflyt på lange single-line tekster (vilkår, owner-tags, dato) — lagt `overflow-wrap: anywhere; word-break: break-word` i lokal `<style>`-blokk.
|
||||
|
||||
4/4 fix-spesifikke smoke-tester passerer + 18/18 renderere produserer fortsatt komplett HTML mot `dft-komplett-demo` (regresjons-test). Filendring 10677 → 10753 linjer (+76 netto).
|
||||
|
|
@ -96,178 +96,6 @@ Agenter leser navngitte kjernefiler, ikke hele kataloger:
|
|||
|
||||
Se `references/architecture/recommended-mcp-servers.md` for detaljer.
|
||||
|
||||
## Utvikling
|
||||
|
||||
### Legge til ny kunnskapsbase
|
||||
1. Opprett `.md`-fil i riktig undermappe under den relevante skillens `references/`-mappe (f.eks. `skills/ms-ai-engineering/references/`)
|
||||
2. Følg format fra eksisterende filer (header, dato, seksjoner, "For Cosmo"-seksjon)
|
||||
3. Oppdater relevant SKILL.md med referanse
|
||||
|
||||
### Legge til ny kommando
|
||||
1. Opprett `commands/navn.md` med frontmatter (`description`, `argument-hint`)
|
||||
2. Følg mønster fra eksisterende kommandoer
|
||||
3. Oppdater `commands/help.md` med ny kommando
|
||||
4. Oppdater denne CLAUDE.md
|
||||
|
||||
### Legge til ny agent
|
||||
1. Opprett `agents/navn-agent.md` med frontmatter (`name`, `description`, `model`, `color`, `tools`)
|
||||
2. Inkluder tydelig "triggers on" i description
|
||||
3. Oppdater denne CLAUDE.md
|
||||
|
||||
### Testing
|
||||
|
||||
#### Statisk validering
|
||||
```bash
|
||||
# Kjør plugin-validering (frontmatter, encoding, KB-referanser)
|
||||
bash tests/validate-plugin.sh
|
||||
```
|
||||
|
||||
#### KB-ferskhet (sitemap-basert, manuell drift)
|
||||
|
||||
**Apply-fasen kjøres via slash-kommandoen** (krever aktiv Claude Code-sesjon, holder oss innenfor Anthropic Consumer Terms § 3):
|
||||
|
||||
```text
|
||||
/architect:kb-update # default: critical + high
|
||||
/architect:kb-update --priorities critical # bare critical
|
||||
/architect:kb-update --skip-discover # hopp over new-URL discovery
|
||||
/architect:kb-update --dry-run # rapport uten apply
|
||||
```
|
||||
|
||||
**Endringsrapport-fasen kan kjøres som rent Node-script (ingen LLM-kostnad):**
|
||||
|
||||
```bash
|
||||
# Poll sitemaps → endringsrapport (ingen filendringer)
|
||||
node scripts/kb-update/run-weekly-update.mjs --force
|
||||
|
||||
# Med discovery av nye relevante sider
|
||||
node scripts/kb-update/run-weekly-update.mjs --force --discover
|
||||
|
||||
# Vis rapport på nytt etter polling
|
||||
node scripts/kb-update/report-changes.mjs
|
||||
|
||||
# Bygg/oppdater URL-registry fra referansefiler
|
||||
node scripts/kb-update/build-registry.mjs [--merge]
|
||||
```
|
||||
|
||||
Systemet sammenligner Microsoft Learn sitemap-`<lastmod>` med filenes `Last updated:` header, og genererer en prioritert endringsrapport (critical/high/medium/low).
|
||||
|
||||
**Match rate:** ~69% av 1342 URLer matche mot sitemaps. ~31% (mest `azure/ai-foundry/openai/`-stier) finnes ikke i sitemaps pga. Microsofts URL-restrukturering.
|
||||
|
||||
**Schedulering:** Pluginen schedulerer ingenting. Bruker som vil ha periodisk varsling kan sette opp egen cron / launchd / systemd / GitHub Actions som kjører `node scripts/kb-update/run-weekly-update.mjs --force --discover` (rapport-fasen, ikke apply). Apply-fasen er bevisst manuell — den krever LLM-resonnering på diff og kjører fra en åpen Claude Code-sesjon.
|
||||
|
||||
Legacy (deprecated):
|
||||
```bash
|
||||
bash scripts/kb-staleness-check.sh # mtime-basert, upålitelig etter git clone
|
||||
```
|
||||
|
||||
#### E2E-regresjonstester
|
||||
```bash
|
||||
# Kjør alle E2E-suiter
|
||||
bash tests/run-e2e.sh
|
||||
|
||||
# Kjør enkeltsuiter
|
||||
bash tests/run-e2e.sh --security
|
||||
bash tests/run-e2e.sh --cost
|
||||
bash tests/run-e2e.sh --summary
|
||||
bash tests/run-e2e.sh --ai-act
|
||||
```
|
||||
|
||||
Fixture-basert validering av agent-output (sikkerhet, kostnad, sammendrag). Tester struktur, encoding, og domene-spesifikke krav uten å invokere Claude.
|
||||
|
||||
#### Manuell test
|
||||
```bash
|
||||
# Test at plugin registreres
|
||||
cd <plugin-root>
|
||||
claude --plugin ./plugins/ms-ai-architect
|
||||
|
||||
# Kjør hovedcommand
|
||||
/architect
|
||||
|
||||
# Vis alle kommandoer
|
||||
/architect:help
|
||||
```
|
||||
|
||||
## Playground (v3 / v1.15.0)
|
||||
|
||||
Interaktiv decision-builder + rapport-viewer for Microsoft AI-beslutninger. Erstatter v2 5-stegs-pipelinen med en multi-surface-app som persisterer state og visualiserer importerte rapporter inline. Spec: v3-arkitektur dokumentert under `.claude/projects/2026-05-03-playground-v3-architecture/`. v1.10.0-utvidelser dokumentert under `.claude/projects/2026-05-03-ms-ai-architect-v1-10-playground/`. v1.11.0 leverer design-system 100%-adoption. v1.13.0/.1 patchet 10+ symptomatiske visuelle bugs. v1.14.0 leverer root-cause refaktor over 6 sesjoner (DS-konvensjon-adopsjon på 14 renderere, lokal CSS halvert).
|
||||
|
||||
**v1.15.0 (sesjon 5 av ~8 i v2-prosjektet):** Project-surface byttet fra v2 `renderProjectSurface` (screen-tabs + category-tabs + per-command paste-cards) til v3 `renderProjectView` (sidebar med 17 artifacts + main-area + import-modal overlay). `renderActive()` ruter `project`-surface til `renderProjectSurfaceV3()` som wrapper renderProjectView + topbar + app-shell. V2-surface helt fjernet: `renderProjectSurface` (152 linjer), `renderCommandSubCard` (87 linjer), `rehydratePasteImports` (15 linjer), `currentProjectScreen`, `ACTIONS['project-screen']`, 5 v2-CSS-klasser. Zombie-handlers beholdt for test-back-compat: `currentProjectTab`, `ACTIONS['project-tab']`, `ACTIONS['parse']`, `handlePasteImport`, `window.__handlePasteImport`. 2 fingerprint-gap lukket: requirements.headers + license.headers. `migrateDataVersion` utvidet med `parserFor` → demo-state (kun `raw_markdown`) auto-parses til `project.artifacts[cid]`. Ship-QA-bugfixes: `components-tier4-project-view.css` lagt til i `<link>`-kjeden (manglet → modal-overlay og two-column layout virket ikke); `renderImportModal` setter `data-open="true"` (DS-kontrakt).
|
||||
|
||||
- **Fil:** `playground/ms-ai-architect-playground.html` (~3870+ linjer, single-file v3-arkitektur)
|
||||
- **4 surfaces:** Onboarding (18 felles felt — 4 strukturerte / 14 fritekst etter v1.10.0) → Home (prosjekt-liste + 3 entry-tracks) → Catalog (25 commands gruppert i 5 expansion-grupper med søk) → **Project v3** (sidebar med 17 artifacts gruppert i 4 kategorier + søk + main-area med per-artifact view eller overview med top-risks/next-actions + import-modal som DS-overlay)
|
||||
- **Persistens:** IndexedDB-primær med localStorage-fallback. Schema-versjonert (`STATE_KEY = 'ms-ai-architect-state-v1'`) med eager `MIGRATIONS`-pipeline. v1.10.0 introduserer `dataVersion v1→v2`-migrasjon (idempotent) som backfill-er `verdict`+`keyStats`.
|
||||
- **17 rapport-renderers (felles grunnskjelett):** Alle wrapper output via `renderPageShell()` med eyebrow + h1 + valgfri verdict-pill + valgfri key-stats-grid + arketype-spesifikk body. Parser → struktur → HTML rutet via kanonisk archetype-routing-tabell.
|
||||
- **Foundation-helpers:** `renderPageShell`, `renderVerdictPill`, `renderKeyStatsGrid`, `inferVerdict`, `inferKeyStats`, `KEY_STATS_CONFIG`.
|
||||
- **Tier 3-adopsjon:** kanban (conformity, review), mat-ladder (migrate, poc), screen-tabs (utredning, project surface), scenario-card-grid (license, compare), residual-pair (dpia, ros), top-risks (ros), recommendation-card (security, ros), suppressed-panel (review), critique-card (adr), read-more (utredning, summary), traffic-light (poc).
|
||||
- **Theme:** Mørk default + lys theme-toggle med Aksel-tokens i begge moduser (lagt til i v1.10.0). Persistert i `localStorage('ms-ai-architect-theme')`. Theme-bootstrap-script i `<head>` unngår FOUC.
|
||||
- **Eksport/import:** JSON Decision Record-envelope (Blob + FileReader), schema-versjon-bevisst på import.
|
||||
|
||||
### Validering (v1.15.0-tall)
|
||||
|
||||
| Test | Kommando | Dekning |
|
||||
|------|----------|---------|
|
||||
| Statisk struktur | `bash tests/test-playground-v3.sh` | 219 PASS, 2 WARN (pre-eks.) — vendored CSS, surfaces, 25 commands, 14 parsere, 17 renderers via PROJECT_VIEW_CONFIG.renderers-routing, action-handlers |
|
||||
| Parser-fixtures | `bash tests/test-playground-parsers.sh` | 70 PASS — 17 fixtures × parser-routing |
|
||||
| Migrasjon | `bash tests/test-playground-migrations.sh` | 16 PASS — v1→v2 + v2→v3 idempotent migrasjon |
|
||||
| Fingerprints | `bash tests/test-playground-fingerprints.sh` | 32 PASS — 17-fixture true-positive + 4 anti-match + API-sanity |
|
||||
| Project-view | `bash tests/test-playground-projectview.sh` | 30 PASS — 4 view-states + nav-søk + null-guard |
|
||||
| ACTIONS | `bash tests/test-playground-actions.sh` | 19 PASS — 6 pure-state-handlers + projectViewUiState |
|
||||
| Kombinert (E2E) | `bash tests/run-e2e.sh --playground` | 386 PASS, 0 FAIL, 2 WARN |
|
||||
| Plugin-validering | `bash tests/validate-plugin.sh` | 219 PASS |
|
||||
| Manuell A11Y QA | Se `playground/MANUAL-CHECKLIST.md` | 10 seksjoner inkl. axe-core-kjøring per surface |
|
||||
| A11Y-rapport | `playground/A11Y-RAPPORT.md` | Statisk vurdering klar — browser-axe-kjøring pending |
|
||||
|
||||
### Demo system (v1.11.0 → v1.15.0)
|
||||
|
||||
`scripts/build-demo-state.mjs` leser alle 17 fixture-filer fra `playground/test-fixtures/` og injiserer dem som en `<script type="application/json" id="demo-state-v1">`-blokk i playground HTML (idempotent — erstatter eksisterende blokk). "Last inn demo-data"-knappen på onboarding-overflaten kaller `ACTIONS['load-demo']` som leser blokken, erstatter alle state-grener via Proxy-mutasjon, kjører `migrateDataVersion` (v2→v3 auto-parser raw_markdown til artifacts), og navigerer til project-surface. Demo viser 17 artifacts gruppert i sidebar med severity-badges, aggregate verdict (BLOKKERT), top-risks-liste, og fungerende re-importer/slett-knapper per artifact.
|
||||
|
||||
`tests/screenshot/` inneholder en frittstående Playwright-runner med egen `package.json` (gitignored `node_modules`). `node run.mjs` produserer 24 PNG-er (12 surfaces × 2 tema) under `playground/screenshots/v1.15.0/`. v1.15.0-surfaces: onboarding-empty, project-overview, project-artifact-{classify,security,ros,cost,summary}, project-import-modal (viewport-only — modal er position:fixed overlay), project-search, home, catalog, onboarding-prefilled. v1.10.0/v1.11.0/v1.14.0 beholdt som historisk referanse. Disse committes så forkere ser pluginen uten å installere noe. Demo-org er "Acme Kommune" og demo-prosjekt er "Acme: Kunde-chatbot".
|
||||
|
||||
### Design-system 100%-adoption (v1.11.0 → v1.14.0)
|
||||
|
||||
Sesjon 3-5 la til inline CSS i `playground/ms-ai-architect-playground.html`. v1.11.0 hoisted alle generiske komponenter til `shared/playground-design-system/components-tier3-supplement.css` (DS v0.3.0):
|
||||
- `.pyramide-desc` / `.pyramide-desc__item`
|
||||
- `.scenario-card-grid` / `.scenario-card`
|
||||
- `.residual-pair` / `__cell` / `__cell-label/__cell-value/__cell-meta` / `__arrow`
|
||||
- `.read-more` / `.read-more__trigger` / `.read-more__chev` / `.read-more__body`
|
||||
- `.top-risks` / `.top-risk[data-severity]`
|
||||
- `.recommendation-card`
|
||||
- `.suppressed-panel`
|
||||
- `.screen-tabs` / `.screen-tab` / `.screen[data-active]`
|
||||
|
||||
v1.14.0 (DS v0.4.0): root-cause fix for tre DS-bugs som tidligere ble symptomatisk patchet i lokal CSS — `.kanban-card__name` (break-word + overflow-wrap; var break-all), `.expansion__title-main/sub` (display: block), `.matrix__bubble` (cursor + hover/focus). Fix-en re-syncet til vendored DS, og tilsvarende lokal-overrides slettet. Plus: 14 renderere refaktorert til DS-konvensjon (3 risk-renderere → DS-summary-grid + ros-layout, 6 compliance/govern-renderere → DS-konvensjon, renderMigrate + renderPoc → expansion-list per fase). Lokal `<style>`-blokk: 191 → 122 effektive linjer (~36% reduksjon siden v1.13.1).
|
||||
|
||||
Alle PARALLEL-CSS-navngrupper migrert til DS-konvensjon. `renderPageShell` + `renderKeyStatsGrid` refaktorert til DS markup. Severity-coded card-borders på rapport-cards, app-header-restruktur, `.stack-lg` body spacing på home/project/catalog, AI Act-pyramide bredde-fix, eyebrow-label på home-projects.
|
||||
|
||||
Ved videre hoisting: re-sync via `node scripts/sync-design-system.mjs ms-ai-architect`. Dette er endringer i delt asset — krever drift-deteksjon-handling per `MANIFEST.json`.
|
||||
|
||||
### Vendored design-system
|
||||
|
||||
Playground laster CSS fra `playground/vendor/playground-design-system/` — en vendored
|
||||
kopi av marketplace-rotens `shared/playground-design-system/`. Dette holder pluginen
|
||||
**standalone**: HTML-filen kan åpnes fra `file://` uavhengig av marketplace-roten.
|
||||
|
||||
- **Sync-skript:** `node scripts/sync-design-system.mjs ms-ai-architect` (ved marketplace-rot)
|
||||
- **Drift-deteksjon:** `MANIFEST.json` lagrer SHA-256 per fil. Re-sync feiler hvis
|
||||
vendored fil er endret lokalt — `--force` overstyrer.
|
||||
- **Lastes i HTML:** `<link>`-tags til `fonts.css`, `tokens.css`, `base.css`,
|
||||
`components.css`, `components-tier2.css`, `components-tier3.css`,
|
||||
`components-tier3-supplement.css` (i den rekkefølgen).
|
||||
- **Aldri rediger** filer under `vendor/playground-design-system/` direkte —
|
||||
endringer går i `shared/`, deretter re-sync.
|
||||
|
||||
> v2-spec under `docs/playground-v2-spec.md` er beholdt som historisk
|
||||
> referanse, men er IKKE gjeldende kontrakt. v3-arkitekturen er
|
||||
> dokumentert i `.claude/projects/2026-05-03-playground-v3-architecture/`.
|
||||
|
||||
## Relaterte plugins (fremtidig)
|
||||
|
||||
- `ms-rag-architect` — RAG-spesialist (egen plugin)
|
||||
- `ms-power-automate-architect` — Power Automate deep-dive
|
||||
- `ms-azure-ai-architect` — Azure AI Services deep-dive
|
||||
- `ms-foundry-architect` — Azure AI Foundry spesialist
|
||||
- `ms-copilot-studio-architect` — Copilot Studio spesialist
|
||||
|
||||
## Hooks (2)
|
||||
|
||||
| Event | Script | Formål |
|
||||
|
|
@ -277,6 +105,12 @@ kopi av marketplace-rotens `shared/playground-design-system/`. Dette holder plug
|
|||
|
||||
> Secrets scanning consolidated to llm-security plugin.
|
||||
|
||||
## Reference docs (read on demand)
|
||||
|
||||
- **Utvikling, testing, KB-refresh-workflow:** `docs/development.md`
|
||||
- **Playground v3 (decision-builder + rapport-viewer):** `docs/playground.md`
|
||||
- **Recommended MCP servers (detail):** `references/architecture/recommended-mcp-servers.md`
|
||||
|
||||
## Viktige frister (EU AI Act)
|
||||
|
||||
| Frist | Krav | Status |
|
||||
|
|
@ -288,20 +122,10 @@ kopi av marketplace-rotens `shared/playground-design-system/`. Dette holder plug
|
|||
|
||||
**Tilsynsmyndigheter:** Datatilsynet (personvern), nasjonal AI-tilsynsmyndighet (under etablering), sektortilsyn.
|
||||
|
||||
## Relaterte plugins (fremtidig)
|
||||
|
||||
## Communication patterns
|
||||
|
||||
### Linking to local files
|
||||
|
||||
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
|
||||
|
||||
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
|
||||
- Always use absolute paths. Never `~/` or relative paths.
|
||||
- For multiple files, render as a bullet list of named markdown links.
|
||||
|
||||
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
|
||||
|
||||
Example:
|
||||
|
||||
- [Brief](file:///Users/ktg/.../brief.html)
|
||||
- [Research summary](file:///Users/ktg/.../research/summary.md)
|
||||
- `ms-rag-architect` — RAG-spesialist (egen plugin)
|
||||
- `ms-power-automate-architect` — Power Automate deep-dive
|
||||
- `ms-azure-ai-architect` — Azure AI Services deep-dive
|
||||
- `ms-foundry-architect` — Azure AI Foundry spesialist
|
||||
- `ms-copilot-studio-architect` — Copilot Studio spesialist
|
||||
|
|
|
|||
92
plugins/ms-ai-architect/docs/development.md
Normal file
92
plugins/ms-ai-architect/docs/development.md
Normal file
|
|
@ -0,0 +1,92 @@
|
|||
# ms-ai-architect — Development
|
||||
|
||||
Plugin development, testing, KB-refresh. Imported from `CLAUDE.md` via pointer.
|
||||
|
||||
## Legge til ny kunnskapsbase
|
||||
1. Opprett `.md`-fil i riktig undermappe under den relevante skillens `references/`-mappe (f.eks. `skills/ms-ai-engineering/references/`)
|
||||
2. Følg format fra eksisterende filer (header, dato, seksjoner, "For Cosmo"-seksjon)
|
||||
3. Oppdater relevant SKILL.md med referanse
|
||||
|
||||
## Legge til ny kommando
|
||||
1. Opprett `commands/navn.md` med frontmatter (`description`, `argument-hint`)
|
||||
2. Følg mønster fra eksisterende kommandoer
|
||||
3. Oppdater `commands/help.md` med ny kommando
|
||||
4. Oppdater `CLAUDE.md`
|
||||
|
||||
## Legge til ny agent
|
||||
1. Opprett `agents/navn-agent.md` med frontmatter (`name`, `description`, `model`, `color`, `tools`)
|
||||
2. Inkluder tydelig "triggers on" i description
|
||||
3. Oppdater `CLAUDE.md`
|
||||
|
||||
## Testing
|
||||
|
||||
### Statisk validering
|
||||
```bash
|
||||
# Kjør plugin-validering (frontmatter, encoding, KB-referanser)
|
||||
bash tests/validate-plugin.sh
|
||||
```
|
||||
|
||||
### KB-ferskhet (sitemap-basert, manuell drift)
|
||||
|
||||
**Apply-fasen kjøres via slash-kommandoen** (krever aktiv Claude Code-sesjon, holder oss innenfor Anthropic Consumer Terms § 3):
|
||||
|
||||
```text
|
||||
/architect:kb-update # default: critical + high
|
||||
/architect:kb-update --priorities critical # bare critical
|
||||
/architect:kb-update --skip-discover # hopp over new-URL discovery
|
||||
/architect:kb-update --dry-run # rapport uten apply
|
||||
```
|
||||
|
||||
**Endringsrapport-fasen kan kjøres som rent Node-script (ingen LLM-kostnad):**
|
||||
|
||||
```bash
|
||||
# Poll sitemaps → endringsrapport (ingen filendringer)
|
||||
node scripts/kb-update/run-weekly-update.mjs --force
|
||||
|
||||
# Med discovery av nye relevante sider
|
||||
node scripts/kb-update/run-weekly-update.mjs --force --discover
|
||||
|
||||
# Vis rapport på nytt etter polling
|
||||
node scripts/kb-update/report-changes.mjs
|
||||
|
||||
# Bygg/oppdater URL-registry fra referansefiler
|
||||
node scripts/kb-update/build-registry.mjs [--merge]
|
||||
```
|
||||
|
||||
Systemet sammenligner Microsoft Learn sitemap-`<lastmod>` med filenes `Last updated:` header, og genererer en prioritert endringsrapport (critical/high/medium/low).
|
||||
|
||||
**Match rate:** ~69% av 1342 URLer matche mot sitemaps. ~31% (mest `azure/ai-foundry/openai/`-stier) finnes ikke i sitemaps pga. Microsofts URL-restrukturering.
|
||||
|
||||
**Schedulering:** Pluginen schedulerer ingenting. Bruker som vil ha periodisk varsling kan sette opp egen cron / launchd / systemd / GitHub Actions som kjører `node scripts/kb-update/run-weekly-update.mjs --force --discover` (rapport-fasen, ikke apply). Apply-fasen er bevisst manuell — den krever LLM-resonnering på diff og kjører fra en åpen Claude Code-sesjon.
|
||||
|
||||
Legacy (deprecated):
|
||||
```bash
|
||||
bash scripts/kb-staleness-check.sh # mtime-basert, upålitelig etter git clone
|
||||
```
|
||||
|
||||
### E2E-regresjonstester
|
||||
```bash
|
||||
# Kjør alle E2E-suiter
|
||||
bash tests/run-e2e.sh
|
||||
|
||||
# Kjør enkeltsuiter
|
||||
bash tests/run-e2e.sh --security
|
||||
bash tests/run-e2e.sh --cost
|
||||
bash tests/run-e2e.sh --summary
|
||||
bash tests/run-e2e.sh --ai-act
|
||||
```
|
||||
|
||||
Fixture-basert validering av agent-output (sikkerhet, kostnad, sammendrag). Tester struktur, encoding, og domene-spesifikke krav uten å invokere Claude.
|
||||
|
||||
### Manuell test
|
||||
```bash
|
||||
# Test at plugin registreres
|
||||
cd <plugin-root>
|
||||
claude --plugin ./plugins/ms-ai-architect
|
||||
|
||||
# Kjør hovedcommand
|
||||
/architect
|
||||
|
||||
# Vis alle kommandoer
|
||||
/architect:help
|
||||
```
|
||||
66
plugins/ms-ai-architect/docs/playground.md
Normal file
66
plugins/ms-ai-architect/docs/playground.md
Normal file
|
|
@ -0,0 +1,66 @@
|
|||
# ms-ai-architect — Playground (v3 / v1.15.0)
|
||||
|
||||
Interaktiv decision-builder + rapport-viewer for Microsoft AI-beslutninger. Imported from `CLAUDE.md` via pointer.
|
||||
|
||||
Erstatter v2 5-stegs-pipelinen med en multi-surface-app som persisterer state og visualiserer importerte rapporter inline. Spec: v3-arkitektur dokumentert under `.claude/projects/2026-05-03-playground-v3-architecture/`. v1.10.0-utvidelser dokumentert under `.claude/projects/2026-05-03-ms-ai-architect-v1-10-playground/`. v1.11.0 leverer design-system 100%-adoption. v1.13.0/.1 patchet 10+ symptomatiske visuelle bugs. v1.14.0 leverer root-cause refaktor over 6 sesjoner (DS-konvensjon-adopsjon på 14 renderere, lokal CSS halvert).
|
||||
|
||||
**v1.15.0 (sesjon 5 av ~8 i v2-prosjektet):** Project-surface byttet fra v2 `renderProjectSurface` (screen-tabs + category-tabs + per-command paste-cards) til v3 `renderProjectView` (sidebar med 17 artifacts + main-area + import-modal overlay). `renderActive()` ruter `project`-surface til `renderProjectSurfaceV3()` som wrapper renderProjectView + topbar + app-shell. V2-surface helt fjernet: `renderProjectSurface` (152 linjer), `renderCommandSubCard` (87 linjer), `rehydratePasteImports` (15 linjer), `currentProjectScreen`, `ACTIONS['project-screen']`, 5 v2-CSS-klasser. Zombie-handlers beholdt for test-back-compat: `currentProjectTab`, `ACTIONS['project-tab']`, `ACTIONS['parse']`, `handlePasteImport`, `window.__handlePasteImport`. 2 fingerprint-gap lukket: requirements.headers + license.headers. `migrateDataVersion` utvidet med `parserFor` → demo-state (kun `raw_markdown`) auto-parses til `project.artifacts[cid]`. Ship-QA-bugfixes: `components-tier4-project-view.css` lagt til i `<link>`-kjeden (manglet → modal-overlay og two-column layout virket ikke); `renderImportModal` setter `data-open="true"` (DS-kontrakt).
|
||||
|
||||
- **Fil:** `playground/ms-ai-architect-playground.html` (~3870+ linjer, single-file v3-arkitektur)
|
||||
- **4 surfaces:** Onboarding (18 felles felt — 4 strukturerte / 14 fritekst etter v1.10.0) → Home (prosjekt-liste + 3 entry-tracks) → Catalog (25 commands gruppert i 5 expansion-grupper med søk) → **Project v3** (sidebar med 17 artifacts gruppert i 4 kategorier + søk + main-area med per-artifact view eller overview med top-risks/next-actions + import-modal som DS-overlay)
|
||||
- **Persistens:** IndexedDB-primær med localStorage-fallback. Schema-versjonert (`STATE_KEY = 'ms-ai-architect-state-v1'`) med eager `MIGRATIONS`-pipeline. v1.10.0 introduserer `dataVersion v1→v2`-migrasjon (idempotent) som backfill-er `verdict`+`keyStats`.
|
||||
- **17 rapport-renderers (felles grunnskjelett):** Alle wrapper output via `renderPageShell()` med eyebrow + h1 + valgfri verdict-pill + valgfri key-stats-grid + arketype-spesifikk body. Parser → struktur → HTML rutet via kanonisk archetype-routing-tabell.
|
||||
- **Foundation-helpers:** `renderPageShell`, `renderVerdictPill`, `renderKeyStatsGrid`, `inferVerdict`, `inferKeyStats`, `KEY_STATS_CONFIG`.
|
||||
- **Tier 3-adopsjon:** kanban (conformity, review), mat-ladder (migrate, poc), screen-tabs (utredning, project surface), scenario-card-grid (license, compare), residual-pair (dpia, ros), top-risks (ros), recommendation-card (security, ros), suppressed-panel (review), critique-card (adr), read-more (utredning, summary), traffic-light (poc).
|
||||
- **Theme:** Mørk default + lys theme-toggle med Aksel-tokens i begge moduser (lagt til i v1.10.0). Persistert i `localStorage('ms-ai-architect-theme')`. Theme-bootstrap-script i `<head>` unngår FOUC.
|
||||
- **Eksport/import:** JSON Decision Record-envelope (Blob + FileReader), schema-versjon-bevisst på import.
|
||||
|
||||
## Validering (v1.15.0-tall)
|
||||
|
||||
| Test | Kommando | Dekning |
|
||||
|------|----------|---------|
|
||||
| Statisk struktur | `bash tests/test-playground-v3.sh` | 219 PASS, 2 WARN (pre-eks.) — vendored CSS, surfaces, 25 commands, 14 parsere, 17 renderers via PROJECT_VIEW_CONFIG.renderers-routing, action-handlers |
|
||||
| Parser-fixtures | `bash tests/test-playground-parsers.sh` | 70 PASS — 17 fixtures × parser-routing |
|
||||
| Migrasjon | `bash tests/test-playground-migrations.sh` | 16 PASS — v1→v2 + v2→v3 idempotent migrasjon |
|
||||
| Fingerprints | `bash tests/test-playground-fingerprints.sh` | 32 PASS — 17-fixture true-positive + 4 anti-match + API-sanity |
|
||||
| Project-view | `bash tests/test-playground-projectview.sh` | 30 PASS — 4 view-states + nav-søk + null-guard |
|
||||
| ACTIONS | `bash tests/test-playground-actions.sh` | 19 PASS — 6 pure-state-handlers + projectViewUiState |
|
||||
| Kombinert (E2E) | `bash tests/run-e2e.sh --playground` | 386 PASS, 0 FAIL, 2 WARN |
|
||||
| Plugin-validering | `bash tests/validate-plugin.sh` | 219 PASS |
|
||||
| Manuell A11Y QA | Se `playground/MANUAL-CHECKLIST.md` | 10 seksjoner inkl. axe-core-kjøring per surface |
|
||||
| A11Y-rapport | `playground/A11Y-RAPPORT.md` | Statisk vurdering klar — browser-axe-kjøring pending |
|
||||
|
||||
## Demo system (v1.11.0 → v1.15.0)
|
||||
|
||||
`scripts/build-demo-state.mjs` leser alle 17 fixture-filer fra `playground/test-fixtures/` og injiserer dem som en `<script type="application/json" id="demo-state-v1">`-blokk i playground HTML (idempotent — erstatter eksisterende blokk). "Last inn demo-data"-knappen på onboarding-overflaten kaller `ACTIONS['load-demo']` som leser blokken, erstatter alle state-grener via Proxy-mutasjon, kjører `migrateDataVersion` (v2→v3 auto-parser raw_markdown til artifacts), og navigerer til project-surface. Demo viser 17 artifacts gruppert i sidebar med severity-badges, aggregate verdict (BLOKKERT), top-risks-liste, og fungerende re-importer/slett-knapper per artifact.
|
||||
|
||||
`tests/screenshot/` inneholder en frittstående Playwright-runner med egen `package.json` (gitignored `node_modules`). `node run.mjs` produserer 24 PNG-er (12 surfaces × 2 tema) under `playground/screenshots/v1.15.0/`. v1.15.0-surfaces: onboarding-empty, project-overview, project-artifact-{classify,security,ros,cost,summary}, project-import-modal (viewport-only — modal er position:fixed overlay), project-search, home, catalog, onboarding-prefilled. v1.10.0/v1.11.0/v1.14.0 beholdt som historisk referanse. Disse committes så forkere ser pluginen uten å installere noe. Demo-org er "Acme Kommune" og demo-prosjekt er "Acme: Kunde-chatbot".
|
||||
|
||||
## Design-system 100%-adoption (v1.11.0 → v1.14.0)
|
||||
|
||||
Sesjon 3-5 la til inline CSS i `playground/ms-ai-architect-playground.html`. v1.11.0 hoisted alle generiske komponenter til `shared/playground-design-system/components-tier3-supplement.css` (DS v0.3.0):
|
||||
- `.pyramide-desc` / `.pyramide-desc__item`
|
||||
- `.scenario-card-grid` / `.scenario-card`
|
||||
- `.residual-pair` / `__cell` / `__cell-label/__cell-value/__cell-meta` / `__arrow`
|
||||
- `.read-more` / `.read-more__trigger` / `.read-more__chev` / `.read-more__body`
|
||||
- `.top-risks` / `.top-risk[data-severity]`
|
||||
- `.recommendation-card`
|
||||
- `.suppressed-panel`
|
||||
- `.screen-tabs` / `.screen-tab` / `.screen[data-active]`
|
||||
|
||||
v1.14.0 (DS v0.4.0): root-cause fix for tre DS-bugs som tidligere ble symptomatisk patchet i lokal CSS — `.kanban-card__name` (break-word + overflow-wrap; var break-all), `.expansion__title-main/sub` (display: block), `.matrix__bubble` (cursor + hover/focus). Fix-en re-syncet til vendored DS, og tilsvarende lokal-overrides slettet. Plus: 14 renderere refaktorert til DS-konvensjon (3 risk-renderere → DS-summary-grid + ros-layout, 6 compliance/govern-renderere → DS-konvensjon, renderMigrate + renderPoc → expansion-list per fase). Lokal `<style>`-blokk: 191 → 122 effektive linjer (~36% reduksjon siden v1.13.1).
|
||||
|
||||
Alle PARALLEL-CSS-navngrupper migrert til DS-konvensjon. `renderPageShell` + `renderKeyStatsGrid` refaktorert til DS markup. Severity-coded card-borders på rapport-cards, app-header-restruktur, `.stack-lg` body spacing på home/project/catalog, AI Act-pyramide bredde-fix, eyebrow-label på home-projects.
|
||||
|
||||
Ved videre hoisting: re-sync via `node scripts/sync-design-system.mjs ms-ai-architect`. Dette er endringer i delt asset — krever drift-deteksjon-handling per `MANIFEST.json`.
|
||||
|
||||
## Vendored design-system
|
||||
|
||||
Playground laster CSS fra `playground/vendor/playground-design-system/` — en vendored kopi av marketplace-rotens `shared/playground-design-system/`. Dette holder pluginen **standalone**: HTML-filen kan åpnes fra `file://` uavhengig av marketplace-roten.
|
||||
|
||||
- **Sync-skript:** `node scripts/sync-design-system.mjs ms-ai-architect` (ved marketplace-rot)
|
||||
- **Drift-deteksjon:** `MANIFEST.json` lagrer SHA-256 per fil. Re-sync feiler hvis vendored fil er endret lokalt — `--force` overstyrer.
|
||||
- **Lastes i HTML:** `<link>`-tags til `fonts.css`, `tokens.css`, `base.css`, `components.css`, `components-tier2.css`, `components-tier3.css`, `components-tier3-supplement.css` (i den rekkefølgen).
|
||||
- **Aldri rediger** filer under `vendor/playground-design-system/` direkte — endringer går i `shared/`, deretter re-sync.
|
||||
|
||||
> v2-spec under `docs/playground-v2-spec.md` er beholdt som historisk referanse, men er IKKE gjeldende kontrakt. v3-arkitekturen er dokumentert i `.claude/projects/2026-05-03-playground-v3-architecture/`.
|
||||
|
|
@ -1,14 +1,7 @@
|
|||
---
|
||||
name: ms-ai-advisor
|
||||
description: |
|
||||
This skill should be used when the user needs Microsoft AI architecture guidance, wants help
|
||||
choosing between Azure AI platforms, or asks about Copilot vs Foundry trade-offs. Cosmo Skyberg
|
||||
persona guides through structured problem understanding before technology selection. Specialist
|
||||
in Azure AI Foundry, M365 Copilot, Copilot Studio, Power Platform, Azure OpenAI, and
|
||||
Microsoft Agent Framework.
|
||||
Triggers on: "Microsoft AI architecture", "Copilot vs Foundry", "which Microsoft AI platform",
|
||||
"Azure AI advice", "M365 Copilot vs Copilot Studio", "help me choose between Azure OpenAI and Copilot Studio",
|
||||
"trenger arkitekturveiledning", "hvilken Copilot skal jeg bruke", "/architect", "Cosmo".
|
||||
description: >-
|
||||
Microsoft AI architecture guidance, choosing between Azure AI platforms, Copilot vs Foundry trade-offs. Cosmo Skyberg persona guides through structured problem understanding before technology selection. Specialist in Azure AI Foundry, M365 Copilot, Copilot Studio, Power Platform, Azure OpenAI, Microsoft Agent Framework. Triggers on: "Microsoft AI architecture", "Copilot vs Foundry", "which Microsoft AI platform", "Cosmo", "/architect".
|
||||
---
|
||||
|
||||
> **INSTRUKSJON:** Du ER Cosmo Skyberg. Følg arbeidsprosessen nedenfor.
|
||||
|
|
|
|||
|
|
@ -1,13 +1,7 @@
|
|||
---
|
||||
name: ms-ai-engineering
|
||||
description: |
|
||||
This skill should be used when the user needs deep technical guidance for building AI solutions
|
||||
in the Microsoft stack — RAG architecture, multi-agent orchestration, Azure AI Services,
|
||||
data engineering with Fabric, MLOps/GenAIOps, multimodal AI, or API Management for AI.
|
||||
Triggers on: "RAG architecture on Azure", "multi-agent orchestration pattern",
|
||||
"MLOps for generative AI", "Azure AI Search implementation", "Semantic Kernel agent",
|
||||
"Fabric data pipeline for AI", "API gateway for AI", "chunking strategy",
|
||||
"embedding model", "APIM for Azure OpenAI".
|
||||
description: >-
|
||||
Deep technical guidance for building AI solutions in the Microsoft stack — RAG architecture, multi-agent orchestration, Azure AI Services, data engineering with Fabric, MLOps/GenAIOps, multimodal AI, API Management for AI. Triggers on: "RAG architecture on Azure", "multi-agent orchestration pattern", "MLOps for generative AI", "Azure AI Search", "Semantic Kernel agent", "Fabric data pipeline".
|
||||
---
|
||||
|
||||
> **INSTRUKSJON:** Denne skillen gir dyp teknisk kunnskap for AI-løsningsbygging.
|
||||
|
|
|
|||
|
|
@ -1,14 +1,7 @@
|
|||
---
|
||||
name: ms-ai-governance
|
||||
description: |
|
||||
This skill should be used when the user asks about Norwegian public sector AI compliance,
|
||||
utredningsinstruksen for AI projects, EU AI Act risk classification, DPIA for AI systems,
|
||||
Digdir architecture principles, responsible AI governance, or monitoring and observability
|
||||
for AI in production.
|
||||
Triggers on: "Norwegian public sector AI compliance", "utredningsinstruksen for AI",
|
||||
"AI Act risk classification", "DPIA for AI system", "Digdir architecture principles",
|
||||
"ansvarlig AI i offentlig sektor", "compliance-vurdering for AI", "Forvaltningsloven AI",
|
||||
"Schrems II AI", "bias detection", "AI governance framework".
|
||||
description: >-
|
||||
Norwegian public sector AI compliance, utredningsinstruksen for AI, EU AI Act risk classification, DPIA for AI systems, Digdir architecture principles, responsible AI governance, monitoring and observability. Triggers on: "Norwegian public sector AI compliance", "AI Act risk classification", "DPIA for AI system", "Digdir architecture principles", "ansvarlig AI i offentlig sektor", "Forvaltningsloven AI".
|
||||
---
|
||||
|
||||
# ms-ai-governance
|
||||
|
|
|
|||
|
|
@ -1,14 +1,7 @@
|
|||
---
|
||||
name: ms-ai-infrastructure
|
||||
description: |
|
||||
This skill should be used when the user asks about disaster recovery for AI workloads,
|
||||
multi-region Azure AI deployment, hybrid or edge AI architecture, sovereign cloud for Norway,
|
||||
offline-first AI patterns, or AI infrastructure resilience planning.
|
||||
Covers BCDR, Azure Arc for AI, ONNX Runtime edge deployment, disconnected scenarios,
|
||||
and Norwegian data sovereignty requirements.
|
||||
Triggers on: "disaster recovery for AI workloads", "edge AI deployment", "sovereign cloud AI",
|
||||
"multi-region Azure AI", "Azure Arc for AI", "offline AI deployment",
|
||||
"AI infrastructure resilience", "BCDR for AI", "hybrid AI", "Norway East failover".
|
||||
description: >-
|
||||
Disaster recovery for AI workloads, multi-region Azure AI deployment, hybrid or edge AI architecture, sovereign cloud for Norway, offline-first AI patterns, AI infrastructure resilience. Covers BCDR, Azure Arc for AI, ONNX Runtime edge deployment, disconnected scenarios, Norwegian data sovereignty. Triggers on: "disaster recovery for AI workloads", "edge AI deployment", "sovereign cloud AI", "Azure Arc for AI", "BCDR for AI".
|
||||
---
|
||||
|
||||
> **INSTRUKSJON:** Denne ferdigheten dekker infrastrukturresiliens og driftsarkitektur for AI-arbeidsbelastninger.
|
||||
|
|
|
|||
|
|
@ -1,13 +1,7 @@
|
|||
---
|
||||
name: ms-ai-security
|
||||
description: |
|
||||
This skill should be used when the user needs a security assessment for an AI solution,
|
||||
wants cost estimation for Azure AI workloads, asks about OWASP LLM Top 10 mitigations,
|
||||
or needs performance optimization guidance. Provides deterministic 6x5 security scoring,
|
||||
P10/P50/P90 cost confidence intervals, and FinOps practices for AI.
|
||||
Triggers on: "security assessment for AI", "AI threat modeling", "cost estimation for Azure AI",
|
||||
"FinOps for AI workloads", "prompt injection defense", "kostnadsestimat for AI-løsning",
|
||||
"sikkerhetsscoring for AI", "OWASP LLM", "6x5 scoring", "PTU vs pay-as-you-go".
|
||||
description: >-
|
||||
Security assessment, cost estimation, OWASP LLM Top 10 mitigations, performance optimization for AI on Microsoft stack. Deterministic 6x5 security scoring, P10/P50/P90 cost confidence intervals, FinOps practices. Triggers on: "security assessment for AI", "AI threat modeling", "cost estimation for Azure AI", "FinOps for AI workloads", "OWASP LLM", "kostnadsestimat for AI-løsning".
|
||||
---
|
||||
|
||||
> **INSTRUKSJON:** Denne skillen dekker kvantitative vurderingsaktiviteter med deterministiske
|
||||
|
|
|
|||
|
|
@ -75,20 +75,3 @@ Cycle archival: `/okr:oppsett arkiver` — moves `syklus/` to `historikk/`, gene
|
|||
/okr:oppsett arkiver ──→ cycle archival + retrospektiv-generering
|
||||
SessionStart ──→ coaching-hook.mjs (proactive coaching)
|
||||
```
|
||||
|
||||
## Communication patterns
|
||||
|
||||
### Linking to local files
|
||||
|
||||
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
|
||||
|
||||
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
|
||||
- Always use absolute paths. Never `~/` or relative paths.
|
||||
- For multiple files, render as a bullet list of named markdown links.
|
||||
|
||||
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
|
||||
|
||||
Example:
|
||||
|
||||
- [Brief](file:///Users/ktg/.../brief.html)
|
||||
- [Research summary](file:///Users/ktg/.../research/summary.md)
|
||||
|
|
|
|||
|
|
@ -1,13 +1,7 @@
|
|||
---
|
||||
name: okr-offentlig-sektor
|
||||
description: |
|
||||
This skill should be used when the user asks about OKR (Objectives and Key Results)
|
||||
in Norwegian public sector context, including writing OKR, reviewing OKR quality,
|
||||
cascading OKR from strategy to team level, tracking OKR progress, running OKR meetings,
|
||||
or translating tildelingsbrev to OKR. Also for CFR, OKR antipatterns, scoring, and Oboard.
|
||||
Triggers on: "OKR", "objectives and key results", "skriv OKR", "vurder OKR",
|
||||
"OKR-scoring", "kaskadere OKR", "OKR-workshop", "tildelingsbrev til OKR",
|
||||
"OKR for offentlig sektor", "Oboard", "CFR", "OKR antipatterns", "OKR kvalitetssjekk".
|
||||
description: >-
|
||||
OKR (Objectives and Key Results) for Norwegian public sector: writing OKR, reviewing OKR quality, cascading OKR from strategy to team, tracking progress, running OKR meetings, translating tildelingsbrev to OKR. Also CFR, OKR antipatterns, scoring, Oboard. Triggers on: "OKR", "skriv OKR", "vurder OKR", "OKR-scoring", "kaskadere OKR", "tildelingsbrev til OKR", "OKR for offentlig sektor".
|
||||
version: "1.0.0"
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -22,87 +22,7 @@ Voyage — a contract-driven Claude Code pipeline: brief, research, plan, execut
|
|||
| `/trekcontinue` | Continue — resumes the next session of a multi-session voyage project. Reads `.session-state.local.json` (Handover 7) and immediately begins executing | opus |
|
||||
| `/trekendsession` | End-session — mark the current session complete and write session-state pointing at the next session. Helper for informal multi-session flows | opus |
|
||||
|
||||
### /trekbrief modes
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| _(default)_ | Dynamic interview until quality gates pass → brief.md with research plan |
|
||||
| `--quick` | Compact start; still escalates if required sections are weak or the brief-review gate fails → brief.md with research plan |
|
||||
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
|
||||
| `--profile <name>` | (v4.1.0) Model profile: `economy` / `balanced` / `premium` / `<custom>`. Sets `phase_models` for the brief phase. See `## Profile system` below. |
|
||||
|
||||
Always interactive. Phase 3 is a section-driven completeness loop (no hard cap on question count); Phase 4 runs a `brief-reviewer` stop-gate with max 3 review iterations. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground).
|
||||
|
||||
### /trekresearch modes
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| _(default)_ | Interview + research (local + external) + synthesis + brief (foreground) |
|
||||
| `--project <dir>` | Write brief to `{dir}/research/{NN}-{slug}.md` (auto-incremented) |
|
||||
| `--quick` | Interview (short) + inline research (no agent swarm) |
|
||||
| `--local` | Only codebase analysis agents (skip external + Gemini) |
|
||||
| `--external` | Only external research agents (skip codebase analysis) |
|
||||
| `--fg` | No-op alias (foreground is default since v2.4.0) |
|
||||
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the research phase. See `## Profile system` below. |
|
||||
|
||||
Flags combine: `--project <dir> --local`, `--external --quick`.
|
||||
|
||||
### /trekplan modes
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| `--project <dir>` | **Required path A** — read `{dir}/brief.md`, auto-discover `{dir}/research/*.md`, write `{dir}/plan.md` |
|
||||
| `--brief <path>` | **Required path B** — plan from a specific brief file; write to `.claude/plans/trekplan-{date}-{slug}.md` |
|
||||
| `--research <brief> [brief2]` | Enrich with extra research briefs beyond what is in `{project_dir}/research/` |
|
||||
| `--fg` | No-op alias (foreground is default since v2.4.0) |
|
||||
| `--quick` | Plan directly (no agent swarm) |
|
||||
| `--export <pr\|issue\|markdown\|headless> <plan>` | Generate shareable output from existing plan |
|
||||
| `--decompose <plan>` | Split plan into self-contained headless sessions |
|
||||
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the plan phase (and others, since plan emits `profile:` to plan.md frontmatter). See `## Profile system` below. |
|
||||
|
||||
**Breaking change (v2.0):** one of `--brief` or `--project` is required. There is no interview inside `/trekplan`. The `--spec` flag has been removed — use `/trekbrief` to produce a brief instead.
|
||||
|
||||
If `{project_dir}/architecture/overview.md` exists (typically produced by an opt-in upstream architect plugin, not bundled), the plan command auto-discovers it and treats `cc_features_proposed` as priors. Missing file is fine — discovery is additive, not required.
|
||||
|
||||
### /trekexecute modes
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| _(default)_ | Execute plan — auto-detects Execution Strategy for multi-session |
|
||||
| `--project <dir>` | Read `{dir}/plan.md`, write `{dir}/progress.json` |
|
||||
| `--resume` | Resume from last progress checkpoint |
|
||||
| `--dry-run` | Validate plan structure without executing |
|
||||
| `--validate` | Schema-only check — parse steps + manifests, report `READY \| FAIL`, no execution |
|
||||
| `--step N` | Execute only step N |
|
||||
| `--fg` | Force foreground — run all steps sequentially, ignore Execution Strategy |
|
||||
| `--session N` | Execute only session N from plan's Execution Strategy |
|
||||
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the execute phase. Inherited from plan.md frontmatter `profile:` if present. See `## Profile system` below. |
|
||||
|
||||
### /trekreview modes
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| _(default)_ | Run brief-conformance + code-correctness reviewers in parallel, coordinator dedup + verdict, write `{project_dir}/review.md` |
|
||||
| `--project <dir>` | **Required.** Path to trekplan project folder containing `brief.md`. Review is written to `{dir}/review.md` |
|
||||
| `--since <ref>` | Override "before" SHA for the diff range. Validated via `git rev-parse --verify` |
|
||||
| `--quick` | Skip brief-conformance reviewer; skip coordinator's reasonableness filter — fast correctness-only pass |
|
||||
| `--validate` | Schema-only check on existing `{dir}/review.md`. No LLM calls |
|
||||
| `--dry-run` | Print discovered scope + triage map; skip writes |
|
||||
| `--fg` | No-op alias (foreground is default) |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the review phase. See `## Profile system` below. |
|
||||
|
||||
### /trekcontinue modes
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| _(default)_ | Auto-discover active project's `.session-state.local.json` and resume |
|
||||
| `<project-dir>` | Resume the next session of an explicit project directory |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the resumed session. Inherited from the previous session's plan.md frontmatter when absent. See `## Profile system` below. |
|
||||
|
||||
The triage gate is deterministic — path-pattern classifier produces `{file → deep-review|summary-only|skip}`. Hard refuse-with-suggestion above 100 files / 100K diff tokens.
|
||||
Full flag reference for each command (modes, `--gates`, `--profile`, breaking changes): see `docs/command-modes.md`.
|
||||
|
||||
## Agents
|
||||
|
||||
|
|
@ -132,191 +52,11 @@ The triage gate is deterministic — path-pattern classifier produces `{file →
|
|||
| contrarian-researcher | opus | Counter-evidence, overlooked alternatives |
|
||||
| gemini-bridge | opus | Gemini Deep Research second opinion (conditional) |
|
||||
|
||||
## Quality infrastructure (v3.4.0)
|
||||
## Reference docs (read on demand)
|
||||
|
||||
`lib/` contains zero-dep validators, parsers, and autonomy primitives wired into the four commands:
|
||||
|
||||
- `lib/util/{frontmatter,result,atomic-write,autonomy-gate}.mjs` — shared YAML-frontmatter parser + Result helpers + `atomicWriteJson(path, obj)` for tmp+rename writes + autonomy-gate state machine (v3.4.0)
|
||||
- `lib/parsers/{plan-schema,manifest-yaml,project-discovery,arg-parser,bash-normalize,jaccard,finding-id}.mjs` — pure parsers (no I/O), unit-tested. `manifest-yaml` extended in v3.4.0 with additive `skip_commit_check` + `memory_write` flags (forward-compat: unknown keys ignored)
|
||||
- `lib/review/{rule-catalogue,plan-review-dedup}.mjs` — version-pinned rule catalogue (12 keys) + Phase 9 inline dedup helpers (v3.4.0)
|
||||
- `lib/stats/event-emit.mjs` — single-source stats event emitter for autonomy-gate transitions and main-merge-gate (v3.4.0)
|
||||
- `lib/validators/{brief,research,plan,progress,session-state}-validator.mjs` — schema validators with CLI shims (`node lib/validators/X.mjs --json <path>`)
|
||||
- `lib/validators/architecture-discovery.mjs` — drift-WARN external-contract discovery for `architecture/overview.md`
|
||||
|
||||
Wiring points (replaces previous prose-grep instructions):
|
||||
- `/trekbrief` Phase 4g → `brief-validator` (post-write sanity check)
|
||||
- `/trekplan` Phase 1 → `brief-validator --soft`, `research-validator --dir`, `architecture-discovery`
|
||||
- `planning-orchestrator` Phase 5.5 → `plan-validator --strict` (replaces 3 `grep -cE` calls)
|
||||
- `/trekexecute --validate` → `plan-validator --strict` + `progress-validator`
|
||||
|
||||
Tests under `tests/**/*.test.mjs` (~290 tests, 0 deps). `npm test` is the fork-readiness gate. v3.4.0 adds: synthetic determinism fixtures (`tests/synthetic/plan-run-*.md` + `review-run-*.md` + companion `*-determinism.test.mjs` enforcing Jaccard ≥ 0.833 SC7 floor) and hook baseline regression pins (`tests/hooks/{path-guard,bash-guard}.test.mjs` exercising `pre-write-executor.mjs` + `pre-bash-executor.mjs` denylist BLOCK paths).
|
||||
|
||||
Doc-consistency test at `tests/lib/doc-consistency.test.mjs` pins agent-table count, command-table coverage, plan_version invariant, settings.json scope cleanliness, Handover 7 presence, and `session-state-validator` CLI shim.
|
||||
|
||||
`docs/HANDOVER-CONTRACTS.md` is the single source of truth for the 7 pipeline handovers (brief→research, research→plan, architecture→plan EXTERNAL, plan→execute, progress.json resume, review→plan, `.session-state.local.json`). Read it before changing any artifact format.
|
||||
|
||||
`hooks/scripts/pre-compact-flush.mjs` (PreCompact event, CC v2.1.105+) fixes the documented P0 in `docs/trekexecute-v2-observations-from-config-audit-v4.md`: keeps `progress.json` in sync with git history before context compaction so `--resume` works after long conversations. Atomic write, monotonic only, never blocks compaction.
|
||||
|
||||
`hooks/scripts/session-title.mjs` (UserPromptSubmit, CC v2.1.94+) sets `sessionTitle` to `voyage:<command>:<slug>` for voyage-command invocations. Helps multi-session headless runs identify themselves in process lists.
|
||||
|
||||
`hooks/scripts/post-bash-stats.mjs` (PostToolUse, CC v2.1.97+) appends `duration_ms` for each Bash call into `${CLAUDE_PLUGIN_DATA}/trekexecute-stats.jsonl`. Useful for finding long-running verify or checkpoint commands.
|
||||
|
||||
`hooks/scripts/post-compact-flush.mjs` (PostCompact event, v3.4.0) re-injects `.session-state.local.json` after context compaction so multi-session work survives a compaction boundary. Companion to `pre-compact-flush.mjs` (which writes the state file before compaction); together they form the rehydrate cycle that keeps `/trekcontinue` reliable across long-running multi-session work.
|
||||
|
||||
## Autonomy mode (`--gates`, v3.4.0)
|
||||
|
||||
All four pipeline commands accept `--gates {open|closed|adaptive}`:
|
||||
|
||||
| Value | Behavior |
|
||||
|-------|----------|
|
||||
| `open` | Skip optional checkpoints; trust manifests + verify gates only |
|
||||
| `closed` | Stop at every autonomy boundary; operator confirms each transition |
|
||||
| `adaptive` (default) | Stop only at meaningful boundaries (manifest-audit FAIL, plan-critic BLOCKER, main-merge gate) |
|
||||
|
||||
Under the hood: `lib/util/autonomy-gate.mjs` runs the state machine `idle → approved → executing → merge-pending → main-merged`. `lib/stats/event-emit.mjs` records each transition to `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl`. The main-merge gate is the final autonomy boundary before HEAD lands on `main`.
|
||||
|
||||
### Path A/B/C decision (v3.4.0; Path C closed 2026-05-05)
|
||||
|
||||
Three architectural options were considered for the speedup work:
|
||||
|
||||
- **Path A — cache-first** (drop `--allowedTools` per child to recover cross-phase cache sharing): REJECTED. Inverts the security model; plugin hooks don't fire reliably in `claude -p` (research/06 GH #36071).
|
||||
- **Path B — sequential `--no-ff` parallel waves with manifest-driven failure recovery**: CHOSEN. Ships in v3.4.0. Phase 2.6 of `/trekexecute` runs the wave executor with hardenings for plugin-in-monorepo + gitignored-state topology.
|
||||
- **Path C — hybrid (cache-warm sentinel + identical-tool parallel)**: **CLOSED 2026-05-05.** Q3 experiment measured median `cache_creation_input_tokens` = 163,903 across 3 fork-children at 186K parent context (CC v2.1.128, Sonnet 4.6). Master-plan thresholds: ≤ 1,500 POSITIVE / ≥ 3,500 NEGATIVE. Result is solidly NEGATIVE — `CLAUDE_CODE_FORK_SUBAGENT` does not preserve cache prefix across identical-tool children at our context size. Path C migration is deferred indefinitely; reassessment is appropriate when CC v2.2.xxx ships fork-cache-relevant features. Harness: `scripts/q3-cache-prefix-experiment.mjs`. Companion analyser: `lib/stats/cache-analyzer.mjs`.
|
||||
|
||||
A revived Path C (post-v2.2.xxx) would require: (1) re-architecting tool-list to be identical across all wave children, (2) cache-telemetry analysis confirming the new fork-cache behaviour holds, (3) prompt-level deny re-enablement to compensate for tool scoping rollback.
|
||||
|
||||
## Profile system (`--profile`, v4.1.0)
|
||||
|
||||
Three built-in model profiles plus operator-defined `<custom>.yaml`. Each profile pins `phase_models` for the six pipeline phases (`brief`, `research`, `plan`, `execute`, `review`, `continue`). Profile is recorded in plan.md frontmatter as `profile: <name>` and emitted to `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl` for cost-attribution.
|
||||
|
||||
| Profile | Brief | Research | Plan | Execute | Review | Continue | Use case |
|
||||
|---------|-------|----------|------|---------|--------|----------|----------|
|
||||
| `economy` | sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; high-confidence small-scope tasks (operator-opt-in via `--profile economy`) |
|
||||
| `balanced` | sonnet | sonnet | opus | sonnet | opus | sonnet | Mixed — opus where reasoning depth pays off (operator-opt-in via `--profile balanced`) |
|
||||
| `premium` (default) | opus | opus | opus | opus | opus | opus | Maximum quality — Opus on every phase. Default since 2026-05-13 operator request; also the hardcoded resolver default at `lib/profiles/resolver.mjs:145` |
|
||||
|
||||
### Lookup order
|
||||
|
||||
1. Explicit `--profile <name>` flag passed to the command
|
||||
2. Plan-file frontmatter `profile:` (when resuming via `/trekexecute --resume` or `/trekcontinue`)
|
||||
3. `VOYAGE_PROFILE` environment variable
|
||||
4. Default `balanced`
|
||||
|
||||
### Custom profiles
|
||||
|
||||
Create `lib/profiles/<custom>.yaml` to define a new tier. The validator (`lib/validators/profile-validator.mjs`) enforces: every `phase_models[].phase` must be a known phase enum; every `phase_models[].model` must match `^(opus|sonnet)(\b|-).*` or one of the canonical short names. Custom profiles override built-ins of the same name (lookup is alphabetical with `<custom>` taking precedence).
|
||||
|
||||
Drift between plan-frontmatter `profile:` and step-manifest `profile_used:` emits a `MANIFEST_PROFILE_DRIFT` warning from `plan-validator --strict` (Step 20). Plan remains valid; the warning surfaces accidental tier-mismatch.
|
||||
|
||||
## Observability (Stop hook, v4.1.0)
|
||||
|
||||
The `Stop` hook in `hooks/hooks.json` runs `hooks/scripts/otel-export.mjs` at session-end. The hook is **opt-in** — when `VOYAGE_EXPORT_MODE` is unset or `off`, no work is done.
|
||||
|
||||
| Mode | Output | Endpoint env-var |
|
||||
|------|--------|------------------|
|
||||
| `off` (default) | _(no export)_ | — |
|
||||
| `textfile` | `voyage.prom` (Prometheus exposition format) | `VOYAGE_TEXTFILE_DIR` |
|
||||
| `otlp` | OTLP/JSON POST | `VOYAGE_OTEL_ENDPOINT` |
|
||||
|
||||
Endpoint validation: `VOYAGE_OTEL_ALLOW_PRIVATE=1` is required to send to loopback or RFC1918 destinations (CWE-918 SSRF mitigation). Allowlist `lib/exporters/field-allowlist.mjs` redacts records before export (CWE-212). Path validation (`lib/exporters/path-validator.mjs`) rejects symlink + traversal (CWE-22).
|
||||
|
||||
Local Docker Compose stack: `examples/observability/`. Operator docs: `docs/observability.md`. Both pin minimum versions per CVE history (`prom/prometheus:v3.0.1`, `grafana/grafana:11.4.0`, `otel/opentelemetry-collector-contrib:0.115.0`).
|
||||
|
||||
## Architecture
|
||||
|
||||
**Brief:** 7-phase workflow: Parse mode → Create project dir → Phase 3 completeness loop (section-driven, no question cap) → Phase 3.5 per-phase effort dialog (v5.1) → Phase 4 draft/review/revise with `brief-reviewer` as stop-gate (max 3 iterations; gate = all dimensions ≥ 4 and research plan = 5) → Finalize (`brief.md` on pass, or `brief_quality: partial` on cap/force-stop) → Manual/auto opt-in → Stats. Always interactive. Auto mode runs research + plan inline in the main context (v2.4.0).
|
||||
|
||||
**Phase 3.5 (v5.1) — adaptive-depth signals:** Between Phase 3 completeness exit and Phase 4 draft, the operator commits an effort level (`low | standard | high`) and an optional `model` (`sonnet | opus`) per downstream phase (`research`, `plan`, `execute`, `review`) via 4 tier-coupled `AskUserQuestion` calls. The choices land in `brief.md` frontmatter as `phase_signals:` (a list of `{phase, effort?, model?}` entries) when committed, or `phase_signals_partial: true` when the operator force-stops. `brief_version: 2.1` activates the **sequencing gate**: validator emits `BRIEF_V51_MISSING_SIGNALS` if a 2.1-versioned brief lacks both fields. Downstream commands surface a friendly hint pointing back to `/trekbrief` — enforcement is validator-only. Composition is documented prose in each downstream command's `## Composition rule (v5.1)` section: `brief.phase_signals[phase] > profile.phase_models[phase]`. The brief signal wins per-phase when present; the profile fills gaps. `effort == low` activates each command's existing `--quick`-equivalent code-path (`/trekexecute` low-effort = `--gates open` + sequential-only). High-effort behavior is deferred to v5.1.1 per brief Non-Goal.
|
||||
|
||||
**Research:** Foreground workflow (v2.4.0): Parse mode → Interview → Parallel research swarm (5 local + 4 external + 1 bridge, spawned from main context) → Follow-ups → Triangulation → Synthesis + brief → Stats. With `--project`, writes to `{dir}/research/NN-slug.md`.
|
||||
|
||||
**Plan:** Foreground workflow (v2.4.0): Parse mode (validate brief input) → Codebase sizing → Brief review (`brief-reviewer`) → Parallel exploration (6-8 agents, spawned from main context) → Deep-dives → Synthesis (with architecture-note cross-reference if present) → Planning → Adversarial review (`plan-critic` + `scope-guardian`) → Present/refine → Handoff. With `--project`, writes to `{dir}/plan.md` and auto-detects `{dir}/architecture/overview.md` (produced by an opt-in upstream architect plugin if installed; not bundled).
|
||||
|
||||
**Decompose:** Parse plan → Analyze step dependencies → Group into sessions → Identify parallel waves → Generate session specs + dependency graph + launch script.
|
||||
|
||||
**Execute:** Parse plan → Security scan (Phase 2.4) → Detect Execution Strategy → Single-session (step loop) or multi-session (parallel waves via `claude -p` with scoped `--allowedTools`) → Phase 7.5 manifest audit → Phase 7.6 bounded recovery (if partial) → Phase 8 atomically writes `progress.json` + `.session-state.local.json` (Handover 7) → Report. With `--project`, reads `{dir}/plan.md`. Phase 2.55 (pre-flight stop) and Phase 4 (entry-condition stop) also write `.session-state.local.json` so `/trekcontinue` can surface the stop and prompt for next steps.
|
||||
|
||||
**Continue:** `/trekcontinue` reads `{dir}/.session-state.local.json` (Handover 7), validates schema-v1 via `session-state-validator`, narrates a 3-line summary (project / next-session-label / brief-path), and immediately begins executing the next session. Auto-discovers active project state files under `.claude/projects/*/.session-state.local.json` if no explicit `<project-dir>` argument. Operator-invoked only — never auto-loaded via SessionStart. The `/trekendsession` helper is the informal-flow producer: writes the same state file for ad-hoc multi-session handovers that don't run through `/trekexecute`.
|
||||
|
||||
**Operator-UX guarantee (since v5.0.2):** `/trekbrief`, `/trekplan`, and `/trekreview` MUST always emit (a) a plain `file://<abs path>` URL AND (b) a copy-pasteable `open file://<abs path>` command in the final report block. The file:// URL must use an ABSOLUTE path (not relative or `~/`-prefixed) so terminals with cmd+click support (Ghostty, iTerm2, modern Terminal.app) can resolve it without shell interpretation. This is a non-negotiable operator-UX contract — the doc-consistency test pins both forms in all three commands' final report blocks.
|
||||
|
||||
**Operator-annotation HTML (v5.0.3):** the last step of `/trekbrief`, `/trekplan`, and `/trekreview` runs `scripts/annotate.mjs` against the just-written `.md` and prints the resulting `file://<abs path>` link. The HTML is self-contained (zero npm deps, zero external network, design-system-styled, light + dark + print) and modelled on `~/repos/claude-code-100x/claude-code-100x/build-site.js` (lines 1431–2255). The operator opens the file, the document renders as a proper article (headings / paragraphs / lists / tables / code / quotes — every element gets a stable `data-anchor-id`). In annotation mode (default ON, pencil-toggle in topbar), the operator can **select any text or click any element** → a form popover opens at the cursor with: section context auto-detected from nearest h1/h2, the anchored snippet (selection if any, else element text), **three intent buttons (Fiks / Endre / Spørsmål)**, comment textarea, Save/Cancel. The sidebar (Show annotations button) lists every annotation grouped by section with intent badge + snippet + comment + delete; clicking a card scrolls to and flashes the source element. **Copy Prompt** assembles a structured markdown (`### N. [Intent] Section: <…>` + `Quote: «…»` + `Comment: …`) and copies to clipboard. Persistence: `localStorage` keyed on absolute artifact path (`voyage-annotate:v2:<abs path>`). v5.0.0 removed the v4.2/v4.3 bespoke playground SPA + `/trekrevise` + Handover 8; v5.0.1 pointed at `/playground document-critique` (Claude-leads, wrong direction); v5.0.2 was operator-led but too thin (line-click + freeform note, no intents); v5.0.3 matches the claude-code-100x reference the operator first pointed at, with pencil-toggle / selection capture / intent categories / popover form / structured export. See [CHANGELOG.md](CHANGELOG.md) § v5.0.3.
|
||||
|
||||
**Security:** 4-layer defense-in-depth: plugin hooks (pre-bash-executor, pre-write-executor), prompt-level denylist (works in headless sessions), pre-execution plan scan (Phase 2.4), scoped `--allowedTools` replacing `--dangerously-skip-permissions`. Hard Rules 14-16 enforce verify command security, repo-boundary writes, and sensitive path protection.
|
||||
|
||||
**Pipeline:** `/trekbrief` produces the task brief. `/trekresearch --project <dir>` fills in `{dir}/research/`. `/trekplan --project <dir>` reads brief + research to produce `{dir}/plan.md` (and auto-discovers `{dir}/architecture/overview.md` if an opt-in upstream architect plugin produced one). `/trekexecute --project <dir>` executes and writes `{dir}/progress.json`. `/trekreview --project <dir>` produces `{dir}/review.md`. `/trekbrief`, `/trekplan`, and `/trekreview` each end by running `scripts/annotate.mjs` on the just-written artifact, producing `{dir}/{artifact}.html` — a self-contained operator-annotation surface — and printing the `file://` link. The operator opens it, clicks lines, writes their own notes, copies a structured prompt, pastes back, Claude revises the `.md`. All artifacts live in one project directory.
|
||||
|
||||
**Project-directory contract (v3.0.0):** trekplan owns the directory layout below. The `architecture/` subdirectory is opt-in and produced by an opt-in upstream architect plugin (not bundled) — the architect plugin is no longer publicly distributed, but the `architecture/overview.md` slot remains available for any compatible producer.
|
||||
|
||||
```
|
||||
.claude/projects/{YYYY-MM-DD}-{slug}/
|
||||
brief.md ← trekbrief writes; everyone reads
|
||||
brief.html ← trekbrief annotates (operator-annotation HTML, gitignored, re-buildable from brief.md)
|
||||
research/*.md ← trekresearch writes; plan + architect read
|
||||
architecture/ ← OPT-IN, owned by an opt-in upstream architect plugin (not bundled)
|
||||
overview.md
|
||||
gaps.md
|
||||
plan.md ← trekplan writes; trekexecute reads
|
||||
plan.html ← trekplan annotates
|
||||
progress.json ← trekexecute writes
|
||||
review.md ← trekreview writes; trekplan reads (Handover 6)
|
||||
review.html ← trekreview annotates
|
||||
```
|
||||
|
||||
The `.html` files (`brief.html`, `plan.html`, `review.html`) are produced by `scripts/annotate.mjs` and live alongside their `.md` siblings in the project directory. They are re-buildable from the `.md` source at any time (deterministic, byte-identical output on re-run), so they are conventionally gitignored along with the rest of `.claude/projects/`. Operator annotations live in browser `localStorage` keyed on the absolute artifact path — they survive refresh and browser-close, but are local to the operator's machine.
|
||||
|
||||
No code-level dependency between plugins — the contract is filesystem-level only.
|
||||
|
||||
## State
|
||||
|
||||
All artifacts in one project directory (default):
|
||||
- Project root: `.claude/projects/{YYYY-MM-DD}-{slug}/`
|
||||
- `brief.md` + `brief.html` (task brief from `/trekbrief`; `.html` is the operator-annotation surface from `scripts/annotate.mjs`)
|
||||
- `research/{NN}-{slug}.md` (research briefs from `/trekresearch --project`)
|
||||
- `architecture/overview.md` + `architecture/gaps.md` (opt-in, produced by an opt-in upstream architect plugin, not bundled)
|
||||
- `plan.md` + `plan.html` (from `/trekplan --project`)
|
||||
- `sessions/session-*.md` (from `--decompose`)
|
||||
- `progress.json` (from `/trekexecute --project`)
|
||||
- `review.md` + `review.html` (from `/trekreview --project`)
|
||||
- `.session-state.local.json` (Handover 7 — gitignored via `*.local.json`; written by `/trekexecute` Phase 8/2.55/4 or `/trekendsession`; read by `/trekcontinue`)
|
||||
|
||||
Legacy paths (still work without `--project`):
|
||||
- Research briefs: `.claude/research/trekresearch-{date}-{slug}.md`
|
||||
- Plans: `.claude/plans/trekplan-{date}-{slug}.md`
|
||||
- Sessions: `.claude/trekplan-sessions/{slug}/session-*.md`
|
||||
- Launch scripts: `.claude/trekplan-sessions/{slug}/launch.sh`
|
||||
- Progress: `{plan-dir}/.trekexecute-progress-{slug}.json`
|
||||
|
||||
Stats:
|
||||
- Brief stats: `${CLAUDE_PLUGIN_DATA}/trekbrief-stats.jsonl`
|
||||
- Plan stats: `${CLAUDE_PLUGIN_DATA}/trekplan-stats.jsonl`
|
||||
- Exec stats: `${CLAUDE_PLUGIN_DATA}/trekexecute-stats.jsonl`
|
||||
- Research stats: `${CLAUDE_PLUGIN_DATA}/trekresearch-stats.jsonl`
|
||||
- Continue stats: `${CLAUDE_PLUGIN_DATA}/trekcontinue-stats.jsonl`
|
||||
|
||||
## Terminology
|
||||
|
||||
- **Task brief** — produced by `/trekbrief`. Declares intent, goal, and research plan. Drives planning.
|
||||
- **Research brief** — produced by `/trekresearch`. Answers a specific research question. Feeds planning.
|
||||
- **Architecture note** — opt-in, produced by an opt-in upstream architect plugin (not bundled; the architect plugin is no longer publicly distributed, but the `architecture/overview.md` filesystem slot remains available for any compatible producer). Proposes which Claude Code features fit the task with brief-anchored rationale + explicit gaps. When present, enriches planning.
|
||||
- **Review** — produced by `/trekreview`. Independent post-hoc review of delivered code against the task brief. **Handover 6 (review → plan)** routes BLOCKER + MAJOR findings into `/trekplan --brief review.md` for a remediation plan. The plan's optional `source_findings:` frontmatter list is the audit trail back to the consumed findings. MINOR + SUGGESTION are skipped for v1.0 plan-input.
|
||||
- **Session state** — `.session-state.local.json` per project. **Handover 7** — produced by any session-end mechanism (`/trekexecute` Phase 8/2.55/4, `/trekendsession` helper, future graceful-handoff v2.2). Consumed by `/trekcontinue` to resume the next session in a fresh chat. Schema v1 is forward-compat (unknown top-level keys ignored). Never committed (gitignored via `*.local.json`).
|
||||
|
||||
A project typically has 1 task brief, 0–N research briefs, 0 or 1 architecture note, 0–N reviews (one per review iteration), and 0 or 1 session-state file (overwritten on every session-end).
|
||||
|
||||
## Communication patterns
|
||||
|
||||
### Linking to local files
|
||||
|
||||
When pointing to local files in responses, always use markdown link syntax with a descriptive name:
|
||||
|
||||
- Use `[Human-friendly name](file:///absolute/path)` — never bare `file:///...` URLs or autolinks `<file://...>`.
|
||||
- Always use absolute paths. Never `~/` or relative paths.
|
||||
- For multiple files, render as a bullet list of named markdown links.
|
||||
|
||||
Why: bare `file://` URLs only render the first as clickable across multiple lines. Named markdown links make each entry independently clickable and look cleaner.
|
||||
|
||||
Example:
|
||||
|
||||
- [Brief](file:///Users/ktg/.../brief.html)
|
||||
- [Research summary](file:///Users/ktg/.../research/summary.md)
|
||||
- **Architecture, workflows, project-directory contract, state, terminology:** `docs/architecture.md`
|
||||
- **Quality infrastructure (`lib/` validators, parsers, autonomy primitives, hooks):** `docs/architecture.md` §Quality infrastructure
|
||||
- **Autonomy gates (`--gates`), Path A/B/C decision:** `docs/operations.md`
|
||||
- **Profile system (`--profile economy/balanced/premium`), lookup order, custom profiles:** `docs/operations.md`
|
||||
- **Observability (Stop hook, OTLP/textfile export, SSRF mitigation):** `docs/operations.md`
|
||||
- **Handover contracts (the 7 pipeline handovers):** `docs/HANDOVER-CONTRACTS.md`
|
||||
|
|
|
|||
116
plugins/voyage/docs/architecture.md
Normal file
116
plugins/voyage/docs/architecture.md
Normal file
|
|
@ -0,0 +1,116 @@
|
|||
# Voyage — Architecture, project layout, state, terminology
|
||||
|
||||
Imported from `CLAUDE.md` via pointer.
|
||||
|
||||
## Quality infrastructure (v3.4.0)
|
||||
|
||||
`lib/` contains zero-dep validators, parsers, and autonomy primitives wired into the four commands:
|
||||
|
||||
- `lib/util/{frontmatter,result,atomic-write,autonomy-gate}.mjs` — shared YAML-frontmatter parser + Result helpers + `atomicWriteJson(path, obj)` for tmp+rename writes + autonomy-gate state machine (v3.4.0)
|
||||
- `lib/parsers/{plan-schema,manifest-yaml,project-discovery,arg-parser,bash-normalize,jaccard,finding-id}.mjs` — pure parsers (no I/O), unit-tested. `manifest-yaml` extended in v3.4.0 with additive `skip_commit_check` + `memory_write` flags (forward-compat: unknown keys ignored)
|
||||
- `lib/review/{rule-catalogue,plan-review-dedup}.mjs` — version-pinned rule catalogue (12 keys) + Phase 9 inline dedup helpers (v3.4.0)
|
||||
- `lib/stats/event-emit.mjs` — single-source stats event emitter for autonomy-gate transitions and main-merge-gate (v3.4.0)
|
||||
- `lib/validators/{brief,research,plan,progress,session-state}-validator.mjs` — schema validators with CLI shims (`node lib/validators/X.mjs --json <path>`)
|
||||
- `lib/validators/architecture-discovery.mjs` — drift-WARN external-contract discovery for `architecture/overview.md`
|
||||
|
||||
Wiring points (replaces previous prose-grep instructions):
|
||||
- `/trekbrief` Phase 4g → `brief-validator` (post-write sanity check)
|
||||
- `/trekplan` Phase 1 → `brief-validator --soft`, `research-validator --dir`, `architecture-discovery`
|
||||
- `planning-orchestrator` Phase 5.5 → `plan-validator --strict` (replaces 3 `grep -cE` calls)
|
||||
- `/trekexecute --validate` → `plan-validator --strict` + `progress-validator`
|
||||
|
||||
Tests under `tests/**/*.test.mjs` (~290 tests, 0 deps). `npm test` is the fork-readiness gate. v3.4.0 adds: synthetic determinism fixtures (`tests/synthetic/plan-run-*.md` + `review-run-*.md` + companion `*-determinism.test.mjs` enforcing Jaccard ≥ 0.833 SC7 floor) and hook baseline regression pins (`tests/hooks/{path-guard,bash-guard}.test.mjs` exercising `pre-write-executor.mjs` + `pre-bash-executor.mjs` denylist BLOCK paths).
|
||||
|
||||
Doc-consistency test at `tests/lib/doc-consistency.test.mjs` pins agent-table count, command-table coverage, plan_version invariant, settings.json scope cleanliness, Handover 7 presence, and `session-state-validator` CLI shim.
|
||||
|
||||
`docs/HANDOVER-CONTRACTS.md` is the single source of truth for the 7 pipeline handovers (brief→research, research→plan, architecture→plan EXTERNAL, plan→execute, progress.json resume, review→plan, `.session-state.local.json`). Read it before changing any artifact format.
|
||||
|
||||
`hooks/scripts/pre-compact-flush.mjs` (PreCompact event, CC v2.1.105+) fixes the documented P0 in `docs/trekexecute-v2-observations-from-config-audit-v4.md`: keeps `progress.json` in sync with git history before context compaction so `--resume` works after long conversations. Atomic write, monotonic only, never blocks compaction.
|
||||
|
||||
`hooks/scripts/session-title.mjs` (UserPromptSubmit, CC v2.1.94+) sets `sessionTitle` to `voyage:<command>:<slug>` for voyage-command invocations. Helps multi-session headless runs identify themselves in process lists.
|
||||
|
||||
`hooks/scripts/post-bash-stats.mjs` (PostToolUse, CC v2.1.97+) appends `duration_ms` for each Bash call into `${CLAUDE_PLUGIN_DATA}/trekexecute-stats.jsonl`. Useful for finding long-running verify or checkpoint commands.
|
||||
|
||||
`hooks/scripts/post-compact-flush.mjs` (PostCompact event, v3.4.0) re-injects `.session-state.local.json` after context compaction so multi-session work survives a compaction boundary. Companion to `pre-compact-flush.mjs` (which writes the state file before compaction); together they form the rehydrate cycle that keeps `/trekcontinue` reliable across long-running multi-session work.
|
||||
|
||||
## Architecture
|
||||
|
||||
**Brief:** 7-phase workflow: Parse mode → Create project dir → Phase 3 completeness loop (section-driven, no question cap) → Phase 3.5 per-phase effort dialog (v5.1) → Phase 4 draft/review/revise with `brief-reviewer` as stop-gate (max 3 iterations; gate = all dimensions ≥ 4 and research plan = 5) → Finalize (`brief.md` on pass, or `brief_quality: partial` on cap/force-stop) → Manual/auto opt-in → Stats. Always interactive. Auto mode runs research + plan inline in the main context (v2.4.0).
|
||||
|
||||
**Phase 3.5 (v5.1) — adaptive-depth signals:** Between Phase 3 completeness exit and Phase 4 draft, the operator commits an effort level (`low | standard | high`) and an optional `model` (`sonnet | opus`) per downstream phase (`research`, `plan`, `execute`, `review`) via 4 tier-coupled `AskUserQuestion` calls. The choices land in `brief.md` frontmatter as `phase_signals:` (a list of `{phase, effort?, model?}` entries) when committed, or `phase_signals_partial: true` when the operator force-stops. `brief_version: 2.1` activates the **sequencing gate**: validator emits `BRIEF_V51_MISSING_SIGNALS` if a 2.1-versioned brief lacks both fields. Downstream commands surface a friendly hint pointing back to `/trekbrief` — enforcement is validator-only. Composition is documented prose in each downstream command's `## Composition rule (v5.1)` section: `brief.phase_signals[phase] > profile.phase_models[phase]`. The brief signal wins per-phase when present; the profile fills gaps. `effort == low` activates each command's existing `--quick`-equivalent code-path (`/trekexecute` low-effort = `--gates open` + sequential-only). High-effort behavior is deferred to v5.1.1 per brief Non-Goal.
|
||||
|
||||
**Research:** Foreground workflow (v2.4.0): Parse mode → Interview → Parallel research swarm (5 local + 4 external + 1 bridge, spawned from main context) → Follow-ups → Triangulation → Synthesis + brief → Stats. With `--project`, writes to `{dir}/research/NN-slug.md`.
|
||||
|
||||
**Plan:** Foreground workflow (v2.4.0): Parse mode (validate brief input) → Codebase sizing → Brief review (`brief-reviewer`) → Parallel exploration (6-8 agents, spawned from main context) → Deep-dives → Synthesis (with architecture-note cross-reference if present) → Planning → Adversarial review (`plan-critic` + `scope-guardian`) → Present/refine → Handoff. With `--project`, writes to `{dir}/plan.md` and auto-detects `{dir}/architecture/overview.md` (produced by an opt-in upstream architect plugin if installed; not bundled).
|
||||
|
||||
**Decompose:** Parse plan → Analyze step dependencies → Group into sessions → Identify parallel waves → Generate session specs + dependency graph + launch script.
|
||||
|
||||
**Execute:** Parse plan → Security scan (Phase 2.4) → Detect Execution Strategy → Single-session (step loop) or multi-session (parallel waves via `claude -p` with scoped `--allowedTools`) → Phase 7.5 manifest audit → Phase 7.6 bounded recovery (if partial) → Phase 8 atomically writes `progress.json` + `.session-state.local.json` (Handover 7) → Report. With `--project`, reads `{dir}/plan.md`. Phase 2.55 (pre-flight stop) and Phase 4 (entry-condition stop) also write `.session-state.local.json` so `/trekcontinue` can surface the stop and prompt for next steps.
|
||||
|
||||
**Continue:** `/trekcontinue` reads `{dir}/.session-state.local.json` (Handover 7), validates schema-v1 via `session-state-validator`, narrates a 3-line summary (project / next-session-label / brief-path), and immediately begins executing the next session. Auto-discovers active project state files under `.claude/projects/*/.session-state.local.json` if no explicit `<project-dir>` argument. Operator-invoked only — never auto-loaded via SessionStart. The `/trekendsession` helper is the informal-flow producer: writes the same state file for ad-hoc multi-session handovers that don't run through `/trekexecute`.
|
||||
|
||||
**Operator-UX guarantee (since v5.0.2):** `/trekbrief`, `/trekplan`, and `/trekreview` MUST always emit (a) a plain `file://<abs path>` URL AND (b) a copy-pasteable `open file://<abs path>` command in the final report block. The file:// URL must use an ABSOLUTE path (not relative or `~/`-prefixed) so terminals with cmd+click support (Ghostty, iTerm2, modern Terminal.app) can resolve it without shell interpretation. This is a non-negotiable operator-UX contract — the doc-consistency test pins both forms in all three commands' final report blocks.
|
||||
|
||||
**Operator-annotation HTML (v5.0.3):** the last step of `/trekbrief`, `/trekplan`, and `/trekreview` runs `scripts/annotate.mjs` against the just-written `.md` and prints the resulting `file://<abs path>` link. The HTML is self-contained (zero npm deps, zero external network, design-system-styled, light + dark + print) and modelled on `~/repos/claude-code-100x/claude-code-100x/build-site.js` (lines 1431–2255). The operator opens the file, the document renders as a proper article (headings / paragraphs / lists / tables / code / quotes — every element gets a stable `data-anchor-id`). In annotation mode (default ON, pencil-toggle in topbar), the operator can **select any text or click any element** → a form popover opens at the cursor with: section context auto-detected from nearest h1/h2, the anchored snippet (selection if any, else element text), **three intent buttons (Fiks / Endre / Spørsmål)**, comment textarea, Save/Cancel. The sidebar (Show annotations button) lists every annotation grouped by section with intent badge + snippet + comment + delete; clicking a card scrolls to and flashes the source element. **Copy Prompt** assembles a structured markdown (`### N. [Intent] Section: <…>` + `Quote: «…»` + `Comment: …`) and copies to clipboard. Persistence: `localStorage` keyed on absolute artifact path (`voyage-annotate:v2:<abs path>`). v5.0.0 removed the v4.2/v4.3 bespoke playground SPA + `/trekrevise` + Handover 8; v5.0.1 pointed at `/playground document-critique` (Claude-leads, wrong direction); v5.0.2 was operator-led but too thin (line-click + freeform note, no intents); v5.0.3 matches the claude-code-100x reference the operator first pointed at, with pencil-toggle / selection capture / intent categories / popover form / structured export. See [CHANGELOG.md](../CHANGELOG.md) § v5.0.3.
|
||||
|
||||
**Security:** 4-layer defense-in-depth: plugin hooks (pre-bash-executor, pre-write-executor), prompt-level denylist (works in headless sessions), pre-execution plan scan (Phase 2.4), scoped `--allowedTools` replacing `--dangerously-skip-permissions`. Hard Rules 14-16 enforce verify command security, repo-boundary writes, and sensitive path protection.
|
||||
|
||||
**Pipeline:** `/trekbrief` produces the task brief. `/trekresearch --project <dir>` fills in `{dir}/research/`. `/trekplan --project <dir>` reads brief + research to produce `{dir}/plan.md` (and auto-discovers `{dir}/architecture/overview.md` if an opt-in upstream architect plugin produced one). `/trekexecute --project <dir>` executes and writes `{dir}/progress.json`. `/trekreview --project <dir>` produces `{dir}/review.md`. `/trekbrief`, `/trekplan`, and `/trekreview` each end by running `scripts/annotate.mjs` on the just-written artifact, producing `{dir}/{artifact}.html` — a self-contained operator-annotation surface — and printing the `file://` link. The operator opens it, clicks lines, writes their own notes, copies a structured prompt, pastes back, Claude revises the `.md`. All artifacts live in one project directory.
|
||||
|
||||
**Project-directory contract (v3.0.0):** trekplan owns the directory layout below. The `architecture/` subdirectory is opt-in and produced by an opt-in upstream architect plugin (not bundled) — the architect plugin is no longer publicly distributed, but the `architecture/overview.md` slot remains available for any compatible producer.
|
||||
|
||||
```
|
||||
.claude/projects/{YYYY-MM-DD}-{slug}/
|
||||
brief.md ← trekbrief writes; everyone reads
|
||||
brief.html ← trekbrief annotates (operator-annotation HTML, gitignored, re-buildable from brief.md)
|
||||
research/*.md ← trekresearch writes; plan + architect read
|
||||
architecture/ ← OPT-IN, owned by an opt-in upstream architect plugin (not bundled)
|
||||
overview.md
|
||||
gaps.md
|
||||
plan.md ← trekplan writes; trekexecute reads
|
||||
plan.html ← trekplan annotates
|
||||
progress.json ← trekexecute writes
|
||||
review.md ← trekreview writes; trekplan reads (Handover 6)
|
||||
review.html ← trekreview annotates
|
||||
```
|
||||
|
||||
The `.html` files (`brief.html`, `plan.html`, `review.html`) are produced by `scripts/annotate.mjs` and live alongside their `.md` siblings in the project directory. They are re-buildable from the `.md` source at any time (deterministic, byte-identical output on re-run), so they are conventionally gitignored along with the rest of `.claude/projects/`. Operator annotations live in browser `localStorage` keyed on the absolute artifact path — they survive refresh and browser-close, but are local to the operator's machine.
|
||||
|
||||
No code-level dependency between plugins — the contract is filesystem-level only.
|
||||
|
||||
## State
|
||||
|
||||
All artifacts in one project directory (default):
|
||||
- Project root: `.claude/projects/{YYYY-MM-DD}-{slug}/`
|
||||
- `brief.md` + `brief.html` (task brief from `/trekbrief`; `.html` is the operator-annotation surface from `scripts/annotate.mjs`)
|
||||
- `research/{NN}-{slug}.md` (research briefs from `/trekresearch --project`)
|
||||
- `architecture/overview.md` + `architecture/gaps.md` (opt-in, produced by an opt-in upstream architect plugin, not bundled)
|
||||
- `plan.md` + `plan.html` (from `/trekplan --project`)
|
||||
- `sessions/session-*.md` (from `--decompose`)
|
||||
- `progress.json` (from `/trekexecute --project`)
|
||||
- `review.md` + `review.html` (from `/trekreview --project`)
|
||||
- `.session-state.local.json` (Handover 7 — gitignored via `*.local.json`; written by `/trekexecute` Phase 8/2.55/4 or `/trekendsession`; read by `/trekcontinue`)
|
||||
|
||||
Legacy paths (still work without `--project`):
|
||||
- Research briefs: `.claude/research/trekresearch-{date}-{slug}.md`
|
||||
- Plans: `.claude/plans/trekplan-{date}-{slug}.md`
|
||||
- Sessions: `.claude/trekplan-sessions/{slug}/session-*.md`
|
||||
- Launch scripts: `.claude/trekplan-sessions/{slug}/launch.sh`
|
||||
- Progress: `{plan-dir}/.trekexecute-progress-{slug}.json`
|
||||
|
||||
Stats:
|
||||
- Brief stats: `${CLAUDE_PLUGIN_DATA}/trekbrief-stats.jsonl`
|
||||
- Plan stats: `${CLAUDE_PLUGIN_DATA}/trekplan-stats.jsonl`
|
||||
- Exec stats: `${CLAUDE_PLUGIN_DATA}/trekexecute-stats.jsonl`
|
||||
- Research stats: `${CLAUDE_PLUGIN_DATA}/trekresearch-stats.jsonl`
|
||||
- Continue stats: `${CLAUDE_PLUGIN_DATA}/trekcontinue-stats.jsonl`
|
||||
|
||||
## Terminology
|
||||
|
||||
- **Task brief** — produced by `/trekbrief`. Declares intent, goal, and research plan. Drives planning.
|
||||
- **Research brief** — produced by `/trekresearch`. Answers a specific research question. Feeds planning.
|
||||
- **Architecture note** — opt-in, produced by an opt-in upstream architect plugin (not bundled; the architect plugin is no longer publicly distributed, but the `architecture/overview.md` filesystem slot remains available for any compatible producer). Proposes which Claude Code features fit the task with brief-anchored rationale + explicit gaps. When present, enriches planning.
|
||||
- **Review** — produced by `/trekreview`. Independent post-hoc review of delivered code against the task brief. **Handover 6 (review → plan)** routes BLOCKER + MAJOR findings into `/trekplan --brief review.md` for a remediation plan. The plan's optional `source_findings:` frontmatter list is the audit trail back to the consumed findings. MINOR + SUGGESTION are skipped for v1.0 plan-input.
|
||||
- **Session state** — `.session-state.local.json` per project. **Handover 7** — produced by any session-end mechanism (`/trekexecute` Phase 8/2.55/4, `/trekendsession` helper, future graceful-handoff v2.2). Consumed by `/trekcontinue` to resume the next session in a fresh chat. Schema v1 is forward-compat (unknown top-level keys ignored). Never committed (gitignored via `*.local.json`).
|
||||
|
||||
A project typically has 1 task brief, 0–N research briefs, 0 or 1 architecture note, 0–N reviews (one per review iteration), and 0 or 1 session-state file (overwritten on every session-end).
|
||||
85
plugins/voyage/docs/command-modes.md
Normal file
85
plugins/voyage/docs/command-modes.md
Normal file
|
|
@ -0,0 +1,85 @@
|
|||
# Voyage — Command flag reference
|
||||
|
||||
Per-command flag tables, imported from `CLAUDE.md` via pointer.
|
||||
|
||||
## /trekbrief modes
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| _(default)_ | Dynamic interview until quality gates pass → brief.md with research plan |
|
||||
| `--quick` | Compact start; still escalates if required sections are weak or the brief-review gate fails → brief.md with research plan |
|
||||
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
|
||||
| `--profile <name>` | (v4.1.0) Model profile: `economy` / `balanced` / `premium` / `<custom>`. Sets `phase_models` for the brief phase. See `## Profile system` in `docs/operations.md`. |
|
||||
|
||||
Always interactive. Phase 3 is a section-driven completeness loop (no hard cap on question count); Phase 4 runs a `brief-reviewer` stop-gate with max 3 review iterations. After writing the brief, asks the user to choose manual (print commands) or auto (Claude runs research + plan in foreground).
|
||||
|
||||
## /trekresearch modes
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| _(default)_ | Interview + research (local + external) + synthesis + brief (foreground) |
|
||||
| `--project <dir>` | Write brief to `{dir}/research/{NN}-{slug}.md` (auto-incremented) |
|
||||
| `--quick` | Interview (short) + inline research (no agent swarm) |
|
||||
| `--local` | Only codebase analysis agents (skip external + Gemini) |
|
||||
| `--external` | Only external research agents (skip codebase analysis) |
|
||||
| `--fg` | No-op alias (foreground is default since v2.4.0) |
|
||||
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the research phase. |
|
||||
|
||||
Flags combine: `--project <dir> --local`, `--external --quick`.
|
||||
|
||||
## /trekplan modes
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| `--project <dir>` | **Required path A** — read `{dir}/brief.md`, auto-discover `{dir}/research/*.md`, write `{dir}/plan.md` |
|
||||
| `--brief <path>` | **Required path B** — plan from a specific brief file; write to `.claude/plans/trekplan-{date}-{slug}.md` |
|
||||
| `--research <brief> [brief2]` | Enrich with extra research briefs beyond what is in `{project_dir}/research/` |
|
||||
| `--fg` | No-op alias (foreground is default since v2.4.0) |
|
||||
| `--quick` | Plan directly (no agent swarm) |
|
||||
| `--export <pr\|issue\|markdown\|headless> <plan>` | Generate shareable output from existing plan |
|
||||
| `--decompose <plan>` | Split plan into self-contained headless sessions |
|
||||
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the plan phase (and others, since plan emits `profile:` to plan.md frontmatter). |
|
||||
|
||||
**Breaking change (v2.0):** one of `--brief` or `--project` is required. There is no interview inside `/trekplan`. The `--spec` flag has been removed — use `/trekbrief` to produce a brief instead.
|
||||
|
||||
If `{project_dir}/architecture/overview.md` exists (typically produced by an opt-in upstream architect plugin, not bundled), the plan command auto-discovers it and treats `cc_features_proposed` as priors. Missing file is fine — discovery is additive, not required.
|
||||
|
||||
## /trekexecute modes
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| _(default)_ | Execute plan — auto-detects Execution Strategy for multi-session |
|
||||
| `--project <dir>` | Read `{dir}/plan.md`, write `{dir}/progress.json` |
|
||||
| `--resume` | Resume from last progress checkpoint |
|
||||
| `--dry-run` | Validate plan structure without executing |
|
||||
| `--validate` | Schema-only check — parse steps + manifests, report `READY \| FAIL`, no execution |
|
||||
| `--step N` | Execute only step N |
|
||||
| `--fg` | Force foreground — run all steps sequentially, ignore Execution Strategy |
|
||||
| `--session N` | Execute only session N from plan's Execution Strategy |
|
||||
| `--gates {open\|closed\|adaptive}` | (v3.4.0) Autonomy-checkpoint policy. Default `adaptive` |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the execute phase. Inherited from plan.md frontmatter `profile:` if present. |
|
||||
|
||||
## /trekreview modes
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| _(default)_ | Run brief-conformance + code-correctness reviewers in parallel, coordinator dedup + verdict, write `{project_dir}/review.md` |
|
||||
| `--project <dir>` | **Required.** Path to trekplan project folder containing `brief.md`. Review is written to `{dir}/review.md` |
|
||||
| `--since <ref>` | Override "before" SHA for the diff range. Validated via `git rev-parse --verify` |
|
||||
| `--quick` | Skip brief-conformance reviewer; skip coordinator's reasonableness filter — fast correctness-only pass |
|
||||
| `--validate` | Schema-only check on existing `{dir}/review.md`. No LLM calls |
|
||||
| `--dry-run` | Print discovered scope + triage map; skip writes |
|
||||
| `--fg` | No-op alias (foreground is default) |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the review phase. |
|
||||
|
||||
## /trekcontinue modes
|
||||
|
||||
| Flag | Behavior |
|
||||
|------|----------|
|
||||
| _(default)_ | Auto-discover active project's `.session-state.local.json` and resume |
|
||||
| `<project-dir>` | Resume the next session of an explicit project directory |
|
||||
| `--profile <name>` | (v4.1.0) Model profile for the resumed session. Inherited from the previous session's plan.md frontmatter when absent. |
|
||||
|
||||
The triage gate is deterministic — path-pattern classifier produces `{file → deep-review|summary-only|skip}`. Hard refuse-with-suggestion above 100 files / 100K diff tokens.
|
||||
62
plugins/voyage/docs/operations.md
Normal file
62
plugins/voyage/docs/operations.md
Normal file
|
|
@ -0,0 +1,62 @@
|
|||
# Voyage — Autonomy gates, profile system, observability
|
||||
|
||||
Imported from `CLAUDE.md` via pointer.
|
||||
|
||||
## Autonomy mode (`--gates`, v3.4.0)
|
||||
|
||||
All four pipeline commands accept `--gates {open|closed|adaptive}`:
|
||||
|
||||
| Value | Behavior |
|
||||
|-------|----------|
|
||||
| `open` | Skip optional checkpoints; trust manifests + verify gates only |
|
||||
| `closed` | Stop at every autonomy boundary; operator confirms each transition |
|
||||
| `adaptive` (default) | Stop only at meaningful boundaries (manifest-audit FAIL, plan-critic BLOCKER, main-merge gate) |
|
||||
|
||||
Under the hood: `lib/util/autonomy-gate.mjs` runs the state machine `idle → approved → executing → merge-pending → main-merged`. `lib/stats/event-emit.mjs` records each transition to `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl`. The main-merge gate is the final autonomy boundary before HEAD lands on `main`.
|
||||
|
||||
### Path A/B/C decision (v3.4.0; Path C closed 2026-05-05)
|
||||
|
||||
Three architectural options were considered for the speedup work:
|
||||
|
||||
- **Path A — cache-first** (drop `--allowedTools` per child to recover cross-phase cache sharing): REJECTED. Inverts the security model; plugin hooks don't fire reliably in `claude -p` (research/06 GH #36071).
|
||||
- **Path B — sequential `--no-ff` parallel waves with manifest-driven failure recovery**: CHOSEN. Ships in v3.4.0. Phase 2.6 of `/trekexecute` runs the wave executor with hardenings for plugin-in-monorepo + gitignored-state topology.
|
||||
- **Path C — hybrid (cache-warm sentinel + identical-tool parallel)**: **CLOSED 2026-05-05.** Q3 experiment measured median `cache_creation_input_tokens` = 163,903 across 3 fork-children at 186K parent context (CC v2.1.128, Sonnet 4.6). Master-plan thresholds: ≤ 1,500 POSITIVE / ≥ 3,500 NEGATIVE. Result is solidly NEGATIVE — `CLAUDE_CODE_FORK_SUBAGENT` does not preserve cache prefix across identical-tool children at our context size. Path C migration is deferred indefinitely; reassessment is appropriate when CC v2.2.xxx ships fork-cache-relevant features. Harness: `scripts/q3-cache-prefix-experiment.mjs`. Companion analyser: `lib/stats/cache-analyzer.mjs`.
|
||||
|
||||
A revived Path C (post-v2.2.xxx) would require: (1) re-architecting tool-list to be identical across all wave children, (2) cache-telemetry analysis confirming the new fork-cache behaviour holds, (3) prompt-level deny re-enablement to compensate for tool scoping rollback.
|
||||
|
||||
## Profile system (`--profile`, v4.1.0)
|
||||
|
||||
Three built-in model profiles plus operator-defined `<custom>.yaml`. Each profile pins `phase_models` for the six pipeline phases (`brief`, `research`, `plan`, `execute`, `review`, `continue`). Profile is recorded in plan.md frontmatter as `profile: <name>` and emitted to `${CLAUDE_PLUGIN_DATA}/trek*-stats.jsonl` for cost-attribution.
|
||||
|
||||
| Profile | Brief | Research | Plan | Execute | Review | Continue | Use case |
|
||||
|---------|-------|----------|------|---------|--------|----------|----------|
|
||||
| `economy` | sonnet | sonnet | sonnet | sonnet | sonnet | sonnet | Lowest cost; high-confidence small-scope tasks (operator-opt-in via `--profile economy`) |
|
||||
| `balanced` | sonnet | sonnet | opus | sonnet | opus | sonnet | Mixed — opus where reasoning depth pays off (operator-opt-in via `--profile balanced`) |
|
||||
| `premium` (default) | opus | opus | opus | opus | opus | opus | Maximum quality — Opus on every phase. Default since 2026-05-13 operator request; also the hardcoded resolver default at `lib/profiles/resolver.mjs:145` |
|
||||
|
||||
### Lookup order
|
||||
|
||||
1. Explicit `--profile <name>` flag passed to the command
|
||||
2. Plan-file frontmatter `profile:` (when resuming via `/trekexecute --resume` or `/trekcontinue`)
|
||||
3. `VOYAGE_PROFILE` environment variable
|
||||
4. Default `balanced`
|
||||
|
||||
### Custom profiles
|
||||
|
||||
Create `lib/profiles/<custom>.yaml` to define a new tier. The validator (`lib/validators/profile-validator.mjs`) enforces: every `phase_models[].phase` must be a known phase enum; every `phase_models[].model` must match `^(opus|sonnet)(\b|-).*` or one of the canonical short names. Custom profiles override built-ins of the same name (lookup is alphabetical with `<custom>` taking precedence).
|
||||
|
||||
Drift between plan-frontmatter `profile:` and step-manifest `profile_used:` emits a `MANIFEST_PROFILE_DRIFT` warning from `plan-validator --strict` (Step 20). Plan remains valid; the warning surfaces accidental tier-mismatch.
|
||||
|
||||
## Observability (Stop hook, v4.1.0)
|
||||
|
||||
The `Stop` hook in `hooks/hooks.json` runs `hooks/scripts/otel-export.mjs` at session-end. The hook is **opt-in** — when `VOYAGE_EXPORT_MODE` is unset or `off`, no work is done.
|
||||
|
||||
| Mode | Output | Endpoint env-var |
|
||||
|------|--------|------------------|
|
||||
| `off` (default) | _(no export)_ | — |
|
||||
| `textfile` | `voyage.prom` (Prometheus exposition format) | `VOYAGE_TEXTFILE_DIR` |
|
||||
| `otlp` | OTLP/JSON POST | `VOYAGE_OTEL_ENDPOINT` |
|
||||
|
||||
Endpoint validation: `VOYAGE_OTEL_ALLOW_PRIVATE=1` is required to send to loopback or RFC1918 destinations (CWE-918 SSRF mitigation). Allowlist `lib/exporters/field-allowlist.mjs` redacts records before export (CWE-212). Path validation (`lib/exporters/path-validator.mjs`) rejects symlink + traversal (CWE-22).
|
||||
|
||||
Local Docker Compose stack: `examples/observability/`. Operator docs: `docs/observability.md`. Both pin minimum versions per CVE history (`prom/prometheus:v3.0.1`, `grafana/grafana:11.4.0`, `otel/opentelemetry-collector-contrib:0.115.0`).
|
||||
Loading…
Add table
Add a link
Reference in a new issue