Step 8 of v5.1.0 humanizer Wave 4. Adds tests/lint-default-output.mjs
runner and tests/scanners/lint-default-output.test.mjs wrapper that
exercise SC-3 against the 6 prose CLIs (scan-orchestrator, posture,
token-hotspots-cli, plugin-health-scanner, drift-cli, fix-cli) running
in default (humanized) mode against tests/fixtures/marketplace-medium.
Lint scope is stderr only — JSON envelope keys ("scanner", "severity")
are structural, not prose. Humanized prose fields embedded inside JSON
are already covered by tests/lib/humanizer-data.test.mjs tier1/tier3
checks. Code references inside backticks pass the lint
(stripBacktickSpans) so technical identifiers can appear when wrapped.
Default-mode prose fixes to land lint at zero violations:
- scan-orchestrator: top banner switches to "Config-Audit v2.2.0" and
per-scanner progress wraps "[XXX] Label" in backticks. --raw and
--json paths preserve the v5.0.0 verbatim banner via new
opts.humanizedProgress flag on runAllScanners.
- plugin-health-scanner: top banner switches to "Plugin Health v2.1.0"
in default mode; --raw/--json keep "Plugin Health Scanner v2.1.0".
- scoring.mjs generateHealthScorecard humanized branch: area names
(CLAUDE.md, Hooks, MCP, Settings, Rules, Imports, Conflicts, Token
Efficiency, Plugin Hygiene) are wrapped in backticks; dot-padding
compensates so column alignment matches v5.0.0 layout.
- posture / drift-cli / fix-cli: thread humanizedProgress flag through
their runAllScanners calls so default mode emits humanized progress
and --raw/--json preserve the v5.0.0 stderr snapshot.
Test infrastructure only — user-facing docs land in Wave 5/6 once
commands and agents consume the humanized payload.
Tests: 735 to 736 (+1 SC-3 wrapper). Full suite passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds --raw flag to all 6 remaining CLIs and wires humanization into the
default rendering path. --json and --raw both bypass humanization for
v5.0.0 byte-equal output; default mode humanizes findings/diff/prose.
token-hotspots-cli: humanizes payload.findings before stdout JSON write.
plugin-health-scanner: humanizes finding titles in stderr brief summary;
--json/--raw write byte-identical v5.0.0-shape result to stdout.
drift-cli: humanizes diff.{newFindings,resolvedFindings,unchangedFindings,
movedFindings} before formatDiffReport; --raw applies to save and list
modes too. Baselines remain raw v5.0.0 on disk.
fix-cli: humanizes manual-finding titles in stderr fix-plan prose; both
--json and --raw produce identical machine-readable JSON to stdout.
manifest, whats-active: --raw is a no-op (no findings, inventory only)
but parsed for CLI surface consistency.
Decision on missing --output-file flag for drift-cli/fix-cli/plugin-health:
deferred. SC-6/SC-7 tests in Wave 4 will use stdout-redirect (the simpler
Alt B path) since these CLIs already write JSON to stdout in machine modes.
Test cli-humanizer.test.mjs covers all 6 CLIs. Three CLIs that read
environment state (plugin-health, manifest, whats-active) verify
mode-equivalence (--json == --raw) instead of frozen-snapshot byte-equal,
because their output reflects current marketplace state which drifts as
plugins are added since the Wave 0 capture.
Wave 3 / Step 7 of v5.1.0 humanizer.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
generateHealthScorecard signature: 2-arg → 3-arg (areaScores, opportunityCount,
options = {}). options.humanized=true renders friendlier title, grade-context
line per overall grade, and rephrased opportunity line. options.humanized=false
(or 2-arg call) preserves v5.0.0 verbatim output for backwards-compat.
topActions also gets an optional options.humanized that swaps recommendations
through humanizeFinding lookup.
posture.mjs main():
--json → write JSON to stdout, suppress stderr scorecard
--raw → write JSON to stdout (byte-identical to --json), write v5.0.0
verbatim scorecard to stderr
default → humanized scorecard to stderr, no stdout
posture.test.mjs scorecard-prose assertions re-anchored to --raw mode (the
explicit v5.0.0 path) — Wave 0 audit only covered finding-title strings;
scorecard prose surfaces here for the first time.
Wave 3 / Step 6 of v5.1.0 humanizer.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds --json and --raw flags to scan-orchestrator CLI main(). Default mode
runs humanizeEnvelope(env) before serialization; --json and --raw bypass
the humanizer for v5.0.0 byte-equal output (SC-6 / SC-7 paths).
Save-baseline path always writes the raw v5.0.0-shape envelope so future
humanizer-data updates do not trigger false-positive drift findings.
runAllScanners() unchanged — it remains the v5.0.0-shape source of truth
for in-process callers (posture, scoring, drift, etc.).
Wave 3 / Step 5 of v5.1.0 humanizer.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wave 1 / Step 3 of v5.1.0 plain-language UX humanizer.
scanners/lib/humanizer.mjs exports three pure functions:
- humanizeFinding(f) -> new finding object with translated
title/description/recommendation + three new fields
(userImpactCategory, userActionLanguage, relevanceContext).
- humanizeFindings(findings) -> mapped array.
- humanizeEnvelope(env) -> walks env.scanners[].findings.
Plus computeRelevanceContext(filePath) as a named export for
unit testing.
Field semantics:
- userImpactCategory: from scanner prefix per research/02 line 124
(Configuration mistake / Conflict / Wasted tokens / Dead config /
Missed opportunity / Other).
- userActionLanguage: from severity per research/02 line 134
(Fix this now / Fix soon / Fix when convenient / Optional cleanup
/ FYI).
- relevanceContext: deterministic file-path heuristic — looks for
/tests/fixtures/ or /test/fixtures/ substring (test-fixture-no-impact),
*.local.* basename (affects-this-machine-only), defaults to
affects-everyone. No subprocess, no network.
Lookup order per scanner: static[title] -> patterns regex match ->
_default -> fall through to original strings (when scanner prefix
absent).
Original id, scanner, severity, file, line, evidence, category,
autoFixable, and optional details are preserved exactly. Pure —
verified by deepEqual of input before/after.
Test (32 cases): purity, field preservation across all paths,
known/unknown scanner handling, all 5 severities, all 6 categories,
relevance heuristic for 4 path types, envelope walking, ANSI-free
guarantee. All pass.
Regression: 689/689 tests (657 + 32 new = 54 new across Wave 1).
Project: .claude/projects/2026-05-01-config-audit-ux-redesign/
Wave 1 / Step 2 of v5.1.0 plain-language UX humanizer.
scanners/lib/humanizer-data.mjs exports TRANSLATIONS keyed by
scanner prefix (CML, SET, HKV, RUL, MCP, IMP, CNF, GAP, TOK, CPS,
DIS, COL, PLH). Each scanner has:
- static: exact-title -> {title, description, recommendation}
- patterns: array of {regex, translation} for template-literal titles
- _default: graceful fallback for unknown findings
Architectural change vs. plan: keys translations by exact scanner
title (not finding ID). Reason: finding IDs are sequence-based
(global counter in lib/output.mjs:34), not stable per finding-type
— two runs can produce different IDs for the same logical issue.
Title strings ARE stable (defined as string literals or template
patterns in the scanner source).
Translations follow research/03 SR-1..SR-17:
- active voice, second person, present tense
- sentences <= 25 words
- tier1 absolute prohibitions and tier3 domain jargon are kept out
of prose
- tier1/tier3 terms are permitted inside `backtick spans` (code
references like filenames and field names) — established
technical-doc convention
Test (12 cases): all 13 scanners covered; every static and pattern
entry has the 3 required fields; tier1 and tier3 forbidden-word
checks pass (with backtick-span exclusion); reference-stable
imports. All pass.
Regression: 657/657 tests (645 + 12 new).
Project: .claude/projects/2026-05-01-config-audit-ux-redesign/
The v5.0.0-rc.1 N5 implementation looked up hotspot.path in
calibrateAgainstApi() but token-hotspots.mjs only emitted hotspot.source —
calibration silently produced 0 actual_tokens because every iteration hit
the `if (!hotspot?.path) continue` guard.
Fix: file-backed hotspots now expose `path: h.absPath` in the JSON output.
MCP-server hotspots intentionally leave path unset — their tokens are
runtime tool-schema (formula-based: 500 + toolCount × 200), not file
content readable by count_tokens.
SC-6b release-gate verified against tests/fixtures/marketplace-large:
- Actual (count_tokens, claude-opus-4-7): 589 tokens for CLAUDE.md
- Estimated (4-bytes/token byte heuristic): 594 tokens
- Delta: -5 tokens / -0.85% — well within ±5% gate. PASS.
CHANGELOG: documented the fix + SC-6b result inline under [5.0.0].
All 635 tests still green. No estimateTokens tuning required for v5.0.0.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New COL scanner detects skill-name collisions across plugins and
between user-level skills (~/.claude/skills/) and plugin-bundled
skills. Skill identity is the directory basename — matches how
enumerateSkills resolves names.
Detection rules (per docs/v5-namespace-research.md, confidence: medium):
- Plugin-vs-plugin same skill name → severity low (CA-COL-001)
- User-vs-plugin same skill name → severity medium (CA-COL-001)
- Plugin-vs-built-in collisions: out of scope for v5.0.0 (insufficient
verification — recorded for v5.0.1 follow-up).
Findings carry details.namespaces array with {source, name, path} for
every conflicting source — supports per-collision reporting downstream.
output.mjs: finding() helper now passes through optional `details`
field (scanner-specific structured payload).
scoring.mjs: COL → "Plugin Hygiene" (new area, 10 total). Posture test
updated from 9 → 10 area scores.
.gitignore: docs/v5-namespace-research.md is local-only (Step 22a
research output, gitignored per plan).
Fixture collision-plugins/fake-home/ has user skill `review` colliding
with plugin-a + plugin-b's `review` (medium severity), plus plugin-c's
unique `summarize` (no collision).
[skip-docs] reason: v5 plan fences off README/CLAUDE.md badge updates
to Session 5; Forgejo pre-commit-docs-gate hook requires this tag.
Tests: 617 → 625 (+8).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New DIS scanner detects tools that appear in BOTH permissions.deny
and permissions.allow within the same settings.json file. The deny
list wins, so allow entries are dead config but still load on every
turn and confuse intent.
Tool identity = bare name (everything before "("). `Bash(npm:*)` and
`Bash` are treated as the same tool, so a deny on `Bash` flags any
`Bash(...)` allow entry.
Severity: low. Wired into scan-orchestrator + scoring (area: Settings).
Fixture denied-tools-in-schema has Bash in both arrays; healthy-project
serves as the negative case.
[skip-docs] reason: v5 plan fences off README/CLAUDE.md badge updates
to Session 5; Forgejo pre-commit-docs-gate hook requires this tag.
Tests: 611 → 617 (+6).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New CPS scanner walks CLAUDE.md cascade and flags volatile content
between lines 31 and 150 — the cache-prefix window beyond TOK Pattern
A's top-30 territory. Volatile content anywhere in the cached prefix
forces a fresh cache write from that line down on every turn.
Volatile-pattern set extends TOK Pattern A with:
- shell-exec lines (! prefix) — common in CLAUDE.md to inject git/date
- ${VAR} substitutions — vary per-shell, defeat cache reuse
Severity: medium per finding. Skips lines 1-30 to avoid duplicating
Pattern A's range; CPS' value is in the 31-150 zone.
Wired into scan-orchestrator + scoring SCANNER_AREA_MAP. CPS shares
the "Token Efficiency" area with TOK; scoreByArea now deduplicates by
area name and combines counts across scanners contributing to the
same area, so the 9-area scorecard contract holds.
Fixtures volatile-mid-section/{volatile-line-60, volatile-line-200}
verify both positive (line 60) and out-of-window (line 200) cases.
[skip-docs] reason: v5 plan fences off README/CLAUDE.md badge updates
to Session 5; Forgejo pre-commit-docs-gate hook requires this tag.
Tests: 604 → 611 (+7).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New scanners/manifest.mjs CLI + commands/manifest.md slash command.
Reads activeConfig and produces a flat, ranked list of every token
source (CLAUDE.md cascade entries, plugins, skills, MCP servers, hooks)
sorted DESC by estimated_tokens.
CLAUDE.md per-file tokens are derived by distributing
claudeMd.estimatedTokens across the cascade proportional to bytes.
Tests cover both real-config (plugin root) and fixture (rich-repo with
patched HOME containing 2 plugins + 3 skills + .mcp.json) paths, plus
error handling (nonexistent path → exit 3, --output-file).
Builds on readActiveConfig from M1 (v5 alpha.2).
[skip-docs] reason: v5 plan fences off README/CLAUDE.md badge updates
to Session 5; Forgejo pre-commit-docs-gate hook requires this tag on
feat commits without doc changes.
Tests: 593 → 604 (+11).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds detectMcpToolBudget detection block in TOK scanner. Tiered severity
per project-local .mcp.json server based on toolCount:
- < 20: no finding
- 20-49: low
- 50-99: medium
- 100+: high
- null (manifest unparseable): low + "tool count unknown" message
Scoped to source==='.mcp.json' to keep findings actionable for the
audited path; plugin/user-level MCP servers are surfaced by the
manifest scanner (Step 19 / N2).
5 fixtures (mcp-budget/{14,25,60,120,unknown}-tools) use inline `tools`
arrays in .mcp.json — no node_modules needed for these tests.
Tests assert title+severity (not exact ID) since TOK IDs are sequential
per scan, not semantic per pattern.
[skip-docs] reason: v5 plan fences off README/CLAUDE.md badge updates
to Session 5; Forgejo pre-commit-docs-gate hook requires this tag on
feat commits without doc changes.
Tests: 586 → 593 (+7).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Filesystem counts are the source of truth; README badges parsed via
line-anchored substring (badge/<kind>-<N>-...). Emits readmeCheck object
with counts/badges/mismatches.
CLI: node scanners/self-audit.mjs --check-readme [--json]
API: runSelfAudit({ checkReadme: true }) → result.readmeCheck
Helper: checkReadmeBadges(pluginDir) for per-fixture testing
New fixture: readme-desynced/ (commands/foo + bar, README claims 1).
Note: alpha phase does NOT require result.readmeCheck.passed === true.
Self-test of real plugin currently fails (scanners 10 vs 9, tests 31 vs 543);
will be reconciled in Session 5 Step 28 (README sync).
582 → 586 tests, all green.
- New Pattern F in TOK: low-severity finding when SKILL.md description > 500 chars
- Scoped to discovery.files (project-local) — activeConfig.skills walk would
pull in user/plugin skills out of project scope
- New fixtures: skill-bloated (594-char desc) + skill-tight (46-char baseline)
574 → 576 tests, all green.
- New Pattern E in TOK: emits medium finding when activeConfig.claudeMd.estimatedTokens > 10_000
- Uses cascade tokens, file count, and calibration note as evidence
- New fixtures: large-cascade (37k bytes / 14475 cascade tokens) + small-cascade (5k baseline)
572 → 574 tests, all green.
- Pattern A (cache-breaking volatile top): medium → high
- Pattern B (redundant permissions): low → medium
- Pattern C (deep @import chain): medium → low
- Add calibration_note evidence on every TOK finding
- Table-driven severity tests (identify by title, IDs are sequential)
563 → 569 tests, all green. Doc sweep deferred to Session 5 (Step 28).
Pattern D was the v4 sonnet-era signature: 'config is structurally
clean but uses no Opus-4.7-specific features'. Two problems:
- It triggered on any minimal config that happened to lack skills/MCP
- The advice was generic and not actionable
The hotspots ranking and per-pattern findings (A/B/C) cover the same
ground with concrete, file-anchored signal. Dropping the noise.
BREAKING (intentional): scanners no longer emit the sonnet-era info
finding. Suppression entries and downstream tooling that reference
the v4 finding ID should be updated. Doc sweep follows in Step 8b.
Tests: sonnet-era fixture now asserts zero findings.
The buildHotspots padding loop and unused 'take' variable were dead
code from the v3 hotspots-min contract. Replaced with a clean
ranked.slice(0, HOTSPOTS_MAX). Tiny fixtures may now return fewer
than 3 hotspots, which is the honest answer; the contract now only
asserts <= 10.
Tests: +2 cases — every hotspot.source is unique (no padding); length
never exceeds HOTSPOTS_MAX.
Removes the v4 'void readActiveConfig' placeholder and wires the
active-config snapshot into the TOK scanner.
Per-turn behavior changes:
- Each enabled MCP server becomes its own hotspot entry (richer than
the parent .mcp.json file alone)
- total_estimated_tokens now includes MCP server cost
- result.activeConfig exposes a small summary
(claudeMdEstimatedTokens, mcpServerCount, pluginCount, skillCount)
Failures of readActiveConfig are non-fatal — the scanner falls back
to the discovery-only path used in v4.
Tests: +3 cases on the new tok-active-config fixture
(.mcp.json with 2 servers, CLAUDE.md, plugin skeleton).
Two MCP enumeration paths in readActiveMcpServers now pass kind='mcp'
to estimateTokens with optional toolCount derived from def.tools array
(populated when callers cache MCP discovery — Step 14 wires that up).
Hook callers keep kind='item' (no schema overhead).
Visible effect: every active MCP server jumps from estimatedTokens=15
to >= 500 (or higher when toolCount is known). The whats-active output
and TOK hotspots now reflect actual MCP cost.
Tests: assert mcpServers[].estimatedTokens >= 500 in fixture.
Replace count-based pass-rate with severity-weighted penalty:
- penalty = sum(count[s] * WEIGHTS[s])
- maxBudget = max(10, findingCount * 4)
- passRate = max(0, 100 - penalty / maxBudget * 100)
A few lows no longer crater an area's grade; a single high or critical
consumes a large fraction of budget. Mirrors the operator intuition that
severity, not count, is the signal.
BREAKING (intentional): scoring semantics differ from v4 for non-clean
configs. Add scoringVersion: 'v5' to the returned struct so consumers
can detect the version. baseline-all-a remains all-A (no critical/high
on that fixture).
Tests: +6 cases for severity weighting; existing "many findings" test
updated to use highs (where v5 still drops the grade as expected).
Promote WEIGHTS const to named export with Object.freeze for downstream
use in scoring.mjs (severity-weighted scoreByArea, F3).
Tests: +2 cases asserting WEIGHTS shape.
New read-only command that shows everything Claude Code actually loads for a
given repo — plugins, skills, MCP servers, hooks, CLAUDE.md cascade — with
source attribution (user/project/plugin) and rough token estimates. Helps
identify candidates for disabling without guessing.
Added:
- scanners/lib/active-config-reader.mjs — pure async helper: readActiveConfig,
detectGitRoot, walkClaudeMdCascade, readClaudeJsonProjectSlice (longest-prefix
matching for .claude.json projects), enumeratePlugins, enumerateSkills,
readActiveHooks, readActiveMcpServers, estimateTokens (markdown 4 c/tok,
json 3.5 c/tok, frontmatter cap 150 tokens, item flat 15)
- scanners/whats-active.mjs — thin CLI shim: --json, --output-file, --verbose,
--suggest-disables
- commands/whats-active.md — renders tables via Read tool; honors UX rules
- tests/lib/active-config-reader.test.mjs — 36 tests, all green (integration
fixture built in tmpdir with fake HOME, .claude.json prefix matching,
plugin discovery, hook/MCP merge from all scopes)
Verified:
- Performance budget: <2s wall-clock (smoke test: 102ms on real repo)
- Token estimates within ±20% of hand-computed values
- Read-only: no writeFile/mkdir/unlink in production code
- Self-audit: Plugin Health scanner reports 0 findings (Grade A)
- Full test suite: 522 tests, 512 pass (10 pre-existing conflict-detector
failures on main — unrelated to this change, reproducible on clean HEAD)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>