feat(humanizer): update audit/analysis command templates group A [skip-docs]

Wave 5 Step 13. Threads the humanizer vocabulary through five audit/
analysis command templates and adds a shape test that locks the
structure in place.

- commands/posture.md, tokens.md, feature-gap.md (findings-renderers):
  reference userImpactCategory/userActionLanguage/relevanceContext;
  remove hardcoded A/B/C/D/F-to-prose tables (humanizer owns the
  grade-context vocabulary now via the stderr scorecard headline).
- commands/manifest.md, whats-active.md (inventory CLIs): add --raw
  pass-through for CLI-surface consistency. --raw is a no-op in these
  CLIs, but the flag is threaded through so users get uniform behaviour.
- All five files: --raw flag parsed from $ARGUMENTS and passed verbatim
  to the underlying scanner CLI when present.

tests/commands/group-a-shape.test.mjs (new, +5 tests, 767 → 772):
  - structural: every file has a bash invocation block, Read tool
    reference, and --raw/$ARGUMENTS plumbing
  - findings-renderers only: at least one humanized field referenced;
    no hardcoded "[grade] grade is..." prose tables
This commit is contained in:
Kjell Tore Guttormsen 2026-05-01 19:41:08 +02:00
commit 79b6e29073
6 changed files with 174 additions and 62 deletions

View file

@ -20,9 +20,11 @@ Context-aware analysis of Claude Code features that could benefit your specific
## Implementation
### Step 1: Determine target and greet
### Step 1: Determine target and flags
Parse `$ARGUMENTS` for a path (default: current working directory).
Split `$ARGUMENTS` into a path and flags. Path is the first non-flag argument (default: current working directory). Recognized flags:
- `--raw` — pass-through to the scanner; produces v5.0.0 verbatim envelope (bypasses the humanizer). When `--raw` is set, render with v5.0.0 finding-field shape only — humanizer fields are absent in raw output.
Tell the user:
@ -38,7 +40,9 @@ Generate session ID (`YYYYMMDD_HHmmss`) if no active session exists.
```bash
mkdir -p ~/.claude/config-audit/sessions/{session-id}/findings 2>/dev/null
node ${CLAUDE_PLUGIN_ROOT}/scanners/posture.mjs <target-path> --json --output-file ~/.claude/config-audit/sessions/{session-id}/posture.json 2>/dev/null; echo $?
RAW_FLAG=""
if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
node ${CLAUDE_PLUGIN_ROOT}/scanners/posture.mjs <target-path> --output-file ~/.claude/config-audit/sessions/{session-id}/posture.json $RAW_FLAG 2>/dev/null; echo $?
```
If exit code is non-zero: "Assessment couldn't run. Check that the path exists and contains configuration files."
@ -59,49 +63,51 @@ ls <target-path>/*.py <target-path>/requirements.txt <target-path>/pyproject.tom
Read `${CLAUDE_PLUGIN_ROOT}/knowledge/gap-closure-templates.md` for implementation templates.
Group GAP findings into three sections. Number them sequentially across sections:
Group GAP findings by their humanized fields rather than re-deriving tier-to-prose mappings. In default mode (no `--raw`) each finding carries:
- `userImpactCategory` (e.g., "Missed opportunity") — the impact bucket
- `userActionLanguage` (e.g., "Fix soon", "Fix when convenient", "Optional cleanup", "FYI") — the urgency phrasing the rest of the toolchain uses
- `relevanceContext` ("affects-everyone" / "affects-this-machine-only" / "test-fixture-no-impact") — the scope so the user knows whether the change touches shared config or just their own machine
Group findings into three sections by `userActionLanguage`: "Fix this now" + "Fix soon" → **High Impact**, "Fix when convenient" → **Worth Considering**, "Optional cleanup" + "FYI" → **Explore When Ready**. Number sequentially across sections. Skip findings whose `relevanceContext === "test-fixture-no-impact"` unless the user explicitly asked to include fixtures.
The humanizer has already replaced jargon-heavy strings with plain-language equivalents in `title`, `description`, and `recommendation` — render those verbatim. Do not paraphrase. Do not introduce inline tier-to-prose tables ("Tier 1 means…"); the categories are pre-translated.
If `--raw` was passed, the v5.0.0 envelope is in effect — humanizer fields are absent. Fall back to grouping by `category` ("t1"/"t2"/"t3"/"t4") and render `title` + `recommendation` directly.
Render shape (default mode):
```markdown
### High Impact
These address correctness or safety — consider them seriously.
{For each finding where userActionLanguage is "Fix this now" or "Fix soon":}
**1.** Add permissions.deny for sensitive paths
→ Settings enforcement is stronger than CLAUDE.md instructions.
→ Effort: Low (5 min)
**2.** Configure at least one hook for safety automation
→ Hooks guarantee the action happens. CLAUDE.md instructions are advisory.
→ Effort: Medium (15 min)
**{N}.** {title}
→ {description}
→ {recommendation}
→ Effort: {from gap-closure-templates.md}
### Worth Considering
These improve workflow efficiency for projects like yours.
{For each finding where userActionLanguage is "Fix when convenient":}
**3.** Split CLAUDE.md into focused modules with @imports
→ Files over 200 lines degrade Claude's adherence to instructions.
→ Effort: Low (10 min)
**4.** Add path-scoped rules for different file types
→ Unscoped rules load every session regardless of relevance.
→ Effort: Low (10 min)
**{N}.** {title}
→ {description}
→ {recommendation}
### Explore When Ready
Nice-to-have. Skip if your current setup works well.
{For each finding where userActionLanguage is "Optional cleanup" or "FYI":}
**5.** Custom keybindings (Shift+Enter for newline)
→ Effort: Low (2 min)
**6.** Status line configuration
→ Effort: Low (2 min)
**{N}.** {title}
→ {recommendation}
```
Each recommendation MUST have:
- A number
- A one-line description
- A "Why" with evidence
- An effort estimate from the templates
- The humanizer-provided `title`
- The humanizer-provided `description` (where shown)
- An effort estimate looked up from the templates
### Step 5: Ask what to implement

View file

@ -24,6 +24,7 @@ Produce a ranked, single-table view of every token source loaded for a given rep
First non-flag argument is the path (default `.`). Recognized flags:
- `--json` — emit raw JSON instead of the rendered table.
- `--raw` — pass-through to the scanner; accepted for CLI surface consistency with the other config-audit commands. The manifest CLI is data-table only (no findings prose), so `--raw` is a no-op here, but the flag is still threaded through so users get uniform behaviour across `--raw`.
### Step 2: Run the CLI silently
@ -31,7 +32,9 @@ Tell the user: **"Building token-source manifest for `<path>`..."**
```bash
TMPFILE="/tmp/ca-manifest-$$.json"
node ${CLAUDE_PLUGIN_ROOT}/scanners/manifest.mjs <path> --output-file "$TMPFILE" 2>/dev/null; echo $?
RAW_FLAG=""
if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
node ${CLAUDE_PLUGIN_ROOT}/scanners/manifest.mjs <path> --output-file "$TMPFILE" $RAW_FLAG 2>/dev/null; echo $?
```
**Exit code handling:**

View file

@ -19,9 +19,13 @@ Quick, deterministic configuration health scorecard. No agents needed — runs a
## Implementation
### Step 1: Determine target
### Step 1: Determine target and flags
Parse `$ARGUMENTS` for a path (default: current working directory). Resolve relative paths.
Split `$ARGUMENTS` into a path and flags. Path is the first non-flag argument (default: current working directory). Resolve relative paths. Recognized flags:
- `--raw` — pass-through to the scanner; produces v5.0.0 verbatim output (bypasses the humanizer). Power-user mode for byte-stable diffs and machine consumption.
- `--drift` — append a "Configuration Drift" section (see Step 5).
- `--plugin-health` — append a "Plugin Health" section (see Step 5).
Tell the user:
@ -33,32 +37,34 @@ Running quick assessment{if path != cwd: " on `{path}`"}...
### Step 2: Run posture scanner
Run silently — all output goes to a file:
Run silently — JSON goes to a file, the humanized scorecard prints to stderr (default mode). The humanized stderr scorecard already includes the grade headline and area-score lines in plain language, so render those directly rather than re-deriving prose tables.
```bash
node ${CLAUDE_PLUGIN_ROOT}/scanners/posture.mjs <target-path> --json --output-file /tmp/config-audit-posture-$$.json 2>/dev/null; echo $?
RAW_FLAG=""
if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
node ${CLAUDE_PLUGIN_ROOT}/scanners/posture.mjs <target-path> --output-file /tmp/config-audit-posture-$$.json $RAW_FLAG 2>/tmp/config-audit-posture-stderr-$$.txt; echo $?
```
If exit code is non-zero, tell the user: "Assessment couldn't complete. Check that the path exists and contains Claude Code configuration files."
If `--raw` was passed, treat the captured stderr as v5.0.0-shape verbatim text and present it as-is in a code block; skip the humanized rendering steps below.
### Step 3: Read and interpret results
Read the JSON output file using the Read tool. Extract:
- `overallGrade`, `opportunityCount`
- `areas[]` — each with `name`, `grade`, `score`, `findingCount`
- `scannerEnvelope.scanners[].findings[]` — when surfacing individual findings, prefer the humanizer-provided fields: `userImpactCategory` (e.g., "Configuration mistake", "Wasted tokens"), `userActionLanguage` (e.g., "Fix this now", "Fix soon", "Optional cleanup"), and `relevanceContext` ("affects-everyone", "affects-this-machine-only", "test-fixture-no-impact"). These let you group and prioritize without hardcoded severity-to-prose mappings.
Also Read the captured stderr file — its body is the humanized scorecard (grade headline, area-score block, opportunity hint). You can present it verbatim or interleave its lines with the JSON-driven table.
### Step 4: Present the scorecard
```markdown
**Health: {overallGrade}** | {qualityAreaCount} areas scanned
{grade-based context — pick ONE:}
- A: "Your configuration is correct and well-maintained."
- B: "Solid configuration with minor improvements available."
- C: "Working configuration with some issues worth addressing."
- D: "Configuration needs attention in several areas."
- F: "Significant issues found — addressing these will improve your experience."
{Use the headline line from the humanized stderr scorecard — it carries grade-context prose already (e.g., " Health: A (97/100) — Healthy setup, only minor polish needed"). Do not re-derive an A/B/C/D prose table here; the humanizer owns that vocabulary.}
### Area Scores
@ -73,22 +79,13 @@ Read the JSON output file using the Read tool. Extract:
### What's next
```
**Grade A or B:**
```
Your configuration health is strong. Re-run after major changes to catch regressions.
For feature recommendations: `/config-audit feature-gap`
```
Group "what's next" suggestions by `userActionLanguage` from the humanized findings:
**Grade C:**
```
Run `/config-audit fix` to auto-fix what's possible, then `/config-audit plan` for a prioritized improvement path.
```
- Findings tagged "Fix this now" / "Fix soon" → suggest `/config-audit fix` first, then `/config-audit plan`.
- Findings tagged "Fix when convenient" / "Optional cleanup" → suggest `/config-audit feature-gap` and routine maintenance.
- No high-urgency findings → suggest `/config-audit feature-gap` for opportunities and re-running posture after major config changes.
**Grade D or F:**
```
Start with `/config-audit fix` — it handles the most impactful issues automatically with backup and rollback.
Then run `/config-audit plan` for a step-by-step path to a better configuration.
```
Avoid hardcoded grade-to-prose ladders here — the humanized scorecard headline already supplies grade context, and `userActionLanguage` supplies finding-level urgency.
### Step 5: Optional sections

View file

@ -28,16 +28,21 @@ Complementary to `/config-audit whats-active`:
Split `$ARGUMENTS` into a path and flags. Path is the first non-flag argument. Default to `.` (current working directory). Recognized flags:
- `--global` — also include the user-level `~/.claude/` cascade
- `--json` — emit raw JSON instead of rendered tables (power-user mode)
- `--json` — emit raw JSON instead of rendered tables (power-user mode; bypasses the humanizer for byte-stable v5.0.0 output)
- `--raw` — pass-through to the scanner; produces v5.0.0 verbatim JSON (bypasses the humanizer). Use when piping into v5.0.0-baseline diff tooling.
- `--with-telemetry-recipe` — include `telemetry_recipe_path` in the JSON output, pointing to `knowledge/cache-telemetry-recipe.md`. Use this when you want to verify a structural fix actually improved cache hit rate (manual jq recipe, opt-in)
### Step 2: Run the CLI silently
Tell the user: **"Analysing token hotspots for `<path>`..."**
Default mode (no `--json`, no `--raw`) emits a humanized JSON envelope: each finding carries `userImpactCategory`, `userActionLanguage`, and `relevanceContext` in addition to the v5.0.0 fields. Pass `--raw` through verbatim if the user requested it.
```bash
TMPFILE="/tmp/config-audit-tokens-$$.json"
node ${CLAUDE_PLUGIN_ROOT}/scanners/token-hotspots-cli.mjs <path> --output-file "$TMPFILE" [--global] 2>/dev/null; echo $?
RAW_FLAG=""
if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
node ${CLAUDE_PLUGIN_ROOT}/scanners/token-hotspots-cli.mjs <path> --output-file "$TMPFILE" [--global] $RAW_FLAG 2>/dev/null; echo $?
```
**Exit code handling:**
@ -58,10 +63,10 @@ Use the Read tool on `$TMPFILE`. Extract:
- `total_estimated_tokens` — top-line number
- `hotspots[]` — top 10 ranked sources
- `findings[]` — Opus 4.7 pattern findings (CA-TOK-001..003)
- `findings[]` — Opus 4.7 pattern findings (CA-TOK-001..003); each finding in default mode carries humanizer fields (`userImpactCategory`, `userActionLanguage`, `relevanceContext`) alongside the v5.0.0 fields
- `counts` — severity breakdown
Render as markdown:
Render as markdown. Group findings by `userImpactCategory` (e.g., "Wasted tokens" vs "Configuration mistake") rather than re-deriving severity prose; lead each line with `userActionLanguage` ("Fix this now", "Fix soon", "Optional cleanup", etc.) so the urgency phrasing stays consistent with the rest of the toolchain. The humanizer already replaced jargon-heavy `title`/`description`/`recommendation` strings with plain-language equivalents — render them verbatim.
```markdown
**Token hotspots for `<path>`** — ~{total_estimated_tokens} estimated tokens loaded per turn
@ -72,13 +77,14 @@ Render as markdown:
|------|--------|--------|-----------------|
| {rank} | `{source}` | ~{estimated_tokens} | {recommendations joined as `· ` bullets} |
### Opus 4.7 pattern findings
### Findings, grouped by impact
{For each finding, render:}
{Group findings[] by their userImpactCategory. Within each group, sort by userActionLanguage urgency (Fix this now → Fix soon → Fix when convenient → Optional cleanup → FYI), then render:}
- **{id}** ({severity}) — {title}
- **{userActionLanguage}** — {title} ({id})
- {description}
- **Fix:** {recommendation}
- _{relevanceContext}_ when not "affects-everyone" (mention the scope so the user knows whether a fix touches shared config or just their machine)
### Severity summary

View file

@ -24,6 +24,7 @@ Show a complete, read-only inventory of everything Claude Code loads for a given
Split `$ARGUMENTS` into a path and flags. Path is the first non-flag argument. Default to `.` (current working directory). Recognized flags:
- `--json` — emit raw JSON instead of rendered tables (power-user mode)
- `--raw` — pass-through to the scanner; accepted for CLI surface consistency. `whats-active` is an inventory-only output (no findings prose), so `--raw` is a no-op here, but the flag is still threaded through for uniform behaviour across the toolchain.
- `--verbose` — include per-file byte/line detail
- `--suggest-disables` — append deterministic disable-candidates + LLM-judgment pass
@ -33,7 +34,9 @@ Tell the user: **"Reading active configuration for `<path>`..."**
```bash
TMPFILE="/tmp/ca-whats-active-$$.json"
node ${CLAUDE_PLUGIN_ROOT}/scanners/whats-active.mjs <path> --output-file "$TMPFILE" [--verbose] [--suggest-disables] 2>/dev/null; echo $?
RAW_FLAG=""
if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
node ${CLAUDE_PLUGIN_ROOT}/scanners/whats-active.mjs <path> --output-file "$TMPFILE" [--verbose] [--suggest-disables] $RAW_FLAG 2>/dev/null; echo $?
```
**Exit code handling:**

View file

@ -0,0 +1,97 @@
/**
* Wave 5 Step 13 Group A command-template shape tests.
*
* Verifies that the 5 audit/analysis command templates have the correct
* structural shape after the humanizer integration:
*
* - All 5 files: contain a Bash invocation block, reference the Read tool,
* and contain the `--raw` flag (or the literal `"$ARGUMENTS"` string).
*
* - Findings-rendering files (posture.md, tokens.md, feature-gap.md):
* reference at least one of `userImpactCategory|userActionLanguage|
* relevanceContext`, and do NOT contain hardcoded grade-prose tables
* of the form `[ABCDF]\s+grade\s+is...`.
*
* - Inventory/data-only files (manifest.md, whats-active.md): structural
* checks only (Bash + Read + --raw pass-through). No humanized-field
* reference required because these CLIs emit data tables, not findings.
*/
import { test } from 'node:test';
import { strict as assert } from 'node:assert';
import { readFile } from 'node:fs/promises';
import { resolve, dirname } from 'node:path';
import { fileURLToPath } from 'node:url';
const __dirname = dirname(fileURLToPath(import.meta.url));
const COMMANDS_DIR = resolve(__dirname, '..', '..', 'commands');
const GROUP_A_FILES = [
'posture.md',
'tokens.md',
'manifest.md',
'whats-active.md',
'feature-gap.md',
];
const FINDINGS_RENDERING_FILES = [
'posture.md',
'tokens.md',
'feature-gap.md',
];
const HUMANIZED_FIELD_REGEX = /userImpactCategory|userActionLanguage|relevanceContext/;
const RAW_OR_ARGUMENTS_REGEX = /--raw|"\$ARGUMENTS"/;
const HARDCODED_GRADE_PROSE_REGEX = /[ABCDF]\s+grade\s+is/;
// A Bash invocation block in markdown is a fenced ``` block tagged with bash.
const BASH_BLOCK_REGEX = /```bash\b/;
// Read tool reference: either explicit "Read tool" prose or the frontmatter
// "allowed-tools" list mentioning Read.
const READ_TOOL_REGEX = /\bRead\s+tool\b|allowed-tools:.*\bRead\b/;
async function readCommand(name) {
return await readFile(resolve(COMMANDS_DIR, name), 'utf-8');
}
test('Group A: every file contains a Bash invocation block', async () => {
for (const name of GROUP_A_FILES) {
const content = await readCommand(name);
assert.match(content, BASH_BLOCK_REGEX, `${name} missing bash block`);
}
});
test('Group A: every file references the Read tool', async () => {
for (const name of GROUP_A_FILES) {
const content = await readCommand(name);
assert.match(content, READ_TOOL_REGEX, `${name} missing Read tool reference`);
}
});
test('Group A: every file contains --raw or "$ARGUMENTS" (pass-through plumbing)', async () => {
for (const name of GROUP_A_FILES) {
const content = await readCommand(name);
assert.match(content, RAW_OR_ARGUMENTS_REGEX, `${name} missing --raw / $ARGUMENTS plumbing`);
}
});
test('Group A findings-renderers: reference at least one humanized field', async () => {
for (const name of FINDINGS_RENDERING_FILES) {
const content = await readCommand(name);
assert.match(
content,
HUMANIZED_FIELD_REGEX,
`${name} must reference userImpactCategory, userActionLanguage, or relevanceContext`,
);
}
});
test('Group A findings-renderers: no hardcoded grade-prose tables', async () => {
for (const name of FINDINGS_RENDERING_FILES) {
const content = await readCommand(name);
assert.doesNotMatch(
content,
HARDCODED_GRADE_PROSE_REGEX,
`${name} contains a hardcoded "[grade] grade is..." prose table — humanizer owns grade vocabulary now`,
);
}
});