feat(humanizer): update audit/analysis command templates group B [skip-docs]

Wave 5 Step 14. Threads the humanizer vocabulary through the remaining six audit/analysis command templates and adds a shape test that locks the structure plus a pair of anchor must-contains. - commands/drift.md: --raw pass-through; new/resolved/changed-finding rendering instructions reference userActionLanguage and relevanceContext rather than raw severity. - commands/plugin-health.md: --raw pass-through; finding rendering groups by userImpactCategory and leads with userActionLanguage. - commands/config-audit.md (router): replaces the 25-line A/B/C/D/F prose ladder with a humanized stderr-scorecard reference + three userActionLanguage-grouped "What you can do next" branches; --raw threaded through both scan-orchestrator and posture invocations. - commands/discover.md: --raw pass-through; finding-summary rendering groups by userImpactCategory. - commands/analyze.md: --raw pass-through; analyzer-agent prompt now instructs grouping by userImpactCategory and leading with userActionLanguage; humanized title/description/recommendation strings rendered verbatim, no paraphrasing. - commands/status.md: phase-label humanization table — current_phase machine field name preserved, user-facing labels translated ("Looking at your config files", "Working out what to recommend", "Asking what you'd like to focus on", "Putting together your action plan", "Making the changes", "Double-checking everything worked"); --raw preserves verbatim machine field values. tests/commands/group-b-shape.test.mjs (new, +8 tests, 772 → 780): - structural: bash block + Read tool + --raw/$ARGUMENTS plumbing across all 6 files - findings-renderers: humanized field reference + no grade-prose - anchor must-contains per plan: config-audit.md ⊇ userImpactCategory|userActionLanguage; drift.md ⊇ --raw|humanized - status.md: current_phase preserved + ≥3 humanized phase labels
2026-05-01 19:45:55 +02:00 · 2026-05-01 19:45:55 +02:00 · 6f38a6340e
commit 6f38a6340e
parent 79b6e29073
7 changed files with 234 additions and 49 deletions
--- a/plugins/config-audit/commands/analyze.md
+++ b/plugins/config-audit/commands/analyze.md
@ -14,11 +14,15 @@ Generate comprehensive analysis report from discovery findings.
 - Must have completed Phase 1 (discovery)
 - Findings must exist in `~/.claude/config-audit/sessions/{session-id}/findings/`

+## Arguments
+
+- `$ARGUMENTS` may contain `--raw` to forward to the analyzer agent's instructions; in `--raw` mode the agent renders v5.0.0 verbatim severity prefiks instead of humanized `userActionLanguage` urgency phrasing.
+
 ## Implementation

 ### Step 1: Verify session state

-Read `~/.claude/config-audit/sessions/{session-id}/state.yaml` and verify discovery phase completed. If not, tell the user: "Discovery hasn't been run yet. Start with `/config-audit discover` or just run `/config-audit` for a full audit."
+Read `~/.claude/config-audit/sessions/{session-id}/state.yaml` using the Read tool and verify discovery phase completed. If not, tell the user: "Discovery hasn't been run yet. Start with `/config-audit discover` or just run `/config-audit` for a full audit."

 ### Step 2: Tell the user what's happening

@ -33,18 +37,29 @@ This includes hierarchy mapping, conflict detection, and prioritized recommendat

 Tell the user: **"Generating analysis (this takes about 30 seconds)..."**

+```bash
+RAW_FLAG=""
+if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
+```
+
 ```
 Agent(subagent_type: "config-audit:analyzer-agent")
  model: sonnet
  prompt: |
    Analyze all findings in: ~/.claude/config-audit/sessions/{session-id}/findings/
+    Mode: $RAW_FLAG (empty = humanized; "--raw" = v5.0.0 verbatim severity prefiks)
    Generate comprehensive report covering:
-    1. Executive summary with key metrics
+    1. Executive summary with key metrics, grouped by userImpactCategory
    2. Hierarchy map visualization
    3. Conflict detection across config layers
    4. CLAUDE.md quality assessment
    5. Security issues (secrets, permissions)
-    6. Top 10 prioritized recommendations
+    6. Top 10 prioritized recommendations — lead each item with the
+       finding's userActionLanguage ("Fix this now," "Fix soon,"
+       "Fix when convenient," "Optional cleanup," "FYI") rather than
+       raw severity. The humanizer already replaced jargon-heavy
+       title/description/recommendation strings with plain-language
+       equivalents — render them verbatim, do not paraphrase.
    Output to: ~/.claude/config-audit/sessions/{session-id}/analysis-report.md
 ```

--- a/plugins/config-audit/commands/config-audit.md
+++ b/plugins/config-audit/commands/config-audit.md
@ -82,10 +82,12 @@ This is a silent infrastructure step — do NOT show output to the user.

 Tell the user: **"Running 12 configuration scanners..."**

-Run both scanners and posture in a single Bash command:
+Run both scanners and posture in a single Bash command. Default mode runs the humanizer, so each finding in `scan-results.json` carries `userImpactCategory`, `userActionLanguage`, and `relevanceContext` alongside the v5.0.0 fields. If the user passed `--raw`, thread it through to both CLIs to get v5.0.0 verbatim output.

 ```bash
-node ${CLAUDE_PLUGIN_ROOT}/scanners/scan-orchestrator.mjs <target-path> --output-file ~/.claude/config-audit/sessions/{session-id}/findings/scan-results.json [--full-machine] [--global] 2>/dev/null; node ${CLAUDE_PLUGIN_ROOT}/scanners/posture.mjs <target-path> --json --output-file ~/.claude/config-audit/sessions/{session-id}/posture.json [--full-machine] [--global] 2>/dev/null; echo $?
+RAW_FLAG=""
+if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
+node ${CLAUDE_PLUGIN_ROOT}/scanners/scan-orchestrator.mjs <target-path> --output-file ~/.claude/config-audit/sessions/{session-id}/findings/scan-results.json [--full-machine] [--global] $RAW_FLAG 2>/dev/null; node ${CLAUDE_PLUGIN_ROOT}/scanners/posture.mjs <target-path> --output-file ~/.claude/config-audit/sessions/{session-id}/posture.json [--full-machine] [--global] $RAW_FLAG 2>/dev/null; echo $?
 ```

 Use `--full-machine` for `full` scope, `--global` for `home` scope. For `repo` and `current`, pass the resolved path directly.
@ -134,19 +136,14 @@ Write to: `~/.claude/config-audit/sessions/{session-id}/state.yaml`

 ### Step 6: Display results

-Present results using this template. Replace all placeholders with actual values. **Adapt the summary sentence based on grade.**
+Present results using this template. The humanizer has already replaced jargon-heavy `title`/`description`/`recommendation` strings on every finding with plain-language equivalents — render them verbatim. Lead urgency phrasing with `userActionLanguage` ("Fix this now", "Fix soon", "Fix when convenient", "Optional cleanup", "FYI") and group "What you can do next" suggestions by that field. Do not re-derive an A/B/C/D/F-to-prose ladder here; the humanized stderr scorecard headline already supplies the grade context, and `userActionLanguage` supplies finding-level urgency.

 ```markdown
 ### Results

 **Health: {overallGrade}** | {qualityAreaCount} areas scanned

-{grade-based summary — pick ONE:}
- Grade A: "Excellent — your configuration is correct and well-maintained."
- Grade B: "Strong — your configuration is solid with minor improvements available."
- Grade C: "Decent — your configuration works but has some issues worth addressing."
- Grade D: "Needs work — several configuration issues could affect your Claude Code experience."
- Grade F: "Significant issues found — addressing these will meaningfully improve your workflow."
+{Use the headline line from the humanized stderr scorecard — it carries grade-context prose already. Avoid hardcoding a separate per-grade prose ladder.}

 Scanned {files_scanned} files | {real_finding_count} findings ({severity_breakdown})
 {If test_fixture_count > 0: "({test_fixture_count} additional findings in test fixtures were excluded.)"}
@ -164,26 +161,25 @@ Scanned {files_scanned} files | {real_finding_count} findings ({severity_breakdo
 | Imports | {grade} | {count} | {status} |
 | Conflicts | {grade} | {count} | {status} |

-{For the status column, use plain language like: "Well structured", "2 minor issues", "Missing trust levels", "No issues", etc.}
+{For the status column, use the humanized title from the most-severe finding in that area, or a one-phrase plain-language summary. Findings carry userImpactCategory which already groups by impact bucket — use that vocabulary, not raw scanner names.}

 {If opportunityCount > 0:}
 {opportunityCount} feature opportunities available — run `/config-audit feature-gap` for context-aware recommendations.

 ### What you can do next

-{Include only relevant options based on findings. Explain each one:}
+Group suggestions by `userActionLanguage` from the humanized findings:

-{If fixable_count > 0:}
- **`/config-audit fix`** — Automatically fix {fixable_count} issues. Creates a backup first so you can roll back with one command.
+{If any finding has userActionLanguage "Fix this now" or "Fix soon":}
+- **`/config-audit fix`** — auto-fix what's possible (backup created first, one-command rollback). The remaining items go into a prioritized plan.
+- **`/config-audit plan`** — produce a prioritized action plan for the items that need manual attention.

-{If real findings > fixable_count:}
- **`/config-audit plan`** — Get a prioritized action plan for the {remaining} issues that need manual attention.
+{If most findings are "Fix when convenient" or "Optional cleanup":}
+- **`/config-audit feature-gap`** — see which features could enhance your setup; pick what you want and implement on the spot.
+- **`/config-audit fix`** — auto-fix anything deterministic; the rest is genuinely optional.

-{If grade is C or better:}
- **`/config-audit feature-gap`** — See which features could help your project, and implement the ones you want on the spot.
-
-{If grade is D or F:}
- **`/config-audit fix`** should be your first step — it handles the most impactful issues automatically.
+{If only "FYI" findings:}
+- **`/config-audit feature-gap`** — explore opportunities; nothing is urgent.

 Session saved to: `~/.claude/config-audit/sessions/{session-id}/`
 ```
--- a/plugins/config-audit/commands/discover.md
+++ b/plugins/config-audit/commands/discover.md
@ -67,10 +67,12 @@ If `--delta` flag:

 ### Step 5: Run discovery

-Run the scan orchestrator silently to discover and scan files:
+Run the scan orchestrator silently to discover and scan files. Default mode emits humanized JSON — each finding in `scan-results.json` carries `userImpactCategory`, `userActionLanguage`, and `relevanceContext` alongside the v5.0.0 fields. Pass `--raw` through if the user requested it (produces v5.0.0 verbatim envelope; humanizer fields absent).

 ```bash
-node ${CLAUDE_PLUGIN_ROOT}/scanners/scan-orchestrator.mjs <target-path> --output-file ~/.claude/config-audit/sessions/{session-id}/findings/scan-results.json [--full-machine] [--global] 2>/dev/null; echo $?
+RAW_FLAG=""
+if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
+node ${CLAUDE_PLUGIN_ROOT}/scanners/scan-orchestrator.mjs <target-path> --output-file ~/.claude/config-audit/sessions/{session-id}/findings/scan-results.json [--full-machine] [--global] $RAW_FLAG 2>/dev/null; echo $?
 ```

 Check exit code: 0/1/2 → normal. 3 → "Discovery encountered an error. Try a narrower scope."
@ -81,7 +83,7 @@ Write `scope.yaml` and `state.yaml` to session directory. Update state with `cur

 ### Step 7: Present summary

-Read the scan results file to count files and findings:
+Read the scan results file using the Read tool. When you surface initial findings, group them by `userImpactCategory` and lead each line with `userActionLanguage` rather than raw severity prefiks — the humanizer already mapped severity to plain-language phrasing ("Fix this now", "Fix soon", "Fix when convenient", "Optional cleanup", "FYI") so the rest of the toolchain sees consistent wording.

 **Full scan:**
 ```markdown
@ -98,7 +100,7 @@ Read the scan results file to count files and findings:
 | Hooks | {n} |
 | Other | {n} |

-Initial scan found {finding_count} items to review.
+Initial scan found {finding_count} items to review (grouped by impact: {comma-separated counts per userImpactCategory}).

 **Next:** Run `/config-audit analyze` to generate your analysis report.
 ```
--- a/plugins/config-audit/commands/drift.md
+++ b/plugins/config-audit/commands/drift.md
@ -16,6 +16,7 @@ Compare current configuration against a saved baseline to see what changed.
  - A target path (default: current working directory)
  - `--save`: Save current state as baseline
  - `--baseline <name>`: Compare against a specific named baseline (default: "default")
+  - `--raw`: Pass-through to the scanner; produces v5.0.0 verbatim diff output (bypasses the humanizer). Use when piping into v5.0.0-baseline diff tooling that depends on byte-stable output.

 ## Implementation

@ -26,7 +27,9 @@ If `--save` is present:
 Tell the user: **"Saving current configuration as baseline..."**

 ```bash
-node ${CLAUDE_PLUGIN_ROOT}/scanners/drift-cli.mjs <path> --save --name <baseline-name> 2>/dev/null
+RAW_FLAG=""
+if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
+node ${CLAUDE_PLUGIN_ROOT}/scanners/drift-cli.mjs <path> --save --name <baseline-name> $RAW_FLAG 2>/dev/null
 ```

 Read stdout for confirmation. Tell the user:
@ -45,17 +48,21 @@ Without `--save`:
 Tell the user: **"Comparing current configuration against baseline..."**

 ```bash
-node ${CLAUDE_PLUGIN_ROOT}/scanners/drift-cli.mjs <path> --baseline <name> 2>/dev/null
+RAW_FLAG=""
+if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
+node ${CLAUDE_PLUGIN_ROOT}/scanners/drift-cli.mjs <path> --baseline <name> $RAW_FLAG 2>/dev/null
 ```

-Read stdout. If baseline not found, tell the user:
+Read stdout. In default mode the diff sections are humanized — finding titles, descriptions, and recommendations have already been replaced with plain-language equivalents. New/resolved/changed finding lists carry `userImpactCategory`, `userActionLanguage`, and `relevanceContext` so you can group and prioritize without re-deriving severity prose. If `--raw` was passed, the v5.0.0 diff is verbatim — present it in a code block as-is.
+
+If baseline not found, tell the user:

 ```
 No baseline found. Save one first with:
  /config-audit drift --save
 ```

-Otherwise, parse and present the drift report:
+Otherwise, parse and present the drift report. Use the Read tool on the captured stdout (or pipe it into a tmpfile first if you prefer):

 ```markdown
 ### Configuration Drift
@ -65,15 +72,15 @@ Otherwise, parse and present the drift report:

 {If new findings:}
 #### New Issues ({count})
-| ID | Severity | Description |
-|----|----------|-------------|
-| ... | ... | ... |
+| ID | Action | Description |
+|----|--------|-------------|
+| {id} | {userActionLanguage — "Fix this now", "Fix soon", etc.} | {humanized title} |

 {If resolved findings:}
 #### Resolved ({count})
 | ID | Description |
 |----|-------------|
-| ... | ... |
+| {id} | {humanized title} |

 {If area changes:}
 #### Area Changes
@ -82,6 +89,8 @@ Otherwise, parse and present the drift report:
 | ... | ... | ... | ... |
 ```

+When iterating new/resolved findings, prefer `userActionLanguage` over raw `severity` for the "Action" column — the humanizer already mapped severity to plain-language phrasing, and surfacing it consistently keeps the toolchain coherent. Mention `relevanceContext` when it isn't `affects-everyone` (the user wants to know if a fix touches shared config or just their machine).
+
 ### List baselines

 If `$ARGUMENTS` contains `--list`:
--- a/plugins/config-audit/commands/plugin-health.md
+++ b/plugins/config-audit/commands/plugin-health.md
@ -14,6 +14,7 @@ Audit Claude Code plugin structure and quality — validates plugin.json, CLAUDE

 - `$ARGUMENTS` may contain a path to a specific plugin directory
 - If omitted: scans all plugins in the marketplace root
+- `--raw`: pass-through to the scanner; produces v5.0.0 verbatim envelope (bypasses the humanizer) for byte-stable diff tooling

 ## Implementation

@ -31,13 +32,15 @@ Auditing {N} plugin(s) for structure, frontmatter quality, and cross-plugin conf

 ### Step 2: Run scanner

-Run silently for each plugin:
+Run silently for each plugin. Default mode emits a humanized JSON envelope where each PLH finding carries `userImpactCategory`, `userActionLanguage`, and `relevanceContext` alongside the v5.0.0 fields. `--raw` is passed through verbatim when present.

 ```bash
-node ${CLAUDE_PLUGIN_ROOT}/scanners/plugin-health-scanner.mjs <path> 2>/dev/null
+RAW_FLAG=""
+if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
+node ${CLAUDE_PLUGIN_ROOT}/scanners/plugin-health-scanner.mjs <path> $RAW_FLAG 2>/dev/null
 ```

-Read stdout output (JSON). Parse findings.
+Read stdout output (JSON) using the Read tool. Parse findings.

 ### Step 3: Present results

@ -59,10 +62,12 @@ Read stdout output (JSON). Parse findings.
 #### Findings by Plugin

 **{plugin-name}** ({finding_count} findings):
-1. [{id}] {title} — {recommendation}
+1. [{userActionLanguage}] {humanized title} ({id}) — {humanized recommendation}
 2. ...
 ```

+Group findings within each plugin by `userImpactCategory` (e.g., "Configuration mistake", "Conflict") and lead each line with `userActionLanguage` ("Fix this now", "Fix soon", "Optional cleanup"). The humanizer already produced the plain-language `title`/`recommendation` strings — render them verbatim, do not paraphrase.
+
 ### Step 4: Suggest next steps

 ```
--- a/plugins/config-audit/commands/status.md
+++ b/plugins/config-audit/commands/status.md
@ -1,7 +1,7 @@
 ---
 name: config-audit:status
 description: Show current session state and available actions
-allowed-tools: Read, Glob
+allowed-tools: Read, Glob, Bash
 model: sonnet
 ---

@ -13,18 +13,40 @@ Display current session state and guide next actions.

 ```
 /config-audit status
+/config-audit status --raw   # show the raw v5.0.0 phase identifiers (current_phase: "discover", etc.) instead of humanized labels
 ```

+## Phase-label translation
+
+The `state.yaml` field `current_phase` is the machine contract — never rename it. The user-facing label is humanized. Map the field value to a plain-language label when rendering (default mode):
+
+| `current_phase` (machine field, unchanged) | User-facing label |
+|--------------------------------------------|-------------------|
+| `discover`  | Looking at your config files |
+| `analyze`   | Working out what to recommend |
+| `interview` | Asking what you'd like to focus on |
+| `plan`      | Putting together your action plan |
+| `implement` | Making the changes |
+| `verify`    | Double-checking everything worked |
+
+When `--raw` is in `$ARGUMENTS`, render the raw `current_phase` field value verbatim (no humanization).
+
 ## Implementation

-1. **Find active session**:
+1. **Parse flags**:
+   ```bash
+   RAW_FLAG=""
+   if echo "$ARGUMENTS" | grep -q -- "--raw"; then RAW_FLAG="--raw"; fi
+   ```
+
+2. **Find active session**:
   ```
   Glob: ~/.claude/config-audit/sessions/*/state.yaml
   Sort by modification time
   Use most recent
   ```

-2. **Read session state**:
+3. **Read session state** with the Read tool:
   ```yaml
   session_id: "20250126_143022"
   current_phase: "analyze"
@ -33,7 +55,7 @@ Display current session state and guide next actions.
   ...
   ```

-3. **Display status**:
+4. **Display status** (default mode — humanized phase labels):
   ```
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   Config-Audit Session Status
@ -44,11 +66,11 @@ Display current session state and guide next actions.

   PHASE PROGRESS
   ──────────────
-   ✓ Phase 1: Discover   - 15 files found (current directory)
-   ✓ Phase 2: Analyze    - report generated
-   ○ Phase 3: Interview  - not started (optional)
-   ○ Phase 4: Plan       - not started
-   ○ Phase 5: Implement  - not started
+   ✓ Phase 1: Looking at your config files     - 15 files found (current directory)
+   ✓ Phase 2: Working out what to recommend    - report generated
+   ○ Phase 3: Asking what you'd like to focus on - not started (optional)
+   ○ Phase 4: Putting together your action plan  - not started
+   ○ Phase 5: Making the changes                 - not started

   NEXT ACTION
   ───────────
@ -64,7 +86,9 @@ Display current session state and guide next actions.
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   ```

-4. **If no session found**:
+   In `--raw` mode, replace the humanized phase labels with the verbatim machine field values (`Phase 1: discover`, `Phase 2: analyze`, etc.).
+
+5. **If no session found**:
   ```
   No active config-audit session found.

--- a/plugins/config-audit/tests/commands/group-b-shape.test.mjs
+++ b/plugins/config-audit/tests/commands/group-b-shape.test.mjs
@ -0,0 +1,134 @@
+/**
+ * Wave 5 Step 14 — Group B command-template shape tests.
+ *
+ * Verifies that the 6 audit/analysis command templates in Group B have the
+ * correct structural shape after the humanizer integration:
+ *
+ *   - All 6 files: contain a Bash invocation block, reference the Read tool,
+ *     and contain the `--raw` flag (or the literal `"$ARGUMENTS"` string).
+ *
+ *   - Findings-rendering files (drift.md, plugin-health.md, config-audit.md,
+ *     discover.md, analyze.md): reference at least one of
+ *     `userImpactCategory|userActionLanguage|relevanceContext`, and do NOT
+ *     contain hardcoded grade-prose tables of the form `[ABCDF]\s+grade\s+is`.
+ *
+ *   - status.md: phase-label table is present, the machine field name
+ *     `current_phase` is preserved (machine contract), and at least one
+ *     humanized phase label appears ("Looking at your config files",
+ *     "Working out what to recommend", "Putting together your action plan",
+ *     "Making the changes", "Double-checking everything worked").
+ *
+ *   - Anchor must-contains from plan line 575–579:
+ *     - config-audit.md: contains userImpactCategory|userActionLanguage
+ *     - drift.md: contains --raw OR humanized
+ */
+
+import { test } from 'node:test';
+import { strict as assert } from 'node:assert';
+import { readFile } from 'node:fs/promises';
+import { resolve, dirname } from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const COMMANDS_DIR = resolve(__dirname, '..', '..', 'commands');
+
+const GROUP_B_FILES = [
+  'drift.md',
+  'plugin-health.md',
+  'config-audit.md',
+  'discover.md',
+  'analyze.md',
+  'status.md',
+];
+
+const FINDINGS_RENDERING_FILES = [
+  'drift.md',
+  'plugin-health.md',
+  'config-audit.md',
+  'discover.md',
+  'analyze.md',
+];
+
+const HUMANIZED_FIELD_REGEX = /userImpactCategory|userActionLanguage|relevanceContext/;
+const RAW_OR_ARGUMENTS_REGEX = /--raw|"\$ARGUMENTS"/;
+const HARDCODED_GRADE_PROSE_REGEX = /[ABCDF]\s+grade\s+is/;
+const BASH_BLOCK_REGEX = /```bash\b/;
+const READ_TOOL_REGEX = /\bRead\s+tool\b|allowed-tools:.*\bRead\b/;
+
+const HUMANIZED_PHASE_LABELS = [
+  'Looking at your config files',
+  'Working out what to recommend',
+  'Asking what you',
+  'Putting together your action plan',
+  'Making the changes',
+  'Double-checking everything worked',
+];
+
+async function readCommand(name) {
+  return await readFile(resolve(COMMANDS_DIR, name), 'utf-8');
+}
+
+test('Group B: every file contains a Bash invocation block', async () => {
+  for (const name of GROUP_B_FILES) {
+    const content = await readCommand(name);
+    assert.match(content, BASH_BLOCK_REGEX, `${name} missing bash block`);
+  }
+});
+
+test('Group B: every file references the Read tool', async () => {
+  for (const name of GROUP_B_FILES) {
+    const content = await readCommand(name);
+    assert.match(content, READ_TOOL_REGEX, `${name} missing Read tool reference`);
+  }
+});
+
+test('Group B: every file contains --raw or "$ARGUMENTS" (pass-through plumbing)', async () => {
+  for (const name of GROUP_B_FILES) {
+    const content = await readCommand(name);
+    assert.match(content, RAW_OR_ARGUMENTS_REGEX, `${name} missing --raw / $ARGUMENTS plumbing`);
+  }
+});
+
+test('Group B findings-renderers: reference at least one humanized field', async () => {
+  for (const name of FINDINGS_RENDERING_FILES) {
+    const content = await readCommand(name);
+    assert.match(
+      content,
+      HUMANIZED_FIELD_REGEX,
+      `${name} must reference userImpactCategory, userActionLanguage, or relevanceContext`,
+    );
+  }
+});
+
+test('Group B findings-renderers: no hardcoded grade-prose tables', async () => {
+  for (const name of FINDINGS_RENDERING_FILES) {
+    const content = await readCommand(name);
+    assert.doesNotMatch(
+      content,
+      HARDCODED_GRADE_PROSE_REGEX,
+      `${name} contains a hardcoded "[grade] grade is..." prose table — humanizer owns grade vocabulary now`,
+    );
+  }
+});
+
+test('Group B anchor: config-audit.md references userImpactCategory|userActionLanguage', async () => {
+  const content = await readCommand('config-audit.md');
+  assert.match(content, /userImpactCategory|userActionLanguage/);
+});
+
+test('Group B anchor: drift.md references --raw or humanized', async () => {
+  const content = await readCommand('drift.md');
+  assert.match(content, /--raw|humanized/);
+});
+
+test('status.md: preserves current_phase machine field and adds humanized phase labels', async () => {
+  const content = await readCommand('status.md');
+  // Machine contract preserved
+  assert.match(content, /\bcurrent_phase\b/, 'status.md must keep current_phase as machine field');
+  // At least 3 of the 6 humanized phase labels appear
+  const present = HUMANIZED_PHASE_LABELS.filter(label => content.includes(label));
+  assert.ok(
+    present.length >= 3,
+    `status.md must include at least 3 humanized phase labels; found ${present.length}: ${present.join(', ')}`,
+  );
+});