Wave 5 Step 13. Threads the humanizer vocabulary through five audit/
analysis command templates and adds a shape test that locks the
structure in place.
- commands/posture.md, tokens.md, feature-gap.md (findings-renderers):
reference userImpactCategory/userActionLanguage/relevanceContext;
remove hardcoded A/B/C/D/F-to-prose tables (humanizer owns the
grade-context vocabulary now via the stderr scorecard headline).
- commands/manifest.md, whats-active.md (inventory CLIs): add --raw
pass-through for CLI-surface consistency. --raw is a no-op in these
CLIs, but the flag is threaded through so users get uniform behaviour.
- All five files: --raw flag parsed from $ARGUMENTS and passed verbatim
to the underlying scanner CLI when present.
tests/commands/group-a-shape.test.mjs (new, +5 tests, 767 → 772):
- structural: every file has a bash invocation block, Read tool
reference, and --raw/$ARGUMENTS plumbing
- findings-renderers only: at least one humanized field referenced;
no hardcoded "[grade] grade is..." prose tables
Step 12 of v5.1.0 humanizer Wave 4. Adds tests/snapshot-default-output
.test.mjs and seeds three snapshots in tests/snapshots/default-output/
that capture humanized default-mode output for representative CLIs.
Coverage:
- scan-orchestrator: stdout JSON envelope (humanized findings); time
fields normalized.
- token-hotspots-cli: stdout JSON envelope (humanized payload.findings);
duration_ms normalized.
- posture: stderr humanized scorecard; (Xms) durations normalized.
Snapshot envelope is uniform on disk: { kind: 'json', payload: ... }
for JSON streams and { kind: 'text', payload: '...' } for stderr text.
This keeps the snapshot files self-describing and easy to read.
Re-seeding requires UPDATE_SNAPSHOT=1 — drift fails the test by design,
so any humanizer prose change is intentional and re-approved.
Tests: 764 to 767 (+3 SC-5 cases). Full suite passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 11 of v5.1.0 humanizer Wave 4. Adds tests/raw-backcompat.test.mjs
mirroring the SC-6 contract for the --raw flag — the explicit "v5.0.0
verbatim" escape hatch.
- 4 fixture-deterministic CLIs (scan-orchestrator, posture,
token-hotspots-cli, fix-cli) get strict byte-equal against
tests/snapshots/v5.0.0/<cli>.json with time fields normalized.
- drift-cli is checked under the same contract guarded by
ensureDriftBaseline.
- 3 environment-aware CLIs (plugin-health, manifest, whats-active) are
checked for mode-equivalence (--raw equals --json).
- Posture additionally asserts its --raw stderr scorecard reproduces
tests/snapshots/v5.0.0-stderr/posture.txt verbatim, modulo (Xms)
duration markers normalized to (0ms).
- Cross-cutting suite asserts --raw findings carry no humanizer fields
on any CLI.
Tests: 751 to 764 (+13 SC-7 cases). Full suite passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 10 of v5.1.0 humanizer Wave 4. Adds tests/json-backcompat.test.mjs
asserting that --json output of every CLI remains backwards-compatible
with the v5.0.0 contract.
Coverage strategy mirrors Wave 3 cli-humanizer test discovery:
- 4 fixture-deterministic CLIs (scan-orchestrator, posture,
token-hotspots-cli, fix-cli) get strict byte-equal byte-equal --json
vs frozen tests/snapshots/v5.0.0/ snapshot, with time-varying fields
(timestamp, target path, duration_ms, generatedAt, durationMs)
normalized.
- drift-cli is checked with the same byte-equal contract guarded by an
ensureDriftBaseline precondition; the test silently skips when the
baseline cannot be created.
- 3 environment-aware CLIs (plugin-health-scanner, manifest,
whats-active) read live config-cascade state, so frozen snapshots
drift as the marketplace evolves. They are verified by mode-
equivalence (--json equals --raw) instead — the same approach
established in Wave 3 cli-humanizer.test.mjs.
A cross-cutting suite asserts --json output of the 4 deterministic
CLIs never carries humanizer fields (userImpactCategory,
userActionLanguage, relevanceContext) on any finding, walking both
top-level findings arrays and scanners[].findings paths.
Tests: 739 to 751 (+12 SC-6 cases). Full suite passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 9 of v5.1.0 humanizer Wave 4. Adds tests/scenario-read-test.mjs
runner, tests/scenario-read-test.test.mjs wrapper, and 5 scenario
fixtures in tests/scenarios/ that feed deterministic raw findings
through humanizeFinding and assert the humanized
title/description/recommendation match brief-owner-approved regex
patterns encoding the ground-truth what/why/whatNext answers.
Corpus selection (per brief criteria):
- 01-tok-cascade.json - TOK/CPS category (token efficiency)
- 02-cps-volatile.json - TOK/CPS category (cache prefix stability)
- 03-cnf-conflict.json - CNF category (conflicts)
- 04-gap-no-claude-md.json - GAP category (feature gap)
- 05-set-invalid-json.json - SET category, AND its v5.0.0 title +
description carry tier1 'invalid' (the brief criterion 'one finding
whose v5.0.0 description uses a forbidden word').
Runner mechanics:
- Loads scenarios matching ^\\d{2}-[a-z0-9-]+\\.json$ in sorted order.
- Calls humanizeFinding(scannerInput) and matches each humanized field
against its declared pattern (case-insensitive regex).
- Verifies humanizer-added structural fields (userImpactCategory,
userActionLanguage, relevanceContext) are non-empty strings.
- Per session decision (1a) acceptance is deterministic regex matching
without a runtime human approval gate.
Wrapper adds 3 tests: scenario-match (binds runner to node --test),
category-coverage (TOK/CPS, CNF, GAP, SET all present), and
tier1-presence (at least one v5.0.0 title or description contains a
tier1 forbidden word).
Tests: 736 to 739 (+3 SC-4 tests). Full suite passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 8 of v5.1.0 humanizer Wave 4. Adds tests/lint-default-output.mjs
runner and tests/scanners/lint-default-output.test.mjs wrapper that
exercise SC-3 against the 6 prose CLIs (scan-orchestrator, posture,
token-hotspots-cli, plugin-health-scanner, drift-cli, fix-cli) running
in default (humanized) mode against tests/fixtures/marketplace-medium.
Lint scope is stderr only — JSON envelope keys ("scanner", "severity")
are structural, not prose. Humanized prose fields embedded inside JSON
are already covered by tests/lib/humanizer-data.test.mjs tier1/tier3
checks. Code references inside backticks pass the lint
(stripBacktickSpans) so technical identifiers can appear when wrapped.
Default-mode prose fixes to land lint at zero violations:
- scan-orchestrator: top banner switches to "Config-Audit v2.2.0" and
per-scanner progress wraps "[XXX] Label" in backticks. --raw and
--json paths preserve the v5.0.0 verbatim banner via new
opts.humanizedProgress flag on runAllScanners.
- plugin-health-scanner: top banner switches to "Plugin Health v2.1.0"
in default mode; --raw/--json keep "Plugin Health Scanner v2.1.0".
- scoring.mjs generateHealthScorecard humanized branch: area names
(CLAUDE.md, Hooks, MCP, Settings, Rules, Imports, Conflicts, Token
Efficiency, Plugin Hygiene) are wrapped in backticks; dot-padding
compensates so column alignment matches v5.0.0 layout.
- posture / drift-cli / fix-cli: thread humanizedProgress flag through
their runAllScanners calls so default mode emits humanized progress
and --raw/--json preserve the v5.0.0 stderr snapshot.
Test infrastructure only — user-facing docs land in Wave 5/6 once
commands and agents consume the humanized payload.
Tests: 735 to 736 (+1 SC-3 wrapper). Full suite passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds --raw flag to all 6 remaining CLIs and wires humanization into the
default rendering path. --json and --raw both bypass humanization for
v5.0.0 byte-equal output; default mode humanizes findings/diff/prose.
token-hotspots-cli: humanizes payload.findings before stdout JSON write.
plugin-health-scanner: humanizes finding titles in stderr brief summary;
--json/--raw write byte-identical v5.0.0-shape result to stdout.
drift-cli: humanizes diff.{newFindings,resolvedFindings,unchangedFindings,
movedFindings} before formatDiffReport; --raw applies to save and list
modes too. Baselines remain raw v5.0.0 on disk.
fix-cli: humanizes manual-finding titles in stderr fix-plan prose; both
--json and --raw produce identical machine-readable JSON to stdout.
manifest, whats-active: --raw is a no-op (no findings, inventory only)
but parsed for CLI surface consistency.
Decision on missing --output-file flag for drift-cli/fix-cli/plugin-health:
deferred. SC-6/SC-7 tests in Wave 4 will use stdout-redirect (the simpler
Alt B path) since these CLIs already write JSON to stdout in machine modes.
Test cli-humanizer.test.mjs covers all 6 CLIs. Three CLIs that read
environment state (plugin-health, manifest, whats-active) verify
mode-equivalence (--json == --raw) instead of frozen-snapshot byte-equal,
because their output reflects current marketplace state which drifts as
plugins are added since the Wave 0 capture.
Wave 3 / Step 7 of v5.1.0 humanizer.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
generateHealthScorecard signature: 2-arg → 3-arg (areaScores, opportunityCount,
options = {}). options.humanized=true renders friendlier title, grade-context
line per overall grade, and rephrased opportunity line. options.humanized=false
(or 2-arg call) preserves v5.0.0 verbatim output for backwards-compat.
topActions also gets an optional options.humanized that swaps recommendations
through humanizeFinding lookup.
posture.mjs main():
--json → write JSON to stdout, suppress stderr scorecard
--raw → write JSON to stdout (byte-identical to --json), write v5.0.0
verbatim scorecard to stderr
default → humanized scorecard to stderr, no stdout
posture.test.mjs scorecard-prose assertions re-anchored to --raw mode (the
explicit v5.0.0 path) — Wave 0 audit only covered finding-title strings;
scorecard prose surfaces here for the first time.
Wave 3 / Step 6 of v5.1.0 humanizer.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds --json and --raw flags to scan-orchestrator CLI main(). Default mode
runs humanizeEnvelope(env) before serialization; --json and --raw bypass
the humanizer for v5.0.0 byte-equal output (SC-6 / SC-7 paths).
Save-baseline path always writes the raw v5.0.0-shape envelope so future
humanizer-data updates do not trigger false-positive drift findings.
runAllScanners() unchanged — it remains the v5.0.0-shape source of truth
for in-process callers (posture, scoring, drift, etc.).
Wave 3 / Step 5 of v5.1.0 humanizer.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Synthetic plan.md fixture with source_findings: block-style YAML list of 3
40-char hex IDs in frontmatter, plus minimal plan structure (Title +
Implementation Plan + 1 Step + Manifest). 3 tests verify:
1. plan-validator accepts a plan with source_findings (additive optional field)
2. frontmatter parser extracts source_findings as array of strings
3. each ID matches the 40-char lowercase hex format from finding-id.mjs
Closes the SC3(b) gap flagged by adversarial review (scope-guardian Gap 2).
LLM-level behavior (planner emitting source_findings) remains non-testable
without live invocation; this covers the structural contract.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wave 2 / Step 4 of v5.1.0 plain-language UX humanizer rollout. Re-anchors
34 title-string assertions across 4 test files so they survive Wave 3's
title/description/recommendation rewriting at the CLI layer.
Anchoring strategy per scanner:
- GAP findings: scanner + category + recommendation substring (humanizer
preserves stable identifiers like CLAUDE.md, .mcp.json, hook in rec).
Hardcoded CA-GAP-NNN IDs for positive checks.
- HKV findings: scanner + evidence regex (evidence preserved verbatim).
- SET findings: scanner + evidence regex (evidence preserved verbatim).
- PLH findings: scanner + hardcoded CA-PLH-NNN IDs (no evidence on most
PLH findings, so ID is the only stable anchor for specific cases;
negative checks use scanner + title-substring spanning raw + humanized).
Per docs/v5.1.0-test-audit.md classification: only (b) WILL BREAK
assertions modified. (a) shape-only assertions (error-message formatting,
pure existence checks) untouched. tests/lib/output.test.mjs and
tests/lib/diff-engine.test.mjs and tests/scanners/fix-engine.test.mjs
unchanged (synthetic test inputs, not scanner output).
Test count unchanged: 689/689 pass. IDs harvested via deterministic
runtime dump per fixture (resetCounter + scan).
3 integration tests using the run-A/run-B fixtures:
- Jaccard(A, B) ≥ 0.70 (SC4 brief threshold)
- IDs match 40-char hex shape (lib/parsers/finding-id.mjs format)
- no duplicate IDs within a single run
Tests the Jaccard PIPELINE; real-LLM determinism deferred to v1.1.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
review-run-A.md (5 findings) and review-run-B.md (6 findings, A ⊂ B) form a
known-Jaccard fixture pair: |A ∩ B| = 5, |A ∪ B| = 6, Jaccard = 5/6 = 0.833,
above the SC4 threshold of 0.70. IDs are real 40-char SHA1s computed via
lib/parsers/finding-id.mjs from realistic (file, line, rule_key) triplets.
Both fixtures pass review-validator --strict (frontmatter + body sections +
findings shape). Real-LLM determinism measurement deferred to v1.1.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Modify "all four pipeline commands" → "all five" (adds /ultrareview-local).
Add 3 new pins: Handover 6 section in HANDOVER-CONTRACTS.md,
review-validator CLI shim, rule-catalogue 12-key size invariant.
11/11 doc-consistency tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes the iteration loop: review.md → plan via source_findings audit trail.
Adds versioning row, validator-map entry, full Handover 6 section, and
stability summary row mirroring the shape of Handovers 1-5.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wave 1 / Step 3 of v5.1.0 plain-language UX humanizer.
scanners/lib/humanizer.mjs exports three pure functions:
- humanizeFinding(f) -> new finding object with translated
title/description/recommendation + three new fields
(userImpactCategory, userActionLanguage, relevanceContext).
- humanizeFindings(findings) -> mapped array.
- humanizeEnvelope(env) -> walks env.scanners[].findings.
Plus computeRelevanceContext(filePath) as a named export for
unit testing.
Field semantics:
- userImpactCategory: from scanner prefix per research/02 line 124
(Configuration mistake / Conflict / Wasted tokens / Dead config /
Missed opportunity / Other).
- userActionLanguage: from severity per research/02 line 134
(Fix this now / Fix soon / Fix when convenient / Optional cleanup
/ FYI).
- relevanceContext: deterministic file-path heuristic — looks for
/tests/fixtures/ or /test/fixtures/ substring (test-fixture-no-impact),
*.local.* basename (affects-this-machine-only), defaults to
affects-everyone. No subprocess, no network.
Lookup order per scanner: static[title] -> patterns regex match ->
_default -> fall through to original strings (when scanner prefix
absent).
Original id, scanner, severity, file, line, evidence, category,
autoFixable, and optional details are preserved exactly. Pure —
verified by deepEqual of input before/after.
Test (32 cases): purity, field preservation across all paths,
known/unknown scanner handling, all 5 severities, all 6 categories,
relevance heuristic for 4 path types, envelope walking, ANSI-free
guarantee. All pass.
Regression: 689/689 tests (657 + 32 new = 54 new across Wave 1).
Project: .claude/projects/2026-05-01-config-audit-ux-redesign/
Wave 1 / Step 2 of v5.1.0 plain-language UX humanizer.
scanners/lib/humanizer-data.mjs exports TRANSLATIONS keyed by
scanner prefix (CML, SET, HKV, RUL, MCP, IMP, CNF, GAP, TOK, CPS,
DIS, COL, PLH). Each scanner has:
- static: exact-title -> {title, description, recommendation}
- patterns: array of {regex, translation} for template-literal titles
- _default: graceful fallback for unknown findings
Architectural change vs. plan: keys translations by exact scanner
title (not finding ID). Reason: finding IDs are sequence-based
(global counter in lib/output.mjs:34), not stable per finding-type
— two runs can produce different IDs for the same logical issue.
Title strings ARE stable (defined as string literals or template
patterns in the scanner source).
Translations follow research/03 SR-1..SR-17:
- active voice, second person, present tense
- sentences <= 25 words
- tier1 absolute prohibitions and tier3 domain jargon are kept out
of prose
- tier1/tier3 terms are permitted inside `backtick spans` (code
references like filenames and field names) — established
technical-doc convention
Test (12 cases): all 13 scanners covered; every static and pattern
entry has the 3 required fields; tier1 and tier3 forbidden-word
checks pass (with backtick-span exclusion); reference-stable
imports. All pass.
Regression: 657/657 tests (645 + 12 new).
Project: .claude/projects/2026-05-01-config-audit-ux-redesign/
Wave 1 / Step 1 of v5.1.0 plain-language UX humanizer.
tests/lint-forbidden-words.json defines the SC-3 forbidden-words
vocabulary used by the lint runner (Wave 4 / Step 8) and the
humanizer-data translation guard (Wave 1 / Step 2).
- Tier 1: 19 absolute prohibitions (failure if matched in default
output) — sourced from Microsoft Writing Style Guide, Federal
Plain Language, GOV.UK, Google Developer Style, Apple HIG.
- Tier 2: 24 strong-avoidance terms (warning if matched) — same
sources plus Mailchimp.
- Tier 3: 12 domain-specific jargon terms (failure if matched in
default output, allowed in --raw and --json paths) — sourced
from research/03 jargon table.
Counts diverge from plan.md (18/21/11) — JSON tracks the brief's
verbatim lists at research/03 lines 200-202 plus tier3 hook entry
from the brief's table. Plan revision noted in audit-doc.
Test: 10 cases verifying parse, count, schema completeness, spot
checks per tier, no cross-tier duplicates. All pass.
Regression: 645/645 tests (635 + 10 new).
Project: .claude/projects/2026-05-01-config-audit-ux-redesign/
Wave 0 / Step 0 of the v5.1.0 plain-language UX humanizer plan.
Captures v5.0.0 baseline output for all 8 CLIs at
tests/snapshots/v5.0.0/ — these snapshots are immutable references
for SC-6 (--json byte-equal) and SC-7 (--raw byte-equal) tests in
later waves.
- 5 CLIs captured via --output-file: scan-orchestrator, posture,
token-hotspots-cli, manifest, whats-active
- 3 CLIs captured via stdout redirect (no --output-file support):
drift-cli (after baseline seed), fix-cli, plugin-health-scanner
- Posture stderr scorecard captured separately for SC-7 stderr-mode
comparison
docs/v5.1.0-test-audit.md classifies all 42 .title references in
7 known test files: 34 will break under humanization (literal
string equality / substring), 8 are safe (test fixtures or error
formatting). This document is the change list for Step 4.
Project: .claude/projects/2026-05-01-config-audit-ux-redesign/
These were committed in b37b938 by mistake — KTG's convention is that
planning docs in plugins/ultraplan-local/docs/ are local working files
and never pushed to the public marketplace.
- git rm --cached on both files (kept on disk, just untracked)
- .gitignore extended with explicit entries for the two filenames
Existing tracked docs in plugins/ultraplan-local/docs/ predate this rule
and are left alone (separate decision).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds two sibling files in plugins/ultraplan-local/docs/ that together
specify a new /ultracontinue command for zero-friction multi-session
resumption — drafted from design dialogue at the end of the config-audit
v5.0.0 release session (5 sessions, ~10 manual NEXT-SESSION-PROMPT
context-handovers — friction this work removes).
ultracontinue-brief.md (159 lines):
- Follows the /ultrabrief-local template (frontmatter brief_version: 2.0)
so /ultraplan-local can consume it directly
- Defines per-project state-file convention .claude/projects/<project>/
.session-state.local.json as the contract; /ultracontinue is read-only,
multiple writers may update
- 10 falsifiable success criteria including cross-project consistency,
no-new-deps, validator + helper command, docs sweep across plugin
README + CLAUDE.md + marketplace root README
- 3 research topics: ultraexecute end-of-session integration depth,
graceful-handoff alignment (no hard dep), Claude Code slash-command
conventions for read+execute commands
- Explicit non-goals: not replacing /ultraexecute-local --resume, not
replacing graceful-handoff, not auto-orchestrating N sessions
- Open questions and assumptions flagged for plan-critic / scope-guardian
ultracontinue-design-notes.md (117 lines):
- Captures the dialogue rationale that shaped the brief, so the
implementing session has full context without needing to read this
conversation's transcript
- Origin (config-audit v5 release pain point), key design insight
("state-fil ER kontrakten, ikke verktøyet"), 6 design decisions with
alternatives considered, anti-patterns from KTG auto-memory to respect,
recommended reading order, expected scope (1-2 execution sessions)
No code changes. Brief is ready for /ultraplan-local --brief
plugins/ultraplan-local/docs/ultracontinue-brief.md (light path) or
/ultraresearch-local for full research path.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v5.0.0 SHIPPED 2026-05-01. Tag config-audit/v5.0.0 pushed to Forgejo.
SC-6b release-gate PASS at -0.85% delta (CLAUDE.md actual 589 vs
estimated 594, well within ±5% gate).
Per-step:
- Step 28: README/CLAUDE.md straggler-sweep + self-audit counter alignment
- Step 29: version bump 4.0.0 → 5.0.0 + consolidated CHANGELOG
- Step 30: full audit + live SC-6b gate + tag (incl. one in-step bug fix
for hotspot.path exposure, required to make calibration measurable)
635 tests still green throughout. No blockers carried forward.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>