Kjell Tore Guttormsen 2f90880f7a docs(linkedin-studio): hardening-phase foundation (brief + adversarially-reviewed plan)

New phase after the baseline-audit remediation (S1-S17, 2633d32, complete):
a command-hardening pass that simulates each of the 29 commands and tightens
it to its stated intention (intention-fidelity + prompt-quality only — no
structural redesign, no new features, no GUI/M0). Runs over ~8 journey-grouped
Voyage sessions, gated by lint + /trekreview ALLOW before push.

Foundation laid this session (execution starts next session):
- docs/hardening/brief.md (valid; 3 locked forks: hybrid simulation,
  intention-fidelity+prompt-quality, per-journey cadence; research skipped —
  the 2026 bar is frozen in algorithm-signals-reference.md)
- docs/hardening/plan.md (8 sessions, Grade B+ 87)

Adversarial review: brief-reviewer PROCEED_WITH_RISKS (3 findings folded:
self-graded quality axis, SC-H deferral contradiction, no stopping rule);
plan-critic REVISE (2 blockers + 10 major — all addressed: anchored coverage
predicate kills substring false-positives, full-session cold-reviewer oracle
on concrete before/after output, per-type mechanical predicate, Failed:0 gate
not literal-74, S1 HALT circuit-breaker); scope-guardian ALIGNED.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-31 05:31:52 +02:00

15 KiB

Raw Blame History

type

brief_version

created

task

slug

project_dir

research_topics

research_status

auto_research

interview_turns

source

phase_signals

trekbrief

2.1

2026-05-31

Command hardening pass — simulate every linkedin-studio command and harden it to its stated intention (intention-fidelity + prompt-quality), one command at a time, across ~8 journey-grouped sessions, before the GUI/M0 track

hardening

docs/hardening/

skipped

false

interview

phase	effort	model
plan	high	opus

phase	effort	model
execute	high	opus

phase	effort	model
review	high	opus

Task: linkedin-studio command hardening

Generated 2026-05-31 (operator-driven, one clarification turn — three forks locked via AskUserQuestion). This brief is the contract between requirements and planning. /trekplan reads it to produce the multi-session plan. Every decision in the plan must trace back to content here.

Predecessor of record: the baseline-audit remediation (docs/remediation/) is complete (S1–S17, last commit 2633d32, clean ALLOW). That phase fixed structure, correctness, and honesty. This phase is the next, distinct layer: does each command actually deliver what it promises when run?

Intent

The remediation made every claim honest, wired every orphan agent, rebuilt the lint, and reconciled the algorithm bar. What it did not do is exercise each command's workflow end-to-end and judge the quality of what it produces against the command's own stated intention. A command can be structurally correct (right frontmatter, right agent wired, lint-green) and still under-deliver: a step that under-determines the next move, a question that yields a weak answer, a prompt that produces generic output, a missing graceful-degradation path.

This is a hardening phase: a deliberate, per-command pass that simulates a realistic invocation, judges the result against intention + the 2026 algorithm bar + the content-quality rules, and tightens the command definition where it falls short. It is the last quality gate before the GUI/M0 track (the two briefs docs/linkedin-studio-ui-brief.md + docs/linkedin-studio-persona-brief.md) — hardening the engine before building a dashboard on top of it.

Division of labour (explicit): the operator tests the commands live over the coming weeks (real inputs, real friction). This phase is the complementary intention-fidelity pass from the definition side. The two converge through a field-notes inbox (below): the operator's real-world findings outrank the simulated ones and steer per-session priority.

Goal

For every one of the 29 command surfaces, produce: (1) a recorded intention, (2) a simulated run with a concrete persona, (3) an evaluation against four axes, (4) a hardened command definition where gaps were found, (5) a green verification. Leave the command set structurally identical (29/19/25/6) and the plugin in a state where each command reliably delivers its description's promise.

Non-Goals

No structural redesign. No command merged, split, added, or removed; no new command; no journey re-tiering. 14a already proved zero redundancy; the surface count (29) and journey layer (v4.1.0) are fixed. *(Locked fork: "intention-fidelity
- prompt-quality", not "structural".)*
No new features / capability accretion. Hardening tightens existing intention; it does not add surfaces the audit/remediation deliberately scoped out (no auto-publish, no dwell measurement, no analytics-API integration, no new agents).
No GUI/M0 work. The dashboard, the per-user data-dir migration (M0), and the provider register are the next track and require their own mandate + brief. This phase only hardens the command engine they will sit on.
No re-litigation of the algorithm bar. references/algorithm-signals-reference.md is the single source of truth (settled in remediation). Hardening applies it; it does not re-research or re-open it. No /trekresearch is run.
No version/count churn per session. Prompt/spec refinement is not a surface change; sessions do not bump the version or touch counts (S11–S16 refinement precedent). A single optional minor bump (v4.2.0 "command hardening pass") may be taken at phase end.

Constraints

Opus on everything (sub-agents + orchestration). Max-discipline default.
No hidden costs. /trekplan, /trekcontinue, /trekreview and any Workflow fan-out are cost-warned in plain text before running; the operator's standing yes (2026-05-30) covers the routine /trekcontinue·/trekreview gate — run, don't block.
Gate before push (no WARN-override): per session, bash scripts/test-runner.sh exit 0 + node --test green where touched + /trekreview ALLOW → commit (own files only, explicit staging, never git add -u/-A) → push to Forgejo. In-session fix of the session's own misses = completion; genuine pre-existing design findings → next session.
Three-doc rule applies only if a hardening edit changes a feature/surface/count/ version (most will not — they refine prompt text). Pure prompt-quality edits are fix:/refactor: and do not trigger the feat-gate.
Platform: bash 3.2, Node-only hooks, no npm deps in hooks/scanners.
Untracked NOT-mine (never commit): docs/linkedin-studio-persona-brief.md, docs/linkedin-studio-ui-brief.md, docs/voyage-build/progress.json. *.local.md
- .session-state.local.json + review.html + STATE.md are gitignored.

Preferences

Simulation = hybrid (locked fork). Per command: role-play a realistic invocation (concrete persona + scenario, mock answers), produce/sketch the actual output the command would generate, and cold-read the spec against intention. Harden from both signals — catches "spec is broken" and "output is mediocre".
Personas: drawn from the plugin's real ICP and state — a solo Norwegian/English AI-advisor creator (the author's own profile: ~1048 followers, "Validation" phase, 5 expertise pillars), plus at least one fresh adopter (no voice samples, no analytics yet) to exercise progressive-onboarding + graceful-degradation paths. The persona-brief may be consulted (read-only) for richer personas; it is optional, not a hard dependency.
Session granularity (locked fork): one journey per session where small (Start/Engage/Measure/Grow); split Create (8 commands incl. the 16-phase newsletter) across 2–3 sessions. ~8 sessions total.
Field-notes inbox: before hardening a journey, check docs/hardening/field-notes.local.md for the operator's live-test findings on those commands and let them steer priority; the operator's real friction outranks the simulated friction.
Per-command log: record intention → simulation (inputs + output sketch + friction) → evaluation (4 axes) → harden diff → verify, in docs/hardening/log.md, so the operator's parallel live-testing can cross-reference every change and its why.

Non-Functional Requirements

No regression. After each session: lint exits 0 with unchanged counts (Commands 29 · Agents 19 · Reference docs 25 · Skills 6); hooks node --test 98/98 and analytics 116/116 stay green if those surfaces are touched; cross-references (router ↔ commands ↔ agents ↔ skills) stay consistent.
Determinism. Every success criterion is falsifiable by a command or a recorded observation; the hardening log is the audit trail.
Independence. The per-session /trekreview runs cold (independent reviewers), same as the remediation gate.

Success Criteria

Falsifiable per command and per phase.

Per command (the hardening unit):

SC-A (intention recorded): docs/hardening/log.md has, for the command, a one-paragraph intention distilled from its frontmatter description + journey role
- the content-quality rules + the algorithm signals it must honor.
SC-B (simulated): a concrete persona + scenario, the mock inputs, and the produced/sketched output are recorded, with a friction log of every ambiguity, dead-end, or under-determined step encountered walking the workflow as written.
SC-C (evaluated on 4 axes): the simulated output + spec carry an explicit pass/gap verdict against (a) intention fidelity (does it deliver the description's promise?), (b) algorithm bar (algorithm-signals-reference.md), (c) content-quality rules (hook 110-140, length bands, no body links, no buzzwords, topic-relevance, topic-rotation), (d) agent-wiring + graceful degradation (right subagent_type; sensible fallback when an agent/tool/mcp/CLI is unavailable).
SC-D (hardened or deferred): every gap is either fixed (diff recorded in the log) or explicitly deferred with a rationale — no gap left unrecorded.
SC-E (verified): after hardening, test-runner.sh exits 0 with unchanged counts, node --test is green where touched, and a re-read/re-simulation confirms the previously-failing axis now passes.

Per phase:

SC-F (coverage): all 29 command surfaces (27 atomic + the 2 front-doors create/measure) have a complete log.md entry through SC-A…SC-E.
SC-G (no structural drift): ls commands/*.md | wc -l == 29 throughout; no merge/split/add/remove; the lint's count/version guards stay green; grep for any stale count returns 0.
SC-H (clean gate): every session was pushed on /trekreview ALLOW (no WARN-override); the per-session review.md shows 0 open findings.
SC-I (operator findings honored): where field-notes.local.md carries a live finding for a session's commands, it is addressed or explicitly triaged in that session's log.

Hardening method (the per-command contract `/trekplan` must encode)

1. INTENT     distil the command's promise (description + journey role + quality
              rules + algorithm signals it must honor) → one paragraph
2. SIMULATE   pick a realistic persona + scenario; walk the workflow exactly as
              written; answer its questions; produce/sketch the real output; log
              every friction / ambiguity / dead-end / under-determined step
3. EVALUATE   judge output + spec on 4 axes: intention · algorithm bar · quality
              rules · agent-wiring + graceful degradation → pass/gap per axis
4. HARDEN     edit the command .md (surgical; agent/reference only if the intention
              requires it): fix dead-ends/contradictions/wrong-wiring + strengthen
              prompts/questions/steps so output hits the bar. NO structural redesign.
5. VERIFY     lint exit 0 + counts unchanged + node --test green where touched +
              re-simulation passes the failing axis; record before/after in log.md

Proposed session decomposition (to be confirmed/refined by `/trekplan`)

Session	Scope	Notes
S1	Method calibration on one command (`quick` — simplest, highest-volume) → show output + harden diff → lock the method. Then Start: `onboarding`, `first-post`, `setup`	de-risks the method before scaling
S2	Create I (atomic short-form): `post`, `react` (+ `quick` if not finished in S1)
S3	Create II (visual/video): `carousel`, `video`, `multiplatform`	mcp-image / aspect-ratio / SRT graceful degradation in focus
S4	Create III: `batch`, `newsletter`	newsletter = orchestration only (16-phase; gate-agents already reviewed in remediation); may spill into its own session
S5	Engage: `firsthour`, `calendar`, `pipeline`	state-mutation + publish + first-hour plan paths
S6	Measure: `import`, `report`, `analyze`, `audit`, `ab-test`	analytics CLI graceful degradation + directional A/B framing
S7	Grow: `strategy`, `competitive`, `monetize`, `outreach`, `profile`	~1K soft-gating + tracked-pipeline + SEO paths
S8	Front-doors + router: `create`, `measure`, `linkedin`	delegation/routing surfaces — verify they route correctly to the now-hardened commands

Research Plan

None. /trekresearch is deliberately skipped. The external 2026 bar (algorithm signals, analytics/publish boundaries, coverage-gap specs) was fully researched and triangulated in the remediation phase and is frozen in references/algorithm-signals-reference.md (the single source of truth) + docs/remediation/research/01-03. Hardening is an internal intention-fidelity exercise: it applies that established bar, it does not extend it. If a simulation surfaces a genuinely new external question, it is logged as an Open Question for the operator — not researched mid-phase.

Open Questions / Assumptions

[ASSUMPTION] /trekplan orders work as the 8 sessions above, one (or part of a) journey per Voyage session, with /trekreview as the per-session release gate.
[ASSUMPTION] quick is the S1 calibration command (operator did not object; overridable next session).
[ASSUMPTION] No /trekresearch (operator proposed-and-agreed to skip — the bar is settled).
[ASSUMPTION] Version stays v4.1.0 through the phase; an optional v4.2.0 "command hardening pass" minor bump is a phase-end decision, not pre-committed.
[OPEN] Whether newsletter (16-phase) needs its own dedicated session — decide during S4 once its orchestration simulation is scoped.
[OPEN] Whether the operator wants the simulated outputs themselves preserved (full text) in the log, or only the friction/verdict summary (token cost vs. reviewability trade-off) — default: summary + output sketch, full output only when it's the evidence for a gap.

Prior Attempts

The baseline-audit remediation (docs/remediation/, S1–S17, complete 2026-05-31, commit 2633d32) is the immediate predecessor. It closed every still-real audit finding (correctness, honesty, orphan-agent wiring, dead lint, generalization) and its S17 triage confirmed 0 still-real findings remain. No command has yet been exercised end-to-end for output quality against intention — that gap is exactly what this phase fills. The plugin is at v4.1.0, 29 commands / 19 agents, stable.

Metadata

Created: 2026-05-31
Interview turns: 1 (three forks locked: simulation depth = hybrid; change appetite = intention-fidelity + prompt-quality; cadence = per-journey, large groups split)
Auto-research opted in: no (bar already settled in remediation)
Source: operator-driven foundation session; execution begins next session

How to continue

# Plan (this session — lays the foundation):
/trekplan --project docs/hardening/

# Then, from next session onward, one journey-group per session:
/trekcontinue --project docs/hardening/     # S1: method calibration (quick) + Start
# … gate each: test-runner.sh + node --test + /trekreview ALLOW → commit (own files) → push

Driven per the operating model: the operator types «Les STATE.md og følg instruksjonene.»; Claude invokes the /trek* skills itself. STATE.md points at the next session; this brief is the contract /trekplan consumes now.

15 KiB Raw Blame History Unescape Escape