Kjell Tore Guttormsen 019fe4ed0b docs(linkedin-studio): replace v1 autonomous hardening plan with v2 interactive

v1 (autonomous 5-step + reviewer swarm + /trekreview gate) produced confident
fabrications and burned context for what are usually 0-1 line edits. v2 is a
slow, dialogic, one-command-per-session gate with the operator as truth source
and every mechanical claim tool-grounded. Method, command-class predicates,
session queue, and STATE.md end-of-session handoff are documented in plan.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-31 07:47:17 +02:00

6.9 KiB

Raw Blame History

LinkedIn Studio — Command Hardening Plan v2 (interactive quality gate)

Supersedes the v1 autonomous plan (recoverable in git history: commit 2f90880). Motivated by the 2026-05-31 S2 fabrication incident: the v1 method (autonomous, prose-heavy 5-step per command, per-command Opus reviewer swarm, /trekreview gate) produced confident fabrications (Claude wrote SIMULATE/EVALUATE/HARDEN narratives — char counts, line numbers, before/after — for files it had not actually observed) and burned context without value (full-file re-reads + long logs + reviewer agents for what are usually 0–1-line edits). A large parallel tool-batch also destabilized the tool-result stream (out-of-order / duplicated / truncated returns). Nothing false shipped — the failed edits + empty greps + real reads caught it, and it was reverted — but the method had to change.

v2 is a slow, dialogic, human-in-the-loop gate: ONE command per session, the operator as the truth source, every mechanical claim tool-grounded.

Intent

Harden each of the 29 commands to its own intent — one command per session — with the operator in the loop. Last quality gate before GUI/M0. Brief: brief.md.

The locked interactive method (identical every session)

Per command, 8 steps. Claude STOPS and waits at each "agree" point — no autonomous run-ahead:

Agree intention — show the command's own description + its journey role; agree in one line.
Agree test-method — the class predicates (below) + what the simulation must prove.
Agree persona — who invokes + with what input.
Run the test — tool-grounded mechanical facts (each with file:line) + a GROUNDED persona simulation (a real invocation → concrete output, anchored to the shown file region).
Talk — the operator's judgment is the truth source.
Agree improvements — surgical, or "no change".
Implement — Edit → grep-confirm landed → show the confirmed diff.
Move on — run the end-of-session ritual; the next command is its own session.

Two guards (non-negotiable)

Read-and-show BEFORE simulate — the simulation is anchored to observed text, shown to the operator.
Every mechanical claim from a tool — char count, ✓/✗, line number: shown from a tool before stated.

Anti-drift through dialogue

The per-command dialogue IS the drift-reduction mechanism: Claude proposes at each agree-point (intention / test-method / persona / improvement) and waits for the operator to confirm or correct before proceeding. Frequent grounding checkpoints are what keep Claude from confabulating — never race past an agree-point.

Execution discipline (the S1+S2 incident lessons)

ONE command per session. Never hold multiple command files in context at once.
Never batch dependent tool calls in parallel. Read→Edit→grep→log is strictly sequential. Large parallel batches destabilize the tool-result stream.
Guard greps: grep/grep -c exit non-zero on zero matches → append || echo NONE so a batch sibling isn't torn down.
Never write a HARDEN/after claim before the edit is grep-confirmed landed.
If the tool layer degrades (slow / truncated / empty / replayed returns): PAUSE and resume in a fresh session. Do not push through a degraded window.

Command classes & mechanical predicates (the testmethod per class)

post-emitting — hook 110–140 present · length band present · no-body-link present · buzzword check present · topic→5 pillars present · every subagent_type: linkedin-studio:X resolves to agents/X.md.
routing — every emitted /linkedin:Y resolves to commands/Y.md.
guided/stateful — primary promised artifact actually produced · subagent targets resolve · graceful degradation present.
analytics — graceful degradation present · saves/dwell honesty intact.

Synthetic test data (fixtures) — for data-dependent commands

Some commands cannot be meaningfully tested without input data — analytics (import, report, analyze, audit, ab-test) need a CSV/JSON; stateful commands (calendar, firsthour) need a queue/state. For these, step 2 (agree test-method) includes agreeing a small fictitious dataset that exercises the command's real path:

Isolated + throwaway — built in a temp/scratch location (e.g. /tmp/… or a gitignored path), NEVER the real analytics/state dirs, NEVER committed.
Minimal but realistic — just enough rows/fields to trigger the behavior under test (e.g. a 3–5-row CSV with the real column names; a 2-experiment ab-test file).
Clearly fictitious — obviously-fake values, impossible to mistake for real data.
Discarded after the test — the fixture proves the command works; it is not a deliverable.

Session queue (ONE command each)

S1 ✅ quick · onboarding · first-post · setup — DONE under v1; do NOT re-harden.


S2 post	S9 newsletter*	S16 analyze	S23 profile
S3 react	S10 headless-review	S17 audit	S24 create
S4 multiplatform	S11 pivot	S18 ab-test	S25 measure
S5 carousel	S12 firsthour	S19 strategy	S26 linkedin
S6 video	S13 calendar	S20 competitive
S7 batch	S14 import	S21 monetize
S8 pipeline	S15 report	S22 outreach

*S9 newsletter (16-phase) may split into S9a/S9b. Otherwise one command = one session.

End-of-session ritual (every session — STATE.md handoff baked in)

bash scripts/test-runner.sh → Failed: 0 + counts 29/19/25/6.
Terse, tool-grounded log.md line for the command (ONLY after verify): PASS / edit + file:line.
Commit own file(s) only, LOCALLY (fix: for command edits; explicit git add <path>, never -u/-A). Do NOT push hardening work — see Push policy.
Overwrite STATE.md: mark this command ✅; point at the NEXT single command + its class; carry the method + discipline rules → ready for the next "Les STATE.md og følg instruksjonene".
(Optional) update Voyage .session-state.local.json next-label for /trekcontinue coherence.

Defaults (adjustable between sessions)

Log: keep, terse, tool-grounded, after-verify only.
Persona: ICP author default; fresh-adopter for onboarding/first-post/setup; deviate explicitly elsewhere.
Gate/commit: once per session = per command.

Push policy (operator rule, 2026-05-31)

Hardening work — "det vi GJØR": command edits + log.md entries — is committed locally only, never pushed. Only method / process / tooling improvements (this plan, a future checker, STATE conventions) may be pushed, and only on explicit operator OK ("eventuelle forbedringer").

Amendment policy

Improve the method/plan between sessions if a session surfaces a gap (operator-approved). This plan is a living contract, not frozen.

6.9 KiB Raw Blame History Unescape Escape