ktg-plugin-marketplace/plugins/linkedin-studio/docs/hardening/plan.md
Kjell Tore Guttormsen 019fe4ed0b docs(linkedin-studio): replace v1 autonomous hardening plan with v2 interactive
v1 (autonomous 5-step + reviewer swarm + /trekreview gate) produced confident
fabrications and burned context for what are usually 0-1 line edits. v2 is a
slow, dialogic, one-command-per-session gate with the operator as truth source
and every mechanical claim tool-grounded. Method, command-class predicates,
session queue, and STATE.md end-of-session handoff are documented in plan.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 07:47:17 +02:00

6.9 KiB
Raw Permalink Blame History

LinkedIn Studio — Command Hardening Plan v2 (interactive quality gate)

Supersedes the v1 autonomous plan (recoverable in git history: commit 2f90880). Motivated by the 2026-05-31 S2 fabrication incident: the v1 method (autonomous, prose-heavy 5-step per command, per-command Opus reviewer swarm, /trekreview gate) produced confident fabrications (Claude wrote SIMULATE/EVALUATE/HARDEN narratives — char counts, line numbers, before/after — for files it had not actually observed) and burned context without value (full-file re-reads + long logs + reviewer agents for what are usually 01-line edits). A large parallel tool-batch also destabilized the tool-result stream (out-of-order / duplicated / truncated returns). Nothing false shipped — the failed edits + empty greps + real reads caught it, and it was reverted — but the method had to change.

v2 is a slow, dialogic, human-in-the-loop gate: ONE command per session, the operator as the truth source, every mechanical claim tool-grounded.

Intent

Harden each of the 29 commands to its own intent — one command per session — with the operator in the loop. Last quality gate before GUI/M0. Brief: brief.md.

The locked interactive method (identical every session)

Per command, 8 steps. Claude STOPS and waits at each "agree" point — no autonomous run-ahead:

  1. Agree intention — show the command's own description + its journey role; agree in one line.
  2. Agree test-method — the class predicates (below) + what the simulation must prove.
  3. Agree persona — who invokes + with what input.
  4. Run the test — tool-grounded mechanical facts (each with file:line) + a GROUNDED persona simulation (a real invocation → concrete output, anchored to the shown file region).
  5. Talk — the operator's judgment is the truth source.
  6. Agree improvements — surgical, or "no change".
  7. Implement — Edit → grep-confirm landed → show the confirmed diff.
  8. Move on — run the end-of-session ritual; the next command is its own session.

Two guards (non-negotiable)

  • Read-and-show BEFORE simulate — the simulation is anchored to observed text, shown to the operator.
  • Every mechanical claim from a tool — char count, ✓/✗, line number: shown from a tool before stated.

Anti-drift through dialogue

The per-command dialogue IS the drift-reduction mechanism: Claude proposes at each agree-point (intention / test-method / persona / improvement) and waits for the operator to confirm or correct before proceeding. Frequent grounding checkpoints are what keep Claude from confabulating — never race past an agree-point.

Execution discipline (the S1+S2 incident lessons)

  • ONE command per session. Never hold multiple command files in context at once.
  • Never batch dependent tool calls in parallel. Read→Edit→grep→log is strictly sequential. Large parallel batches destabilize the tool-result stream.
  • Guard greps: grep/grep -c exit non-zero on zero matches → append || echo NONE so a batch sibling isn't torn down.
  • Never write a HARDEN/after claim before the edit is grep-confirmed landed.
  • If the tool layer degrades (slow / truncated / empty / replayed returns): PAUSE and resume in a fresh session. Do not push through a degraded window.

Command classes & mechanical predicates (the testmethod per class)

  • post-emitting — hook 110140 present · length band present · no-body-link present · buzzword check present · topic→5 pillars present · every subagent_type: linkedin-studio:X resolves to agents/X.md.
  • routing — every emitted /linkedin:Y resolves to commands/Y.md.
  • guided/stateful — primary promised artifact actually produced · subagent targets resolve · graceful degradation present.
  • analytics — graceful degradation present · saves/dwell honesty intact.

Synthetic test data (fixtures) — for data-dependent commands

Some commands cannot be meaningfully tested without input data — analytics (import, report, analyze, audit, ab-test) need a CSV/JSON; stateful commands (calendar, firsthour) need a queue/state. For these, step 2 (agree test-method) includes agreeing a small fictitious dataset that exercises the command's real path:

  • Isolated + throwaway — built in a temp/scratch location (e.g. /tmp/… or a gitignored path), NEVER the real analytics/state dirs, NEVER committed.
  • Minimal but realistic — just enough rows/fields to trigger the behavior under test (e.g. a 35-row CSV with the real column names; a 2-experiment ab-test file).
  • Clearly fictitious — obviously-fake values, impossible to mistake for real data.
  • Discarded after the test — the fixture proves the command works; it is not a deliverable.

Session queue (ONE command each)

S1 ✅ quick · onboarding · first-post · setup — DONE under v1; do NOT re-harden.

S2 post S9 newsletter* S16 analyze S23 profile
S3 react S10 headless-review S17 audit S24 create
S4 multiplatform S11 pivot S18 ab-test S25 measure
S5 carousel S12 firsthour S19 strategy S26 linkedin
S6 video S13 calendar S20 competitive
S7 batch S14 import S21 monetize
S8 pipeline S15 report S22 outreach

*S9 newsletter (16-phase) may split into S9a/S9b. Otherwise one command = one session.

End-of-session ritual (every session — STATE.md handoff baked in)

  1. bash scripts/test-runner.shFailed: 0 + counts 29/19/25/6.
  2. Terse, tool-grounded log.md line for the command (ONLY after verify): PASS / edit + file:line.
  3. Commit own file(s) only, LOCALLY (fix: for command edits; explicit git add <path>, never -u/-A). Do NOT push hardening work — see Push policy.
  4. Overwrite STATE.md: mark this command ; point at the NEXT single command + its class; carry the method + discipline rules → ready for the next "Les STATE.md og følg instruksjonene".
  5. (Optional) update Voyage .session-state.local.json next-label for /trekcontinue coherence.

Defaults (adjustable between sessions)

  1. Log: keep, terse, tool-grounded, after-verify only.
  2. Persona: ICP author default; fresh-adopter for onboarding/first-post/setup; deviate explicitly elsewhere.
  3. Gate/commit: once per session = per command.

Push policy (operator rule, 2026-05-31)

Hardening work — "det vi GJØR": command edits + log.md entries — is committed locally only, never pushed. Only method / process / tooling improvements (this plan, a future checker, STATE conventions) may be pushed, and only on explicit operator OK ("eventuelle forbedringer").

Amendment policy

Improve the method/plan between sessions if a session surfaces a gap (operator-approved). This plan is a living contract, not frozen.