ktg-plugin-marketplace/plugins/linkedin-studio/docs/hardening/plan.md
Kjell Tore Guttormsen 019fe4ed0b docs(linkedin-studio): replace v1 autonomous hardening plan with v2 interactive
v1 (autonomous 5-step + reviewer swarm + /trekreview gate) produced confident
fabrications and burned context for what are usually 0-1 line edits. v2 is a
slow, dialogic, one-command-per-session gate with the operator as truth source
and every mechanical claim tool-grounded. Method, command-class predicates,
session queue, and STATE.md end-of-session handoff are documented in plan.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 07:47:17 +02:00

123 lines
6.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# LinkedIn Studio — Command Hardening Plan v2 (interactive quality gate)
> **Supersedes the v1 autonomous plan** (recoverable in git history: commit `2f90880`).
> Motivated by the 2026-05-31 S2 fabrication incident: the v1 method (autonomous, prose-heavy
> 5-step per command, per-command Opus reviewer swarm, `/trekreview` gate) produced **confident
> fabrications** (Claude wrote SIMULATE/EVALUATE/HARDEN narratives — char counts, line numbers,
> before/after — for files it had not actually observed) and **burned context without value**
> (full-file re-reads + long logs + reviewer agents for what are usually 01-line edits). A large
> parallel tool-batch also destabilized the tool-result stream (out-of-order / duplicated /
> truncated returns). Nothing false shipped — the failed edits + empty greps + real reads caught
> it, and it was reverted — but the method had to change.
>
> **v2 is a slow, dialogic, human-in-the-loop gate: ONE command per session, the operator as the
> truth source, every mechanical claim tool-grounded.**
## Intent
Harden each of the 29 commands to its own intent — **one command per session** — with the
operator in the loop. Last quality gate before GUI/M0. Brief: `brief.md`.
## The locked interactive method (identical every session)
Per command, 8 steps. **Claude STOPS and waits at each "agree" point — no autonomous run-ahead:**
1. **Agree intention** — show the command's own description + its journey role; agree in one line.
2. **Agree test-method** — the class predicates (below) + what the simulation must prove.
3. **Agree persona** — who invokes + with what input.
4. **Run the test** — tool-grounded mechanical facts (each with `file:line`) + a GROUNDED persona
simulation (a real invocation → concrete output, anchored to the shown file region).
5. **Talk** — the operator's judgment is the truth source.
6. **Agree improvements** — surgical, or "no change".
7. **Implement** — Edit → grep-confirm landed → show the confirmed diff.
8. **Move on** — run the end-of-session ritual; the next command is its own session.
### Two guards (non-negotiable)
- **Read-and-show BEFORE simulate** — the simulation is anchored to observed text, shown to the operator.
- **Every mechanical claim from a tool** — char count, ✓/✗, line number: shown from a tool before stated.
### Anti-drift through dialogue
The per-command dialogue IS the drift-reduction mechanism: Claude **proposes** at each agree-point
(intention / test-method / persona / improvement) and **waits** for the operator to confirm or
correct before proceeding. Frequent grounding checkpoints are what keep Claude from confabulating —
never race past an agree-point.
### Execution discipline (the S1+S2 incident lessons)
- **ONE command per session.** Never hold multiple command files in context at once.
- **Never batch dependent tool calls in parallel.** Read→Edit→grep→log is strictly sequential.
Large parallel batches destabilize the tool-result stream.
- **Guard greps:** `grep`/`grep -c` exit non-zero on zero matches → append `|| echo NONE` so a
batch sibling isn't torn down.
- **Never write a HARDEN/after claim before the edit is grep-confirmed landed.**
- **If the tool layer degrades** (slow / truncated / empty / replayed returns): PAUSE and resume
in a fresh session. Do not push through a degraded window.
## Command classes & mechanical predicates (the testmethod per class)
- **post-emitting** — hook 110140 present · length band present · no-body-link present ·
buzzword check present · topic→5 pillars present · every `subagent_type: linkedin-studio:X`
resolves to `agents/X.md`.
- **routing** — every emitted `/linkedin:Y` resolves to `commands/Y.md`.
- **guided/stateful** — primary promised artifact actually produced · subagent targets resolve ·
graceful degradation present.
- **analytics** — graceful degradation present · saves/dwell honesty intact.
## Synthetic test data (fixtures) — for data-dependent commands
Some commands cannot be meaningfully tested without input data — analytics (`import`, `report`,
`analyze`, `audit`, `ab-test`) need a CSV/JSON; stateful commands (`calendar`, `firsthour`) need a
queue/state. For these, **step 2 (agree test-method) includes agreeing a small fictitious dataset**
that exercises the command's real path:
- **Isolated + throwaway** — built in a temp/scratch location (e.g. `/tmp/…` or a gitignored path),
**NEVER** the real analytics/state dirs, **NEVER** committed.
- **Minimal but realistic** — just enough rows/fields to trigger the behavior under test (e.g. a
35-row CSV with the real column names; a 2-experiment ab-test file).
- **Clearly fictitious** — obviously-fake values, impossible to mistake for real data.
- **Discarded after the test** — the fixture proves the command works; it is not a deliverable.
## Session queue (ONE command each)
`S1 ✅` quick · onboarding · first-post · setup — **DONE under v1; do NOT re-harden.**
| | | | |
|---|---|---|---|
| S2 post | S9 newsletter* | S16 analyze | S23 profile |
| S3 react | S10 headless-review | S17 audit | S24 create |
| S4 multiplatform | S11 pivot | S18 ab-test | S25 measure |
| S5 carousel | S12 firsthour | S19 strategy | S26 linkedin |
| S6 video | S13 calendar | S20 competitive | |
| S7 batch | S14 import | S21 monetize | |
| S8 pipeline | S15 report | S22 outreach | |
*S9 newsletter (16-phase) may split into S9a/S9b. Otherwise one command = one session.
## End-of-session ritual (every session — STATE.md handoff baked in)
1. `bash scripts/test-runner.sh``Failed: 0` + counts 29/19/25/6.
2. Terse, tool-grounded `log.md` line for the command (ONLY after verify): `PASS` / edit + `file:line`.
3. Commit own file(s) only, **LOCALLY** (`fix:` for command edits; explicit `git add <path>`, never `-u`/`-A`). **Do NOT push hardening work** — see Push policy.
4. **Overwrite STATE.md:** mark this command ✅; point at the NEXT single command + its class;
carry the method + discipline rules → ready for the next "Les STATE.md og følg instruksjonene".
5. (Optional) update Voyage `.session-state.local.json` next-label for `/trekcontinue` coherence.
## Defaults (adjustable between sessions)
1. **Log:** keep, terse, tool-grounded, after-verify only.
2. **Persona:** ICP author default; fresh-adopter for onboarding/first-post/setup; deviate explicitly elsewhere.
3. **Gate/commit:** once per session = per command.
## Push policy (operator rule, 2026-05-31)
Hardening work — *"det vi GJØR"*: command edits + `log.md` entries — is committed **locally
only, never pushed**. Only **method / process / tooling improvements** (this plan, a future
checker, STATE conventions) may be pushed, and only on **explicit operator OK**
(*"eventuelle forbedringer"*).
## Amendment policy
Improve the method/plan **between sessions** if a session surfaces a gap (operator-approved).
This plan is a living contract, not frozen.