v1 (autonomous 5-step + reviewer swarm + /trekreview gate) produced confident fabrications and burned context for what are usually 0-1 line edits. v2 is a slow, dialogic, one-command-per-session gate with the operator as truth source and every mechanical claim tool-grounded. Method, command-class predicates, session queue, and STATE.md end-of-session handoff are documented in plan.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
123 lines
6.9 KiB
Markdown
123 lines
6.9 KiB
Markdown
# LinkedIn Studio — Command Hardening Plan v2 (interactive quality gate)
|
||
|
||
> **Supersedes the v1 autonomous plan** (recoverable in git history: commit `2f90880`).
|
||
> Motivated by the 2026-05-31 S2 fabrication incident: the v1 method (autonomous, prose-heavy
|
||
> 5-step per command, per-command Opus reviewer swarm, `/trekreview` gate) produced **confident
|
||
> fabrications** (Claude wrote SIMULATE/EVALUATE/HARDEN narratives — char counts, line numbers,
|
||
> before/after — for files it had not actually observed) and **burned context without value**
|
||
> (full-file re-reads + long logs + reviewer agents for what are usually 0–1-line edits). A large
|
||
> parallel tool-batch also destabilized the tool-result stream (out-of-order / duplicated /
|
||
> truncated returns). Nothing false shipped — the failed edits + empty greps + real reads caught
|
||
> it, and it was reverted — but the method had to change.
|
||
>
|
||
> **v2 is a slow, dialogic, human-in-the-loop gate: ONE command per session, the operator as the
|
||
> truth source, every mechanical claim tool-grounded.**
|
||
|
||
## Intent
|
||
|
||
Harden each of the 29 commands to its own intent — **one command per session** — with the
|
||
operator in the loop. Last quality gate before GUI/M0. Brief: `brief.md`.
|
||
|
||
## The locked interactive method (identical every session)
|
||
|
||
Per command, 8 steps. **Claude STOPS and waits at each "agree" point — no autonomous run-ahead:**
|
||
|
||
1. **Agree intention** — show the command's own description + its journey role; agree in one line.
|
||
2. **Agree test-method** — the class predicates (below) + what the simulation must prove.
|
||
3. **Agree persona** — who invokes + with what input.
|
||
4. **Run the test** — tool-grounded mechanical facts (each with `file:line`) + a GROUNDED persona
|
||
simulation (a real invocation → concrete output, anchored to the shown file region).
|
||
5. **Talk** — the operator's judgment is the truth source.
|
||
6. **Agree improvements** — surgical, or "no change".
|
||
7. **Implement** — Edit → grep-confirm landed → show the confirmed diff.
|
||
8. **Move on** — run the end-of-session ritual; the next command is its own session.
|
||
|
||
### Two guards (non-negotiable)
|
||
|
||
- **Read-and-show BEFORE simulate** — the simulation is anchored to observed text, shown to the operator.
|
||
- **Every mechanical claim from a tool** — char count, ✓/✗, line number: shown from a tool before stated.
|
||
|
||
### Anti-drift through dialogue
|
||
|
||
The per-command dialogue IS the drift-reduction mechanism: Claude **proposes** at each agree-point
|
||
(intention / test-method / persona / improvement) and **waits** for the operator to confirm or
|
||
correct before proceeding. Frequent grounding checkpoints are what keep Claude from confabulating —
|
||
never race past an agree-point.
|
||
|
||
### Execution discipline (the S1+S2 incident lessons)
|
||
|
||
- **ONE command per session.** Never hold multiple command files in context at once.
|
||
- **Never batch dependent tool calls in parallel.** Read→Edit→grep→log is strictly sequential.
|
||
Large parallel batches destabilize the tool-result stream.
|
||
- **Guard greps:** `grep`/`grep -c` exit non-zero on zero matches → append `|| echo NONE` so a
|
||
batch sibling isn't torn down.
|
||
- **Never write a HARDEN/after claim before the edit is grep-confirmed landed.**
|
||
- **If the tool layer degrades** (slow / truncated / empty / replayed returns): PAUSE and resume
|
||
in a fresh session. Do not push through a degraded window.
|
||
|
||
## Command classes & mechanical predicates (the testmethod per class)
|
||
|
||
- **post-emitting** — hook 110–140 present · length band present · no-body-link present ·
|
||
buzzword check present · topic→5 pillars present · every `subagent_type: linkedin-studio:X`
|
||
resolves to `agents/X.md`.
|
||
- **routing** — every emitted `/linkedin:Y` resolves to `commands/Y.md`.
|
||
- **guided/stateful** — primary promised artifact actually produced · subagent targets resolve ·
|
||
graceful degradation present.
|
||
- **analytics** — graceful degradation present · saves/dwell honesty intact.
|
||
|
||
## Synthetic test data (fixtures) — for data-dependent commands
|
||
|
||
Some commands cannot be meaningfully tested without input data — analytics (`import`, `report`,
|
||
`analyze`, `audit`, `ab-test`) need a CSV/JSON; stateful commands (`calendar`, `firsthour`) need a
|
||
queue/state. For these, **step 2 (agree test-method) includes agreeing a small fictitious dataset**
|
||
that exercises the command's real path:
|
||
|
||
- **Isolated + throwaway** — built in a temp/scratch location (e.g. `/tmp/…` or a gitignored path),
|
||
**NEVER** the real analytics/state dirs, **NEVER** committed.
|
||
- **Minimal but realistic** — just enough rows/fields to trigger the behavior under test (e.g. a
|
||
3–5-row CSV with the real column names; a 2-experiment ab-test file).
|
||
- **Clearly fictitious** — obviously-fake values, impossible to mistake for real data.
|
||
- **Discarded after the test** — the fixture proves the command works; it is not a deliverable.
|
||
|
||
## Session queue (ONE command each)
|
||
|
||
`S1 ✅` quick · onboarding · first-post · setup — **DONE under v1; do NOT re-harden.**
|
||
|
||
| | | | |
|
||
|---|---|---|---|
|
||
| S2 post | S9 newsletter* | S16 analyze | S23 profile |
|
||
| S3 react | S10 headless-review | S17 audit | S24 create |
|
||
| S4 multiplatform | S11 pivot | S18 ab-test | S25 measure |
|
||
| S5 carousel | S12 firsthour | S19 strategy | S26 linkedin |
|
||
| S6 video | S13 calendar | S20 competitive | |
|
||
| S7 batch | S14 import | S21 monetize | |
|
||
| S8 pipeline | S15 report | S22 outreach | |
|
||
|
||
*S9 newsletter (16-phase) may split into S9a/S9b. Otherwise one command = one session.
|
||
|
||
## End-of-session ritual (every session — STATE.md handoff baked in)
|
||
|
||
1. `bash scripts/test-runner.sh` → `Failed: 0` + counts 29/19/25/6.
|
||
2. Terse, tool-grounded `log.md` line for the command (ONLY after verify): `PASS` / edit + `file:line`.
|
||
3. Commit own file(s) only, **LOCALLY** (`fix:` for command edits; explicit `git add <path>`, never `-u`/`-A`). **Do NOT push hardening work** — see Push policy.
|
||
4. **Overwrite STATE.md:** mark this command ✅; point at the NEXT single command + its class;
|
||
carry the method + discipline rules → ready for the next "Les STATE.md og følg instruksjonene".
|
||
5. (Optional) update Voyage `.session-state.local.json` next-label for `/trekcontinue` coherence.
|
||
|
||
## Defaults (adjustable between sessions)
|
||
|
||
1. **Log:** keep, terse, tool-grounded, after-verify only.
|
||
2. **Persona:** ICP author default; fresh-adopter for onboarding/first-post/setup; deviate explicitly elsewhere.
|
||
3. **Gate/commit:** once per session = per command.
|
||
|
||
## Push policy (operator rule, 2026-05-31)
|
||
|
||
Hardening work — *"det vi GJØR"*: command edits + `log.md` entries — is committed **locally
|
||
only, never pushed**. Only **method / process / tooling improvements** (this plan, a future
|
||
checker, STATE conventions) may be pushed, and only on **explicit operator OK**
|
||
(*"eventuelle forbedringer"*).
|
||
|
||
## Amendment policy
|
||
|
||
Improve the method/plan **between sessions** if a session surfaces a gap (operator-approved).
|
||
This plan is a living contract, not frozen.
|