ktg-plugin-marketplace/plugins/linkedin-studio/agents/fixtures/content-reviewer-cases.md
Kjell Tore Guttormsen e69ea1f4c9 feat(linkedin-studio): v3.1.0 — Endring 9 adversarial review-pakke + per-artefakt personas
Cold, adversarial review package for the long-form pipeline + configurable
per-edition personas. Motivated by Del 4 (Security Champions pivot): the
in-session editor + persona sweep shared the drafting session's framing-bias,
so the shipped version was never independently re-reviewed.

Headless package (9a/9b):
- New Step 6.5 (headless-review) in /linkedin:newsletter, after the persona
  sweep, before lock — the independence layer the in-session gates can't be.
- New standalone /linkedin:headless-review command (run in a fresh session for
  maximum isolation; reconstructs frozen draft + contract + personas from disk).
- 3 new Opus archetypes, each with a cardinal context-isolation block that
  refuses drafting-session framing as "context pollution":
  - content-reviewer (argument integrity C1–C5, ≤8 flags)
  - language-reviewer (Norwegian language L1–L5, ≤10 flags)
  - fact-reviewer (cold re-verification F1–F4, risk-sort + pivot-risk, WebSearch)
- Deliberate redundancy with fact-checker / editorial-reviewer documented so
  the pairs are never de-duplicated.

Pivot-reopen (9c):
- New /linkedin:pivot command: logs articles.NN.pivots[], resets currentPhase,
  un-locks, marks gates to re-run.
- Pivot-detection gate in Step 8 lock precondition (>20% word-count change or
  >2 new sections re-opens cleared gates). Del 4 v8→v11 worked example.

Per-artifact personas (new requirement):
- articles.NN.personas with resolution order (edition-state → series file →
  plugin library → interactive). One or more readers configurable per edition.

Schema/docs:
- edition-state.template.json: additive personas[], pivots[], headlessReview,
  headless-review phase (16 phases); personaSweep.resonance.wordCount baseline.
- 3 fasit fixtures + 3 structural lint tests (Del 4 worked cases).
- Counts: 24→26 commands, 16→19 agents, 15→16 newsletter phases.
- README + CLAUDE.md (plugin + root) + CHANGELOG synced.

Verification: 35 agent-fixture + 59 hook + 20 render tests green. Backward-
compatible (additive state); reload required before the 3 new agents resolve.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 13:01:24 +02:00

12 KiB
Raw Permalink Blame History

Content-Reviewer Fasit Fixture

The Del 4 production round (Security Champions, Maskinrommet, 2026-05-29) as the gold standard for the content-reviewer agent. Late in the round the draft took a Security Champions pivot: a new ~260-word section introducing the Champions model and a new ~270-word role-description section were added after the in-session gates had already formed their reading. The in-session gates (fact-check Step 5, editorial Step 5.5, persona sweep Step 6) all read the draft through the drafting session's framing — they knew why the pivot happened and what it was meant to argue, so they silently supplied the missing argumentative steps for free. A cold, adversarial reviewer — handed only the frozen page — cannot supply them, and that is exactly the point: the cold read catches the argument holes the framing hid.

The six cases below are the fasit: a correct content-reviewer run on the frozen Del 4 draft should surface comparable flags, mapped to the one axis with consistent severities.

This file is a fasit, not a test harness. The structural lint lives in agents/__tests__/content-reviewer-fixture.test.mjs. Whether the agent's live flags actually reproduce these directions is [GATE]/[OPERATØR] — it is not self-certified here.

The jury judges; the writer writes. Every expected output below is direction, not rewritten copy. A correct agent run hands back flags + a severity — never edited prose. (Foreslått retning, not a new sentence.)

Why this gate exists. The in-session gates shared the drafting session's framing-bias: they read the pivot knowing its intent, so they bridged the argument's gaps without noticing the gaps were there. A cold reader — run in an isolated context with no history, no changelog, no "out of scope" list, no pivot narrative — reads the frozen page as a first-time skeptic and finds the argument holes the framing hid. Any such framing that reaches the agent is context pollution: it is named and ignored, never acted on. This is a distinct failure surface from craft (editorial), language (language-reviewer), truth (fact-reviewer), and response (persona) — those gates can all pass while the argument itself does not hold.


The axis (the agent judges on exactly this)

The agent judges on one axis — argument-integritet (argument & logical integrity): does the reasoning hold? It does not judge craft, language, factual truth, or reader response — those are editorial-reviewer, language-reviewer, fact-reviewer, and persona-reviewer respectively. The axis decomposes into exactly five checks:

  • C1 — logiske hull (logical holes): a step in the argument chain is missing; the text jumps from A to C without B, and the reader cannot reconstruct why the conclusion follows.
  • C2 — ubegrunnede antakelser (unsupported assumptions): the argument leans on a premise it never establishes, asserted as self-evident when a thoughtful reader would not simply grant it.
  • C3 — argument-motsigelser (argument-level contradiction): the recommendation, premise, and payoff are not mutually consistent — distinct from editorial- reviewer's P4 (a prose-level contradiction between two passages); C3 is a contradiction in the logic of the argument itself.
  • C4 — manglende konkretisering der argumentet trenger det (missing argumentative concretization): a load-bearing claim a skeptic would only believe with a concrete instance stays abstract — not for vividness (editorial A1) but because the argument needs the instance to carry weight.
  • C5 — ubesvart «what about X?» (the unanswered obvious objection): the strongest obvious objection a thoughtful reader raises is never acknowledged or answered — the argument wins only because it never met its best counter.

Severity (every flag carries exactly one)

  • BLOCK — a defect that breaks the argument: an argument-level contradiction (C3), or an unanswered objection (C5) that, once raised, collapses the recommendation.
  • REWORK — a real gap that should be filled, not load-bearing-fatal: a logical hole (C1), an unsupported load-bearing assumption (C2), a claim that needs concretization (C4).
  • NICE — a minor reasoning soft spot worth tightening if cheap.

Sort BLOCK → REWORK → NICE; cap at eight flags (argument defects are coarser than prose nits); if any are suppressed, say how many and of what severity — never silently truncate.


The six Del 4 argument points (fasit)

Each case states the argument defect a cold read would catch on the frozen Del 4 draft, the check (C1C5) it belongs to, the expected severity, and the direction a correct agent run returns. Every case is an argument blind spot — distinct from craft (what editorial-reviewer would catch) and response (what persona-reviewer would catch). The in-session gates passed the draft; the cold read does not, because the framing they shared is gone.

Case 1 — pivot-premisset asserted uten støtte (unsupported pivot premise)

  • Axis: argument-integritet · Check: C2 · Severity: BLOCK
  • Cold-read defect: The new ~260-word Security Champions section opens by treating "Security Champions er svaret" as an established premise the rest of the part builds on — but the frozen page never establishes why the Champions model is the right response rather than one option among several. The drafting session knew the rationale; the cold reader is handed only the assertion.
  • Fasit / direction: The pivot's load-bearing premise is asserted as self-evident with no support a first-time skeptic would grant. Direction: establish why the Champions model follows from the part's problem, or hedge it as one option — do not let the whole section rest on an un-earned premise. (An unsupported load-bearing premise that the section depends on is BLOCK: the argument has not earned the right to make its central move.)

Case 2 — ubesvart «hva med små organisasjoner?» (unanswered obvious objection)

  • Axis: argument-integritet · Check: C5 · Severity: BLOCK
  • Cold-read defect: The strongest obvious objection a thoughtful reader raises on first reading the Champions pivot — "what about small organisations that cannot staff a dedicated Champion?" — is never acknowledged or answered. The recommendation effectively assumes an org large enough to nominate a Champion, and the argument wins only because it never meets this counter.
  • Fasit / direction: Name the objection and answer it (a small-org variant, an explicit scope boundary, or a rule of thumb) — an unanswered objection that, once raised, collapses the recommendation for a whole class of readers is BLOCK. Direction only; the agent does not write the answer.

Case 3 — sprang fra «Champions finnes» til «dømmekraft bevart» (logical hole)

  • Axis: argument-integritet · Check: C1 · Severity: REWORK
  • Cold-read defect: The text jumps from "Security Champions finnes i organisasjonen" (A) to "dermed er dømmekraften bevart" (C) with no connecting step (B): existence of a role does not on its own establish that judgment is preserved. The reader cannot reconstruct why the conclusion follows. The session carried the missing step in its head; the page does not state it.
  • Fasit / direction: Supply the missing step — how the Champion's presence translates into preserved judgment (mechanism, mandate, practice) — or soften the conclusion to a hypothesis. A bridgeable-but-unbridged jump on a supporting line is REWORK.

Case 4 — rolle-seksjonen aldri forankret i én konkret org (missing concretization)

  • Axis: argument-integritet · Check: C4 · Severity: REWORK
  • Cold-read defect: The new ~270-word role-description section describes what a Champion does entirely in the abstract and never grounds it in one concrete organisation where this role actually operates. This is not a vividness nit (that would be editorial A1) — the argument that the role works needs one real instance to be believed; a skeptic will not grant an abstract job description as evidence the model functions.
  • Fasit / direction: Anchor the role in a single concrete (preferably Norwegian) org where a Champion operates, so the load-bearing claim "this role works" carries weight. Flag the absence of the argument-bearing instance; do not supply the org. (Boundary: route any pure craft/vividness face to editorial A1; this flag is the argument face — the claim cannot be believed abstractly.)

Case 5 — anbefaling delegerer den dømmekraften serien sier ikke kan settes ut (argument contradiction)

  • Axis: argument-integritet · Check: C3 · Severity: BLOCK
  • Cold-read defect: The series premise is "du kan ikke sette ut dømmekraft" (you cannot outsource judgment). The Champions recommendation, read cold on the frozen page, effectively delegates that judgment to the Champion — the close recommends the very move the premise rules out. Premise, recommendation, and payoff are not mutually consistent. This is an argument-level contradiction (C3), not a prose-level one between two passages (that would be editorial P4): the logic defeats itself.
  • Fasit / direction: Hold premise, recommendation, and gevinst side by side and resolve one side — either reframe the Champion as supporting judgment that stays distributed (not a delegate it is outsourced to), or qualify the series premise. A recommendation that defeats the series premise is BLOCK.

Case 6 — gevinst-leddet antar utbredt modenhet (unsupported assumption)

  • Axis: argument-integritet · Check: C2 · Severity: REWORK
  • Cold-read defect: The promised payoff of the Champions model leans on an unstated assumption that the surrounding organisation is mature enough to use a Champion well (clear mandate, time allocation, leadership backing). The frozen page asserts the gevinst as if it follows automatically; the cold reader sees an un-earned premise standing between the model and its benefit.
  • Fasit / direction: Establish or hedge the maturity assumption the payoff depends on — name the conditions under which the gevinst holds, or mark it conditional. A load-bearing assumption left unstated under the payoff is REWORK (it weakens the case rather than defeating it outright).

Expected aggregate (what a correct run looks like)

  • Total flags: 6 (well within the ≤8 cap — no suppression needed).
  • By check: C1 = 1 (Case 3) · C2 = 2 (Cases 1, 6) · C3 = 1 (Case 5) · C4 = 1 (Case 4) · C5 = 1 (Case 2).
  • By severity: BLOCK = 3 (Cases 1, 2, 5 — unsupported pivot premise, unanswered small-org objection, premise/recommendation contradiction) · REWORK = 3 (Cases 3, 4, 6) · NICE = 0.
  • All six are argument blind spots: none is a craft defect (editorial- reviewer's domain), a language defect (language-reviewer), a factual error (fact-reviewer), or a resonance miss (persona-reviewer). The in-session gates passed the draft on every one of those axes — and still the argument did not hold, because they read it through the session's framing. The cold read is the quantified case for the gate.

A run that reproduces ~these six directions, on the one argument-integritet axis, with ~these severities, is comparable to the cold adversarial read the gate is built to deliver. Exact wording is the editor's; the agent returns direction, not rewritten copy.

Calibration boundary

Whether the agent's live flags truly match this fasit is judged by the operator ([OPERATØR]), not self-certified here. This fixture is the calibration target, the same way editorial-reviewer-cases.md, persona-reviewer-cases.md, and fact-checker-cases.md are fasits for their agents.

Live-run note. A live run on the frozen Del 4 draft requires (a) a Claude Code session reload — a freshly added agent is not invokable until the plugin agent set is rebuilt at session start — and (b) a genuinely cold invocation (an isolated context with no drafting-session history, changelog, scope list, or pivot narrative reaching the agent). Until both hold, this fixture is the gold-standard of record.