Kjell Tore Guttormsen e69ea1f4c9 feat(linkedin-studio): v3.1.0 — Endring 9 adversarial review-pakke + per-artefakt personas

Cold, adversarial review package for the long-form pipeline + configurable
per-edition personas. Motivated by Del 4 (Security Champions pivot): the
in-session editor + persona sweep shared the drafting session's framing-bias,
so the shipped version was never independently re-reviewed.

Headless package (9a/9b):
- New Step 6.5 (headless-review) in /linkedin:newsletter, after the persona
  sweep, before lock — the independence layer the in-session gates can't be.
- New standalone /linkedin:headless-review command (run in a fresh session for
  maximum isolation; reconstructs frozen draft + contract + personas from disk).
- 3 new Opus archetypes, each with a cardinal context-isolation block that
  refuses drafting-session framing as "context pollution":
  - content-reviewer (argument integrity C1–C5, ≤8 flags)
  - language-reviewer (Norwegian language L1–L5, ≤10 flags)
  - fact-reviewer (cold re-verification F1–F4, risk-sort + pivot-risk, WebSearch)
- Deliberate redundancy with fact-checker / editorial-reviewer documented so
  the pairs are never de-duplicated.

Pivot-reopen (9c):
- New /linkedin:pivot command: logs articles.NN.pivots[], resets currentPhase,
  un-locks, marks gates to re-run.
- Pivot-detection gate in Step 8 lock precondition (>20% word-count change or
  >2 new sections re-opens cleared gates). Del 4 v8→v11 worked example.

Per-artifact personas (new requirement):
- articles.NN.personas with resolution order (edition-state → series file →
  plugin library → interactive). One or more readers configurable per edition.

Schema/docs:
- edition-state.template.json: additive personas[], pivots[], headlessReview,
  headless-review phase (16 phases); personaSweep.resonance.wordCount baseline.
- 3 fasit fixtures + 3 structural lint tests (Del 4 worked cases).
- Counts: 24→26 commands, 16→19 agents, 15→16 newsletter phases.
- README + CLAUDE.md (plugin + root) + CHANGELOG synced.

Verification: 35 agent-fixture + 59 hook + 20 render tests green. Backward-
compatible (additive state); reload required before the 3 new agents resolve.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-29 13:01:24 +02:00

11 KiB

Raw Blame History

Fact-Reviewer Fasit Fixture

The Del 4 production round (Security Champions, Maskinrommet, 2026-05-29) as the gold standard for the fact-reviewer agent. The in-session fact-checker (Step 5) ran on a still-moving draft. A late Security Champions pivot — a new argument anchor — arrived after that Step 5 sweep, so the pivot's premise was never fact-checked. The pivot then went through the in-session persona sweep (Step 6) and reached the frozen, publish-ready version with an unverified premise intact. KTG's cold re-reading caught a misattribution, a quote-precision error, a postulated number with no provenance, a "settled standard" that actually varies, and a secondary source trusted for a precise figure — six points in all. Those six points are the fasit below: a correct fact-reviewer run on the frozen/pivoted Del 4 should surface comparable verdicts, mapped to the four checks with consistent risk verdicts and the pivot-risk flag.

This file is a fasit, not a test harness. The structural lint lives in agents/__tests__/fact-reviewer-fixture.test.mjs. Whether the agent's live verdicts actually reproduce these is [GATE]/[OPERATØR] — it is not self-certified here.

The jury judges; the writer writes. Every expected output below is direction, not rewritten copy. A correct agent run hands back a verdict + a source (or "none found") + a fix-as-direction (source it / hedge it / cut it) — never edited prose.

Why this gate exists. Fact-checking was post-hoc relative to the pivot in Del 4: the in-session Step 5 sweep ran before the Security Champions pivot was added, so the pivot premise never met it. A cold re-verification on the frozen/pivoted version closes that gap — it re-checks every claim with equal suspicion, with no knowledge of which passages were pivot-fresh, and so catches exactly the premise Step 5 missed.

The axis and the four checks (the agent judges on exactly these)

The single axis is faktisk-korrekthet — factual correctness, re-verified COLD on the frozen/pivoted version. It is checked through four lenses:

F1 — Verifiserbare påstander (verifiable claims): every checkable assertion (numbers, dates, named examples, attributions, causal claims) searched against primary/credible sources; opinions and predictions skipped.
F2 — Sitat-presisjon (quote precision): any quotation must match the source verbatim — wording, attribution, and who said it. «Vi» vs «Vi i Nav» is a precision failure even when the gist is right.
F3 — Tall-attribusjon (number attribution): every figure must trace to a named source; a postulated number with no provenance is 🟡/🔴. Here provenance is VERIFIED (distinct from editorial-reviewer's P3, which only flags the absence of a source/hedge without searching).
F4 — Kilde-kvalitet (source quality): primary over secondary; a source supporting "around a third" does not verify "exactly 37 %"; post-cutoff claims must be web-searched.

Context isolation — cold reader (the agent's cardinal rule)

The agent runs in a cold context: its only input is this prompt, the frozen draft, and the writing contract. Any pivot narrative, changelog, omission list, or "what the author intended" is context pollution — the agent states it is ignoring it and judges only the text. That independence (no main-session framing-bias) is the whole reason a defect that survived the in-session gates can still be caught here.

Pivot-premise risk (the design feature). Because the agent does not know which passages were added in a late pivot, it re-checks every claim with equal suspicion — a claim's age in the draft buys it no trust. A claim that looks freshly bolted on (new anchor, new section topic, a 2025/2026 figure) is surfaced in a pivot-risk subsection. A pivot-risk claim that fails verification is the headline catch: the pivot premise that never met Step 5.

Risk sort and gate (every claim carries exactly one verdict)

🔴 høy risiko (high risk) → BLOCK — contradicted by evidence, or a precise claim with no usable source.
🟡 uverifisert (unverified) → REWORK — cannot be confirmed / weak sources; asserted as fact must be hedged, sourced, or cut.
🟢 verifisert (verified) → keep — confirmed by a primary/credible source matching the claim.

The agent recommends PASS / REWORK / BLOCK; the operator ([OPERATØR]) holds the gate. Each case block below carries exactly one verdict emoji in its Verdict field; the surrounding prose deliberately avoids the emoji so the structural lint can read a single unambiguous verdict per case.

The six Del 4 worked cases (fasit)

Each case states the point, the check it belongs to (F1–F4), the verdict, whether it is a pivot-risk claim, and the direction a correct cold run returns.

Case 1 — pivot-premissen aldri faktasjekket (pivot premise never met Step 5) — PIVOT-RISK

Check: F1 (verifiable claim — the pivot's anchor assertion) · Verdict: 🔴
Pivot-risk: YES — this is the late Security Champions pivot, added after the Step 5 fact-check; its premise was never verified in-session.
Fasit / direction: The Security Champions pivot rests on a premise asserted as established fact, but no primary source confirms it as stated. Because the pivot arrived after Step 5, the in-session sweep never touched it. The cold re-check — applying equal suspicion to a claim it does not know is pivot-fresh — searches primary sources, finds none that confirm the premise as worded, and returns high risk. This is the exact catch the gate exists for: the cold pass on the frozen version surfaces the pivot premise that Step 5 missed. Direction: source the premise to a primary record or recast it as a hedged hypothesis; do not assert. The agent returns the verdict + "none found", not a rewritten premise.

Case 2 — feilattribusjon (misattribution) — editor caught on cold read

Check: F1 (attribution) + F2 (who said it) · Verdict: 🔴
Pivot-risk: no.
Fasit / direction: A statement is attributed to the wrong source/originator — the named party is not who said or published it. Contradicted by the primary record, so high risk (a contradicted claim is 🔴 regardless of partial score). Direction: correct the attribution to the verified originator, or cut the attribution; the agent names the contradicting source, never invents one.

Case 3 — sitat-presisjon «Vi» vs «Vi i Nav» (quote precision) — editor caught

Check: F2 (quote precision) · Verdict: 🟡
Pivot-risk: no.
Fasit / direction: A quotation's gist is right but the wording/attribution is not verbatim: the source said «Vi i Nav», the draft renders «Vi». Changing who the «vi» refers to is a precision failure even though the meaning is close. The source exists, so this is unverified-as-worded rather than contradicted. Direction: match the source verbatim — restore «Vi i Nav» — or mark it as a paraphrase, not a quote. The agent flags the precision gap; it does not supply the corrected line.

Case 4 — postulert tall uten proveniens (postulated number, no provenance)

Check: F3 (number attribution) · Verdict: 🟡
Pivot-risk: plausible — a figure of this kind often arrives with a late anchor; surface it in the pivot-risk subsection if it reads freshly bolted on.
Fasit / direction: A specific figure is stated as fact with no named source. Distinct from editorial-reviewer's P3, which would only flag the absence of a source/hedge: here the agent searches for the provenance, finds none that supports the exact figure, and returns unverified. Direction: trace it to a named source or hedge it ("anslagsvis"); else cut. The agent never invents a provenance to promote it to verified.

Case 5 — «Security Champions» som settet standard (a settled standard that varies)

Check: F1 (claim) + source-scope · Verdict: 🔴
Pivot-risk: YES — part of the same Security Champions pivot.
Fasit / direction: The "Security Champions" practice is presented as a settled, universal standard, but in reality it is a practice that varies per organization — implementations, scope, and definitions differ. A local convention dressed as a universal standard is a source-scope failure; asserting it as settled with no source that supports the universal framing is high risk. Direction: scope the claim to where it actually holds ("varies; in some orgs…") or source the universal framing to a standard that does establish it. The agent flags the over-broad scope; it does not rewrite the passage.

Case 6 — sekundærkilde brukt for et presist tall (secondary source for a precise figure)

Check: F4 (source quality) + F3 (number) · Verdict: 🟡
Pivot-risk: no.
Fasit / direction: A precise figure is backed only by a secondary source that summarizes the number — the primary record supports a directional claim ("around a third"), not the precise figure ("37 %") the draft asserts. A source supporting "around a third" does not verify "exactly 37 %". Direction: trace to the primary source and confirm the exact figure, soften the draft to the directional claim the secondary source actually supports, or hedge. The agent records the secondary source found and the precision gap, not a corrected number.

Expected aggregate (what a correct cold run looks like)

Total verdicts surfaced: 6 (within a reasonable verification-log cap; no 🔴 silently dropped).
By check: F1 = 3 (Cases 1, 2, 5) · F2 = 2 (Cases 2, 3) · F3 = 2 (Cases 4, 6) · F4 = 1 (Case 6). (Cases overlap checks; the headline check is listed first.)
By risk verdict: 🔴 høy risiko = 3 (Cases 1, 2, 5 → BLOCK) · 🟡 uverifisert = 3 (Cases 3, 4, 6 → REWORK) · 🟢 verifisert = 0 among the flagged points (the rest of the draft's claims verify clean and are not listed here).
Pivot-risk: Cases 1 and 5 are the Security Champions pivot; Case 4 is a plausible pivot-risk. Case 1 is the headline catch — the pivot premise that was never fact-checked in-session, caught only because the cold pass re-checks every claim with equal suspicion.

A run that reproduces ~these six verdicts, on ~these checks, with ~these risk levels — and that surfaces the Security Champions pivot premise as a pivot-risk 🔴 — is comparable to KTG's actual cold re-reading. Exact wording is the editor's; the agent returns direction, not rewritten copy.

Calibration boundary

Whether the agent's live verdicts truly match this fasit is judged by the operator ([OPERATØR]), not self-certified here. This fixture is the calibration target, the same way fact-checker-cases.md, editorial-reviewer-cases.md, and persona-reviewer-cases.md are fasits for their agents.

Live-run note. A live cold run on the frozen Del 4 requires (a) a Claude Code session reload — a freshly added agent is not invokable until the plugin agent set is rebuilt at session start — and (b) the agent run in a genuinely cold context (no drafting-session history, no pivot narrative) with read access to the frozen draft and web search. Until both hold, this fixture is the gold-standard of record.

11 KiB Raw Blame History Unescape Escape