ktg-plugin-marketplace/plugins/linkedin-studio/agents/fixtures/fact-reviewer-cases.md
Kjell Tore Guttormsen e69ea1f4c9 feat(linkedin-studio): v3.1.0 — Endring 9 adversarial review-pakke + per-artefakt personas
Cold, adversarial review package for the long-form pipeline + configurable
per-edition personas. Motivated by Del 4 (Security Champions pivot): the
in-session editor + persona sweep shared the drafting session's framing-bias,
so the shipped version was never independently re-reviewed.

Headless package (9a/9b):
- New Step 6.5 (headless-review) in /linkedin:newsletter, after the persona
  sweep, before lock — the independence layer the in-session gates can't be.
- New standalone /linkedin:headless-review command (run in a fresh session for
  maximum isolation; reconstructs frozen draft + contract + personas from disk).
- 3 new Opus archetypes, each with a cardinal context-isolation block that
  refuses drafting-session framing as "context pollution":
  - content-reviewer (argument integrity C1–C5, ≤8 flags)
  - language-reviewer (Norwegian language L1–L5, ≤10 flags)
  - fact-reviewer (cold re-verification F1–F4, risk-sort + pivot-risk, WebSearch)
- Deliberate redundancy with fact-checker / editorial-reviewer documented so
  the pairs are never de-duplicated.

Pivot-reopen (9c):
- New /linkedin:pivot command: logs articles.NN.pivots[], resets currentPhase,
  un-locks, marks gates to re-run.
- Pivot-detection gate in Step 8 lock precondition (>20% word-count change or
  >2 new sections re-opens cleared gates). Del 4 v8→v11 worked example.

Per-artifact personas (new requirement):
- articles.NN.personas with resolution order (edition-state → series file →
  plugin library → interactive). One or more readers configurable per edition.

Schema/docs:
- edition-state.template.json: additive personas[], pivots[], headlessReview,
  headless-review phase (16 phases); personaSweep.resonance.wordCount baseline.
- 3 fasit fixtures + 3 structural lint tests (Del 4 worked cases).
- Counts: 24→26 commands, 16→19 agents, 15→16 newsletter phases.
- README + CLAUDE.md (plugin + root) + CHANGELOG synced.

Verification: 35 agent-fixture + 59 hook + 20 render tests green. Backward-
compatible (additive state); reload required before the 3 new agents resolve.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 13:01:24 +02:00

196 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Fact-Reviewer Fasit Fixture
The Del 4 production round (Security Champions, Maskinrommet, 2026-05-29) as the
gold standard for the `fact-reviewer` agent. The in-session `fact-checker`
(Step 5) ran on a still-moving draft. A **late Security Champions pivot** — a new
argument anchor — arrived **after** that Step 5 sweep, so the pivot's premise was
**never fact-checked**. The pivot then went through the in-session persona sweep
(Step 6) and reached the frozen, publish-ready version with an unverified premise
intact. KTG's cold re-reading caught a misattribution, a quote-precision error, a
postulated number with no provenance, a "settled standard" that actually varies,
and a secondary source trusted for a precise figure — six points in all. Those six
points are the fasit below: a correct `fact-reviewer` run on the frozen/pivoted Del
4 should surface **comparable verdicts**, mapped to the four checks with consistent
risk verdicts and the pivot-risk flag.
This file is a *fasit*, not a test harness. The structural lint lives in
`agents/__tests__/fact-reviewer-fixture.test.mjs`. Whether the agent's live
verdicts actually reproduce these is `[GATE]`/`[OPERATØR]` — it is **not**
self-certified here.
> **The jury judges; the writer writes.** Every expected output below is
> **direction, not rewritten copy.** A correct agent run hands back a verdict + a
> source (or "none found") + a fix-as-direction (source it / hedge it / cut it) —
> **never** edited prose.
> **Why this gate exists.** Fact-checking was **post-hoc relative to the pivot** in
> Del 4: the in-session Step 5 sweep ran *before* the Security Champions pivot was
> added, so the pivot premise never met it. A **cold re-verification on the
> frozen/pivoted version** closes that gap — it re-checks every claim with equal
> suspicion, with no knowledge of which passages were pivot-fresh, and so catches
> exactly the premise Step 5 missed.
---
## The axis and the four checks (the agent judges on exactly these)
The single axis is **faktisk-korrekthet** — factual correctness, re-verified COLD
on the frozen/pivoted version. It is checked through four lenses:
- **F1 — Verifiserbare påstander** (verifiable claims): every checkable assertion
(numbers, dates, named examples, attributions, causal claims) searched against
primary/credible sources; opinions and predictions skipped.
- **F2 — Sitat-presisjon** (quote precision): any quotation must match the source
verbatim — wording, attribution, and who said it. «Vi» vs «Vi i Nav» is a
precision failure even when the gist is right.
- **F3 — Tall-attribusjon** (number attribution): every figure must trace to a
named source; a postulated number with no provenance is 🟡/🔴. Here provenance is
VERIFIED (distinct from `editorial-reviewer`'s P3, which only flags the absence
of a source/hedge without searching).
- **F4 — Kilde-kvalitet** (source quality): primary over secondary; a source
supporting "around a third" does not verify "exactly 37 %"; post-cutoff claims
must be web-searched.
## Context isolation — cold reader (the agent's cardinal rule)
The agent runs in a **cold context**: its only input is this prompt, the frozen
draft, and the writing contract. Any pivot narrative, changelog, omission list, or
"what the author intended" is **context pollution** — the agent states it is
ignoring it and judges only the text. That independence (no main-session
**framing-bias**) is the whole reason a defect that survived the in-session gates
can still be caught here.
**Pivot-premise risk (the design feature).** Because the agent does **not** know
which passages were added in a late pivot, it re-checks **every** claim with equal
suspicion — a claim's age in the draft buys it no trust. A claim that looks freshly
bolted on (new anchor, new section topic, a 2025/2026 figure) is surfaced in a
**pivot-risk** subsection. A pivot-risk claim that fails verification is the
headline catch: the pivot premise that never met Step 5.
## Risk sort and gate (every claim carries exactly one verdict)
- 🔴 **høy risiko** (high risk) → **BLOCK** — contradicted by evidence, or a precise
claim with no usable source.
- 🟡 **uverifisert** (unverified) → **REWORK** — cannot be confirmed / weak sources;
asserted as fact must be hedged, sourced, or cut.
- 🟢 **verifisert** (verified) → keep — confirmed by a primary/credible source
matching the claim.
The agent recommends PASS / REWORK / BLOCK; the operator (`[OPERATØR]`) holds the
gate. Each case block below carries exactly one verdict emoji in its **Verdict**
field; the surrounding prose deliberately avoids the emoji so the structural lint
can read a single unambiguous verdict per case.
---
## The six Del 4 worked cases (fasit)
Each case states the point, the check it belongs to (F1F4), the verdict, whether
it is a pivot-risk claim, and the direction a correct cold run returns.
### Case 1 — pivot-premissen aldri faktasjekket (pivot premise never met Step 5) — PIVOT-RISK
- **Check:** F1 (verifiable claim — the pivot's anchor assertion) · **Verdict:** 🔴
- **Pivot-risk:** YES — this is the late Security Champions pivot, added *after* the
Step 5 fact-check; its premise was never verified in-session.
- **Fasit / direction:** The Security Champions pivot rests on a premise asserted as
established fact, but no primary source confirms it as stated. Because the pivot
arrived after Step 5, the in-session sweep never touched it. The cold re-check
— applying equal suspicion to a claim it does not know is pivot-fresh — searches
primary sources, finds none that confirm the premise as worded, and returns high
risk. **This is the exact catch the gate exists for:** the cold pass on the frozen
version surfaces the pivot premise that Step 5 missed. Direction: source the
premise to a primary record or recast it as a hedged hypothesis; do not assert.
The agent returns the verdict + "none found", not a rewritten premise.
### Case 2 — feilattribusjon (misattribution) — editor caught on cold read
- **Check:** F1 (attribution) + F2 (who said it) · **Verdict:** 🔴
- **Pivot-risk:** no.
- **Fasit / direction:** A statement is attributed to the wrong source/originator —
the named party is not who said or published it. Contradicted by the primary
record, so high risk (a contradicted claim is 🔴 regardless of partial score).
Direction: correct the attribution to the verified originator, or cut the
attribution; the agent names the contradicting source, never invents one.
### Case 3 — sitat-presisjon «Vi» vs «Vi i Nav» (quote precision) — editor caught
- **Check:** F2 (quote precision) · **Verdict:** 🟡
- **Pivot-risk:** no.
- **Fasit / direction:** A quotation's gist is right but the wording/attribution is
not verbatim: the source said «Vi i Nav», the draft renders «Vi». Changing who the
«vi» refers to is a precision failure even though the meaning is close. The source
exists, so this is unverified-as-worded rather than contradicted. Direction: match
the source verbatim — restore «Vi i Nav» — or mark it as a paraphrase, not a
quote. The agent flags the precision gap; it does not supply the corrected line.
### Case 4 — postulert tall uten proveniens (postulated number, no provenance)
- **Check:** F3 (number attribution) · **Verdict:** 🟡
- **Pivot-risk:** plausible — a figure of this kind often arrives with a late anchor;
surface it in the pivot-risk subsection if it reads freshly bolted on.
- **Fasit / direction:** A specific figure is stated as fact with **no named
source**. Distinct from `editorial-reviewer`'s P3, which would only flag the
*absence* of a source/hedge: here the agent **searches for the provenance**, finds
none that supports the exact figure, and returns unverified. Direction: trace it to
a named source or hedge it ("anslagsvis"); else cut. The agent never invents a
provenance to promote it to verified.
### Case 5 — «Security Champions» som settet standard (a settled standard that varies)
- **Check:** F1 (claim) + source-scope · **Verdict:** 🔴
- **Pivot-risk:** YES — part of the same Security Champions pivot.
- **Fasit / direction:** The "Security Champions" practice is presented as a settled,
universal standard, but in reality it is a practice that **varies per
organization** — implementations, scope, and definitions differ. A local
convention dressed as a universal standard is a source-scope failure; asserting it
as settled with no source that supports the universal framing is high risk.
Direction: scope the claim to where it actually holds ("varies; in some orgs…") or
source the universal framing to a standard that does establish it. The agent flags
the over-broad scope; it does not rewrite the passage.
### Case 6 — sekundærkilde brukt for et presist tall (secondary source for a precise figure)
- **Check:** F4 (source quality) + F3 (number) · **Verdict:** 🟡
- **Pivot-risk:** no.
- **Fasit / direction:** A precise figure is backed only by a **secondary source**
that summarizes the number — the primary record supports a *directional* claim
("around a third"), not the precise figure ("37 %") the draft asserts. A source
supporting "around a third" does not verify "exactly 37 %". Direction: trace to the
primary source and confirm the exact figure, soften the draft to the directional
claim the secondary source actually supports, or hedge. The agent records the
secondary source found and the precision gap, not a corrected number.
---
## Expected aggregate (what a correct cold run looks like)
- **Total verdicts surfaced:** 6 (within a reasonable verification-log cap; no 🔴
silently dropped).
- **By check:** F1 = 3 (Cases 1, 2, 5) · F2 = 2 (Cases 2, 3) · F3 = 2 (Cases 4, 6) ·
F4 = 1 (Case 6). (Cases overlap checks; the headline check is listed first.)
- **By risk verdict:** 🔴 høy risiko = 3 (Cases 1, 2, 5 → BLOCK) · 🟡 uverifisert = 3
(Cases 3, 4, 6 → REWORK) · 🟢 verifisert = 0 among the flagged points (the rest of
the draft's claims verify clean and are not listed here).
- **Pivot-risk:** Cases 1 and 5 are the Security Champions pivot; Case 4 is a
plausible pivot-risk. **Case 1 is the headline catch** — the pivot premise that
was never fact-checked in-session, caught only because the cold pass re-checks
every claim with equal suspicion.
A run that reproduces ~these six verdicts, on ~these checks, with ~these risk
levels — and that surfaces the Security Champions pivot premise as a pivot-risk 🔴
— is **comparable** to KTG's actual cold re-reading. Exact wording is the editor's;
the agent returns **direction, not rewritten copy**.
## Calibration boundary
Whether the agent's live verdicts truly match this fasit is judged by the operator
(`[OPERATØR]`), not self-certified here. This fixture is the calibration target, the
same way `fact-checker-cases.md`, `editorial-reviewer-cases.md`, and
`persona-reviewer-cases.md` are fasits for their agents.
> **Live-run note.** A live cold run on the frozen Del 4 requires (a) a Claude Code
> session reload — a freshly added agent is not invokable until the plugin agent set
> is rebuilt at session start — and (b) the agent run in a genuinely cold context
> (no drafting-session history, no pivot narrative) with read access to the frozen
> draft and web search. Until both hold, this fixture is the gold-standard of record.