Cold, adversarial review package for the long-form pipeline + configurable per-edition personas. Motivated by Del 4 (Security Champions pivot): the in-session editor + persona sweep shared the drafting session's framing-bias, so the shipped version was never independently re-reviewed. Headless package (9a/9b): - New Step 6.5 (headless-review) in /linkedin:newsletter, after the persona sweep, before lock — the independence layer the in-session gates can't be. - New standalone /linkedin:headless-review command (run in a fresh session for maximum isolation; reconstructs frozen draft + contract + personas from disk). - 3 new Opus archetypes, each with a cardinal context-isolation block that refuses drafting-session framing as "context pollution": - content-reviewer (argument integrity C1–C5, ≤8 flags) - language-reviewer (Norwegian language L1–L5, ≤10 flags) - fact-reviewer (cold re-verification F1–F4, risk-sort + pivot-risk, WebSearch) - Deliberate redundancy with fact-checker / editorial-reviewer documented so the pairs are never de-duplicated. Pivot-reopen (9c): - New /linkedin:pivot command: logs articles.NN.pivots[], resets currentPhase, un-locks, marks gates to re-run. - Pivot-detection gate in Step 8 lock precondition (>20% word-count change or >2 new sections re-opens cleared gates). Del 4 v8→v11 worked example. Per-artifact personas (new requirement): - articles.NN.personas with resolution order (edition-state → series file → plugin library → interactive). One or more readers configurable per edition. Schema/docs: - edition-state.template.json: additive personas[], pivots[], headlessReview, headless-review phase (16 phases); personaSweep.resonance.wordCount baseline. - 3 fasit fixtures + 3 structural lint tests (Del 4 worked cases). - Counts: 24→26 commands, 16→19 agents, 15→16 newsletter phases. - README + CLAUDE.md (plugin + root) + CHANGELOG synced. Verification: 35 agent-fixture + 59 hook + 20 render tests green. Backward- compatible (additive state); reload required before the 3 new agents resolve. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
196 lines
11 KiB
Markdown
196 lines
11 KiB
Markdown
# Fact-Reviewer Fasit Fixture
|
||
|
||
The Del 4 production round (Security Champions, Maskinrommet, 2026-05-29) as the
|
||
gold standard for the `fact-reviewer` agent. The in-session `fact-checker`
|
||
(Step 5) ran on a still-moving draft. A **late Security Champions pivot** — a new
|
||
argument anchor — arrived **after** that Step 5 sweep, so the pivot's premise was
|
||
**never fact-checked**. The pivot then went through the in-session persona sweep
|
||
(Step 6) and reached the frozen, publish-ready version with an unverified premise
|
||
intact. KTG's cold re-reading caught a misattribution, a quote-precision error, a
|
||
postulated number with no provenance, a "settled standard" that actually varies,
|
||
and a secondary source trusted for a precise figure — six points in all. Those six
|
||
points are the fasit below: a correct `fact-reviewer` run on the frozen/pivoted Del
|
||
4 should surface **comparable verdicts**, mapped to the four checks with consistent
|
||
risk verdicts and the pivot-risk flag.
|
||
|
||
This file is a *fasit*, not a test harness. The structural lint lives in
|
||
`agents/__tests__/fact-reviewer-fixture.test.mjs`. Whether the agent's live
|
||
verdicts actually reproduce these is `[GATE]`/`[OPERATØR]` — it is **not**
|
||
self-certified here.
|
||
|
||
> **The jury judges; the writer writes.** Every expected output below is
|
||
> **direction, not rewritten copy.** A correct agent run hands back a verdict + a
|
||
> source (or "none found") + a fix-as-direction (source it / hedge it / cut it) —
|
||
> **never** edited prose.
|
||
|
||
> **Why this gate exists.** Fact-checking was **post-hoc relative to the pivot** in
|
||
> Del 4: the in-session Step 5 sweep ran *before* the Security Champions pivot was
|
||
> added, so the pivot premise never met it. A **cold re-verification on the
|
||
> frozen/pivoted version** closes that gap — it re-checks every claim with equal
|
||
> suspicion, with no knowledge of which passages were pivot-fresh, and so catches
|
||
> exactly the premise Step 5 missed.
|
||
|
||
---
|
||
|
||
## The axis and the four checks (the agent judges on exactly these)
|
||
|
||
The single axis is **faktisk-korrekthet** — factual correctness, re-verified COLD
|
||
on the frozen/pivoted version. It is checked through four lenses:
|
||
|
||
- **F1 — Verifiserbare påstander** (verifiable claims): every checkable assertion
|
||
(numbers, dates, named examples, attributions, causal claims) searched against
|
||
primary/credible sources; opinions and predictions skipped.
|
||
- **F2 — Sitat-presisjon** (quote precision): any quotation must match the source
|
||
verbatim — wording, attribution, and who said it. «Vi» vs «Vi i Nav» is a
|
||
precision failure even when the gist is right.
|
||
- **F3 — Tall-attribusjon** (number attribution): every figure must trace to a
|
||
named source; a postulated number with no provenance is 🟡/🔴. Here provenance is
|
||
VERIFIED (distinct from `editorial-reviewer`'s P3, which only flags the absence
|
||
of a source/hedge without searching).
|
||
- **F4 — Kilde-kvalitet** (source quality): primary over secondary; a source
|
||
supporting "around a third" does not verify "exactly 37 %"; post-cutoff claims
|
||
must be web-searched.
|
||
|
||
## Context isolation — cold reader (the agent's cardinal rule)
|
||
|
||
The agent runs in a **cold context**: its only input is this prompt, the frozen
|
||
draft, and the writing contract. Any pivot narrative, changelog, omission list, or
|
||
"what the author intended" is **context pollution** — the agent states it is
|
||
ignoring it and judges only the text. That independence (no main-session
|
||
**framing-bias**) is the whole reason a defect that survived the in-session gates
|
||
can still be caught here.
|
||
|
||
**Pivot-premise risk (the design feature).** Because the agent does **not** know
|
||
which passages were added in a late pivot, it re-checks **every** claim with equal
|
||
suspicion — a claim's age in the draft buys it no trust. A claim that looks freshly
|
||
bolted on (new anchor, new section topic, a 2025/2026 figure) is surfaced in a
|
||
**pivot-risk** subsection. A pivot-risk claim that fails verification is the
|
||
headline catch: the pivot premise that never met Step 5.
|
||
|
||
## Risk sort and gate (every claim carries exactly one verdict)
|
||
|
||
- 🔴 **høy risiko** (high risk) → **BLOCK** — contradicted by evidence, or a precise
|
||
claim with no usable source.
|
||
- 🟡 **uverifisert** (unverified) → **REWORK** — cannot be confirmed / weak sources;
|
||
asserted as fact must be hedged, sourced, or cut.
|
||
- 🟢 **verifisert** (verified) → keep — confirmed by a primary/credible source
|
||
matching the claim.
|
||
|
||
The agent recommends PASS / REWORK / BLOCK; the operator (`[OPERATØR]`) holds the
|
||
gate. Each case block below carries exactly one verdict emoji in its **Verdict**
|
||
field; the surrounding prose deliberately avoids the emoji so the structural lint
|
||
can read a single unambiguous verdict per case.
|
||
|
||
---
|
||
|
||
## The six Del 4 worked cases (fasit)
|
||
|
||
Each case states the point, the check it belongs to (F1–F4), the verdict, whether
|
||
it is a pivot-risk claim, and the direction a correct cold run returns.
|
||
|
||
### Case 1 — pivot-premissen aldri faktasjekket (pivot premise never met Step 5) — PIVOT-RISK
|
||
|
||
- **Check:** F1 (verifiable claim — the pivot's anchor assertion) · **Verdict:** 🔴
|
||
- **Pivot-risk:** YES — this is the late Security Champions pivot, added *after* the
|
||
Step 5 fact-check; its premise was never verified in-session.
|
||
- **Fasit / direction:** The Security Champions pivot rests on a premise asserted as
|
||
established fact, but no primary source confirms it as stated. Because the pivot
|
||
arrived after Step 5, the in-session sweep never touched it. The cold re-check
|
||
— applying equal suspicion to a claim it does not know is pivot-fresh — searches
|
||
primary sources, finds none that confirm the premise as worded, and returns high
|
||
risk. **This is the exact catch the gate exists for:** the cold pass on the frozen
|
||
version surfaces the pivot premise that Step 5 missed. Direction: source the
|
||
premise to a primary record or recast it as a hedged hypothesis; do not assert.
|
||
The agent returns the verdict + "none found", not a rewritten premise.
|
||
|
||
### Case 2 — feilattribusjon (misattribution) — editor caught on cold read
|
||
|
||
- **Check:** F1 (attribution) + F2 (who said it) · **Verdict:** 🔴
|
||
- **Pivot-risk:** no.
|
||
- **Fasit / direction:** A statement is attributed to the wrong source/originator —
|
||
the named party is not who said or published it. Contradicted by the primary
|
||
record, so high risk (a contradicted claim is 🔴 regardless of partial score).
|
||
Direction: correct the attribution to the verified originator, or cut the
|
||
attribution; the agent names the contradicting source, never invents one.
|
||
|
||
### Case 3 — sitat-presisjon «Vi» vs «Vi i Nav» (quote precision) — editor caught
|
||
|
||
- **Check:** F2 (quote precision) · **Verdict:** 🟡
|
||
- **Pivot-risk:** no.
|
||
- **Fasit / direction:** A quotation's gist is right but the wording/attribution is
|
||
not verbatim: the source said «Vi i Nav», the draft renders «Vi». Changing who the
|
||
«vi» refers to is a precision failure even though the meaning is close. The source
|
||
exists, so this is unverified-as-worded rather than contradicted. Direction: match
|
||
the source verbatim — restore «Vi i Nav» — or mark it as a paraphrase, not a
|
||
quote. The agent flags the precision gap; it does not supply the corrected line.
|
||
|
||
### Case 4 — postulert tall uten proveniens (postulated number, no provenance)
|
||
|
||
- **Check:** F3 (number attribution) · **Verdict:** 🟡
|
||
- **Pivot-risk:** plausible — a figure of this kind often arrives with a late anchor;
|
||
surface it in the pivot-risk subsection if it reads freshly bolted on.
|
||
- **Fasit / direction:** A specific figure is stated as fact with **no named
|
||
source**. Distinct from `editorial-reviewer`'s P3, which would only flag the
|
||
*absence* of a source/hedge: here the agent **searches for the provenance**, finds
|
||
none that supports the exact figure, and returns unverified. Direction: trace it to
|
||
a named source or hedge it ("anslagsvis"); else cut. The agent never invents a
|
||
provenance to promote it to verified.
|
||
|
||
### Case 5 — «Security Champions» som settet standard (a settled standard that varies)
|
||
|
||
- **Check:** F1 (claim) + source-scope · **Verdict:** 🔴
|
||
- **Pivot-risk:** YES — part of the same Security Champions pivot.
|
||
- **Fasit / direction:** The "Security Champions" practice is presented as a settled,
|
||
universal standard, but in reality it is a practice that **varies per
|
||
organization** — implementations, scope, and definitions differ. A local
|
||
convention dressed as a universal standard is a source-scope failure; asserting it
|
||
as settled with no source that supports the universal framing is high risk.
|
||
Direction: scope the claim to where it actually holds ("varies; in some orgs…") or
|
||
source the universal framing to a standard that does establish it. The agent flags
|
||
the over-broad scope; it does not rewrite the passage.
|
||
|
||
### Case 6 — sekundærkilde brukt for et presist tall (secondary source for a precise figure)
|
||
|
||
- **Check:** F4 (source quality) + F3 (number) · **Verdict:** 🟡
|
||
- **Pivot-risk:** no.
|
||
- **Fasit / direction:** A precise figure is backed only by a **secondary source**
|
||
that summarizes the number — the primary record supports a *directional* claim
|
||
("around a third"), not the precise figure ("37 %") the draft asserts. A source
|
||
supporting "around a third" does not verify "exactly 37 %". Direction: trace to the
|
||
primary source and confirm the exact figure, soften the draft to the directional
|
||
claim the secondary source actually supports, or hedge. The agent records the
|
||
secondary source found and the precision gap, not a corrected number.
|
||
|
||
---
|
||
|
||
## Expected aggregate (what a correct cold run looks like)
|
||
|
||
- **Total verdicts surfaced:** 6 (within a reasonable verification-log cap; no 🔴
|
||
silently dropped).
|
||
- **By check:** F1 = 3 (Cases 1, 2, 5) · F2 = 2 (Cases 2, 3) · F3 = 2 (Cases 4, 6) ·
|
||
F4 = 1 (Case 6). (Cases overlap checks; the headline check is listed first.)
|
||
- **By risk verdict:** 🔴 høy risiko = 3 (Cases 1, 2, 5 → BLOCK) · 🟡 uverifisert = 3
|
||
(Cases 3, 4, 6 → REWORK) · 🟢 verifisert = 0 among the flagged points (the rest of
|
||
the draft's claims verify clean and are not listed here).
|
||
- **Pivot-risk:** Cases 1 and 5 are the Security Champions pivot; Case 4 is a
|
||
plausible pivot-risk. **Case 1 is the headline catch** — the pivot premise that
|
||
was never fact-checked in-session, caught only because the cold pass re-checks
|
||
every claim with equal suspicion.
|
||
|
||
A run that reproduces ~these six verdicts, on ~these checks, with ~these risk
|
||
levels — and that surfaces the Security Champions pivot premise as a pivot-risk 🔴
|
||
— is **comparable** to KTG's actual cold re-reading. Exact wording is the editor's;
|
||
the agent returns **direction, not rewritten copy**.
|
||
|
||
## Calibration boundary
|
||
|
||
Whether the agent's live verdicts truly match this fasit is judged by the operator
|
||
(`[OPERATØR]`), not self-certified here. This fixture is the calibration target, the
|
||
same way `fact-checker-cases.md`, `editorial-reviewer-cases.md`, and
|
||
`persona-reviewer-cases.md` are fasits for their agents.
|
||
|
||
> **Live-run note.** A live cold run on the frozen Del 4 requires (a) a Claude Code
|
||
> session reload — a freshly added agent is not invokable until the plugin agent set
|
||
> is rebuilt at session start — and (b) the agent run in a genuinely cold context
|
||
> (no drafting-session history, no pivot narrative) with read access to the frozen
|
||
> draft and web search. Until both hold, this fixture is the gold-standard of record.
|