feat(linkedin-studio): v3.1.0 — Endring 9 adversarial review-pakke + per-artefakt personas
Cold, adversarial review package for the long-form pipeline + configurable per-edition personas. Motivated by Del 4 (Security Champions pivot): the in-session editor + persona sweep shared the drafting session's framing-bias, so the shipped version was never independently re-reviewed. Headless package (9a/9b): - New Step 6.5 (headless-review) in /linkedin:newsletter, after the persona sweep, before lock — the independence layer the in-session gates can't be. - New standalone /linkedin:headless-review command (run in a fresh session for maximum isolation; reconstructs frozen draft + contract + personas from disk). - 3 new Opus archetypes, each with a cardinal context-isolation block that refuses drafting-session framing as "context pollution": - content-reviewer (argument integrity C1–C5, ≤8 flags) - language-reviewer (Norwegian language L1–L5, ≤10 flags) - fact-reviewer (cold re-verification F1–F4, risk-sort + pivot-risk, WebSearch) - Deliberate redundancy with fact-checker / editorial-reviewer documented so the pairs are never de-duplicated. Pivot-reopen (9c): - New /linkedin:pivot command: logs articles.NN.pivots[], resets currentPhase, un-locks, marks gates to re-run. - Pivot-detection gate in Step 8 lock precondition (>20% word-count change or >2 new sections re-opens cleared gates). Del 4 v8→v11 worked example. Per-artifact personas (new requirement): - articles.NN.personas with resolution order (edition-state → series file → plugin library → interactive). One or more readers configurable per edition. Schema/docs: - edition-state.template.json: additive personas[], pivots[], headlessReview, headless-review phase (16 phases); personaSweep.resonance.wordCount baseline. - 3 fasit fixtures + 3 structural lint tests (Del 4 worked cases). - Counts: 24→26 commands, 16→19 agents, 15→16 newsletter phases. - README + CLAUDE.md (plugin + root) + CHANGELOG synced. Verification: 35 agent-fixture + 59 hook + 20 render tests green. Backward- compatible (additive state); reload required before the 3 new agents resolve. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
e162cdce38
commit
e69ea1f4c9
20 changed files with 2520 additions and 59 deletions
|
|
@ -0,0 +1,210 @@
|
|||
# Content-Reviewer Fasit Fixture
|
||||
|
||||
The Del 4 production round (Security Champions, Maskinrommet, 2026-05-29) as the
|
||||
gold standard for the `content-reviewer` agent. Late in the round the draft took
|
||||
a **Security Champions pivot**: a new ~260-word section introducing the Champions
|
||||
model and a new ~270-word role-description section were added after the
|
||||
in-session gates had already formed their reading. The in-session gates
|
||||
(fact-check Step 5, editorial Step 5.5, persona sweep Step 6) all read the draft
|
||||
through the drafting session's framing — they knew *why* the pivot happened and
|
||||
*what* it was meant to argue, so they silently supplied the missing argumentative
|
||||
steps for free. A **cold, adversarial reviewer** — handed only the frozen page —
|
||||
cannot supply them, and that is exactly the point: the cold read catches the
|
||||
argument holes the framing hid.
|
||||
|
||||
The six cases below are the fasit: a correct `content-reviewer` run on the frozen
|
||||
Del 4 draft should surface **comparable flags**, mapped to the one axis with
|
||||
consistent severities.
|
||||
|
||||
This file is a *fasit*, not a test harness. The structural lint lives in
|
||||
`agents/__tests__/content-reviewer-fixture.test.mjs`. Whether the agent's live
|
||||
flags actually reproduce these directions is `[GATE]`/`[OPERATØR]` — it is **not**
|
||||
self-certified here.
|
||||
|
||||
> **The jury judges; the writer writes.** Every expected output below is
|
||||
> **direction, not rewritten copy.** A correct agent run hands back flags + a
|
||||
> severity — never edited prose. (`Foreslått retning`, not a new sentence.)
|
||||
|
||||
> **Why this gate exists.** The in-session gates shared the drafting session's
|
||||
> framing-bias: they read the pivot knowing its intent, so they bridged the
|
||||
> argument's gaps without noticing the gaps were there. A **cold reader** — run in
|
||||
> an isolated context with no history, no changelog, no "out of scope" list, no
|
||||
> pivot narrative — reads the frozen page as a first-time skeptic and finds the
|
||||
> argument holes the framing hid. Any such framing that reaches the agent is
|
||||
> **context pollution**: it is named and ignored, never acted on. This is a
|
||||
> distinct failure surface from craft (editorial), language (language-reviewer),
|
||||
> truth (fact-reviewer), and response (persona) — those gates can all pass while
|
||||
> the argument itself does not hold.
|
||||
|
||||
---
|
||||
|
||||
## The axis (the agent judges on exactly this)
|
||||
|
||||
The agent judges on **one axis — argument-integritet** (argument & logical
|
||||
integrity): *does the reasoning hold?* It does **not** judge craft, language,
|
||||
factual truth, or reader response — those are `editorial-reviewer`,
|
||||
`language-reviewer`, `fact-reviewer`, and `persona-reviewer` respectively. The
|
||||
axis decomposes into exactly five checks:
|
||||
|
||||
- **C1 — logiske hull** (logical holes): a step in the argument chain is missing;
|
||||
the text jumps from A to C without B, and the reader cannot reconstruct why the
|
||||
conclusion follows.
|
||||
- **C2 — ubegrunnede antakelser** (unsupported assumptions): the argument leans on
|
||||
a premise it never establishes, asserted as self-evident when a thoughtful
|
||||
reader would not simply grant it.
|
||||
- **C3 — argument-motsigelser** (argument-level contradiction): the recommendation,
|
||||
premise, and payoff are not mutually consistent — distinct from editorial-
|
||||
reviewer's P4 (a *prose-level* contradiction between two passages); C3 is a
|
||||
contradiction in the *logic of the argument itself*.
|
||||
- **C4 — manglende konkretisering der argumentet trenger det** (missing
|
||||
argumentative concretization): a load-bearing claim a skeptic would only believe
|
||||
with a concrete instance stays abstract — not for vividness (editorial A1) but
|
||||
because the argument *needs* the instance to carry weight.
|
||||
- **C5 — ubesvart «what about X?»** (the unanswered obvious objection): the
|
||||
strongest obvious objection a thoughtful reader raises is never acknowledged or
|
||||
answered — the argument wins only because it never met its best counter.
|
||||
|
||||
## Severity (every flag carries exactly one)
|
||||
|
||||
- **BLOCK** — a defect that breaks the argument: an argument-level contradiction
|
||||
(C3), or an unanswered objection (C5) that, once raised, collapses the
|
||||
recommendation.
|
||||
- **REWORK** — a real gap that should be filled, not load-bearing-fatal: a logical
|
||||
hole (C1), an unsupported load-bearing assumption (C2), a claim that needs
|
||||
concretization (C4).
|
||||
- **NICE** — a minor reasoning soft spot worth tightening if cheap.
|
||||
|
||||
Sort BLOCK → REWORK → NICE; cap at **eight** flags (argument defects are coarser
|
||||
than prose nits); if any are suppressed, say how many and of what severity —
|
||||
never silently truncate.
|
||||
|
||||
---
|
||||
|
||||
## The six Del 4 argument points (fasit)
|
||||
|
||||
Each case states the argument defect a cold read would catch on the frozen Del 4
|
||||
draft, the check (C1–C5) it belongs to, the expected severity, and the direction a
|
||||
correct agent run returns. Every case is an **argument blind spot** — distinct
|
||||
from craft (what `editorial-reviewer` would catch) and response (what
|
||||
`persona-reviewer` would catch). The in-session gates passed the draft; the cold
|
||||
read does not, because the framing they shared is gone.
|
||||
|
||||
### Case 1 — pivot-premisset asserted uten støtte (unsupported pivot premise)
|
||||
|
||||
- **Axis:** argument-integritet · **Check:** C2 · **Severity:** BLOCK
|
||||
- **Cold-read defect:** The new ~260-word Security Champions section opens by
|
||||
treating "Security Champions er svaret" as an established premise the rest of
|
||||
the part builds on — but the frozen page never establishes *why* the Champions
|
||||
model is the right response rather than one option among several. The drafting
|
||||
session knew the rationale; the cold reader is handed only the assertion.
|
||||
- **Fasit / direction:** The pivot's load-bearing premise is asserted as
|
||||
self-evident with no support a first-time skeptic would grant. Direction:
|
||||
establish why the Champions model follows from the part's problem, or hedge it
|
||||
as one option — do not let the whole section rest on an un-earned premise. (An
|
||||
unsupported *load-bearing* premise that the section depends on is BLOCK: the
|
||||
argument has not earned the right to make its central move.)
|
||||
|
||||
### Case 2 — ubesvart «hva med små organisasjoner?» (unanswered obvious objection)
|
||||
|
||||
- **Axis:** argument-integritet · **Check:** C5 · **Severity:** BLOCK
|
||||
- **Cold-read defect:** The strongest obvious objection a thoughtful reader raises
|
||||
on first reading the Champions pivot — *"what about small organisations that
|
||||
cannot staff a dedicated Champion?"* — is never acknowledged or answered. The
|
||||
recommendation effectively assumes an org large enough to nominate a Champion,
|
||||
and the argument wins only because it never meets this counter.
|
||||
- **Fasit / direction:** Name the objection and answer it (a small-org variant, an
|
||||
explicit scope boundary, or a rule of thumb) — an unanswered objection that, once
|
||||
raised, collapses the recommendation for a whole class of readers is BLOCK.
|
||||
Direction only; the agent does not write the answer.
|
||||
|
||||
### Case 3 — sprang fra «Champions finnes» til «dømmekraft bevart» (logical hole)
|
||||
|
||||
- **Axis:** argument-integritet · **Check:** C1 · **Severity:** REWORK
|
||||
- **Cold-read defect:** The text jumps from *"Security Champions finnes i
|
||||
organisasjonen"* (A) to *"dermed er dømmekraften bevart"* (C) with no connecting
|
||||
step (B): existence of a role does not on its own establish that judgment is
|
||||
preserved. The reader cannot reconstruct why the conclusion follows. The session
|
||||
carried the missing step in its head; the page does not state it.
|
||||
- **Fasit / direction:** Supply the missing step — *how* the Champion's presence
|
||||
translates into preserved judgment (mechanism, mandate, practice) — or soften the
|
||||
conclusion to a hypothesis. A bridgeable-but-unbridged jump on a supporting line
|
||||
is REWORK.
|
||||
|
||||
### Case 4 — rolle-seksjonen aldri forankret i én konkret org (missing concretization)
|
||||
|
||||
- **Axis:** argument-integritet · **Check:** C4 · **Severity:** REWORK
|
||||
- **Cold-read defect:** The new ~270-word role-description section describes what a
|
||||
Champion *does* entirely in the abstract and never grounds it in **one concrete
|
||||
organisation** where this role actually operates. This is not a vividness nit
|
||||
(that would be editorial A1) — the *argument* that the role works needs one real
|
||||
instance to be believed; a skeptic will not grant an abstract job description as
|
||||
evidence the model functions.
|
||||
- **Fasit / direction:** Anchor the role in a single concrete (preferably
|
||||
Norwegian) org where a Champion operates, so the load-bearing claim "this role
|
||||
works" carries weight. Flag the *absence of the argument-bearing instance*; do
|
||||
not supply the org. (Boundary: route any pure craft/vividness face to editorial
|
||||
A1; this flag is the argument face — the claim cannot be believed abstractly.)
|
||||
|
||||
### Case 5 — anbefaling delegerer den dømmekraften serien sier ikke kan settes ut (argument contradiction)
|
||||
|
||||
- **Axis:** argument-integritet · **Check:** C3 · **Severity:** BLOCK
|
||||
- **Cold-read defect:** The series premise is *"du kan ikke sette ut dømmekraft"*
|
||||
(you cannot outsource judgment). The Champions recommendation, read cold on the
|
||||
frozen page, effectively **delegates that judgment** to the Champion — the close
|
||||
recommends the very move the premise rules out. Premise, recommendation, and
|
||||
payoff are not mutually consistent. This is an argument-level contradiction (C3),
|
||||
not a prose-level one between two passages (that would be editorial P4): the
|
||||
*logic* defeats itself.
|
||||
- **Fasit / direction:** Hold premise, recommendation, and gevinst side by side and
|
||||
resolve one side — either reframe the Champion as *supporting* judgment that
|
||||
stays distributed (not a delegate it is outsourced to), or qualify the series
|
||||
premise. A recommendation that defeats the series premise is BLOCK.
|
||||
|
||||
### Case 6 — gevinst-leddet antar utbredt modenhet (unsupported assumption)
|
||||
|
||||
- **Axis:** argument-integritet · **Check:** C2 · **Severity:** REWORK
|
||||
- **Cold-read defect:** The promised payoff of the Champions model leans on an
|
||||
unstated assumption that the surrounding organisation is mature enough to use a
|
||||
Champion well (clear mandate, time allocation, leadership backing). The frozen
|
||||
page asserts the gevinst as if it follows automatically; the cold reader sees an
|
||||
un-earned premise standing between the model and its benefit.
|
||||
- **Fasit / direction:** Establish or hedge the maturity assumption the payoff
|
||||
depends on — name the conditions under which the gevinst holds, or mark it
|
||||
conditional. A load-bearing assumption left unstated under the payoff is REWORK
|
||||
(it weakens the case rather than defeating it outright).
|
||||
|
||||
---
|
||||
|
||||
## Expected aggregate (what a correct run looks like)
|
||||
|
||||
- **Total flags:** 6 (well within the ≤8 cap — no suppression needed).
|
||||
- **By check:** C1 = 1 (Case 3) · C2 = 2 (Cases 1, 6) · C3 = 1 (Case 5) ·
|
||||
C4 = 1 (Case 4) · C5 = 1 (Case 2).
|
||||
- **By severity:** BLOCK = 3 (Cases 1, 2, 5 — unsupported pivot premise,
|
||||
unanswered small-org objection, premise/recommendation contradiction) ·
|
||||
REWORK = 3 (Cases 3, 4, 6) · NICE = 0.
|
||||
- **All six are argument blind spots:** none is a craft defect (`editorial-
|
||||
reviewer`'s domain), a language defect (`language-reviewer`), a factual error
|
||||
(`fact-reviewer`), or a resonance miss (`persona-reviewer`). The in-session
|
||||
gates passed the draft on every one of those axes — and still the argument did
|
||||
not hold, because they read it through the session's framing. The cold read is
|
||||
the quantified case for the gate.
|
||||
|
||||
A run that reproduces ~these six directions, on the one argument-integritet axis,
|
||||
with ~these severities, is **comparable** to the cold adversarial read the gate is
|
||||
built to deliver. Exact wording is the editor's; the agent returns
|
||||
**direction, not rewritten copy.**
|
||||
|
||||
## Calibration boundary
|
||||
|
||||
Whether the agent's live flags truly match this fasit is judged by the operator
|
||||
(`[OPERATØR]`), not self-certified here. This fixture is the calibration target,
|
||||
the same way `editorial-reviewer-cases.md`, `persona-reviewer-cases.md`, and
|
||||
`fact-checker-cases.md` are fasits for their agents.
|
||||
|
||||
> **Live-run note.** A live run on the frozen Del 4 draft requires (a) a Claude
|
||||
> Code session reload — a freshly added agent is not invokable until the plugin
|
||||
> agent set is rebuilt at session start — and (b) a genuinely **cold** invocation
|
||||
> (an isolated context with no drafting-session history, changelog, scope list, or
|
||||
> pivot narrative reaching the agent). Until both hold, this fixture is the
|
||||
> gold-standard of record.
|
||||
196
plugins/linkedin-studio/agents/fixtures/fact-reviewer-cases.md
Normal file
196
plugins/linkedin-studio/agents/fixtures/fact-reviewer-cases.md
Normal file
|
|
@ -0,0 +1,196 @@
|
|||
# Fact-Reviewer Fasit Fixture
|
||||
|
||||
The Del 4 production round (Security Champions, Maskinrommet, 2026-05-29) as the
|
||||
gold standard for the `fact-reviewer` agent. The in-session `fact-checker`
|
||||
(Step 5) ran on a still-moving draft. A **late Security Champions pivot** — a new
|
||||
argument anchor — arrived **after** that Step 5 sweep, so the pivot's premise was
|
||||
**never fact-checked**. The pivot then went through the in-session persona sweep
|
||||
(Step 6) and reached the frozen, publish-ready version with an unverified premise
|
||||
intact. KTG's cold re-reading caught a misattribution, a quote-precision error, a
|
||||
postulated number with no provenance, a "settled standard" that actually varies,
|
||||
and a secondary source trusted for a precise figure — six points in all. Those six
|
||||
points are the fasit below: a correct `fact-reviewer` run on the frozen/pivoted Del
|
||||
4 should surface **comparable verdicts**, mapped to the four checks with consistent
|
||||
risk verdicts and the pivot-risk flag.
|
||||
|
||||
This file is a *fasit*, not a test harness. The structural lint lives in
|
||||
`agents/__tests__/fact-reviewer-fixture.test.mjs`. Whether the agent's live
|
||||
verdicts actually reproduce these is `[GATE]`/`[OPERATØR]` — it is **not**
|
||||
self-certified here.
|
||||
|
||||
> **The jury judges; the writer writes.** Every expected output below is
|
||||
> **direction, not rewritten copy.** A correct agent run hands back a verdict + a
|
||||
> source (or "none found") + a fix-as-direction (source it / hedge it / cut it) —
|
||||
> **never** edited prose.
|
||||
|
||||
> **Why this gate exists.** Fact-checking was **post-hoc relative to the pivot** in
|
||||
> Del 4: the in-session Step 5 sweep ran *before* the Security Champions pivot was
|
||||
> added, so the pivot premise never met it. A **cold re-verification on the
|
||||
> frozen/pivoted version** closes that gap — it re-checks every claim with equal
|
||||
> suspicion, with no knowledge of which passages were pivot-fresh, and so catches
|
||||
> exactly the premise Step 5 missed.
|
||||
|
||||
---
|
||||
|
||||
## The axis and the four checks (the agent judges on exactly these)
|
||||
|
||||
The single axis is **faktisk-korrekthet** — factual correctness, re-verified COLD
|
||||
on the frozen/pivoted version. It is checked through four lenses:
|
||||
|
||||
- **F1 — Verifiserbare påstander** (verifiable claims): every checkable assertion
|
||||
(numbers, dates, named examples, attributions, causal claims) searched against
|
||||
primary/credible sources; opinions and predictions skipped.
|
||||
- **F2 — Sitat-presisjon** (quote precision): any quotation must match the source
|
||||
verbatim — wording, attribution, and who said it. «Vi» vs «Vi i Nav» is a
|
||||
precision failure even when the gist is right.
|
||||
- **F3 — Tall-attribusjon** (number attribution): every figure must trace to a
|
||||
named source; a postulated number with no provenance is 🟡/🔴. Here provenance is
|
||||
VERIFIED (distinct from `editorial-reviewer`'s P3, which only flags the absence
|
||||
of a source/hedge without searching).
|
||||
- **F4 — Kilde-kvalitet** (source quality): primary over secondary; a source
|
||||
supporting "around a third" does not verify "exactly 37 %"; post-cutoff claims
|
||||
must be web-searched.
|
||||
|
||||
## Context isolation — cold reader (the agent's cardinal rule)
|
||||
|
||||
The agent runs in a **cold context**: its only input is this prompt, the frozen
|
||||
draft, and the writing contract. Any pivot narrative, changelog, omission list, or
|
||||
"what the author intended" is **context pollution** — the agent states it is
|
||||
ignoring it and judges only the text. That independence (no main-session
|
||||
**framing-bias**) is the whole reason a defect that survived the in-session gates
|
||||
can still be caught here.
|
||||
|
||||
**Pivot-premise risk (the design feature).** Because the agent does **not** know
|
||||
which passages were added in a late pivot, it re-checks **every** claim with equal
|
||||
suspicion — a claim's age in the draft buys it no trust. A claim that looks freshly
|
||||
bolted on (new anchor, new section topic, a 2025/2026 figure) is surfaced in a
|
||||
**pivot-risk** subsection. A pivot-risk claim that fails verification is the
|
||||
headline catch: the pivot premise that never met Step 5.
|
||||
|
||||
## Risk sort and gate (every claim carries exactly one verdict)
|
||||
|
||||
- 🔴 **høy risiko** (high risk) → **BLOCK** — contradicted by evidence, or a precise
|
||||
claim with no usable source.
|
||||
- 🟡 **uverifisert** (unverified) → **REWORK** — cannot be confirmed / weak sources;
|
||||
asserted as fact must be hedged, sourced, or cut.
|
||||
- 🟢 **verifisert** (verified) → keep — confirmed by a primary/credible source
|
||||
matching the claim.
|
||||
|
||||
The agent recommends PASS / REWORK / BLOCK; the operator (`[OPERATØR]`) holds the
|
||||
gate. Each case block below carries exactly one verdict emoji in its **Verdict**
|
||||
field; the surrounding prose deliberately avoids the emoji so the structural lint
|
||||
can read a single unambiguous verdict per case.
|
||||
|
||||
---
|
||||
|
||||
## The six Del 4 worked cases (fasit)
|
||||
|
||||
Each case states the point, the check it belongs to (F1–F4), the verdict, whether
|
||||
it is a pivot-risk claim, and the direction a correct cold run returns.
|
||||
|
||||
### Case 1 — pivot-premissen aldri faktasjekket (pivot premise never met Step 5) — PIVOT-RISK
|
||||
|
||||
- **Check:** F1 (verifiable claim — the pivot's anchor assertion) · **Verdict:** 🔴
|
||||
- **Pivot-risk:** YES — this is the late Security Champions pivot, added *after* the
|
||||
Step 5 fact-check; its premise was never verified in-session.
|
||||
- **Fasit / direction:** The Security Champions pivot rests on a premise asserted as
|
||||
established fact, but no primary source confirms it as stated. Because the pivot
|
||||
arrived after Step 5, the in-session sweep never touched it. The cold re-check
|
||||
— applying equal suspicion to a claim it does not know is pivot-fresh — searches
|
||||
primary sources, finds none that confirm the premise as worded, and returns high
|
||||
risk. **This is the exact catch the gate exists for:** the cold pass on the frozen
|
||||
version surfaces the pivot premise that Step 5 missed. Direction: source the
|
||||
premise to a primary record or recast it as a hedged hypothesis; do not assert.
|
||||
The agent returns the verdict + "none found", not a rewritten premise.
|
||||
|
||||
### Case 2 — feilattribusjon (misattribution) — editor caught on cold read
|
||||
|
||||
- **Check:** F1 (attribution) + F2 (who said it) · **Verdict:** 🔴
|
||||
- **Pivot-risk:** no.
|
||||
- **Fasit / direction:** A statement is attributed to the wrong source/originator —
|
||||
the named party is not who said or published it. Contradicted by the primary
|
||||
record, so high risk (a contradicted claim is 🔴 regardless of partial score).
|
||||
Direction: correct the attribution to the verified originator, or cut the
|
||||
attribution; the agent names the contradicting source, never invents one.
|
||||
|
||||
### Case 3 — sitat-presisjon «Vi» vs «Vi i Nav» (quote precision) — editor caught
|
||||
|
||||
- **Check:** F2 (quote precision) · **Verdict:** 🟡
|
||||
- **Pivot-risk:** no.
|
||||
- **Fasit / direction:** A quotation's gist is right but the wording/attribution is
|
||||
not verbatim: the source said «Vi i Nav», the draft renders «Vi». Changing who the
|
||||
«vi» refers to is a precision failure even though the meaning is close. The source
|
||||
exists, so this is unverified-as-worded rather than contradicted. Direction: match
|
||||
the source verbatim — restore «Vi i Nav» — or mark it as a paraphrase, not a
|
||||
quote. The agent flags the precision gap; it does not supply the corrected line.
|
||||
|
||||
### Case 4 — postulert tall uten proveniens (postulated number, no provenance)
|
||||
|
||||
- **Check:** F3 (number attribution) · **Verdict:** 🟡
|
||||
- **Pivot-risk:** plausible — a figure of this kind often arrives with a late anchor;
|
||||
surface it in the pivot-risk subsection if it reads freshly bolted on.
|
||||
- **Fasit / direction:** A specific figure is stated as fact with **no named
|
||||
source**. Distinct from `editorial-reviewer`'s P3, which would only flag the
|
||||
*absence* of a source/hedge: here the agent **searches for the provenance**, finds
|
||||
none that supports the exact figure, and returns unverified. Direction: trace it to
|
||||
a named source or hedge it ("anslagsvis"); else cut. The agent never invents a
|
||||
provenance to promote it to verified.
|
||||
|
||||
### Case 5 — «Security Champions» som settet standard (a settled standard that varies)
|
||||
|
||||
- **Check:** F1 (claim) + source-scope · **Verdict:** 🔴
|
||||
- **Pivot-risk:** YES — part of the same Security Champions pivot.
|
||||
- **Fasit / direction:** The "Security Champions" practice is presented as a settled,
|
||||
universal standard, but in reality it is a practice that **varies per
|
||||
organization** — implementations, scope, and definitions differ. A local
|
||||
convention dressed as a universal standard is a source-scope failure; asserting it
|
||||
as settled with no source that supports the universal framing is high risk.
|
||||
Direction: scope the claim to where it actually holds ("varies; in some orgs…") or
|
||||
source the universal framing to a standard that does establish it. The agent flags
|
||||
the over-broad scope; it does not rewrite the passage.
|
||||
|
||||
### Case 6 — sekundærkilde brukt for et presist tall (secondary source for a precise figure)
|
||||
|
||||
- **Check:** F4 (source quality) + F3 (number) · **Verdict:** 🟡
|
||||
- **Pivot-risk:** no.
|
||||
- **Fasit / direction:** A precise figure is backed only by a **secondary source**
|
||||
that summarizes the number — the primary record supports a *directional* claim
|
||||
("around a third"), not the precise figure ("37 %") the draft asserts. A source
|
||||
supporting "around a third" does not verify "exactly 37 %". Direction: trace to the
|
||||
primary source and confirm the exact figure, soften the draft to the directional
|
||||
claim the secondary source actually supports, or hedge. The agent records the
|
||||
secondary source found and the precision gap, not a corrected number.
|
||||
|
||||
---
|
||||
|
||||
## Expected aggregate (what a correct cold run looks like)
|
||||
|
||||
- **Total verdicts surfaced:** 6 (within a reasonable verification-log cap; no 🔴
|
||||
silently dropped).
|
||||
- **By check:** F1 = 3 (Cases 1, 2, 5) · F2 = 2 (Cases 2, 3) · F3 = 2 (Cases 4, 6) ·
|
||||
F4 = 1 (Case 6). (Cases overlap checks; the headline check is listed first.)
|
||||
- **By risk verdict:** 🔴 høy risiko = 3 (Cases 1, 2, 5 → BLOCK) · 🟡 uverifisert = 3
|
||||
(Cases 3, 4, 6 → REWORK) · 🟢 verifisert = 0 among the flagged points (the rest of
|
||||
the draft's claims verify clean and are not listed here).
|
||||
- **Pivot-risk:** Cases 1 and 5 are the Security Champions pivot; Case 4 is a
|
||||
plausible pivot-risk. **Case 1 is the headline catch** — the pivot premise that
|
||||
was never fact-checked in-session, caught only because the cold pass re-checks
|
||||
every claim with equal suspicion.
|
||||
|
||||
A run that reproduces ~these six verdicts, on ~these checks, with ~these risk
|
||||
levels — and that surfaces the Security Champions pivot premise as a pivot-risk 🔴
|
||||
— is **comparable** to KTG's actual cold re-reading. Exact wording is the editor's;
|
||||
the agent returns **direction, not rewritten copy**.
|
||||
|
||||
## Calibration boundary
|
||||
|
||||
Whether the agent's live verdicts truly match this fasit is judged by the operator
|
||||
(`[OPERATØR]`), not self-certified here. This fixture is the calibration target, the
|
||||
same way `fact-checker-cases.md`, `editorial-reviewer-cases.md`, and
|
||||
`persona-reviewer-cases.md` are fasits for their agents.
|
||||
|
||||
> **Live-run note.** A live cold run on the frozen Del 4 requires (a) a Claude Code
|
||||
> session reload — a freshly added agent is not invokable until the plugin agent set
|
||||
> is rebuilt at session start — and (b) the agent run in a genuinely cold context
|
||||
> (no drafting-session history, no pivot narrative) with read access to the frozen
|
||||
> draft and web search. Until both hold, this fixture is the gold-standard of record.
|
||||
|
|
@ -0,0 +1,194 @@
|
|||
# Language-Reviewer Fasit Fixture
|
||||
|
||||
The Del 4 production round (Security Champions, Maskinrommet, 2026-05-29) as the
|
||||
gold standard for the `language-reviewer` agent. By Step 6 the in-session persona
|
||||
resonance sweep had returned PASS across the personas and the in-session craft
|
||||
gate (`editorial-reviewer`, Step 5.5) had run — both *inside* the drafting session,
|
||||
both sharing its framing-bias. On a **cold, first-time reading of the frozen
|
||||
draft** (the F5 finding), the editor then caught Norwegian-language defects the
|
||||
in-session passes had all read straight past: a verbatim **quote error** («Vi»
|
||||
where the source said «Vi i Nav»), anglicisms, and verbatim repetitions across
|
||||
sections. Those are the fasit below: a correct `language-reviewer` run on the
|
||||
Del 4 frozen draft should surface **comparable flags**, mapped to the one axis
|
||||
with consistent severities.
|
||||
|
||||
This file is a *fasit*, not a test harness. The structural lint lives in
|
||||
`agents/__tests__/language-reviewer-fixture.test.mjs`. Whether the agent's live
|
||||
flags actually reproduce these directions is `[GATE]`/`[OPERATØR]` — it is **not**
|
||||
self-certified here.
|
||||
|
||||
> **The jury judges; the writer writes.** Every expected output below is
|
||||
> **direction, not rewritten copy.** A correct agent run hands back flags + a
|
||||
> severity — never edited prose. (`Foreslått retning`, not a new sentence.)
|
||||
|
||||
> **Why this gate exists — the cold re-read.** The in-session gates (fact-check,
|
||||
> craft, persona) all ran while the drafting session's framing-bias was still in
|
||||
> the room: the same blind spots that let the author miss «Vi» vs «Vi i Nav» let
|
||||
> those gates miss it too. `language-reviewer` is run in a **cold context** with
|
||||
> no access to version history, intent, pivots, or how any gate voted — exactly so
|
||||
> it carries none of that bias. Any such framing that reaches it is **context
|
||||
> pollution** to be named and ignored. A cold Norwegian re-read catches what the
|
||||
> bias hid. That is the F5 finding made into a gate.
|
||||
|
||||
---
|
||||
|
||||
## The axis (the agent judges on exactly this)
|
||||
|
||||
**Axis: norsk-språkkvalitet** (Norwegian language quality) — one axis, five
|
||||
checks. L1, L2, L5 start grep-able; L3, L4 need a read. The voice under judgment
|
||||
is a **personal chronicle**, not a saksframlegg.
|
||||
|
||||
- **L1 — Ordrette gjentakelser** (verbatim repetition): the same distinctive
|
||||
phrase or sentence-opening repeats mechanically across the draft (grep 3–6-word
|
||||
phrases, then read in context).
|
||||
- **L2 — Anglisismer** (anglicisms): English calques / loan-constructions where
|
||||
idiomatic Norwegian exists («adressere et problem», «på en daglig basis», «i
|
||||
terms av»). Flag the calque **and name the Norwegian idiom direction.**
|
||||
- **L3 — Stivt tjenesteskriftspråk** (stiff bureaucratic register): «kanselli-stil»
|
||||
— nominalisations, passive overload, «det vises til», agentless sentences that
|
||||
drain the chronicle voice.
|
||||
- **L4 — Indre språklige selvmotsigelser** (language-level self-contradiction): a
|
||||
sentence/phrase that undercuts itself, or two phrasings that cannot both be the
|
||||
intended register/meaning. The *wording* contradicting itself — **not** the
|
||||
argument-level logic (that is `content-reviewer`).
|
||||
- **L5 — Klang / rytme** (clang & rhythm): sentences that read badly aloud —
|
||||
monotone cadence, every sentence the same length, a jarring word, run-ons that
|
||||
lose the breath.
|
||||
|
||||
## Severity (every flag carries exactly one)
|
||||
|
||||
- **BLOCK** — misrepresents or embarrasses: a quote rendered wrong (a verbatim
|
||||
error inside a quotation — «Vi» vs «Vi i Nav»), or a self-contradicting phrasing
|
||||
(L4) that changes the meaning.
|
||||
- **REWORK** — a real language weakness a reader notices: a repeated phrase (L1),
|
||||
an anglicism (L2), a bureaucratic passage (L3), a rhythm stumble (L5).
|
||||
- **NICE** — cheap polish: a single mild repetition, one slightly stiff sentence.
|
||||
|
||||
## Direction, not copy (the boundary)
|
||||
|
||||
Every expected output is **direction, not rewritten copy**: "§3 'adressere' —
|
||||
anglicism; use the Norwegian idiom («ta tak i»)" is the agent's job; supplying the
|
||||
rewritten sentence is not. Each flag carries a **quote or line reference.**
|
||||
|
||||
---
|
||||
|
||||
## The six Del 4 language points (fasit)
|
||||
|
||||
Each case states the point the editor raised on the cold reading, the check it
|
||||
belongs to, the expected severity, and the direction a correct agent run returns.
|
||||
These are **language blind spots** — distinct from craft (`editorial-reviewer`),
|
||||
de-AI / voice (`voice-scrubber`), and reader response (`persona-reviewer`). They
|
||||
survived to the cold pass precisely because the in-session gates shared the
|
||||
author's framing-bias.
|
||||
|
||||
### Case 1 — sitat gjengitt feil: «Vi» i stedet for «Vi i Nav» (verbatim quote error)
|
||||
|
||||
- **Check:** L4 (language-level self-contradiction / verbatim quotation error)
|
||||
· **Severity:** BLOCK
|
||||
- **Cold-read finding:** A quotation in the chronicle is rendered «Vi …» where the
|
||||
source said «Vi i Nav …». The clipped quote changes who "vi" refers to — the
|
||||
wording now misrepresents the source. (Maps to L4 as a wording-level
|
||||
self-contradiction; the same defect could be filed under L1 as a near-verbatim
|
||||
repetition of the source gone wrong — the agent files it once, as the BLOCK it
|
||||
is.)
|
||||
- **Fasit / direction:** Quote misrenders «Vi i Nav» as «Vi»; restore the source
|
||||
wording. A misquote misrepresents the piece, so BLOCK. The agent flags the
|
||||
*wrong rendering*; it does not supply the corrected sentence.
|
||||
- **Why blind to the in-session gates:** the persona sweep measured whether the
|
||||
passage *landed* (it did — PASS); none of the in-session gates re-checked the
|
||||
quote against the source on a cold reading. This is the canonical F5 finding.
|
||||
|
||||
### Case 2 — anglisisme: «adressere problemet» (anglicism)
|
||||
|
||||
- **Check:** L2 (anglicisms) · **Severity:** REWORK
|
||||
- **Cold-read finding:** «adressere et problem» is an English calque (to *address*
|
||||
a problem) where idiomatic Norwegian reads «ta tak i / håndtere / ta opp».
|
||||
- **Fasit / direction:** Anglicism; use the Norwegian idiom («ta tak i» /
|
||||
«håndtere»). Name the idiom direction, do not write the sentence.
|
||||
- **Why blind:** an anglicism reads fluently to a reader inside the drafting
|
||||
session — the calque *sounds* like normal prose until a cold ear hits it.
|
||||
|
||||
### Case 3 — anglisisme: «på en daglig basis» (anglicism)
|
||||
|
||||
- **Check:** L2 (anglicisms) · **Severity:** REWORK
|
||||
- **Cold-read finding:** «på en daglig basis» is a calque of *on a daily basis*;
|
||||
idiomatic Norwegian is «daglig» / «til daglig».
|
||||
- **Fasit / direction:** Anglicism; collapse to the Norwegian adverb («daglig»).
|
||||
Direction only.
|
||||
- **Why blind:** same mechanism as Case 2 — a second calque the in-session passes
|
||||
read straight through. Two L2 flags is itself a signal the draft drifted into
|
||||
English construction.
|
||||
|
||||
### Case 4 — ordrette gjentakelser: samme frase 3× på tvers av seksjoner (verbatim repetition)
|
||||
|
||||
- **Check:** L1 (verbatim repetition) · **Severity:** REWORK
|
||||
- **Cold-read finding:** A distinctive phrase recurs three times across §1, §4 and
|
||||
§6 — mechanical, not load-bearing. `grep`-findable as a repeated 3–6-word
|
||||
string.
|
||||
- **Fasit / direction:** Vary or cut the repeats; keep at most the one
|
||||
load-bearing use. Report the count (3×).
|
||||
- **Why blind:** a reader inside the session sees each section in isolation; the
|
||||
repetition only shows when a cold reader takes the whole draft at once. This is
|
||||
the verbatim-repetition half of the F5 finding.
|
||||
|
||||
### Case 5 — stivt tjenesteskriftspråk: «det vises til»-passasje i en personlig krønike (stiff bureaucratic register)
|
||||
|
||||
- **Check:** L3 (stiff bureaucratic register / «kanselli-stil») · **Severity:**
|
||||
REWORK
|
||||
- **Cold-read finding:** A passage slides into saksframlegg register — «det vises
|
||||
til», nominalised, agentless, passive-stacked — inside a piece whose voice is a
|
||||
personal chronicle. The register break drains the chronicle voice.
|
||||
- **Fasit / direction:** Kanselli-stil in a personal chronicle; restore an agent
|
||||
and an active verb so the passage reads as the chronicle, not a memo. Direction
|
||||
only. (This is a *language-register* defect, distinct from `voice-scrubber`'s
|
||||
de-AI tells and from `editorial-reviewer`'s craft.)
|
||||
- **Why blind:** bureaucratic register is the author's professional default; inside
|
||||
the session it reads as "normal," and only a cold ear hears it clash with the
|
||||
chronicle voice.
|
||||
|
||||
### Case 6 — klang / rytme: fem like lange setninger på rad (monotone cadence)
|
||||
|
||||
- **Check:** L5 (clang & rhythm) · **Severity:** NICE
|
||||
- **Cold-read finding:** A run of five sentences shares the same length and a
|
||||
near-identical opening — a monotone cadence that reads flat aloud. Chronicle
|
||||
prose has a varied cadence; this passage loses it.
|
||||
- **Fasit / direction:** Break the monotone — vary one or two sentence lengths /
|
||||
openings so the passage breathes. NICE: noticeable on a read-aloud, not
|
||||
load-bearing. `grep`/scan-findable (same-length run, repeated opening).
|
||||
- **Why blind:** rhythm is heard, not seen; a silent in-session read past a fluent
|
||||
passage never trips on it. A cold read-aloud does.
|
||||
|
||||
---
|
||||
|
||||
## Expected aggregate (what a correct run looks like)
|
||||
|
||||
- **Total flags:** 6 (well within the ≤10 cap — no suppression needed).
|
||||
- **By check:** L1 = 1 (Case 4) · L2 = 2 (Cases 2 + 3) · L3 = 1 (Case 5) ·
|
||||
L4 = 1 (Case 1) · L5 = 1 (Case 6).
|
||||
- **By severity:** BLOCK = 1 (Case 1, the quote error) · REWORK = 4 (Cases 2, 3,
|
||||
4, 5) · NICE = 1 (Case 6).
|
||||
- **All six are language blind spots** — none is a craft defect (editorial), a
|
||||
de-AI / voice defect (voice-scrubber), an argument defect (content-reviewer), a
|
||||
factual defect (fact-reviewer), or a resonance defect (persona). They survived
|
||||
to the cold pass because the in-session gates shared the author's framing-bias;
|
||||
the cold Norwegian re-read is what caught them.
|
||||
|
||||
A run that reproduces ~these six directions, on ~these checks, with ~these
|
||||
severities, is **comparable** to the editor's actual cold reading of Del 4 — the
|
||||
acceptance bar. Exact wording is the editor's; the agent returns direction, never
|
||||
copy.
|
||||
|
||||
## Calibration boundary
|
||||
|
||||
Whether the agent's live flags truly match this fasit is judged by the operator
|
||||
(`[OPERATØR]`), not self-certified here. This fixture is the calibration target,
|
||||
the same way `editorial-reviewer-cases.md`, `persona-reviewer-cases.md` and
|
||||
`fact-checker-cases.md` are fasits for their agents.
|
||||
|
||||
> **Live-run note.** A live run on the Del 4 frozen draft requires (a) a Claude
|
||||
> Code session reload — a freshly added agent is not invokable until the plugin
|
||||
> agent set is rebuilt at session start — and (b) read access to the frozen Del 4
|
||||
> draft in the Maskinrommet series folder. Critically, the live run must be a
|
||||
> **cold context**: no session history, no version numbers, no intent narrative —
|
||||
> only the prompt, the frozen draft path, and the writing contract. Until both
|
||||
> hold, this fixture is the gold-standard of record.
|
||||
Loading…
Add table
Add a link
Reference in a new issue