ktg-plugin-marketplace/plugins/linkedin-studio/docs/remediation/overlap-measurement.md
Kjell Tore Guttormsen 0d3da7828d docs(linkedin-studio): measure long-form review-pass overlap, trim where unjustified
Steg 20 (remediation Wave 4 / S5, SOLO): measure whether the 7-agent long-form
review stack carries redundant gates. Method: cross-reference each agent's check
taxonomy against its in-repo fasit fixture; four fixtures (editorial, content,
language, fact-reviewer) target the SAME Del 4 edition, enabling a real
cross-gate overlap comparison on one piece (not a live run — fixtures' own
live-run notes require a reload + cross-repo Maskinrommet access, out of scope).

Finding: every gate has >=1 unique catch on Del 4. The four genuine overlaps
(verbatim repetition, the Vi/Vi-i-Nav quote, the postulated number, the
small-orgs thread) are each justified — a cold re-take (Endring 9's reason to
exist), the same symptom via a different operation (flag-absence vs web-verify),
or two distinct defects sharing a surface topic — with no subsumption either way.
The fact-checker <-> fact-reviewer overlap is load-bearing (the pivot premise
arrived after Step 5, so only the cold re-run caught it).

Decision: NO TRIM. voice-scrubber has no fixture -> inconclusive; redundancy
retained (Step 20 On-failure = skip). Counts unchanged 19 agents / 27 commands;
count contract (EXPECT_AGENTS=19) untouched. test-runner 62/62 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 07:17:55 +02:00

14 KiB
Raw Permalink Blame History

Long-form Review-Pass Overlap Measurement — Steg 20

Remediation Voyage, Wave 4 / S5. Measures whether the long-form review stack carries redundant gates, and trims only where a gate catches nothing the others don't. Written 2026-05-30, SOLO (no subagent fan-out).

The question and the trim rule

The long-form pipeline runs seven review agents. Endring 9 (v3.1.0) added a cold/headless package (content-reviewer, language-reviewer, fact-reviewer) whose agent prompts argue, in their own words, that they overlap the in-session gates on purpose (fact-reviewer: «the redundancy is load-bearing, not waste»; language-reviewer anti-pattern: «'De-duplicate' yourself against editorial-reviewer — the overlap is the cold re-take»). Steg 20 tests that claim against evidence instead of taking it on faith:

Trim a gate ONLY where it catches nothing the others don't (then merge/remove it + update the count contract). If the redundancy is justified, record that and keep it. If the fixture is insufficient to decide, record «inconclusive; redundancy retained» and do NOT trim. (Step 20 On-failure = skip the trim.)

Method — and its honest limit

I measured the documented catch-sets: each agent's check taxonomy (the agent .md) cross-referenced against its in-repo fasit fixture (agents/fixtures/*-cases.md). I did not run the agents live: every fixture's own live-run note states a live cold run needs (a) a session reload and (b) read access to the frozen Del 4 draft in the Maskinrommet series folder — cross-repo, explicitly out of scope this session. By each fixture's own declaration the fasit is «the gold-standard of record» until both hold, so the fasit catch-sets are the legitimate measurement surface.

The lucky break that makes this more than taxonomy-reasoning: four of the six fixtures target the same edition — Del 4 (Security Champions, Maskinrommet). editorial-reviewer reviewed v5 (2026-05-28, in-session); the cold trio (content/language/fact-reviewer) re-read the frozen/pivoted version (2026-05-29). That shared edition lets me compare what each gate actually caught on one piece — a real cross-gate overlap measurement, not just a boundary restatement.

Fixture Edition under review Cases Enables shared-edition compare?
editorial-reviewer-cases.md Del 4 v5 (28.05, in-session) 8 yes
content-reviewer-cases.md Del 4 frozen/pivoted (29.05, cold) 6 yes
language-reviewer-cases.md Del 4 frozen (29.05, cold) 6 yes
fact-reviewer-cases.md Del 4 frozen/pivoted (29.05, cold) 6 yes
persona-reviewer-cases.md separate jargon-wall sample (+ documented Del 4 behaviour) 6 axes partial
fact-checker-cases.md 3 generic reference claims (not Del 4) 3 role only
voice-scrubber NO FIXTURE inconclusive

The seven agents — axis map

Agent Step Axis (the one question it answers) When Fixture
fact-checker 5 factual truth — is it true? in-session, moving draft generic (3 claims)
editorial-reviewer 5.5 prose craft + narrative architecture — is it well-made? in-session Del 4 v5
persona-reviewer 2.5/6/9 reader response — does it land? in-session sample + Del 4 behaviour
voice-scrubber 4 de-AI + chronicle voice drift — does it sound like the author? in-session (applies edits) none
content-reviewer 6.5 argument integrity — does the reasoning hold? cold/frozen Del 4 frozen
language-reviewer 6.5 Norwegian language — does it read clean? cold/frozen Del 4 frozen
fact-reviewer 6.5 factual truth, re-verified — is every claim, incl. pivot, true? cold/frozen+pivoted Del 4 frozen

Per-reviewer catch table (what each gate caught on the fixtures)

Legend: U = unique catch (no other gate's fixture surfaces this defect) · O = overlaps another gate's catch (overlap analysed in the matrix below).

editorial-reviewer — Del 4 v5 (8 catches)

# Check Defect caught Sev U/O
1 A1 abstract figure never instantiated (craft/vividness) REWORK O → content C4 (adjacent)
2 P3 postulated number, no source/hedge — flags absence, no search REWORK O → fact-reviewer F3
3 A2 trust-effect hypothesis with no SDT/theory anchor BLOCK U
4 A3 broken series-title symmetry (part floats free) REWORK U
5 A4 small-business addressee stranded — no usable action BLOCK O → content C5 (adjacent)
6 P2 verbatim repetition REWORK O → language L1
7 P1 em-dash over-density REWORK U
8 P4 prose-level internal contradiction (two passages) BLOCK O → content C3 (adjacent)

content-reviewer — Del 4 frozen (6 catches) — argument-integritet

# Check Defect caught Sev U/O
1 C2 Security-Champions pivot premise asserted unsupported BLOCK U
2 C5 unanswered «what about small orgs?» objection BLOCK O → editorial A4 (adjacent)
3 C1 logical hole «Champions finnes» → «dømmekraft bevart» REWORK U
4 C4 role section needs one concrete org for the argument REWORK O → editorial A1 (adjacent)
5 C3 recommendation delegates the judgment the series premise rules out BLOCK U
6 C2 gevinst assumes widespread org maturity REWORK U

language-reviewer — Del 4 frozen (6 catches) — norsk-språkkvalitet

# Check Defect caught Sev U/O
1 L4 quote error «Vi» vs «Vi i Nav» (wording misrepresents source) BLOCK O → fact-reviewer F2
2 L2 anglicism «adressere problemet» REWORK U
3 L2 anglicism «på en daglig basis» REWORK U
4 L1 verbatim repetition 3× across §1/§4/§6 REWORK O → editorial P2
5 L3 «det vises til» kanselli-stil in a personal chronicle REWORK U
6 L5 monotone cadence (5 same-length sentences) NICE U

fact-reviewer — Del 4 frozen/pivoted (6 catches) — faktisk-korrekthet (cold)

# Check Defect caught Verdict U/O
1 F1 pivot premise never met Step 5 (PIVOT-RISK headline) 🔴 U
2 F1+F2 misattribution to wrong originator 🔴 U
3 F2 quote precision «Vi» vs «Vi i Nav» (vs source) 🟡 O → language L4
4 F3 postulated number, no provenance — searches, finds none 🟡 O → editorial P3
5 F1 «Security Champions» as a settled standard that varies per org (PIVOT-RISK) 🔴 U
6 F4+F3 secondary source for a precise figure («~a third» ≠ «37 %») 🟡 U

fact-checker — role on Del 4 (generic fixture, 3 claims)

Catches truth defects cheaply and early, on the moving draft (Step 5). Its fixture is 3 generic ground-truth claims (EU AI Act 🟢 / GPT-4-by-Anthropic 🔴 / unverifiable 37 % 🟡), not Del 4. Its measured role on Del 4 is documented by the fact-reviewer fixture: the Security-Champions pivot arrived after the Step 5 sweep, so fact-checker structurally never saw the pivot premise. It is necessary (early/cheap truth gate) but provably insufficient — which is the entire reason fact-reviewer exists. U by pipeline position.

persona-reviewer — resonance/response

On Del 4 the persona sweep returned 15 flags across 3 personas and every persona PASS / ready-to-publish (per the editorial fixture). Its own fixture (jargon-wall sample) shows the 6 response axes (Krok IKKE, Leder-takeaway IKKE, …). Catches reader-response defects no other gate measures. U by axis.

voice-scrubber — de-AI + chronicle voice drift

No fixture exists. Its axis (mechanical AI-tells + Norwegian-chronicle voice drift, judged against approved Norwegian editions) is measured by no other gate, and uniquely it applies edits (Pass 1) and maintains a drift-log — it is not even part of the review-report package. Overlap inconclusive from in-repo fixtures; see decision below.

Cross-gate overlap matrix (the shared Del 4 edition)

Four genuine overlaps surface on Del 4. The decisive test for each: does either gate's catch-set subsume the other's? In every case — no.

# Defect Gates that catch it Same defect or same symptom? Subsumption? Justification
O1 verbatim repetition editorial P2 (in-session, v5) ↔ language L1 (cold, frozen) same defect neither Cold re-take. Editorial caught it in-session sharing the author's framing; language re-caught it cold on the frozen version. The agent prompts mandate this overlap explicitly. The value is the independent reading, not a second checklist.
O2 quote «Vi» vs «Vi i Nav» language L4 (BLOCK) ↔ fact-reviewer F2 (🟡) same defect, two operations neither language flags the wording misrepresenting the source without web access; fact-reviewer verifies against the actual source via web search. Different tools, different severities — one catches it if the source is unreachable, the other if the wording reads clean but the source differs.
O3 postulated number editorial P3 (REWORK) ↔ fact-reviewer F3 (🟡) same symptom, two operations neither editorial flags the absence of a source/hedge (no search); fact-reviewer searches for provenance and finds none. The prompts draw this boundary by hand. A bare number with a findable source passes editorial (it has none inline) but is exactly what fact-reviewer's search resolves.
O4 small-orgs thread editorial A4 (stranded addressee) ↔ content C5 (unanswered objection) adjacent — different defects n/a Same surface topic (small orgs) decomposes into two genuinely different defects: A4 = «the small-business reader leaves with no action» (architecture); C5 = «the argument never meets the obvious counter and collapses for that class» (logic). Not redundancy — two gates needed to see both faces.

Plus the fact-checker ↔ fact-reviewer time-axis overlap (deliberate, not in the matrix because it spans pipeline stages, not one defect): Step 5 runs in-session on the moving draft; Step 6.5 re-runs cold on the frozen/pivoted draft. Case 1 (pivot premise) is the proof it's load-bearing — the pivot arrived after Step 5, so only the cold re-run could catch it. Collapsing the two would re-open the exact gap that motivated Endring 9.

Adjacent (not overlap) pairs the prompts separate by design and the Del 4 cases confirm as distinct defects: editorial P4 (prose contradiction) vs content C3 (argument-logic contradiction); editorial A1 (vividness) vs content C4 (a load-bearing claim a skeptic won't believe abstractly).

Unique catch per gate — none is a subset of another

Every one of the seven has ≥1 catch no other gate's fixture surfaces:

  • fact-checker — early/cheap truth on the moving draft; provably insufficient alone (never saw the pivot), which is the case for keeping fact-reviewer.
  • editorial-reviewerA2 theory-anchor and A3 series-title symmetry are pure blind spots no other gate measures (and were persona-blind on Del 4).
  • persona-reviewer — reader response (Krok/resonans/takeaway); the only gate on that axis. The «PASS yet 8 editorial + 6 argument + 6 language points» result is the whole motivation for the stack.
  • content-reviewer — argument logic (C1/C2/C3/C5 all unique); the only gate that asks does the reasoning hold?
  • language-reviewer — anglicisms, kanselli-stil, cadence; the only gate on Norwegian idiom/register/rhythm.
  • fact-reviewer — the pivot-risk catches (Cases 1, 5); the only cold post-pivot truth re-run.
  • voice-scrubber — de-AI tells + chronicle voice drift; the only gate that applies edits and keeps a drift-log.

Trim decision — NO TRIM

No gate catches nothing the others don't. Every gate has ≥1 unique catch on the fixtures, and every one of the four genuine overlaps (O1O4) is justified — a cold re-take (O1), the same symptom via a different operation (O2, O3), or two distinct defects sharing a surface topic (O4) — with no subsumption in any direction. The fact-checker ↔ fact-reviewer overlap is load-bearing by construction (proven by the pivot-premise catch). Per the Steg 20 rule this is the «redundancy is justified — record and keep» case for all measurable gates.

voice-scrubber specifically: no in-repo fixture, so its overlap cannot be measured here → «measurement inconclusive; redundancy retained pending a real edition» (Step 20 On-failure = skip the trim). Its axis is orthogonal by design and it is not part of the review-report package, so there is no redundancy claim to adjudicate even in principle.

Consequence for the count contract: no gate removed → counts unchanged.

Count Value Touched?
Agents 19 no
Commands 27 no

The count contract (EXPECT_AGENTS=19, the CLAUDE.md/README agent tables) is not modified this step — there is nothing to update because nothing was trimmed. Steg 21 (version bump + count recompute) inherits an unchanged 19/27 baseline.

Verification

  • test -f docs/remediation/overlap-measurement.md → present (this file).
  • Per-reviewer catch table present (one per gate) + cross-gate overlap matrix.
  • No gate removed → count contract untouched; EXPECT_AGENTS stays 19. (The trim branch's test-runner.sh exit 0 + same-commit count update is N/A — no trim.)
  • bash scripts/test-runner.sh run for hygiene regardless → expect exit 0 (repo green, nothing changed but a doc).