52 lines
2.4 KiB
Markdown
52 lines
2.4 KiB
Markdown
# Fact-Checker Fasit Fixture
|
|
|
|
Three reference claims with known ground truth, used to sanity-check the
|
|
`fact-checker` agent. Each case states the claim, the **fasit** (the correct
|
|
answer + why), and the expected risk verdict.
|
|
|
|
- 🟢 = verified true against a primary/credible source
|
|
- 🔴 = contradicted by evidence (false), or a high-risk claim asserted without support
|
|
- 🟡 = unverifiable from available sources — flagged, never guessed
|
|
|
|
This file is a *fasit*, not a test harness. The structural lint lives in
|
|
`agents/__tests__/fact-checker-fixture.test.mjs`. Whether the agent's live
|
|
output actually reproduces these verdicts is `[GATE]`/`[OPERATØR]` — it is
|
|
not self-certified.
|
|
|
|
Each case block below carries exactly one verdict emoji (in its **Verdict**
|
|
field); the prose deliberately avoids emoji so the structural lint can read a
|
|
single, unambiguous verdict per case.
|
|
|
|
---
|
|
|
|
## Case 1 — verifiable true
|
|
|
|
- **Claim:** The EU AI Act entered into force on 1 August 2024.
|
|
- **Verdict:** 🟢
|
|
- **Fasit:** True. Regulation (EU) 2024/1689 was published in the Official
|
|
Journal on 12 July 2024 and entered into force 20 days later, on
|
|
1 August 2024. This is confirmable against the primary source (EUR-Lex)
|
|
and the European Commission's own communications. A correct agent run
|
|
returns the verified verdict with a primary-source citation.
|
|
|
|
## Case 2 — verifiable false
|
|
|
|
- **Claim:** GPT-4 was developed and released by Anthropic.
|
|
- **Verdict:** 🔴
|
|
- **Fasit:** False. GPT-4 was released by OpenAI (March 2023). Anthropic
|
|
develops the Claude model family. The claim is contradicted by both
|
|
vendors' primary documentation. A correct agent run returns the high-risk
|
|
verdict and names the contradicting source — it must not soften a
|
|
contradicted claim to the unverified tier.
|
|
|
|
## Case 3 — unverifiable
|
|
|
|
- **Claim:** A Norwegian public-sector agency cut its case-handling time by
|
|
exactly 37% in Q3 2025 after deploying an internal AI assistant.
|
|
- **Verdict:** 🟡
|
|
- **Fasit:** Unverifiable. No named agency, no published report, and no
|
|
primary source exists for this precise figure; an internal operational
|
|
metric of this kind is not independently confirmable from open sources.
|
|
A correct agent run returns the unverified verdict and states explicitly
|
|
that the claim cannot be verified — it must not fill the gap by inventing
|
|
a plausible source or promoting the claim to the verified tier.
|