ktg-plugin-marketplace/plugins/ultraplan-local/tests/fixtures/skill-factory
Kjell Tore Guttormsen 486f544d39 test(ultraplan-local): add skill-factory calibration fixtures
3 source/draft pairs that pin n-gram-overlap verdicts to representative prose:
- accepted (containment 0.014 / longestRun 3)
- needs-review (containment 0.211 / longestRun 12)
- rejected (containment 0.676 / longestRun 74)

Topics: session-start hooks, subagent delegation, output styles — proximate to
the production source material the skill-factory will ingest.

Plan: .claude/projects/2026-04-18-skill-factory-fase-1-mvp/plan.md (step 5)
2026-04-18 15:16:28 +02:00
..
draft-accepted.md test(ultraplan-local): add skill-factory calibration fixtures 2026-04-18 15:16:28 +02:00
draft-needs-review.md test(ultraplan-local): add skill-factory calibration fixtures 2026-04-18 15:16:28 +02:00
draft-rejected.md test(ultraplan-local): add skill-factory calibration fixtures 2026-04-18 15:16:28 +02:00
README.md test(ultraplan-local): add skill-factory calibration fixtures 2026-04-18 15:16:28 +02:00
source-accepted.md test(ultraplan-local): add skill-factory calibration fixtures 2026-04-18 15:16:28 +02:00
source-needs-review.md test(ultraplan-local): add skill-factory calibration fixtures 2026-04-18 15:16:28 +02:00
source-rejected.md test(ultraplan-local): add skill-factory calibration fixtures 2026-04-18 15:16:28 +02:00

Skill-factory calibration fixtures

These fixtures calibrate the IP-hygiene thresholds used by scripts/ngram-overlap.mjs. Each pair (source-*, draft-*) is hand-tuned so that the n-gram containment verdict lands in a specific band, anchoring the empirical thresholds against representative prose.

Pairs

Pair Target verdict Containment Longest run Notes
source-accepted.mddraft-accepted.md accepted 0.014 3 Heavy paraphrase; concept-equivalent without phrase reuse
source-needs-review.mddraft-needs-review.md needs-review 0.211 12 Mixed: paraphrased frame, retained domain phrasing
source-rejected.mddraft-rejected.md rejected 0.676 74 Light edit on top of source; verbatim runs survive

Verdict bands

The verdict bands match the constants in scripts/ngram-overlap.mjs:

  • accepted — containment < 0.15 AND longestRun < 8
  • needs-review — between accepted and rejected
  • rejected — containment ≥ 0.35 OR longestRun ≥ 15

If you change the thresholds in ngram-overlap.mjs, re-verify each fixture pair to confirm the calibration still holds. The fixtures are content-stable; the thresholds are the variable.

Why these topics

The fixtures use Claude Code reference prose (session-start hooks, subagent delegation, output styles) so they live near the kind of source material the skill-factory will actually paraphrase in production. Drift between fixture domain and production domain would weaken the calibration signal.

Regeneration

These files are committed to the repo as ground-truth fixtures. Do not regenerate them ad-hoc — edit deliberately, re-run the verification commands listed in plan.md Step 5, and commit intentionally.

node scripts/ngram-overlap.mjs tests/fixtures/skill-factory/draft-accepted.md \
  tests/fixtures/skill-factory/source-accepted.md