test(ultraplan-local): add skill-factory calibration fixtures
3 source/draft pairs that pin n-gram-overlap verdicts to representative prose: - accepted (containment 0.014 / longestRun 3) - needs-review (containment 0.211 / longestRun 12) - rejected (containment 0.676 / longestRun 74) Topics: session-start hooks, subagent delegation, output styles — proximate to the production source material the skill-factory will ingest. Plan: .claude/projects/2026-04-18-skill-factory-fase-1-mvp/plan.md (step 5)
This commit is contained in:
parent
491711119a
commit
486f544d39
7 changed files with 280 additions and 0 deletions
42
plugins/ultraplan-local/tests/fixtures/skill-factory/README.md
vendored
Normal file
42
plugins/ultraplan-local/tests/fixtures/skill-factory/README.md
vendored
Normal file
|
|
@ -0,0 +1,42 @@
|
|||
# Skill-factory calibration fixtures
|
||||
|
||||
These fixtures calibrate the IP-hygiene thresholds used by `scripts/ngram-overlap.mjs`. Each
|
||||
pair (`source-*`, `draft-*`) is hand-tuned so that the n-gram containment verdict lands in a
|
||||
specific band, anchoring the empirical thresholds against representative prose.
|
||||
|
||||
## Pairs
|
||||
|
||||
| Pair | Target verdict | Containment | Longest run | Notes |
|
||||
|------|----------------|-------------|-------------|-------|
|
||||
| `source-accepted.md` ↔ `draft-accepted.md` | **accepted** | 0.014 | 3 | Heavy paraphrase; concept-equivalent without phrase reuse |
|
||||
| `source-needs-review.md` ↔ `draft-needs-review.md` | **needs-review** | 0.211 | 12 | Mixed: paraphrased frame, retained domain phrasing |
|
||||
| `source-rejected.md` ↔ `draft-rejected.md` | **rejected** | 0.676 | 74 | Light edit on top of source; verbatim runs survive |
|
||||
|
||||
## Verdict bands
|
||||
|
||||
The verdict bands match the constants in `scripts/ngram-overlap.mjs`:
|
||||
|
||||
- **accepted** — containment < 0.15 AND longestRun < 8
|
||||
- **needs-review** — between accepted and rejected
|
||||
- **rejected** — containment ≥ 0.35 OR longestRun ≥ 15
|
||||
|
||||
If you change the thresholds in `ngram-overlap.mjs`, re-verify each fixture pair to confirm
|
||||
the calibration still holds. The fixtures are content-stable; the thresholds are the variable.
|
||||
|
||||
## Why these topics
|
||||
|
||||
The fixtures use Claude Code reference prose (session-start hooks, subagent delegation, output
|
||||
styles) so they live near the kind of source material the skill-factory will actually paraphrase
|
||||
in production. Drift between fixture domain and production domain would weaken the calibration
|
||||
signal.
|
||||
|
||||
## Regeneration
|
||||
|
||||
These files are committed to the repo as ground-truth fixtures. Do not regenerate them ad-hoc —
|
||||
edit deliberately, re-run the verification commands listed in `plan.md` Step 5, and commit
|
||||
intentionally.
|
||||
|
||||
```bash
|
||||
node scripts/ngram-overlap.mjs tests/fixtures/skill-factory/draft-accepted.md \
|
||||
tests/fixtures/skill-factory/source-accepted.md
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue