feat(ultraplan-local): Spor 3 — semantic plan-critic, examples, CC features, security docs

- agents/plan-critic.md: rule #7 split into literal blockers (TBD/TODO/FIXME)
  + semantic rubric with 8 deferred-decision tests; calibrated against the
  5-phrase corpus from the v3.1.0 quality brief
- hooks/hooks.json: rebuilt from corrupted state; valid JSON, registers
  PreToolUse(Bash,Write), UserPromptSubmit, PostToolUse(Bash), PreCompact
- hooks/scripts/session-title.mjs: NEW — sets ultra:<cmd>:<slug> session
  title for ultra commands (CC v2.1.94+)
- hooks/scripts/post-bash-stats.mjs: NEW — appends duration_ms per Bash
  call to ultraexecute-stats.jsonl (CC v2.1.97+)
- SECURITY.md: NEW — Forgejo private-issue reporting, supported = current
  minor only, scope = 4 hooks + denylist, hardening recommendations
- docs/architect-bridge-test.md: NEW — manual smoke checklist for the
  ultraplan ↔ ultra-cc-architect bridge
- examples/01-add-verbose-flag/: NEW — calibrated end-to-end (brief +
  research + plan + progress.json) for fork-er onramp; all four artifacts
  pass their validators
- README.md: + Extending the plugin, + Headless multi-session tuning
  (MCP_CONNECTION_NONBLOCKING), + Session titles, + Per-step timing,
  + disableSkillShellExecution recommendation
- CLAUDE.md: documents session-title.mjs and post-bash-stats.mjs
- root README.md: v3.1.0 entry expanded with Spor 2+3 deliverables

CC features adopted: F8, F9, F12 implemented; F3 implemented as Bash
PostToolUse logger; F2 (hook 'if'-field scoping) deferred — universal
protection beats reduced-scope protection for blocked commands.

Tests: 109/109 green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Kjell Tore Guttormsen 2026-05-01 06:28:44 +02:00
commit 9ecd225018
15 changed files with 1170 additions and 59 deletions

View file

@ -80,17 +80,70 @@ You find what is wrong, what is missing, and what will break.
### 7. No-placeholder rule (BLOCKER-level)
Flag as **blocker** if ANY of these are found in the plan:
- "TBD", "TODO", "FIXME" as actual plan content (not in code quotes)
- "add appropriate error handling" or similar delegated decisions
- "update as needed", "adjust accordingly", "configure appropriately"
- File paths that do not exist and are not marked "(new file)"
- "Similar to step N" without repeating the specific content
- Steps that mention >2 files without specifying the change per file
- Steps with >3 change points (too complex — should be decomposed)
This rule has two parts: a **literal blockers** list (exact-string matches
that always fire) and a **semantic rubric** (instruction-shaped detection
that catches paraphrased deferrals).
These are unconditional blockers. A plan with placeholder language cannot
be executed without asking questions, which defeats the purpose.
#### 7a. Literal blockers (exact-string)
Flag as **blocker** if any of these strings appear in the plan as actual
content (not inside code quotes or examples):
- `TBD`
- `TODO`
- `FIXME`
- `XXX` (when used as a placeholder marker)
These are unconditional. If the planner had to write a placeholder marker,
the decision was deferred.
#### 7b. Semantic rubric (deferred-decision detection)
Flag as **blocker** any clause that **defers a decision to the executor**.
A clause defers a decision if executing the step requires the executor to
choose something the plan did not specify.
Apply this test to each step body, including verify/checkpoint/failure
clauses. A clause defers a decision if any of these are true:
1. **Vague modifier without referent.** The step uses "appropriate",
"necessary", "as needed", "where appropriate", "if relevant", "as
required", "suitable", "reasonable" — and the plan does not separately
define what counts as appropriate/necessary/etc.
2. **Imperative without target.** The step says to do something
("implement", "add", "wire up", "handle", "make production-ready",
"configure", "set up", "integrate") without naming the specific files,
functions, edits, or values involved.
3. **Forward reference without expansion.** The step says "similar to step
N" or "follow the same pattern" without restating the specific changes
for this step's files.
4. **Volume/quality without spec.** The step says "add tests" or "improve
coverage" without naming what to test or what coverage threshold counts
as success.
5. **Edge cases delegated.** The step says "handle edge cases" or
"add error handling" without enumerating the cases or the handling
strategy.
6. **Production-readiness delegated.** The step says "make this
production-ready", "harden it", "polish it" without listing the
concrete changes that constitute production-ready/hardened/polished.
7. **Path mismatch.** File paths that do not exist and are not marked
`(new file)`.
8. **Too many edits per step.** Steps that mention >2 files without
specifying the change per file, or steps with >3 distinct change
points (decompose).
Calibration corpus (plan-critic must catch all five — these are paraphrased
deferrals that the v3.0 exact-string blacklist missed):
- "implement as needed" → vague modifier without referent (rule 1)
- "wire it up" → imperative without target (rule 2)
- "make it production-ready" → production-readiness delegated (rule 6)
- "add tests where appropriate" → volume/quality without spec + vague
modifier (rules 1 + 4)
- "handle edge cases" → edge cases delegated (rule 5)
A plan with deferred decisions cannot be executed without asking
questions, which defeats the purpose.
### 8. Verification gaps