feat(ultraplan-local): Spor 3 — semantic plan-critic, examples, CC features, security docs

- agents/plan-critic.md: rule #7 split into literal blockers (TBD/TODO/FIXME) + semantic rubric with 8 deferred-decision tests; calibrated against the 5-phrase corpus from the v3.1.0 quality brief - hooks/hooks.json: rebuilt from corrupted state; valid JSON, registers PreToolUse(Bash,Write), UserPromptSubmit, PostToolUse(Bash), PreCompact - hooks/scripts/session-title.mjs: NEW — sets ultra:<cmd>:<slug> session title for ultra commands (CC v2.1.94+) - hooks/scripts/post-bash-stats.mjs: NEW — appends duration_ms per Bash call to ultraexecute-stats.jsonl (CC v2.1.97+) - SECURITY.md: NEW — Forgejo private-issue reporting, supported = current minor only, scope = 4 hooks + denylist, hardening recommendations - docs/architect-bridge-test.md: NEW — manual smoke checklist for the ultraplan ↔ ultra-cc-architect bridge - examples/01-add-verbose-flag/: NEW — calibrated end-to-end (brief + research + plan + progress.json) for fork-er onramp; all four artifacts pass their validators - README.md: + Extending the plugin, + Headless multi-session tuning (MCP_CONNECTION_NONBLOCKING), + Session titles, + Per-step timing, + disableSkillShellExecution recommendation - CLAUDE.md: documents session-title.mjs and post-bash-stats.mjs - root README.md: v3.1.0 entry expanded with Spor 2+3 deliverables CC features adopted: F8, F9, F12 implemented; F3 implemented as Bash PostToolUse logger; F2 (hook 'if'-field scoping) deferred — universal protection beats reduced-scope protection for blocked commands. Tests: 109/109 green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 06:28:44 +02:00 · 2026-05-01 06:28:44 +02:00 · 9ecd225018
commit 9ecd225018
parent 34669d596c
15 changed files with 1170 additions and 59 deletions
--- a/plugins/ultraplan-local/agents/plan-critic.md
+++ b/plugins/ultraplan-local/agents/plan-critic.md
@ -80,17 +80,70 @@ You find what is wrong, what is missing, and what will break.

 ### 7. No-placeholder rule (BLOCKER-level)

-Flag as **blocker** if ANY of these are found in the plan:
- "TBD", "TODO", "FIXME" as actual plan content (not in code quotes)
- "add appropriate error handling" or similar delegated decisions
- "update as needed", "adjust accordingly", "configure appropriately"
- File paths that do not exist and are not marked "(new file)"
- "Similar to step N" without repeating the specific content
- Steps that mention >2 files without specifying the change per file
- Steps with >3 change points (too complex — should be decomposed)
+This rule has two parts: a **literal blockers** list (exact-string matches
+that always fire) and a **semantic rubric** (instruction-shaped detection
+that catches paraphrased deferrals).

-These are unconditional blockers. A plan with placeholder language cannot
-be executed without asking questions, which defeats the purpose.
+#### 7a. Literal blockers (exact-string)
+
+Flag as **blocker** if any of these strings appear in the plan as actual
+content (not inside code quotes or examples):
+
+- `TBD`
+- `TODO`
+- `FIXME`
+- `XXX` (when used as a placeholder marker)
+
+These are unconditional. If the planner had to write a placeholder marker,
+the decision was deferred.
+
+#### 7b. Semantic rubric (deferred-decision detection)
+
+Flag as **blocker** any clause that **defers a decision to the executor**.
+A clause defers a decision if executing the step requires the executor to
+choose something the plan did not specify.
+
+Apply this test to each step body, including verify/checkpoint/failure
+clauses. A clause defers a decision if any of these are true:
+
+1. **Vague modifier without referent.** The step uses "appropriate",
+   "necessary", "as needed", "where appropriate", "if relevant", "as
+   required", "suitable", "reasonable" — and the plan does not separately
+   define what counts as appropriate/necessary/etc.
+2. **Imperative without target.** The step says to do something
+   ("implement", "add", "wire up", "handle", "make production-ready",
+   "configure", "set up", "integrate") without naming the specific files,
+   functions, edits, or values involved.
+3. **Forward reference without expansion.** The step says "similar to step
+   N" or "follow the same pattern" without restating the specific changes
+   for this step's files.
+4. **Volume/quality without spec.** The step says "add tests" or "improve
+   coverage" without naming what to test or what coverage threshold counts
+   as success.
+5. **Edge cases delegated.** The step says "handle edge cases" or
+   "add error handling" without enumerating the cases or the handling
+   strategy.
+6. **Production-readiness delegated.** The step says "make this
+   production-ready", "harden it", "polish it" without listing the
+   concrete changes that constitute production-ready/hardened/polished.
+7. **Path mismatch.** File paths that do not exist and are not marked
+   `(new file)`.
+8. **Too many edits per step.** Steps that mention >2 files without
+   specifying the change per file, or steps with >3 distinct change
+   points (decompose).
+
+Calibration corpus (plan-critic must catch all five — these are paraphrased
+deferrals that the v3.0 exact-string blacklist missed):
+
+- "implement as needed" → vague modifier without referent (rule 1)
+- "wire it up" → imperative without target (rule 2)
+- "make it production-ready" → production-readiness delegated (rule 6)
+- "add tests where appropriate" → volume/quality without spec + vague
+  modifier (rules 1 + 4)
+- "handle edge cases" → edge cases delegated (rule 5)
+
+A plan with deferred decisions cannot be executed without asking
+questions, which defeats the purpose.

 ### 8. Verification gaps