feat(ultraplan-local): Spor 3 — semantic plan-critic, examples, CC features, security docs
- agents/plan-critic.md: rule #7 split into literal blockers (TBD/TODO/FIXME) + semantic rubric with 8 deferred-decision tests; calibrated against the 5-phrase corpus from the v3.1.0 quality brief - hooks/hooks.json: rebuilt from corrupted state; valid JSON, registers PreToolUse(Bash,Write), UserPromptSubmit, PostToolUse(Bash), PreCompact - hooks/scripts/session-title.mjs: NEW — sets ultra:<cmd>:<slug> session title for ultra commands (CC v2.1.94+) - hooks/scripts/post-bash-stats.mjs: NEW — appends duration_ms per Bash call to ultraexecute-stats.jsonl (CC v2.1.97+) - SECURITY.md: NEW — Forgejo private-issue reporting, supported = current minor only, scope = 4 hooks + denylist, hardening recommendations - docs/architect-bridge-test.md: NEW — manual smoke checklist for the ultraplan ↔ ultra-cc-architect bridge - examples/01-add-verbose-flag/: NEW — calibrated end-to-end (brief + research + plan + progress.json) for fork-er onramp; all four artifacts pass their validators - README.md: + Extending the plugin, + Headless multi-session tuning (MCP_CONNECTION_NONBLOCKING), + Session titles, + Per-step timing, + disableSkillShellExecution recommendation - CLAUDE.md: documents session-title.mjs and post-bash-stats.mjs - root README.md: v3.1.0 entry expanded with Spor 2+3 deliverables CC features adopted: F8, F9, F12 implemented; F3 implemented as Bash PostToolUse logger; F2 (hook 'if'-field scoping) deferred — universal protection beats reduced-scope protection for blocked commands. Tests: 109/109 green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
34669d596c
commit
9ecd225018
15 changed files with 1170 additions and 59 deletions
|
|
@ -80,17 +80,70 @@ You find what is wrong, what is missing, and what will break.
|
|||
|
||||
### 7. No-placeholder rule (BLOCKER-level)
|
||||
|
||||
Flag as **blocker** if ANY of these are found in the plan:
|
||||
- "TBD", "TODO", "FIXME" as actual plan content (not in code quotes)
|
||||
- "add appropriate error handling" or similar delegated decisions
|
||||
- "update as needed", "adjust accordingly", "configure appropriately"
|
||||
- File paths that do not exist and are not marked "(new file)"
|
||||
- "Similar to step N" without repeating the specific content
|
||||
- Steps that mention >2 files without specifying the change per file
|
||||
- Steps with >3 change points (too complex — should be decomposed)
|
||||
This rule has two parts: a **literal blockers** list (exact-string matches
|
||||
that always fire) and a **semantic rubric** (instruction-shaped detection
|
||||
that catches paraphrased deferrals).
|
||||
|
||||
These are unconditional blockers. A plan with placeholder language cannot
|
||||
be executed without asking questions, which defeats the purpose.
|
||||
#### 7a. Literal blockers (exact-string)
|
||||
|
||||
Flag as **blocker** if any of these strings appear in the plan as actual
|
||||
content (not inside code quotes or examples):
|
||||
|
||||
- `TBD`
|
||||
- `TODO`
|
||||
- `FIXME`
|
||||
- `XXX` (when used as a placeholder marker)
|
||||
|
||||
These are unconditional. If the planner had to write a placeholder marker,
|
||||
the decision was deferred.
|
||||
|
||||
#### 7b. Semantic rubric (deferred-decision detection)
|
||||
|
||||
Flag as **blocker** any clause that **defers a decision to the executor**.
|
||||
A clause defers a decision if executing the step requires the executor to
|
||||
choose something the plan did not specify.
|
||||
|
||||
Apply this test to each step body, including verify/checkpoint/failure
|
||||
clauses. A clause defers a decision if any of these are true:
|
||||
|
||||
1. **Vague modifier without referent.** The step uses "appropriate",
|
||||
"necessary", "as needed", "where appropriate", "if relevant", "as
|
||||
required", "suitable", "reasonable" — and the plan does not separately
|
||||
define what counts as appropriate/necessary/etc.
|
||||
2. **Imperative without target.** The step says to do something
|
||||
("implement", "add", "wire up", "handle", "make production-ready",
|
||||
"configure", "set up", "integrate") without naming the specific files,
|
||||
functions, edits, or values involved.
|
||||
3. **Forward reference without expansion.** The step says "similar to step
|
||||
N" or "follow the same pattern" without restating the specific changes
|
||||
for this step's files.
|
||||
4. **Volume/quality without spec.** The step says "add tests" or "improve
|
||||
coverage" without naming what to test or what coverage threshold counts
|
||||
as success.
|
||||
5. **Edge cases delegated.** The step says "handle edge cases" or
|
||||
"add error handling" without enumerating the cases or the handling
|
||||
strategy.
|
||||
6. **Production-readiness delegated.** The step says "make this
|
||||
production-ready", "harden it", "polish it" without listing the
|
||||
concrete changes that constitute production-ready/hardened/polished.
|
||||
7. **Path mismatch.** File paths that do not exist and are not marked
|
||||
`(new file)`.
|
||||
8. **Too many edits per step.** Steps that mention >2 files without
|
||||
specifying the change per file, or steps with >3 distinct change
|
||||
points (decompose).
|
||||
|
||||
Calibration corpus (plan-critic must catch all five — these are paraphrased
|
||||
deferrals that the v3.0 exact-string blacklist missed):
|
||||
|
||||
- "implement as needed" → vague modifier without referent (rule 1)
|
||||
- "wire it up" → imperative without target (rule 2)
|
||||
- "make it production-ready" → production-readiness delegated (rule 6)
|
||||
- "add tests where appropriate" → volume/quality without spec + vague
|
||||
modifier (rules 1 + 4)
|
||||
- "handle edge cases" → edge cases delegated (rule 5)
|
||||
|
||||
A plan with deferred decisions cannot be executed without asking
|
||||
questions, which defeats the purpose.
|
||||
|
||||
### 8. Verification gaps
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue