feat(templates): add feedback loop and performance scoring templates

Session 5 step 20 — templates for recurring feedback patterns with VFM-compatible scoring. Adds FEEDBACK.md append-only log, PostToolUse hook that detects 3+ recurring pattern tags, and per-agent scoring that tracks trends against prior window. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 06:51:37 +02:00 · 2026-04-12 06:51:37 +02:00 · d743ec7fbf
commit d743ec7fbf
parent 2451dd9dfd
4 changed files with 404 additions and 0 deletions
--- a/scripts/templates/feedback/README.md
+++ b/scripts/templates/feedback/README.md
@ -0,0 +1,73 @@
+# Feedback Loop
+
+Systematic feedback collection and performance scoring for agent pipelines.
+
+## How it works
+
+1. After each pipeline run, a reviewer agent (or human) assigns a score (0–100)
+   and categorizes any issues with a pattern tag.
+2. `feedback-collector.sh` runs as a PostToolUse hook on `review_pipeline` or
+   `score_output` tool calls. It appends a row to `FEEDBACK.md`.
+3. When 3+ rows share the same pattern tag, a recurring-pattern alert fires.
+4. `performance-scorer.sh` reads `FEEDBACK.md` and `budget/cost-events.jsonl`
+   to compute per-agent metrics: average score, error rate, cost per run,
+   improvement trend (last 10 vs. previous 10 runs).
+5. Agents scoring below the threshold (default 60/100) are flagged for review.
+
+## Pattern tags
+
+Consistent tags are required for pattern detection to work. Use the tags
+defined in `FEEDBACK.md`. Add project-specific tags as needed — but be
+consistent. Inconsistent tagging produces false negatives.
+
+## Scoring → self-improvement connection
+
+Feedback scores are the input to VFM (Value-for-Money) pre-scoring
+defined in `scripts/templates/proactive/VFM-SCORING.md` (Step 11).
+A low-scoring agent gets a lower VFM pre-score for future pipeline tasks,
+making it less likely to be selected until its performance improves.
+
+The feedback loop closes the improvement cycle:
+1. Pipeline runs → reviewer assigns score + pattern tag
+2. `feedback-collector.sh` appends to FEEDBACK.md
+3. `performance-scorer.sh` flags underperforming agents
+4. Developer reviews top patterns → iterates on agent prompt
+5. New runs produce new feedback → trend shows improvement
+6. VFM scores update automatically on next pipeline selection
+
+## Example: prompt iteration driven by feedback
+
+Suppose `agent-writer` repeatedly scores 45/100 with pattern `quality-low`:
+
+```
+| 2025-01-10 | doc-pipeline | agent-writer | 45/100 | Output too brief | Added detail requirement | quality-low |
+| 2025-01-11 | doc-pipeline | agent-writer | 42/100 | Still too brief  | Repeated instruction     | quality-low |
+| 2025-01-12 | doc-pipeline | agent-writer | 48/100 | Slightly better  | —                        | quality-low |
+```
+
+After 3 rows: feedback-collector.sh fires the recurring-pattern alert.
+performance-scorer.sh shows avg 45/100, error rate 100%.
+Action: update agent-writer's system prompt with explicit length and
+depth requirements. Next 10 runs show trend "improving (+18.3)".
+
+## Integration
+
+Add feedback-collector.sh as a PostToolUse hook in `.claude/settings.json`:
+
+```json
+{
+  "hooks": {
+    "PostToolUse": [{
+      "matcher": "review_pipeline",
+      "hooks": [{"type": "command", "command": "bash feedback/feedback-collector.sh"}]
+    }]
+  }
+}
+```
+
+Run performance-scorer.sh on demand or as a scheduled report:
+
+```bash
+./feedback/performance-scorer.sh
+./feedback/performance-scorer.sh --agent agent-writer --threshold 70
+```