Kjell Tore Guttormsen d743ec7fbf feat(templates): add feedback loop and performance scoring templates

Session 5 step 20 — templates for recurring feedback patterns with
VFM-compatible scoring. Adds FEEDBACK.md append-only log, PostToolUse
hook that detects 3+ recurring pattern tags, and per-agent scoring
that tracks trends against prior window.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-12 06:51:37 +02:00

2.8 KiB

Raw Blame History

Feedback Loop

Systematic feedback collection and performance scoring for agent pipelines.

How it works

After each pipeline run, a reviewer agent (or human) assigns a score (0–100) and categorizes any issues with a pattern tag.
feedback-collector.sh runs as a PostToolUse hook on review_pipeline or score_output tool calls. It appends a row to FEEDBACK.md.
When 3+ rows share the same pattern tag, a recurring-pattern alert fires.
performance-scorer.sh reads FEEDBACK.md and budget/cost-events.jsonl to compute per-agent metrics: average score, error rate, cost per run, improvement trend (last 10 vs. previous 10 runs).
Agents scoring below the threshold (default 60/100) are flagged for review.

Pattern tags

Consistent tags are required for pattern detection to work. Use the tags defined in FEEDBACK.md. Add project-specific tags as needed — but be consistent. Inconsistent tagging produces false negatives.

Scoring → self-improvement connection

Feedback scores are the input to VFM (Value-for-Money) pre-scoring defined in scripts/templates/proactive/VFM-SCORING.md (Step 11). A low-scoring agent gets a lower VFM pre-score for future pipeline tasks, making it less likely to be selected until its performance improves.

The feedback loop closes the improvement cycle:

Pipeline runs → reviewer assigns score + pattern tag
feedback-collector.sh appends to FEEDBACK.md
performance-scorer.sh flags underperforming agents
Developer reviews top patterns → iterates on agent prompt
New runs produce new feedback → trend shows improvement
VFM scores update automatically on next pipeline selection

Example: prompt iteration driven by feedback

Suppose agent-writer repeatedly scores 45/100 with pattern quality-low:

| 2025-01-10 | doc-pipeline | agent-writer | 45/100 | Output too brief | Added detail requirement | quality-low |
| 2025-01-11 | doc-pipeline | agent-writer | 42/100 | Still too brief  | Repeated instruction     | quality-low |
| 2025-01-12 | doc-pipeline | agent-writer | 48/100 | Slightly better  | —                        | quality-low |

After 3 rows: feedback-collector.sh fires the recurring-pattern alert. performance-scorer.sh shows avg 45/100, error rate 100%. Action: update agent-writer's system prompt with explicit length and depth requirements. Next 10 runs show trend "improving (+18.3)".

Integration

Add feedback-collector.sh as a PostToolUse hook in .claude/settings.json:

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "review_pipeline",
      "hooks": [{"type": "command", "command": "bash feedback/feedback-collector.sh"}]
    }]
  }
}

Run performance-scorer.sh on demand or as a scheduled report:

./feedback/performance-scorer.sh
./feedback/performance-scorer.sh --agent agent-writer --threshold 70

2.8 KiB Raw Blame History Unescape Escape