Session 5 step 20 — templates for recurring feedback patterns with VFM-compatible scoring. Adds FEEDBACK.md append-only log, PostToolUse hook that detects 3+ recurring pattern tags, and per-agent scoring that tracks trends against prior window. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| feedback-collector.sh | ||
| FEEDBACK.md | ||
| performance-scorer.sh | ||
| README.md | ||
Feedback Loop
Systematic feedback collection and performance scoring for agent pipelines.
How it works
- After each pipeline run, a reviewer agent (or human) assigns a score (0–100) and categorizes any issues with a pattern tag.
feedback-collector.shruns as a PostToolUse hook onreview_pipelineorscore_outputtool calls. It appends a row toFEEDBACK.md.- When 3+ rows share the same pattern tag, a recurring-pattern alert fires.
performance-scorer.shreadsFEEDBACK.mdandbudget/cost-events.jsonlto compute per-agent metrics: average score, error rate, cost per run, improvement trend (last 10 vs. previous 10 runs).- Agents scoring below the threshold (default 60/100) are flagged for review.
Pattern tags
Consistent tags are required for pattern detection to work. Use the tags
defined in FEEDBACK.md. Add project-specific tags as needed — but be
consistent. Inconsistent tagging produces false negatives.
Scoring → self-improvement connection
Feedback scores are the input to VFM (Value-for-Money) pre-scoring
defined in scripts/templates/proactive/VFM-SCORING.md (Step 11).
A low-scoring agent gets a lower VFM pre-score for future pipeline tasks,
making it less likely to be selected until its performance improves.
The feedback loop closes the improvement cycle:
- Pipeline runs → reviewer assigns score + pattern tag
feedback-collector.shappends to FEEDBACK.mdperformance-scorer.shflags underperforming agents- Developer reviews top patterns → iterates on agent prompt
- New runs produce new feedback → trend shows improvement
- VFM scores update automatically on next pipeline selection
Example: prompt iteration driven by feedback
Suppose agent-writer repeatedly scores 45/100 with pattern quality-low:
| 2025-01-10 | doc-pipeline | agent-writer | 45/100 | Output too brief | Added detail requirement | quality-low |
| 2025-01-11 | doc-pipeline | agent-writer | 42/100 | Still too brief | Repeated instruction | quality-low |
| 2025-01-12 | doc-pipeline | agent-writer | 48/100 | Slightly better | — | quality-low |
After 3 rows: feedback-collector.sh fires the recurring-pattern alert. performance-scorer.sh shows avg 45/100, error rate 100%. Action: update agent-writer's system prompt with explicit length and depth requirements. Next 10 runs show trend "improving (+18.3)".
Integration
Add feedback-collector.sh as a PostToolUse hook in .claude/settings.json:
{
"hooks": {
"PostToolUse": [{
"matcher": "review_pipeline",
"hooks": [{"type": "command", "command": "bash feedback/feedback-collector.sh"}]
}]
}
}
Run performance-scorer.sh on demand or as a scheduled report:
./feedback/performance-scorer.sh
./feedback/performance-scorer.sh --agent agent-writer --threshold 70