feat(templates): add feedback loop and performance scoring templates

Session 5 step 20 — templates for recurring feedback patterns with
VFM-compatible scoring. Adds FEEDBACK.md append-only log, PostToolUse
hook that detects 3+ recurring pattern tags, and per-agent scoring
that tracks trends against prior window.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Kjell Tore Guttormsen 2026-04-12 06:51:37 +02:00
commit d743ec7fbf
4 changed files with 404 additions and 0 deletions

View file

@ -0,0 +1,73 @@
# Feedback Loop
Systematic feedback collection and performance scoring for agent pipelines.
## How it works
1. After each pipeline run, a reviewer agent (or human) assigns a score (0100)
and categorizes any issues with a pattern tag.
2. `feedback-collector.sh` runs as a PostToolUse hook on `review_pipeline` or
`score_output` tool calls. It appends a row to `FEEDBACK.md`.
3. When 3+ rows share the same pattern tag, a recurring-pattern alert fires.
4. `performance-scorer.sh` reads `FEEDBACK.md` and `budget/cost-events.jsonl`
to compute per-agent metrics: average score, error rate, cost per run,
improvement trend (last 10 vs. previous 10 runs).
5. Agents scoring below the threshold (default 60/100) are flagged for review.
## Pattern tags
Consistent tags are required for pattern detection to work. Use the tags
defined in `FEEDBACK.md`. Add project-specific tags as needed — but be
consistent. Inconsistent tagging produces false negatives.
## Scoring → self-improvement connection
Feedback scores are the input to VFM (Value-for-Money) pre-scoring
defined in `scripts/templates/proactive/VFM-SCORING.md` (Step 11).
A low-scoring agent gets a lower VFM pre-score for future pipeline tasks,
making it less likely to be selected until its performance improves.
The feedback loop closes the improvement cycle:
1. Pipeline runs → reviewer assigns score + pattern tag
2. `feedback-collector.sh` appends to FEEDBACK.md
3. `performance-scorer.sh` flags underperforming agents
4. Developer reviews top patterns → iterates on agent prompt
5. New runs produce new feedback → trend shows improvement
6. VFM scores update automatically on next pipeline selection
## Example: prompt iteration driven by feedback
Suppose `agent-writer` repeatedly scores 45/100 with pattern `quality-low`:
```
| 2025-01-10 | doc-pipeline | agent-writer | 45/100 | Output too brief | Added detail requirement | quality-low |
| 2025-01-11 | doc-pipeline | agent-writer | 42/100 | Still too brief | Repeated instruction | quality-low |
| 2025-01-12 | doc-pipeline | agent-writer | 48/100 | Slightly better | — | quality-low |
```
After 3 rows: feedback-collector.sh fires the recurring-pattern alert.
performance-scorer.sh shows avg 45/100, error rate 100%.
Action: update agent-writer's system prompt with explicit length and
depth requirements. Next 10 runs show trend "improving (+18.3)".
## Integration
Add feedback-collector.sh as a PostToolUse hook in `.claude/settings.json`:
```json
{
"hooks": {
"PostToolUse": [{
"matcher": "review_pipeline",
"hooks": [{"type": "command", "command": "bash feedback/feedback-collector.sh"}]
}]
}
}
```
Run performance-scorer.sh on demand or as a scheduled report:
```bash
./feedback/performance-scorer.sh
./feedback/performance-scorer.sh --agent agent-writer --threshold 70
```