agent-builder/scripts/templates/feedback/README.md
Kjell Tore Guttormsen d743ec7fbf feat(templates): add feedback loop and performance scoring templates
Session 5 step 20 — templates for recurring feedback patterns with
VFM-compatible scoring. Adds FEEDBACK.md append-only log, PostToolUse
hook that detects 3+ recurring pattern tags, and per-agent scoring
that tracks trends against prior window.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 06:51:37 +02:00

73 lines
2.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Feedback Loop
Systematic feedback collection and performance scoring for agent pipelines.
## How it works
1. After each pipeline run, a reviewer agent (or human) assigns a score (0100)
and categorizes any issues with a pattern tag.
2. `feedback-collector.sh` runs as a PostToolUse hook on `review_pipeline` or
`score_output` tool calls. It appends a row to `FEEDBACK.md`.
3. When 3+ rows share the same pattern tag, a recurring-pattern alert fires.
4. `performance-scorer.sh` reads `FEEDBACK.md` and `budget/cost-events.jsonl`
to compute per-agent metrics: average score, error rate, cost per run,
improvement trend (last 10 vs. previous 10 runs).
5. Agents scoring below the threshold (default 60/100) are flagged for review.
## Pattern tags
Consistent tags are required for pattern detection to work. Use the tags
defined in `FEEDBACK.md`. Add project-specific tags as needed — but be
consistent. Inconsistent tagging produces false negatives.
## Scoring → self-improvement connection
Feedback scores are the input to VFM (Value-for-Money) pre-scoring
defined in `scripts/templates/proactive/VFM-SCORING.md` (Step 11).
A low-scoring agent gets a lower VFM pre-score for future pipeline tasks,
making it less likely to be selected until its performance improves.
The feedback loop closes the improvement cycle:
1. Pipeline runs → reviewer assigns score + pattern tag
2. `feedback-collector.sh` appends to FEEDBACK.md
3. `performance-scorer.sh` flags underperforming agents
4. Developer reviews top patterns → iterates on agent prompt
5. New runs produce new feedback → trend shows improvement
6. VFM scores update automatically on next pipeline selection
## Example: prompt iteration driven by feedback
Suppose `agent-writer` repeatedly scores 45/100 with pattern `quality-low`:
```
| 2025-01-10 | doc-pipeline | agent-writer | 45/100 | Output too brief | Added detail requirement | quality-low |
| 2025-01-11 | doc-pipeline | agent-writer | 42/100 | Still too brief | Repeated instruction | quality-low |
| 2025-01-12 | doc-pipeline | agent-writer | 48/100 | Slightly better | — | quality-low |
```
After 3 rows: feedback-collector.sh fires the recurring-pattern alert.
performance-scorer.sh shows avg 45/100, error rate 100%.
Action: update agent-writer's system prompt with explicit length and
depth requirements. Next 10 runs show trend "improving (+18.3)".
## Integration
Add feedback-collector.sh as a PostToolUse hook in `.claude/settings.json`:
```json
{
"hooks": {
"PostToolUse": [{
"matcher": "review_pipeline",
"hooks": [{"type": "command", "command": "bash feedback/feedback-collector.sh"}]
}]
}
}
```
Run performance-scorer.sh on demand or as a scheduled report:
```bash
./feedback/performance-scorer.sh
./feedback/performance-scorer.sh --agent agent-writer --threshold 70
```