Session 5 step 20 — templates for recurring feedback patterns with VFM-compatible scoring. Adds FEEDBACK.md append-only log, PostToolUse hook that detects 3+ recurring pattern tags, and per-agent scoring that tracks trends against prior window. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
73 lines
2.8 KiB
Markdown
73 lines
2.8 KiB
Markdown
# Feedback Loop
|
||
|
||
Systematic feedback collection and performance scoring for agent pipelines.
|
||
|
||
## How it works
|
||
|
||
1. After each pipeline run, a reviewer agent (or human) assigns a score (0–100)
|
||
and categorizes any issues with a pattern tag.
|
||
2. `feedback-collector.sh` runs as a PostToolUse hook on `review_pipeline` or
|
||
`score_output` tool calls. It appends a row to `FEEDBACK.md`.
|
||
3. When 3+ rows share the same pattern tag, a recurring-pattern alert fires.
|
||
4. `performance-scorer.sh` reads `FEEDBACK.md` and `budget/cost-events.jsonl`
|
||
to compute per-agent metrics: average score, error rate, cost per run,
|
||
improvement trend (last 10 vs. previous 10 runs).
|
||
5. Agents scoring below the threshold (default 60/100) are flagged for review.
|
||
|
||
## Pattern tags
|
||
|
||
Consistent tags are required for pattern detection to work. Use the tags
|
||
defined in `FEEDBACK.md`. Add project-specific tags as needed — but be
|
||
consistent. Inconsistent tagging produces false negatives.
|
||
|
||
## Scoring → self-improvement connection
|
||
|
||
Feedback scores are the input to VFM (Value-for-Money) pre-scoring
|
||
defined in `scripts/templates/proactive/VFM-SCORING.md` (Step 11).
|
||
A low-scoring agent gets a lower VFM pre-score for future pipeline tasks,
|
||
making it less likely to be selected until its performance improves.
|
||
|
||
The feedback loop closes the improvement cycle:
|
||
1. Pipeline runs → reviewer assigns score + pattern tag
|
||
2. `feedback-collector.sh` appends to FEEDBACK.md
|
||
3. `performance-scorer.sh` flags underperforming agents
|
||
4. Developer reviews top patterns → iterates on agent prompt
|
||
5. New runs produce new feedback → trend shows improvement
|
||
6. VFM scores update automatically on next pipeline selection
|
||
|
||
## Example: prompt iteration driven by feedback
|
||
|
||
Suppose `agent-writer` repeatedly scores 45/100 with pattern `quality-low`:
|
||
|
||
```
|
||
| 2025-01-10 | doc-pipeline | agent-writer | 45/100 | Output too brief | Added detail requirement | quality-low |
|
||
| 2025-01-11 | doc-pipeline | agent-writer | 42/100 | Still too brief | Repeated instruction | quality-low |
|
||
| 2025-01-12 | doc-pipeline | agent-writer | 48/100 | Slightly better | — | quality-low |
|
||
```
|
||
|
||
After 3 rows: feedback-collector.sh fires the recurring-pattern alert.
|
||
performance-scorer.sh shows avg 45/100, error rate 100%.
|
||
Action: update agent-writer's system prompt with explicit length and
|
||
depth requirements. Next 10 runs show trend "improving (+18.3)".
|
||
|
||
## Integration
|
||
|
||
Add feedback-collector.sh as a PostToolUse hook in `.claude/settings.json`:
|
||
|
||
```json
|
||
{
|
||
"hooks": {
|
||
"PostToolUse": [{
|
||
"matcher": "review_pipeline",
|
||
"hooks": [{"type": "command", "command": "bash feedback/feedback-collector.sh"}]
|
||
}]
|
||
}
|
||
}
|
||
```
|
||
|
||
Run performance-scorer.sh on demand or as a scheduled report:
|
||
|
||
```bash
|
||
./feedback/performance-scorer.sh
|
||
./feedback/performance-scorer.sh --agent agent-writer --threshold 70
|
||
```
|