feat(templates): add feedback loop and performance scoring templates
Session 5 step 20 — templates for recurring feedback patterns with VFM-compatible scoring. Adds FEEDBACK.md append-only log, PostToolUse hook that detects 3+ recurring pattern tags, and per-agent scoring that tracks trends against prior window. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
2451dd9dfd
commit
d743ec7fbf
4 changed files with 404 additions and 0 deletions
73
scripts/templates/feedback/README.md
Normal file
73
scripts/templates/feedback/README.md
Normal file
|
|
@ -0,0 +1,73 @@
|
|||
# Feedback Loop
|
||||
|
||||
Systematic feedback collection and performance scoring for agent pipelines.
|
||||
|
||||
## How it works
|
||||
|
||||
1. After each pipeline run, a reviewer agent (or human) assigns a score (0–100)
|
||||
and categorizes any issues with a pattern tag.
|
||||
2. `feedback-collector.sh` runs as a PostToolUse hook on `review_pipeline` or
|
||||
`score_output` tool calls. It appends a row to `FEEDBACK.md`.
|
||||
3. When 3+ rows share the same pattern tag, a recurring-pattern alert fires.
|
||||
4. `performance-scorer.sh` reads `FEEDBACK.md` and `budget/cost-events.jsonl`
|
||||
to compute per-agent metrics: average score, error rate, cost per run,
|
||||
improvement trend (last 10 vs. previous 10 runs).
|
||||
5. Agents scoring below the threshold (default 60/100) are flagged for review.
|
||||
|
||||
## Pattern tags
|
||||
|
||||
Consistent tags are required for pattern detection to work. Use the tags
|
||||
defined in `FEEDBACK.md`. Add project-specific tags as needed — but be
|
||||
consistent. Inconsistent tagging produces false negatives.
|
||||
|
||||
## Scoring → self-improvement connection
|
||||
|
||||
Feedback scores are the input to VFM (Value-for-Money) pre-scoring
|
||||
defined in `scripts/templates/proactive/VFM-SCORING.md` (Step 11).
|
||||
A low-scoring agent gets a lower VFM pre-score for future pipeline tasks,
|
||||
making it less likely to be selected until its performance improves.
|
||||
|
||||
The feedback loop closes the improvement cycle:
|
||||
1. Pipeline runs → reviewer assigns score + pattern tag
|
||||
2. `feedback-collector.sh` appends to FEEDBACK.md
|
||||
3. `performance-scorer.sh` flags underperforming agents
|
||||
4. Developer reviews top patterns → iterates on agent prompt
|
||||
5. New runs produce new feedback → trend shows improvement
|
||||
6. VFM scores update automatically on next pipeline selection
|
||||
|
||||
## Example: prompt iteration driven by feedback
|
||||
|
||||
Suppose `agent-writer` repeatedly scores 45/100 with pattern `quality-low`:
|
||||
|
||||
```
|
||||
| 2025-01-10 | doc-pipeline | agent-writer | 45/100 | Output too brief | Added detail requirement | quality-low |
|
||||
| 2025-01-11 | doc-pipeline | agent-writer | 42/100 | Still too brief | Repeated instruction | quality-low |
|
||||
| 2025-01-12 | doc-pipeline | agent-writer | 48/100 | Slightly better | — | quality-low |
|
||||
```
|
||||
|
||||
After 3 rows: feedback-collector.sh fires the recurring-pattern alert.
|
||||
performance-scorer.sh shows avg 45/100, error rate 100%.
|
||||
Action: update agent-writer's system prompt with explicit length and
|
||||
depth requirements. Next 10 runs show trend "improving (+18.3)".
|
||||
|
||||
## Integration
|
||||
|
||||
Add feedback-collector.sh as a PostToolUse hook in `.claude/settings.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"PostToolUse": [{
|
||||
"matcher": "review_pipeline",
|
||||
"hooks": [{"type": "command", "command": "bash feedback/feedback-collector.sh"}]
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Run performance-scorer.sh on demand or as a scheduled report:
|
||||
|
||||
```bash
|
||||
./feedback/performance-scorer.sh
|
||||
./feedback/performance-scorer.sh --agent agent-writer --threshold 70
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue