# Feedback Loop

Systematic feedback collection and performance scoring for agent pipelines.

## How it works

1. After each pipeline run, a reviewer agent (or human) assigns a score (0–100)
   and categorizes any issues with a pattern tag.
2. `feedback-collector.sh` runs as a PostToolUse hook on `review_pipeline` or
   `score_output` tool calls. It appends a row to `FEEDBACK.md`.
3. When 3+ rows share the same pattern tag, a recurring-pattern alert fires.
4. `performance-scorer.sh` reads `FEEDBACK.md` and `budget/cost-events.jsonl`
   to compute per-agent metrics: average score, error rate, cost per run,
   improvement trend (last 10 vs. previous 10 runs).
5. Agents scoring below the threshold (default 60/100) are flagged for review.

## Pattern tags

Consistent tags are required for pattern detection to work. Use the tags
defined in `FEEDBACK.md`. Add project-specific tags as needed — but be
consistent. Inconsistent tagging produces false negatives.

## Scoring → self-improvement connection

Feedback scores are the input to VFM (Value-for-Money) pre-scoring
defined in `scripts/templates/proactive/VFM-SCORING.md` (Step 11).
A low-scoring agent gets a lower VFM pre-score for future pipeline tasks,
making it less likely to be selected until its performance improves.

The feedback loop closes the improvement cycle:
1. Pipeline runs → reviewer assigns score + pattern tag
2. `feedback-collector.sh` appends to FEEDBACK.md
3. `performance-scorer.sh` flags underperforming agents
4. Developer reviews top patterns → iterates on agent prompt
5. New runs produce new feedback → trend shows improvement
6. VFM scores update automatically on next pipeline selection

## Example: prompt iteration driven by feedback

Suppose `agent-writer` repeatedly scores 45/100 with pattern `quality-low`:

```
| 2025-01-10 | doc-pipeline | agent-writer | 45/100 | Output too brief | Added detail requirement | quality-low |
| 2025-01-11 | doc-pipeline | agent-writer | 42/100 | Still too brief  | Repeated instruction     | quality-low |
| 2025-01-12 | doc-pipeline | agent-writer | 48/100 | Slightly better  | —                        | quality-low |
```

After 3 rows: feedback-collector.sh fires the recurring-pattern alert.
performance-scorer.sh shows avg 45/100, error rate 100%.
Action: update agent-writer's system prompt with explicit length and
depth requirements. Next 10 runs show trend "improving (+18.3)".

## Integration

Add feedback-collector.sh as a PostToolUse hook in `.claude/settings.json`:

```json
{
  "hooks": {
    "PostToolUse": [{
      "matcher": "review_pipeline",
      "hooks": [{"type": "command", "command": "bash feedback/feedback-collector.sh"}]
    }]
  }
}
```

Run performance-scorer.sh on demand or as a scheduled report:

```bash
./feedback/performance-scorer.sh
./feedback/performance-scorer.sh --agent agent-writer --threshold 70
```