agent-builder/commands/evaluate.md
2026-04-12 06:45:35 +02:00

63 lines
2.5 KiB
Markdown

---
description: Evaluate your agent system against the 22 agent capabilities. Shows coverage, gaps, and recommendations.
argument-hint: "Optional: focus area (security, deployment, memory, autonomy)"
allowed-tools: ["Read", "Glob", "Grep", "Bash"]
---
You are running `/agent-factory:evaluate` — a capability assessment for your agent system.
## Step 1: Scan project components
Scan for all agent system components:
- Agents: Glob for `.claude/agents/*.md`
- Pipeline skills: Glob for `.claude/skills/*/SKILL.md`
- Knowledge skills: Glob for `.claude/skills/*.md`
- Hooks: Glob for `.claude/hooks/*.sh` and `hooks/*.sh`
- Settings: Read `.claude/settings.json` if it exists
- Context: Read `CLAUDE.md` if it exists
- Automation: Glob for `automation/*`, `scripts/*.sh`
- Memory: look for `memory/MEMORY.md`, `memory/SESSION-STATE.md`, `data/run-state.json`
- Heartbeat: look for `HEARTBEAT.md`
- Goals: look for `GOALS.md`
- Governance: look for `GOVERNANCE.md`
- Org chart: look for `ORG-CHART.md`
- Budget: look for `budget/BUDGET.md`, `budget/cost-events.jsonl`
- Docker: look for `Dockerfile`, `docker-compose.yml`
## Step 2: Score against 22 capabilities
Read the feature map at `${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/feature-map.md`.
For each of the 22 capabilities, check whether the user's project has the corresponding component. Score as:
- **OK** — component exists and is properly configured
- **Partial** — component exists but is incomplete or misconfigured
- **Missing** — component does not exist
## Step 3: Output capability matrix
```
AGENT SYSTEM EVALUATION
=======================
| # | Capability | Status | What exists | What's needed |
|---|-----------|--------|-------------|---------------|
| 1 | Agent Runtime | OK | CLAUDE.md + settings.json | — |
| 2 | Shell Execution | Missing | — | hooks/pre-tool-use.sh + deny list |
...
Score: X/22 OK | Y/22 Partial | Z/22 Missing
```
## Step 4: Recommendations
Provide specific recommendations for filling gaps, ordered by impact:
1. Security gaps first (hooks, permissions)
2. Core functionality gaps (missing agents, skills)
3. Operational gaps (memory, automation, deployment)
4. Advanced gaps (governance, budget, self-learning)
If $ARGUMENTS specifies a focus area, expand that section with detailed guidance and link to relevant templates from `${CLAUDE_PLUGIN_ROOT}/scripts/templates/`.
## Step 5: Next steps
Suggest: "Run `/agent-factory:build` to fill gaps interactively, or `/agent-factory:deploy` to configure deployment."