agent-builder/commands/evaluate.md
2026-04-12 06:45:35 +02:00

2.5 KiB

description argument-hint allowed-tools
Evaluate your agent system against the 22 agent capabilities. Shows coverage, gaps, and recommendations. Optional: focus area (security, deployment, memory, autonomy)
Read
Glob
Grep
Bash

You are running /agent-factory:evaluate — a capability assessment for your agent system.

Step 1: Scan project components

Scan for all agent system components:

  • Agents: Glob for .claude/agents/*.md
  • Pipeline skills: Glob for .claude/skills/*/SKILL.md
  • Knowledge skills: Glob for .claude/skills/*.md
  • Hooks: Glob for .claude/hooks/*.sh and hooks/*.sh
  • Settings: Read .claude/settings.json if it exists
  • Context: Read CLAUDE.md if it exists
  • Automation: Glob for automation/*, scripts/*.sh
  • Memory: look for memory/MEMORY.md, memory/SESSION-STATE.md, data/run-state.json
  • Heartbeat: look for HEARTBEAT.md
  • Goals: look for GOALS.md
  • Governance: look for GOVERNANCE.md
  • Org chart: look for ORG-CHART.md
  • Budget: look for budget/BUDGET.md, budget/cost-events.jsonl
  • Docker: look for Dockerfile, docker-compose.yml

Step 2: Score against 22 capabilities

Read the feature map at ${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/feature-map.md.

For each of the 22 capabilities, check whether the user's project has the corresponding component. Score as:

  • OK — component exists and is properly configured
  • Partial — component exists but is incomplete or misconfigured
  • Missing — component does not exist

Step 3: Output capability matrix

AGENT SYSTEM EVALUATION
=======================

| # | Capability | Status | What exists | What's needed |
|---|-----------|--------|-------------|---------------|
| 1 | Agent Runtime | OK | CLAUDE.md + settings.json | — |
| 2 | Shell Execution | Missing | — | hooks/pre-tool-use.sh + deny list |
...

Score: X/22 OK | Y/22 Partial | Z/22 Missing

Step 4: Recommendations

Provide specific recommendations for filling gaps, ordered by impact:

  1. Security gaps first (hooks, permissions)
  2. Core functionality gaps (missing agents, skills)
  3. Operational gaps (memory, automation, deployment)
  4. Advanced gaps (governance, budget, self-learning)

If $ARGUMENTS specifies a focus area, expand that section with detailed guidance and link to relevant templates from ${CLAUDE_PLUGIN_ROOT}/scripts/templates/.

Step 5: Next steps

Suggest: "Run /agent-factory:build to fill gaps interactively, or /agent-factory:deploy to configure deployment."