feat(ms-ai-architect): sitemap-based KB change detection system

Adds a zero-dependency Node.js pipeline that polls Microsoft Learn sitemaps
weekly to detect when source documentation changes. Replaces the broken
mtime-based staleness check (all files had identical mtime after release).

Components:
- build-registry.mjs: extracts 1342 URLs from 387 reference files
- poll-sitemaps.mjs: streams ~18 child sitemaps, matches against registry
- report-changes.mjs: prioritized change report (critical/high/medium/low)
- discover-new-urls.mjs: finds relevant new MS Learn pages not yet covered
- run-weekly-update.mjs: orchestrator with --force/--discover/--dry-run

Integration:
- session-start hook reads change-report.json instead of broken mtime check
- hook triggers background poll if >7 days since last check
- generate-skills --update reads change report for targeted MCP updates

Current stats: 69% match rate (924/1342 URLs tracked via sitemaps).
~31% unmatched due to Microsoft URL restructuring (ai-foundry/openai paths).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Kjell Tore Guttormsen 2026-04-09 21:19:51 +02:00
commit f968f37be3
13 changed files with 976 additions and 59 deletions

View file

@ -487,29 +487,37 @@ bash tests/capture-fixture.sh <source-file> <section-header> <output-dir>
### Knowledge Base Maintenance
The plugin includes a systematic process for keeping reference documents current. See `docs/kb-update-policy.md` for the full policy (update frequencies per domain, procedures, quality gates).
The plugin includes a sitemap-based change detection system that tracks when Microsoft Learn source pages are updated. This replaces the previous mtime-based staleness check.
**Staleness checking:**
**Automated change detection (sitemap-based):**
```bash
# Human-readable report
bash scripts/kb-staleness-check.sh
# Weekly update: poll sitemaps → compare → generate change report
node scripts/kb-update/run-weekly-update.mjs --force
# Machine-readable JSON output
bash scripts/kb-staleness-check.sh --json
# Include discovery of new relevant pages
node scripts/kb-update/run-weekly-update.mjs --force --discover
# Write report to file
bash scripts/kb-staleness-check.sh --json --output report.json
# View change report only (after polling)
node scripts/kb-update/report-changes.mjs
```
**Knowledge base regeneration:**
The session-start hook automatically triggers a background poll if >7 days since the last check.
**How it works:**
1. `build-registry.mjs` extracts 1342 unique `learn.microsoft.com` URLs from reference files
2. `poll-sitemaps.mjs` fetches Microsoft Learn sitemaps and compares `<lastmod>` dates
3. `report-changes.mjs` generates a prioritized list of files needing update
4. `discover-new-urls.mjs` finds relevant new pages not yet covered
**Knowledge base update:**
```bash
# Incremental update based on change report (targets changed sources via MCP)
/architect:generate-skills --update
# Full regeneration via MCP research
/architect:generate-skills
# Incremental update (Edit existing files instead of rewriting)
/architect:generate-skills --update
```
Category-to-skill routing is defined in `scripts/skill-gen/category-skill-map.json` (20 categories mapped to 5 skills), used by the generate-skills workflow to place new reference documents in the correct skill directory.