feat(ms-ai-architect): sitemap-based KB change detection system

Adds a zero-dependency Node.js pipeline that polls Microsoft Learn sitemaps
weekly to detect when source documentation changes. Replaces the broken
mtime-based staleness check (all files had identical mtime after release).

Components:
- build-registry.mjs: extracts 1342 URLs from 387 reference files
- poll-sitemaps.mjs: streams ~18 child sitemaps, matches against registry
- report-changes.mjs: prioritized change report (critical/high/medium/low)
- discover-new-urls.mjs: finds relevant new MS Learn pages not yet covered
- run-weekly-update.mjs: orchestrator with --force/--discover/--dry-run

Integration:
- session-start hook reads change-report.json instead of broken mtime check
- hook triggers background poll if >7 days since last check
- generate-skills --update reads change report for targeted MCP updates

Current stats: 69% match rate (924/1342 URLs tracked via sitemaps).
~31% unmatched due to Microsoft URL restructuring (ai-foundry/openai paths).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Kjell Tore Guttormsen 2026-04-09 21:19:51 +02:00
commit f968f37be3
13 changed files with 976 additions and 59 deletions

View file

@ -121,13 +121,28 @@ Se `references/architecture/recommended-mcp-servers.md` for detaljer.
bash tests/validate-plugin.sh
```
#### KB-ferskhet
#### KB-ferskhet (sitemap-basert)
```bash
# Sjekk stale kunnskapsfiler
bash scripts/kb-staleness-check.sh
# Ukentlig oppdatering: poll sitemaps → endringsrapport
node scripts/kb-update/run-weekly-update.mjs --force
# Vis kun prioriterte stale filer
bash scripts/kb-staleness-check.sh --priority-only
# Med discovery av nye relevante sider
node scripts/kb-update/run-weekly-update.mjs --force --discover
# Kun endringsrapport (etter polling)
node scripts/kb-update/report-changes.mjs
# Bygg/oppdater URL-registry fra referansefiler
node scripts/kb-update/build-registry.mjs [--merge]
```
Systemet poller Microsoft Learn sitemaps ukentlig, sammenligner `<lastmod>` med filenes `Last updated:` header, og genererer en prioritert endringsrapport. Session-start hook trigger bakgrunns-poll automatisk hvis >7 dager siden siste.
**Match rate:** ~69% av 1342 URLer matche mot sitemaps. ~31% (mest `azure/ai-foundry/openai/`-stier) finnes ikke i sitemaps pga. Microsofts URL-restrukturering.
Legacy (deprecated):
```bash
bash scripts/kb-staleness-check.sh # mtime-basert, upålitelig etter git clone
```
#### E2E-regresjonstester