feat(ms-ai-architect): sitemap-based KB change detection system

Adds a zero-dependency Node.js pipeline that polls Microsoft Learn sitemaps
weekly to detect when source documentation changes. Replaces the broken
mtime-based staleness check (all files had identical mtime after release).

Components:
- build-registry.mjs: extracts 1342 URLs from 387 reference files
- poll-sitemaps.mjs: streams ~18 child sitemaps, matches against registry
- report-changes.mjs: prioritized change report (critical/high/medium/low)
- discover-new-urls.mjs: finds relevant new MS Learn pages not yet covered
- run-weekly-update.mjs: orchestrator with --force/--discover/--dry-run

Integration:
- session-start hook reads change-report.json instead of broken mtime check
- hook triggers background poll if >7 days since last check
- generate-skills --update reads change report for targeted MCP updates

Current stats: 69% match rate (924/1342 URLs tracked via sitemaps).
~31% unmatched due to Microsoft URL restructuring (ai-foundry/openai paths).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Kjell Tore Guttormsen 2026-04-09 21:19:51 +02:00
commit f968f37be3
13 changed files with 976 additions and 59 deletions

View file

@ -234,7 +234,9 @@ When invoked with `--update`, the command updates existing stale files instead o
**Workflow:**
1. Run `bash scripts/kb-staleness-check.sh --json` to identify stale files
1. Read `scripts/kb-update/data/change-report.json` for source-aware change detection
- If not available, fall back to `bash scripts/kb-staleness-check.sh --json`
- The change report contains `changed_urls` per file — use these for targeted MCP fetches
2. Sort by priority (Critical > High > Medium > Low)
3. For each stale file, dispatch an update agent with this prompt:
@ -247,10 +249,14 @@ Oppdater filen: {FILE_PATH}
## Eksisterende innhold (les først)
Les filen med Read-verktøyet. Bevar strukturen.
## Endrede kilde-URLer (hent disse først)
{changed_urls from change-report.json — if available}
## Steg 1: Research
Bruk MCP-verktøy for å verifisere og oppdatere:
1. microsoft_docs_search — 2-3 søk for siste oppdateringer
2. microsoft_docs_fetch — les oppdatert dokumentasjon
1. microsoft_docs_fetch — hent de endrede kilde-URLene direkte (hvis tilgjengelig)
2. microsoft_docs_search — 2-3 søk for siste oppdateringer
3. microsoft_docs_fetch — les ytterligere oppdatert dokumentasjon ved behov
## Steg 2: Oppdater med Edit
Bruk Edit-verktøyet (IKKE Write) for å:
@ -277,7 +283,9 @@ status: success|no_changes|failed
Before generating new knowledge base content, check for stale files:
1. Run `bash scripts/kb-staleness-check.sh` to identify stale files
1. Read `scripts/kb-update/data/change-report.json` for source-aware staleness data
- This is generated by `node scripts/kb-update/run-weekly-update.mjs` (polls Microsoft Learn sitemaps)
- Fallback: `bash scripts/kb-staleness-check.sh` (mtime-based, less accurate)
2. Prioritize regeneration of stale files by priority (Critical > Low)
3. When regenerating a file, update its `Sist oppdatert:` header to today's date
4. After regeneration, verify the file with the staleness checker
3. When regenerating a file, update its `Last updated:` header to today's date
4. After update, run `node scripts/kb-update/build-registry.mjs --merge` to refresh URL registry