Initial addition of ms-ai-architect plugin to the open-source marketplace. Private content excluded: orchestrator/ (Linear tooling), docs/utredning/ (client investigation), generated test reports and PDF export script. skill-gen tooling moved from orchestrator/ to scripts/skill-gen/. Security scan: WARNING (risk 20/100) — no secrets, no injection found. False positive fixed: added gitleaks:allow to Python variable reference in output-validation-grounding-verification.md line 109. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
220 lines
12 KiB
Markdown
220 lines
12 KiB
Markdown
---
|
|
name: ms-ai-security
|
|
description: |
|
|
This skill should be used when the user needs a security assessment for an AI solution,
|
|
wants cost estimation for Azure AI workloads, asks about OWASP LLM Top 10 mitigations,
|
|
or needs performance optimization guidance. Provides deterministic 6x5 security scoring,
|
|
P10/P50/P90 cost confidence intervals, and FinOps practices for AI.
|
|
Triggers on: "security assessment for AI", "AI threat modeling", "cost estimation for Azure AI",
|
|
"FinOps for AI workloads", "prompt injection defense", "kostnadsestimat for AI-løsning",
|
|
"sikkerhetsscoring for AI", "OWASP LLM", "6x5 scoring", "PTU vs pay-as-you-go".
|
|
---
|
|
|
|
> **INSTRUKSJON:** Denne skillen dekker kvantitative vurderingsaktiviteter med deterministiske
|
|
> scoringsmodeller. Bruk rammeverket systematisk — ikke hopp over dimensjoner eller anta scorer.
|
|
> Alle vurderinger skal produsere konkrete, etterprøvbare resultater med tallverdier.
|
|
|
|
# Sikkerhets- og kostnadsvurdering for Microsoft AI
|
|
|
|
Strukturerte metoder for tre vurderingsaktiviteter:
|
|
|
|
1. **Sikkerhetsvurdering** — Deterministisk 6x5 sikkerhetsscoring med OWASP LLM Top 10-mapping
|
|
2. **Kostnadsestimering** — TCO-beregning med P10/P50/P90 konfidensintervaller og FinOps-praksis
|
|
3. **Ytelsesgjennomgang** — Latency-optimalisering, skalering og benchmarking
|
|
|
|
**Primære agenter:** security-assessment-agent, cost-estimation-agent
|
|
|
|
---
|
|
|
|
## 1. Sikkerhetsrammeverk
|
|
|
|
### 6-dimensjons sikkerhetsmodell
|
|
|
|
To assess security, score each of the six dimensions independently on a 1-5 scale:
|
|
|
|
| Dimensjon | Dekker |
|
|
|-----------|--------|
|
|
| Identity & Access Control | Entra ID, Managed Identities, RBAC, API-nøkkelrotasjon, JIT-tilgang |
|
|
| Network Security | Private Endpoints, VNet, NSG, Azure Firewall, DNS, utgående trafikk |
|
|
| Data Protection | Kryptering (rest/transit), Key Vault, data residency, PII-maskering, backup |
|
|
| Content Safety & AI Security | Content Safety-filtre, prompt injection-forsvar, jailbreak, output-validering, STRIDE-AI |
|
|
| Compliance & Governance | AI Act-klassifisering, GDPR/Schrems II, Purview, Digdir/NSM, DPIA |
|
|
| Monitoring & Incident Response | Azure Monitor, token-bruk, anomalideteksjon, audit logging, alerting |
|
|
|
|
### Scoringmodell (1-5)
|
|
|
|
| Score | Nivå | Kriterium |
|
|
|-------|------|-----------|
|
|
| **1** | Kritisk | Ingen kontroller. Umiddelbar risiko. |
|
|
| **2** | Utilstrekkelig | Grunnleggende kontroller med vesentlige hull. Kun PoC/sandbox. |
|
|
| **3** | Akseptabel | Sentrale kontroller på plass. Minimum for lav-risiko produksjon. |
|
|
| **4** | God | Robuste, automatiserte kontroller med overvåking. Sensitiv data OK. |
|
|
| **5** | Utmerket | State-of-the-art. Zero Trust. Defense in depth. Høy-risiko AI Act OK. |
|
|
|
|
### Vektet scoring
|
|
|
|
Apply weights based on workload type, then calculate: **Samlet score = Sum(dimensjon_score x vekt)**
|
|
|
|
| Dimensjon | Standard | Eksternt eksponert | Persondata-intensiv |
|
|
|-----------|----------|-------------------|-------------------|
|
|
| Identity & Access Control | 20% | 25% | 20% |
|
|
| Network Security | 15% | 20% | 15% |
|
|
| Data Protection | 20% | 15% | 25% |
|
|
| Content Safety & AI Security | 20% | 25% | 15% |
|
|
| Compliance & Governance | 15% | 10% | 20% |
|
|
| Monitoring & Incident Response | 10% | 5% | 5% |
|
|
|
|
### Risikoklassifisering
|
|
|
|
| Samlet score | Klassifisering | Anbefaling |
|
|
|-------------|---------------|------------|
|
|
| 1.0 - 2.0 | Kritisk risiko | Stopp utrulling. Umiddelbar utbedring. |
|
|
| 2.1 - 3.0 | Høy risiko | Begrenset tilgang. Utbedringsplan innen 30 dager. |
|
|
| 3.1 - 3.5 | Moderat risiko | Produksjon med restriksjoner. Utbedringsplan innen 90 dager. |
|
|
| 3.6 - 4.5 | Lav risiko | Produksjon godkjent. Kontinuerlig forbedring. |
|
|
| 4.6 - 5.0 | Minimal risiko | Produksjon godkjent. Benchmark for andre løsninger. |
|
|
|
|
For fullstendige rubrikker med eksempler per dimensjon og score, see `references/ai-security-engineering/security-scoring-rubrics-6x5.md` and `references/ai-security-engineering/ai-security-scoring-framework.md`.
|
|
|
|
### OWASP LLM Top 10 (2025)
|
|
|
|
Map each threat to the solution under assessment. Use the reference files for detailed mitigation patterns.
|
|
|
|
| ID | Threat | Key Microsoft Mitigation | Reference |
|
|
|----|--------|--------------------------|-----------|
|
|
| LLM01 | Prompt Injection | Content Safety Prompt Shields, system message hardening, Groundedness Detection | `prompt-injection-defense-patterns.md` |
|
|
| LLM02 | Sensitive Information Disclosure | PII-filter, Purview DLP, output-filtrering | `data-leakage-prevention-ai.md`, `pii-detection-norwegian-context.md` |
|
|
| LLM03 | Supply Chain Vulnerabilities | AI Foundry curated models, signed models, DLP for connectors | `supply-chain-security-ai-models.md` |
|
|
| LLM04 | Data and Model Poisoning | Azure ML data lineage, isolated fine-tuning, Purview validation | — |
|
|
| LLM05 | Improper Output Handling | Grounding Detection API, Content Safety output-filtre, Structured Outputs | `output-validation-grounding-verification.md` |
|
|
| LLM06 | Excessive Agency | Copilot Studio scoped tools, RBAC per project, human-in-the-loop, budget caps | — |
|
|
| LLM07 | System Prompt Leakage | Metaprompt patterns, Prompt Shields, output monitoring | `jailbreak-prevention-production.md` |
|
|
| LLM08 | Vector and Embedding Weaknesses | AI Search managed identities, index-level security filters, Private Endpoints | — |
|
|
| LLM09 | Misinformation | RAG grounding, Groundedness Detection, citation patterns, confidence scoring | — |
|
|
| LLM10 | Unbounded Consumption | Rate limits, token budgets, PTU for capacity, Cost Management alerts | — |
|
|
|
|
All reference files are in `references/ai-security-engineering/`.
|
|
|
|
### Azure AI-spesifikke sikkerhetskontroller
|
|
|
|
For detailed per-service security controls tables, see `references/ai-security-engineering/secure-model-deployment-hardening.md` and `references/ai-security-engineering/zero-trust-ai-services.md`. Key services covered:
|
|
|
|
- **Azure OpenAI Service** — Content Filtering, Abuse Monitoring, VNet/Private Endpoints, Managed Identity, CMK
|
|
- **Azure AI Search** — Managed Identities, index-level security filters, encryption, Private Endpoints
|
|
- **Copilot Studio** — Entra ID auth, Power Platform DLP, generative AI guardrails, environment isolation
|
|
- **Azure AI Foundry** — Project isolation, granular RBAC, Private Endpoints, curated model catalog, tracing
|
|
|
|
---
|
|
|
|
## 2. Kostnadsestimering
|
|
|
|
### P10/P50/P90 konfidensintervaller
|
|
|
|
Provide all estimates with three scenarios. Verify current prices via `microsoft_docs_search` before calculating.
|
|
|
|
| Scenario | Persentil | Beskrivelse | Multiplikator |
|
|
|----------|-----------|-------------|---------------|
|
|
| **P10** (Optimistisk) | 10. | Lavt volum, ideelle forhold | Basis x 0.6 |
|
|
| **P50** (Forventet) | 50. | Normal bruk, erfaringstall | Basis x 1.0 |
|
|
| **P90** (Konservativt) | 90. | Høyt volum, buffer for uforutsett | Basis x 1.8 |
|
|
|
|
Adjust multipliers based on historical volatility. Always present both USD and NOK (add 3-5% currency buffer for NOK).
|
|
|
|
### TCO-komponenter
|
|
|
|
Calculate for 1, 12, and 36 months. Present Budget/Recommended/Premium alternatives.
|
|
|
|
| Komponent | Inkluderer | Eksempler |
|
|
|-----------|-----------|----------|
|
|
| **Lisenser** | Software per bruker/org | M365 Copilot, Copilot Studio, Power Platform |
|
|
| **Compute** | AI-inferens, hosting | Azure OpenAI tokens, App Service, Functions |
|
|
| **Storage** | Datalagring | AI Search indekser, Blob Storage, Cosmos DB |
|
|
| **Networking** | Dataoverføring | Egress, Private Link, Application Gateway |
|
|
| **Support** | Microsoft Support | Unified/Premier Support |
|
|
| **Drift** | Internt personell | Utviklere, MLOps, sikkerhetsteam |
|
|
|
|
See `references/cost-optimization/deterministic-cost-calculation-model.md` and `references/cost-optimization/budget-forecasting-ai-projects.md` for full calculation methodology.
|
|
|
|
### FinOps for AI
|
|
|
|
Apply these optimization strategies and refer to detailed guidance in references:
|
|
|
|
- **Token-optimalisering:** Shorter prompts, context window management, model tiering (GPT-4o mini vs GPT-4o), prompt caching. See `references/cost-optimization/token-counting-optimization.md`.
|
|
- **PTU vs Pay-As-You-Go:** PTU for stable workloads (break-even ~60-70% utilization), PAYG for variable. See `references/cost-optimization/ptu-vs-paygo-economics.md`.
|
|
- **Caching:** Semantic caching, prompt caching, RAG result caching. See `references/cost-optimization/semantic-caching-patterns.md`.
|
|
- **Right-sizing:** Start with lowest SKU, monitor 2-4 weeks, consider SLMs for specialized tasks. See `references/cost-optimization/model-selection-price-performance.md`.
|
|
|
|
---
|
|
|
|
## 3. Ytelse og skalerbarhet
|
|
|
|
Optimize latency, throughput, and scalability for AI workloads. Key strategies:
|
|
|
|
- **Regional deployment** in Norway East / West Europe reduces latency 20-50ms
|
|
- **Streaming responses** reduce perceived latency 5-10x for interactive use
|
|
- **Prompt caching** gives up to 50% cost reduction and 80% latency reduction for repeated prefixes (>1024 tokens)
|
|
- **Batch API** provides 50% price reduction for non-interactive workloads (24h SLA)
|
|
- **Auto-scaling patterns:** Horizontal scaling (App Service/AKS), load balancing (APIM/Traffic Manager), queue-based buffering (Service Bus+Functions), PTU+PAYG hybrid
|
|
- **Rate limit management:** TPM/RPM quotas, exponential backoff with jitter, multi-deployment, APIM for centralized throttling
|
|
- **Load testing:** Establish baseline, simulate peak traffic, identify breaking points, long-running soak tests
|
|
|
|
For detailed implementation guidance, see specific files in `references/performance-scalability/`:
|
|
- `latency-optimization-azure-openai.md` — Latency tuning
|
|
- `auto-scaling-ai-infrastructure.md` — Scaling patterns
|
|
- `rate-limit-management.md` — TPM/RPM quota management
|
|
- `load-testing-ai-services.md` — Load testing methodology
|
|
|
|
---
|
|
|
|
## 4. Referansekatalog
|
|
|
|
### Eide referanser
|
|
|
|
| Katalog | Filer | Innhold |
|
|
|---------|-------|---------|
|
|
| `references/ai-security-engineering/` | 17 | Forsvar, testing, scoring, hendelseshåndtering, Zero Trust, STRIDE-AI, prompt injection, content safety |
|
|
| `references/cost-optimization/` | 21 | Kostnadsmodellering, FinOps, token-optimalisering, PTU/PAYG, caching, right-sizing, SLM-økonomi |
|
|
| `references/performance-scalability/` | 18 | Latency, skalering, streaming, batch API, rate limits, benchmarking, GPU-dimensjonering |
|
|
|
|
### Kryss-referanser
|
|
|
|
- **Compliance/governance:** `skills/ms-ai-governance/references/responsible-ai/` (AI Act, bias, etikk) and `references/norwegian-public-sector-governance/` (Digdir, NSM, Schrems II, DPIA)
|
|
- **Arkitektur:** `skills/ms-ai-advisor/references/architecture/` (sikkerhetssoner, arkitekturmønstre, offentlig sektor-sjekkliste)
|
|
|
|
---
|
|
|
|
## 5. MCP-verktøy
|
|
|
|
| Behov | Verktøy | Bruk |
|
|
|-------|---------|------|
|
|
| Sikkerhetsdokumentasjon | `microsoft_docs_search` | Verifiser kontroller, sjekk oppdateringer |
|
|
| Fullstendig veiledning | `microsoft_docs_fetch` | Security baselines, konfigurasjonsguider |
|
|
| Kodeeksempler | `microsoft_code_sample_search` | SDK for Content Safety, RBAC, Key Vault |
|
|
|
|
Never trust the knowledge base blindly for prices and feature availability — verify via MCP tools.
|
|
|
|
---
|
|
|
|
## 6. Arbeidsprosess
|
|
|
|
### Sikkerhetsvurdering
|
|
|
|
1. Map the solution's AI components and data flows
|
|
2. Score each of the 6 dimensions using rubrics from references
|
|
3. Calculate weighted risk score with appropriate weight profile
|
|
4. Map OWASP LLM Top 10 threats to the solution
|
|
5. Document findings with concrete remediation recommendations, prioritized by risk and cost
|
|
|
|
### Kostnadsestimering
|
|
|
|
1. Identify all Azure services in the solution
|
|
2. Estimate consumption per service (tokens, storage, traffic)
|
|
3. Fetch current prices via MCP tools
|
|
4. Calculate P10/P50/P90 per component, sum to TCO for 1/12/36 months
|
|
5. Present Budget/Recommended/Premium alternatives with FinOps opportunities
|
|
|
|
### Ytelsesgjennomgang
|
|
|
|
1. Define performance requirements (latency, throughput, availability)
|
|
2. Identify bottlenecks and recommend optimizations from reference catalog
|
|
3. Estimate performance impact and propose monitoring/benchmarking setup
|