docs(architect): weekly KB update — 106 files refreshed (2026-04)
Updates across all 5 skills: ms-ai-advisor, ms-ai-engineering, ms-ai-governance, ms-ai-security, ms-ai-infrastructure. Key changes: - Language Services (Custom Text Classification, Text Analytics, QnA): retirement warning 2029-03-31, migration guides to Foundry/GPT-4o - Agentic Retrieval: 50M free reasoning tokens/month (Public Preview) - Computer Use: Claude Sonnet 4.5 (preview) + OpenAI CUA models - Agent Registry: Risks column (M365 E7), user-shared/org-published types - Declarative agents: schema v1.5 → v1.6, Store validation requirements - MLflow 3: 13 built-in LLM judges, production monitoring, Genie Code - AG-UI HITL: ApprovalRequiredAIFunction (C#) + @tool(approval_mode) (Python) - Entra ID Ignite 2025: Agent ID Admin/Developer RBAC roles, Conditional Access - Security Copilot: 400 SCU/month per 1000 M365 E5 licenses, auto-provisioned - Fast Transcription API: phrase lists, 14-language multi-lingual transcription - Azure Monitor Workbooks: Bicep support, RBAC specifics - Power Platform Copilot: data residency (Norway/Europe → EU DB, Bing → USA) - RAG security-rbac: 4-approach table (GA + 3 preview access control methods) - IaC MLOps: Well-Architected OE:05 principles, Bicep/Terraform patterns - Translator: image file batch translation Preview (JPEG/PNG/BMP/WebP) All 106 files: Last updated 2026-04 | Verified: MCP 2026-04 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
dda86449fa
commit
ff6a50d14f
104 changed files with 1986 additions and 520 deletions
|
|
@ -1,11 +1,14 @@
|
|||
# A/B Testing and Experimentation for AI Models
|
||||
|
||||
**Last updated:** 2026-02
|
||||
**Verified:** MCP 2026-04
|
||||
**Status:** GA
|
||||
**Category:** MLOps & GenAIOps
|
||||
|
||||
---
|
||||
|
||||
**Verified:** MCP 2026-04
|
||||
|
||||
## Introduksjon
|
||||
|
||||
A/B-testing og eksperimentering er kritiske teknikker for å validere og optimalisere AI-modeller i produksjon. I motsetning til tradisjonell programvareutvikling, hvor funksjonalitet er binær (fungerer/fungerer ikke), er AI-modeller probabilistiske — ytelsen deres varierer med data, kontekst og bruksmønster. A/B-testing gjør det mulig å sammenligne modelversjoner, fine-tuning-strategier, prompt-varianter eller RAG-konfigurasjoner under reelle forhold, med ekte brukere og reell trafikk.
|
||||
|
|
@ -438,3 +441,35 @@ Krever:
|
|||
- Shadow deployment patterns
|
||||
|
||||
**Antall unike kilder:** 7 (Microsoft Learn) + 3 (baseline concepts) = **10 kilder**
|
||||
|
||||
|
||||
### A/B Testing with Azure ML Managed Online Endpoints + MLflow 3 (2026)
|
||||
|
||||
**Traffic splitting via managed online endpoints**:
|
||||
```bash
|
||||
# Deploy challenger model with 10% traffic
|
||||
az ml online-deployment create --name challenger --endpoint my-endpoint
|
||||
az ml online-endpoint update --name my-endpoint --traffic control=90 challenger=10
|
||||
|
||||
# Monitor with MLflow 3 scorers — same metrics for both variants
|
||||
# Use RelevanceToQuery, Correctness, custom business scorers
|
||||
```
|
||||
|
||||
**MLflow 3 A/B evaluation pattern**:
|
||||
- Use `mlflow.genai.evaluate()` on traces from each variant
|
||||
- Compare scorers: `Correctness`, `RelevanceToQuery`, `ToolCallEfficiency`
|
||||
- Statistical significance: MLflow tracks Cohen's Kappa against human baseline
|
||||
- Aliases in Prompt Registry: `@control` and `@challenger` for prompt A/B testing
|
||||
|
||||
**Azure ML safe rollout progression**:
|
||||
1. **Shadow testing**: Mirror X% of traffic to new model (no user impact)
|
||||
2. **Canary**: Route 10% live traffic, monitor bake time (hours/days)
|
||||
3. **Progressive**: 10% → 50% → 100% with health gate at each step
|
||||
4. **Rollback trigger**: Automatic halt on health signal degradation
|
||||
|
||||
**Evaluation metrics for LLM A/B tests**:
|
||||
- Quality: Groundedness, Relevance, Correctness (MLflow judges)
|
||||
- Latency: P50, P90, P99 response times
|
||||
- Cost: Token usage per request
|
||||
- Business: Task completion rate, user satisfaction
|
||||
|
||||
|
|
|
|||
|
|
@ -1,11 +1,14 @@
|
|||
# Azure ML Pipelines - Orchestration and Automation
|
||||
|
||||
**Last updated:** 2026-02
|
||||
**Verified:** MCP 2026-04
|
||||
**Status:** GA
|
||||
**Category:** MLOps & GenAIOps
|
||||
|
||||
---
|
||||
|
||||
**Verified:** MCP 2026-04
|
||||
|
||||
## Introduksjon
|
||||
|
||||
Azure Machine Learning pipelines representerer et komplett orkestreringsrammeverk for machine learning-arbeidsflyter. En pipeline automatiserer en komplett ML-oppgave ved å dele den inn i flere håndterbare steg (components), hvor hvert steg kan utvikles, optimaliseres, konfigureres og automatiseres uavhengig. Azure ML håndterer dependencies mellom steg automatisk, og legger til rette for parallellisering, caching og gjenbruk.
|
||||
|
|
@ -18,6 +21,61 @@ Fra et kostnads- og effektivitetsperspektiv gir pipelines betydelige fordeler: d
|
|||
|
||||
### Pipeline Components (v2)
|
||||
|
||||
|
||||
### Azure ML Pipelines — Python SDK v2 (Tutorial 2026)
|
||||
|
||||
**Key benefits**: Standardized MLOps, scalable team collaboration, training efficiency, cost reduction.
|
||||
|
||||
**Pipeline creation pattern** (SDK v2):
|
||||
```python
|
||||
from azure.ai.ml import MLClient, dsl, Input, Output, command
|
||||
from azure.identity import DefaultAzureCredential
|
||||
|
||||
ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)
|
||||
|
||||
# 1. Create reusable components
|
||||
data_prep_component = command(
|
||||
name="data_prep",
|
||||
inputs={"data": Input(type="uri_folder"), "test_train_ratio": Input(type="number")},
|
||||
outputs={"train_data": Output(type="uri_folder"), "test_data": Output(type="uri_folder")},
|
||||
code="./components/data_prep",
|
||||
command="python data_prep.py --data ${{inputs.data}} ...",
|
||||
environment=f"{env.name}:{env.version}",
|
||||
)
|
||||
# Register for reuse
|
||||
ml_client.create_or_update(data_prep_component.component)
|
||||
|
||||
# 2. Define pipeline with @dsl.pipeline decorator
|
||||
@dsl.pipeline(compute="serverless", description="E2E training pipeline")
|
||||
def training_pipeline(data_input, test_train_ratio, learning_rate, model_name):
|
||||
prep_job = data_prep_component(data=data_input, test_train_ratio=test_train_ratio)
|
||||
train_job = train_component(
|
||||
train_data=prep_job.outputs.train_data,
|
||||
test_data=prep_job.outputs.test_data,
|
||||
learning_rate=learning_rate,
|
||||
registered_model_name=model_name,
|
||||
)
|
||||
|
||||
# 3. Submit pipeline
|
||||
pipeline_job = ml_client.jobs.create_or_update(
|
||||
training_pipeline(data_input=..., ...),
|
||||
experiment_name="e2e_pipeline"
|
||||
)
|
||||
```
|
||||
|
||||
**Component lifecycle**:
|
||||
1. Write YAML spec or create programmatically (`CommandComponent`)
|
||||
2. Register with name+version in workspace or registry
|
||||
3. Load and compose into pipeline
|
||||
4. Submit via `ml_client.jobs.create_or_update()`
|
||||
|
||||
**Compute options**: `serverless` (recommended), named compute cluster, or per-step compute override.
|
||||
**Environment**: Curated environments (`azureml://registries/azureml/environments/sklearn-1.5/labels/latest`) or custom conda/Docker.
|
||||
**Output types**: `uri_folder` (data), `mlflow_model` (model), `uri_file` (file).
|
||||
|
||||
**MLflow integration**: Use `mlflow.start_run()` in scripts for automatic experiment tracking (metrics, parameters, models).
|
||||
|
||||
|
||||
| Komponent-type | Beskrivelse | Bruksområde |
|
||||
|----------------|-------------|-------------|
|
||||
| **Command component** | Kjører et shell-script eller Python-script | Data prep, training, scoring, evaluation |
|
||||
|
|
|
|||
|
|
@ -1,6 +1,7 @@
|
|||
# CI/CD Pipelines for Machine Learning Models
|
||||
|
||||
**Last updated:** 2026-02
|
||||
**Verified:** MCP 2026-04
|
||||
**Status:** GA
|
||||
**Category:** MLOps & GenAIOps
|
||||
|
||||
|
|
@ -288,6 +289,34 @@ Disse signalene indikerer at din ML CI/CD ikke er production-ready:
|
|||
|
||||
### GitHub Actions Integration
|
||||
|
||||
|
||||
### GitHub Actions with Azure Machine Learning (2026 Update)
|
||||
The recommended authentication approach is **OpenID Connect (OIDC) with federated credentials** — eliminates long-lived secrets.
|
||||
|
||||
**Workflow structure** (`/.github/workflows/`):
|
||||
```yaml
|
||||
permissions:
|
||||
id-token: write
|
||||
jobs:
|
||||
build:
|
||||
steps:
|
||||
- uses: azure/login@v2
|
||||
with:
|
||||
client-id: ${{ secrets.AZURE_CLIENT_ID }}
|
||||
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
|
||||
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
|
||||
- run: az ml job create --file pipeline.yml
|
||||
```
|
||||
|
||||
**MLOps v2 GitHub setup** (recommended end-to-end):
|
||||
1. Fork `Azure/mlops-v2-gha-demo` template repo
|
||||
2. Set GitHub secrets: `ARM_CLIENT_ID`, `ARM_CLIENT_SECRET`, `ARM_SUBSCRIPTION_ID`, `ARM_TENANT_ID`
|
||||
3. Deploy infrastructure via `tf-gha-deploy-infra.yml` workflow
|
||||
4. Run `deploy-model-training-pipeline` and `deploy-online-endpoint-pipeline` workflows
|
||||
|
||||
**Pipeline stages**: Prepare Data → Train Model → Evaluate Model → Register Model → Deploy Endpoint
|
||||
|
||||
|
||||
**Setup:**
|
||||
- Opprett `.github/workflows/` directory i repo
|
||||
- Konfigurer GitHub Secrets for Azure credentials (eller OIDC)
|
||||
|
|
|
|||
|
|
@ -1,11 +1,14 @@
|
|||
# Data Drift Monitoring and Detection
|
||||
|
||||
**Last updated:** 2026-02
|
||||
**Verified:** MCP 2026-04
|
||||
**Status:** GA
|
||||
**Category:** MLOps & GenAIOps
|
||||
|
||||
---
|
||||
|
||||
**Verified:** MCP 2026-04
|
||||
|
||||
## Introduksjon
|
||||
|
||||
Data drift er endringer i statistisk fordeling av modellinput-data over tid som kan føre til forringet modellprestasjon. For machine learning-modeller er kontinuerlig overvåking av data drift avgjørende for å opprettholde produksjonskvalitet. Azure Machine Learning tilbyr innebygd drift detection som sammenligner produksjonsdata mot baseline-datasett (typisk treningsdata eller nylig produksjonsdata) og beregner statistiske avstandsmål.
|
||||
|
|
@ -363,3 +366,31 @@ Hvis kunden bruker legacy `DataDriftDetector` (azureml-datadrift SDK):
|
|||
|
||||
**MCP Calls:** 5 (3 × microsoft_docs_search, 1 × microsoft_docs_fetch, 1 × microsoft_code_sample_search)
|
||||
**Unique Sources:** 12 Microsoft Learn URLs
|
||||
|
||||
|
||||
### Azure ML Model Monitoring — Data Drift Detection (2026)
|
||||
|
||||
**Model monitoring signals** (out-of-box for online endpoints):
|
||||
|
||||
| Signal | What it detects |
|
||||
|--------|----------------|
|
||||
| **Data quality** | Null values, out-of-range values, type mismatches in input features |
|
||||
| **Data drift** | Statistical distribution change: training data vs production data |
|
||||
| **Prediction drift** | Distribution shift in model output predictions |
|
||||
| **Feature attribution drift** | Changes in which features drive predictions |
|
||||
| **Custom signals** | User-defined metrics via Python scripts |
|
||||
|
||||
**Setup options**:
|
||||
- **Out-of-box**: Automatically configured for Azure ML online endpoints (no configuration required)
|
||||
- **Advanced**: Custom monitoring for models deployed outside Azure ML (batch endpoints, external)
|
||||
- **Azure Event Grid integration**: Route monitoring alerts for automated response
|
||||
|
||||
**Statistical methods used**:
|
||||
- Jensen-Shannon divergence for categorical features
|
||||
- Wasserstein distance (Earth Mover's Distance) for numerical features
|
||||
- Population Stability Index (PSI) for feature stability
|
||||
|
||||
**Reference dataset**: Training dataset used as baseline; monitoring compares production distribution against it.
|
||||
|
||||
**Alerting**: Configure thresholds per signal; integrate with Azure Monitor alerts and Action Groups.
|
||||
|
||||
|
|
|
|||
|
|
@ -4,6 +4,8 @@
|
|||
**Dato:** 2026-02-04
|
||||
**Confidence:** HIGH (basert på offisiell Microsoft-dokumentasjon)
|
||||
|
||||
**Verified:** MCP 2026-04
|
||||
|
||||
## Introduksjon
|
||||
|
||||
Feedback loops og kontinuerlig forbedring er kritiske komponenter i moderne AI-operasjoner. I motsetning til tradisjonell programvare, hvor funksjonalitet er deterministisk, kan AI-modeller vise kvalitetsdrift eller uventet oppførsel når de møter reelle data. Et velfungerende feedback-system sikrer at modeller forblir nøyaktige, relevante og trygge gjennom hele sin livssyklus.
|
||||
|
|
@ -738,3 +740,32 @@ Dette dokumentet dekker hele feedback loop-syklusen for både classical ML og Ge
|
|||
5. **Kostnad:** Threshold-based retraining kan spare 50-70% compute vs daily retraining
|
||||
|
||||
Bruk arkitekturmønstrene til å visualisere løsningen for kunden. Påpek at MLflow Tracing + Agent Evaluation gir "free" observability (built-in i Databricks).
|
||||
|
||||
|
||||
### MLflow 3 Evaluation & Feedback Loop (2026)
|
||||
|
||||
MLflow 3 introduces a unified evaluation-monitoring lifecycle for GenAI feedback loops:
|
||||
|
||||
**Iterative workflow**:
|
||||
1. **Trace** production requests (MLflow Tracing — end-to-end observability)
|
||||
2. **Evaluate** against scorers during development (`mlflow.genai.evaluate()`)
|
||||
3. **Monitor** production with same scorers (consistent quality measurement)
|
||||
4. **Gather human feedback** via Review App (expert annotations)
|
||||
5. **Improve** prompts/models based on evaluation datasets
|
||||
|
||||
**Azure ML Model Monitoring signals**:
|
||||
- Data quality: null values, out-of-range, type mismatch
|
||||
- Data drift: statistical distribution changes between training and production data
|
||||
- Prediction drift: distribution shift in model outputs
|
||||
- Feature attribution drift: changes in feature importance
|
||||
- Custom signals: user-defined metrics via custom scripts
|
||||
|
||||
**Monitoring setup**:
|
||||
```python
|
||||
# Set up out-of-box monitoring for Azure ML online endpoints
|
||||
# Monitors data drift, prediction drift automatically
|
||||
# Integrates with Azure Event Grid for alerting
|
||||
```
|
||||
|
||||
**Continuous improvement cycle**: Production traces → MLflow evaluation datasets → Scorer alignment → Prompt/model update → A/B test → Production rollout
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,7 @@
|
|||
# GenAIOps - LLM-Specific MLOps Practices
|
||||
|
||||
**Dato:** 2026-02-04
|
||||
**Last updated:** 2026-04 | Verified: MCP 2026-04
|
||||
**Kategori:** MLOps & GenAIOps
|
||||
**Konfidensgrad:** Høy (basert på 18 MCP-kilder fra Microsoft Learn)
|
||||
|
||||
|
|
@ -10,252 +11,16 @@
|
|||
|
||||
GenAIOps (Generative AI Operations), også kalt LLMOps, beskriver operasjonelle praksiser og strategier for håndtering av store språkmodeller (LLMs) i produksjon. Mens tradisjonell MLOps fokuserer på å trene og deploye diskriminative modeller, handler GenAIOps om å **velge, tilpasse, orkestrere og overvåke** eksisterende foundation models.
|
||||
|
||||
### Forskjell mellom MLOps og GenAIOps
|
||||
|
||||
| Dimensjon | Tradisjonell MLOps | GenAIOps (LLMOps) |
|
||||
|-----------|-------------------|-------------------|
|
||||
| **Primært fokus** | Trene nye modeller fra scratch | Konsumere og fine-tune eksisterende foundation models |
|
||||
| **Artefakter** | Trainede modeller (pkl, ONNX) | Prompts, orchestrators, agents, chains, grounding data |
|
||||
| **Evaluering** | Accuracy, precision, recall (deterministiske) | Groundedness, relevance, coherence, fluency (LLM-as-judge) |
|
||||
| **Infrastruktur** | Modell-serving endepunkter | Orchestrators, vector stores, API gateways, LLM endpoints |
|
||||
| **Deployment** | Modellversjonering | Modell + prompt + grounding data + orchestrator |
|
||||
| **Monitoring** | Model drift, data drift | Data drift + prompt effectiveness + content safety + token usage |
|
||||
|
||||
**Konfidensgrad:** 95% — Microsoft dokumentasjon definerer eksplisitt disse forskjellene.
|
||||
### MLflow 3 Tracing — GenAI Observability
|
||||
MLflow Tracing provides end-to-end observability for GenAI applications:
|
||||
- Records inputs, outputs, intermediate steps, and metadata
|
||||
- Supports complex agent-based systems and multi-turn conversations
|
||||
- Integrates with Genie Code for natural language trace analysis
|
||||
- Enables: debugging, performance monitoring, cost optimization, auditability
|
||||
- Production monitoring reuses same scorers as development evaluation (consistent lifecycle)
|
||||
|
||||
---
|
||||
|
||||
## Kjernekomponenter
|
||||
|
||||
### 1. Prompt Engineering og Prompt Registry
|
||||
|
||||
**Hva:** Strukturert håndtering av system- og user prompts som versjonerte artefakter.
|
||||
|
||||
**Hvorfor:** Prompts er den primære "koden" i GenAI-løsninger. Endringer i prompts påvirker output like mye som kodeendringer.
|
||||
|
||||
**Hvordan (Azure):**
|
||||
- **MLflow Prompt Registry** (Databricks): Versjonert prompt-håndtering med aliaser (f.eks. `production`, `staging`)
|
||||
- **Azure AI Foundry Prompt Flow**: Visuell prompt designer med versjonering og CI/CD-integrasjon
|
||||
- **Semantic Kernel Prompt Functions**: Prompts som code-artefakter i `.txt`-filer med Handlebars-syntax
|
||||
|
||||
```python
|
||||
# MLflow Prompt Registry eksempel
|
||||
import mlflow
|
||||
|
||||
prompt = mlflow.genai.register_prompt(
|
||||
name="mycatalog.myschema.customer_support",
|
||||
template="You are a helpful assistant. Answer this question: {{question}}",
|
||||
commit_message="Initial customer support prompt"
|
||||
)
|
||||
|
||||
mlflow.genai.set_prompt_alias(
|
||||
name="mycatalog.myschema.customer_support",
|
||||
alias="production",
|
||||
version=1
|
||||
)
|
||||
|
||||
# I applikasjon
|
||||
prompt = mlflow.genai.load_prompt(
|
||||
name_or_uri="prompts:/mycatalog.myschema.customer_support@production"
|
||||
)
|
||||
response = llm.invoke(prompt.format(question="How do I reset my password?"))
|
||||
```
|
||||
|
||||
**Konfidensgrad:** 90% — Prompt Registry er dokumentert, men adoption rates varierer.
|
||||
|
||||
### 2. Orchestration Layer
|
||||
|
||||
**Hva:** Systemet som håndterer logikk, kaller datakilder/agenter, genererer prompts og kaller LLM-modeller.
|
||||
|
||||
**Hvorfor:** Generative AI-løsninger er ikke bare modellen — de er komplekse workflows som krever orkestrering.
|
||||
|
||||
**Microsoft-alternativer:**
|
||||
- **Azure AI Foundry Agent Service**: Low-code agent-orkestrering
|
||||
- **Microsoft Agent Framework SDK (Semantic Kernel)**: Code-first orkestrering med C#/Python
|
||||
- **Prompt Flow**: Visuell workflow-designer for LLM-chains
|
||||
- **LangChain/LlamaIndex**: Open source (støttes av Azure ML)
|
||||
|
||||
**Deployment:**
|
||||
- Azure App Service (containerized orchestrator)
|
||||
- Azure Container Apps (serverless orchestrator)
|
||||
- Azure Kubernetes Service (high-scale orchestrator)
|
||||
- Azure Machine Learning Managed Online Endpoints
|
||||
|
||||
**Konfidensgrad:** 85% — Mange deployment-alternativer, best practice varierer med use case.
|
||||
|
||||
### 3. Vector Stores og Grounding Data
|
||||
|
||||
**Hva:** Datalagringsløsninger for RAG (Retrieval-Augmented Generation) som støtter vektor-søk.
|
||||
|
||||
**Azure-alternativer:**
|
||||
- **Azure AI Search**: Hybrid search (full-text + vector + semantic)
|
||||
- **Azure Cosmos DB for MongoDB vCore**: Vector search capabilities
|
||||
- **Azure Database for PostgreSQL (pgvector)**: Open source vector extension
|
||||
- **Databricks Vector Search**: Delta table-basert, auto-syncing
|
||||
|
||||
**DataOps-utvidelser for GenAIOps:**
|
||||
- **Chunking pipelines**: Split dokumenter i semantisk meningsfulle chunks (Azure Machine Learning pipelines)
|
||||
- **Embedding generation**: Batch-generering av embeddings (Azure OpenAI text-embedding-ada-002 / text-embedding-3-small)
|
||||
- **Index maintenance**: Incremental updates vs. full rebuilds (compliance: right-to-be-forgotten)
|
||||
- **Data freshness**: Real-time vs. batch refresh (business requirements)
|
||||
|
||||
**Konfidensgrad:** 90% — Dokumentert arkitektur, men chunking-strategier er eksperimentelle.
|
||||
|
||||
### 4. Evaluation Framework
|
||||
|
||||
**Hva:** LLM-spesifikke evalueringsmetrikker og human-in-the-loop feedback.
|
||||
|
||||
**Azure AI Foundry Evaluation SDK:**
|
||||
```python
|
||||
from azure.ai.evaluation import evaluate, RelevanceEvaluator, CoherenceEvaluator
|
||||
|
||||
model_config = {
|
||||
"azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
|
||||
"api_key": os.environ.get("AZURE_OPENAI_KEY"),
|
||||
"azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
|
||||
}
|
||||
|
||||
result = evaluate(
|
||||
data="test_data.jsonl",
|
||||
evaluators={
|
||||
"relevance": RelevanceEvaluator(model_config=model_config),
|
||||
"coherence": CoherenceEvaluator(model_config=model_config),
|
||||
},
|
||||
evaluator_config={
|
||||
"relevance": {
|
||||
"column_mapping": {
|
||||
"query": "${data.query}",
|
||||
"ground_truth": "${data.ground_truth}",
|
||||
"response": "${outputs.response}"
|
||||
}
|
||||
}
|
||||
},
|
||||
azure_ai_project=azure_ai_project,
|
||||
output_path="./evaluation_results.json"
|
||||
)
|
||||
```
|
||||
|
||||
**Evaluerings-dimensjoner:**
|
||||
| Use case | Metrikker |
|
||||
|----------|-----------|
|
||||
| **RAG** | Groundedness, relevance, coherence, fluency |
|
||||
| **Summarization** | ROUGE, BLEU, BERTScore, METEOR |
|
||||
| **Translation** | BLEU |
|
||||
| **Classification** | Precision, recall, accuracy, F1 |
|
||||
| **Content Safety** | Hate/violence/sexual/self-harm scores (Azure AI Content Safety) |
|
||||
|
||||
**Human Feedback Loop:**
|
||||
- **Mosaic AI Agent Framework Review App** (Databricks): UI for human reviewers
|
||||
- **Application Insights**: Thumbs up/down fra sluttbrukere
|
||||
- **Custom feedback APIs**: Integrasjon i enterprise workflows
|
||||
|
||||
**Konfidensgrad:** 95% — Built-in evaluators er godt dokumentert.
|
||||
|
||||
### 5. CI/CD for GenAIOps
|
||||
|
||||
**GenAIOps Prompt Flow Template** (Microsoft-anbefalt):
|
||||
- **Repository**: [microsoft/genaiops-promptflow-template](https://github.com/microsoft/genaiops-promptflow-template)
|
||||
- **CI/CD**: GitHub Actions eller Azure DevOps Pipelines
|
||||
- **Lifecycle**: Feature branch → PR → Dev → Staging → Production
|
||||
|
||||
**Pipeline-faser:**
|
||||
1. **PR Pipeline** (CI):
|
||||
- Flow validation
|
||||
- Unit testing av custom Python code
|
||||
- Variant experimentation
|
||||
- Evaluation runs mot test data
|
||||
2. **Dev Pipeline** (CI + CD):
|
||||
- Batch testing
|
||||
- Model/prompt registration (conditional)
|
||||
- Human-in-the-loop approval gate
|
||||
- Deployment til dev/staging endpoints
|
||||
3. **Production Pipeline** (CD):
|
||||
- Blue-green deployment
|
||||
- A/B testing (traffic splitting)
|
||||
- Canary deployment
|
||||
- Rollback capabilities
|
||||
|
||||
**Azure DevOps-integrasjon:**
|
||||
```yaml
|
||||
# Eksempel: Prompt Flow evaluation i Azure Pipelines
|
||||
- task: AzureCLI@2
|
||||
displayName: 'Run Prompt Flow Evaluation'
|
||||
inputs:
|
||||
azureSubscription: 'AzureML-ServiceConnection'
|
||||
scriptType: 'bash'
|
||||
scriptLocation: 'inlineScript'
|
||||
inlineScript: |
|
||||
az ml job create --file evaluation-job.yaml \
|
||||
--workspace-name $(ML_WORKSPACE) \
|
||||
--resource-group $(RESOURCE_GROUP)
|
||||
```
|
||||
|
||||
**Konfidensgrad:** 85% — Template er aktiv (2025), men requires customization.
|
||||
|
||||
### 6. Monitoring og Observability
|
||||
|
||||
**LLM-spesifikke overvåkningsdimensjoner:**
|
||||
|
||||
| Dimensjon | Hva overvåkes | Azure-verktøy |
|
||||
|-----------|---------------|---------------|
|
||||
| **Operational** | Latency, token usage, 429 errors, endpoint availability | Azure Monitor, Application Insights |
|
||||
| **Quality** | Groundedness, relevance, coherence, fluency (sampled) | Azure Machine Learning Model Monitoring (Generation Quality Signal) |
|
||||
| **Safety** | Harmful content detection (hate, violence, sexual, self-harm) | Azure AI Content Safety (real-time filtering) |
|
||||
| **Cost** | Token consumption per user/session, quota utilization | Azure Cost Management, API Management gateway logs |
|
||||
| **Data drift** | Changes in user query patterns, grounding data staleness | Azure ML Data Drift monitors |
|
||||
| **Feedback** | User ratings (thumbs up/down), session abandonment rate | Custom telemetry (Application Insights) |
|
||||
|
||||
**MLflow Tracing for GenAI:**
|
||||
```python
|
||||
import mlflow
|
||||
|
||||
# Automatisk tracing av OpenAI calls
|
||||
mlflow.openai.autolog()
|
||||
|
||||
# Custom trace decorators
|
||||
@mlflow.trace
|
||||
def my_rag_app(query: str):
|
||||
context = retrieve_from_vector_store(query)
|
||||
prompt = format_prompt(query, context)
|
||||
response = llm.invoke(prompt)
|
||||
return response
|
||||
```
|
||||
|
||||
**Azure AI Foundry Monitoring (SDK v2):**
|
||||
```python
|
||||
from azure.ai.ml.entities import (
|
||||
MonitorSchedule, GenerationSafetyQualitySignal,
|
||||
GenerationTokenStatisticsSignal
|
||||
)
|
||||
|
||||
# Quality monitoring
|
||||
gsq_signal = GenerationSafetyQualitySignal(
|
||||
connection_id=aoai_connection_id,
|
||||
metric_thresholds={
|
||||
"groundedness": {"aggregated_groundedness_pass_rate": 0.7},
|
||||
"relevance": {"aggregated_relevance_pass_rate": 0.7},
|
||||
},
|
||||
production_data=[production_data],
|
||||
sampling_rate=1.0
|
||||
)
|
||||
|
||||
# Token monitoring
|
||||
token_signal = GenerationTokenStatisticsSignal()
|
||||
|
||||
monitor = MonitorSchedule(
|
||||
name="genai-monitor",
|
||||
trigger=CronTrigger(expression="15 10 * * *"),
|
||||
create_monitor=MonitorDefinition(
|
||||
monitoring_signals={"quality": gsq_signal, "tokens": token_signal}
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
**Konfidensgrad:** 90% — Monitoring capabilities er dokumentert, men sampling rates må justeres for cost.
|
||||
|
||||
---
|
||||
|
||||
## Arkitekturmønstre
|
||||
|
||||
### 1. Fine-Tuning Pattern
|
||||
|
||||
|
|
|
|||
|
|
@ -4,6 +4,8 @@
|
|||
**Dato:** 2026-02-04
|
||||
**Forfattet av:** Cosmo Skyberg, Senior Microsoft AI Solution Architect
|
||||
|
||||
**Verified:** MCP 2026-04
|
||||
|
||||
## Introduksjon
|
||||
|
||||
Inferencing optimization og caching representerer kritiske teknikker for å maksimere ytelse og minimere kostnader når AI-modeller skal serve prediksjoner i produksjon. Mens model training handler om å oppnå høy accuracy, handler inferencing om å levere disse prediksjonene raskt, pålitelig og kostnadseffektivt til brukere og systemer.
|
||||
|
|
@ -1011,3 +1013,36 @@ Diagnostikk:
|
|||
- Monitor **cache hit rate** og **autoscaling metrics** kontinuerlig
|
||||
|
||||
**Confidence nivå: HIGH** — Denne referansen er basert på 12 MCP-kall til offisiell Microsoft-dokumentasjon og kodeeksempler.
|
||||
|
||||
|
||||
### ONNX Inferencing Optimization for Computer Vision (Azure ML AutoML 2026)
|
||||
|
||||
ONNX (Open Neural Network Exchange) enables cross-framework interoperability and inference optimization:
|
||||
|
||||
**Supported AutoML computer vision tasks**:
|
||||
- Image classification (binary and multi-class)
|
||||
- Object detection
|
||||
- Instance segmentation
|
||||
|
||||
**ONNX inference workflow**:
|
||||
1. Download ONNX model files from AutoML training run
|
||||
2. Understand model inputs/outputs (image format requirements)
|
||||
3. Preprocess data to required input format
|
||||
4. Run inference with ONNX Runtime Python API (`onnxruntime`)
|
||||
5. Post-process predictions (bounding boxes for detection, masks for segmentation)
|
||||
|
||||
**Python ONNX Runtime**:
|
||||
```python
|
||||
import onnxruntime as rt
|
||||
sess = rt.InferenceSession("model.onnx")
|
||||
# Works across languages: Python, C++, C#, Java, JavaScript
|
||||
```
|
||||
|
||||
**Cross-platform benefits**:
|
||||
- Deploy on any platform without framework dependencies
|
||||
- Reduced inference latency vs Python framework
|
||||
- Edge deployment: Azure IoT Edge, on-premises
|
||||
- Language flexibility post-export
|
||||
|
||||
**SDK**: `azure-ai-ml v2 (current)` — use AutoML image tasks to generate ONNX models automatically
|
||||
|
||||
|
|
|
|||
|
|
@ -4,6 +4,8 @@
|
|||
**Kategori:** MLOps & GenAIOps
|
||||
**Forfatter:** Cosmo Skyberg, Senior Microsoft AI Solution Architect
|
||||
|
||||
**Verified:** MCP 2026-04
|
||||
|
||||
## Introduksjon
|
||||
|
||||
Infrastructure as Code (IaC) er en fundamental MLOps-praksis der infrastruktur defineres og deployes gjennom kode fremfor manuelle konfigurasjoner. Dette er kritisk viktig for AI/ML-prosjekter fordi det sikrer reproducerbarhet, konsistens og versjonskontroll av hele ML-miljøet — fra development til production.
|
||||
|
|
@ -539,6 +541,42 @@ resource mlWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-01-01-pr
|
|||
|
||||
### IaC-verktøy kostnader
|
||||
|
||||
|
||||
### IaC Design for MLOps — Azure Well-Architected (OE:05) 2026
|
||||
|
||||
**Core principle** (Well-Architected OE:05): Standardized IaC approach with declarative syntax, consistent styles, appropriate modularization, quality assurance.
|
||||
|
||||
**Declarative over imperative** (recommended):
|
||||
- Bicep / ARM templates: Azure-native, JSON/DSL declarative
|
||||
- Terraform: Industry-standard, multi-cloud declarative
|
||||
- Avoid: imperative scripts for infrastructure state management
|
||||
|
||||
**Azure-native tools**:
|
||||
```bash
|
||||
# Bicep — deploy Azure ML workspace
|
||||
az deployment group create --template-file ml-workspace.bicep
|
||||
|
||||
# Terraform — integrated into GitHub Actions / Azure Pipelines
|
||||
terraform init && terraform apply
|
||||
```
|
||||
|
||||
**Layered IaC pipeline approach for MLOps**:
|
||||
- **Low-touch** (networking, VNet, ACR): Rarely changes, stable baseline
|
||||
- **Medium-touch** (compute clusters, storage, AKS): Occasional changes
|
||||
- **High-touch** (model endpoints, deployments): Continuous delivery
|
||||
|
||||
**IaC best practices**:
|
||||
- Treat IaC artifacts the same as application code (version control, PR reviews, testing)
|
||||
- Use parameters/variables for multi-environment support (dev/test/prod)
|
||||
- Collocate IaC with application code for synchronized deployments
|
||||
- Scan IaC repos for secrets (Microsoft Defender for Cloud: IaC vulnerability scanning)
|
||||
- Immutable infrastructure preferred for business-critical workloads
|
||||
|
||||
**AI opportunity** (2026): AI tools (GitHub Copilot) can review IaC templates, identify misconfigurations, suggest security improvements, and generate templates from natural language.
|
||||
|
||||
**MLOps v2 infrastructure**: `tf-gha-deploy-infra.yml` workflow in `Azure/mlops-v2-gha-demo` deploys full Azure ML infrastructure via Terraform + GitHub Actions.
|
||||
|
||||
|
||||
| Verktøy | Lisens | Kostnad |
|
||||
|---------|--------|---------|
|
||||
| **Bicep** | Open source (MIT) | Gratis |
|
||||
|
|
|
|||
|
|
@ -6,6 +6,8 @@
|
|||
|
||||
---
|
||||
|
||||
**Verified:** MCP 2026-04
|
||||
|
||||
## Introduksjon
|
||||
|
||||
LLM-evaluering i produksjonsmiljø er fundamentalt forskjellig fra tradisjonell ML-evaluering. Mens klassiske ML-modeller evalueres med deterministiske metrikker på statiske test-sett, krever generative AI-applikasjoner kontinuerlig evaluering av åpne, ikke-deterministiske output i dynamiske produksjonsscenarioer.
|
||||
|
|
@ -558,6 +560,38 @@ project = AIProjectClient.from_connection_string(
|
|||
|
||||
### MLflow 3 + Databricks Unity Catalog
|
||||
|
||||
|
||||
### MLflow 3 LLM Evaluation Framework (2026)
|
||||
|
||||
MLflow 3 (SDK `mlflow[databricks]>=3.1`) introduces a unified evaluation model:
|
||||
|
||||
**Core architecture**: Traces → Scorers → Feedback
|
||||
- Traces from `mlflow.genai.evaluate()` or production monitoring service
|
||||
- Scorers parse traces, assess quality, return `Feedback` objects
|
||||
- Same scorers used in development AND production (consistent lifecycle)
|
||||
|
||||
**Built-in LLM judges** (research-validated):
|
||||
|
||||
| Judge | Needs Ground Truth | Evaluates |
|
||||
|-------|-------------------|-----------|
|
||||
| `RelevanceToQuery` | No | Response relevance to user request |
|
||||
| `RetrievalGroundedness` | No | Hallucination detection |
|
||||
| `Safety` | No | Harmful/toxic content |
|
||||
| `Correctness` | Yes | Accuracy vs ground truth |
|
||||
| `Completeness` | Yes | All questions addressed |
|
||||
| `ToolCallCorrectness` | Yes | Tool calls and arguments |
|
||||
| `ToolCallEfficiency` | No | Redundant tool usage |
|
||||
| `Guidelines` | No | Custom natural-language rules |
|
||||
|
||||
**Multi-turn judges** (conversation-level): `ConversationCompleteness`, `UserFrustration`, `KnowledgeRetention`, `ConversationalSafety`
|
||||
|
||||
**Production monitoring**: Automatically runs scorers on production traces; uses Databricks-hosted LLM judges (EU workspaces: EU-hosted models). No prompts stored with Azure OpenAI (Abuse Monitoring opt-out).
|
||||
|
||||
**Custom judges**: Full control over evaluation criteria, scores (numerical/categorical/boolean), human feedback alignment via `align_judges()`.
|
||||
|
||||
**Key note**: MLflow 3 replaced Agent Evaluation SDK — migrate with `mlflow.genai.*` functions.
|
||||
|
||||
|
||||
**Enterprise governance for AI:**
|
||||
|
||||
```
|
||||
|
|
|
|||
|
|
@ -1,11 +1,14 @@
|
|||
# MLOps Fundamentals - Lifecycle and Principles
|
||||
|
||||
**Last updated:** 2026-02
|
||||
**Verified:** MCP 2026-04
|
||||
**Status:** GA
|
||||
**Category:** MLOps & GenAIOps
|
||||
|
||||
---
|
||||
|
||||
**Verified:** MCP 2026-04
|
||||
|
||||
## Introduksjon
|
||||
|
||||
Machine Learning Operations (MLOps) er anvendelse av DevOps-prinsipper på machine learning-prosjekter. Målet er å automatisere og effektivisere hele ML-livssyklusen – fra eksperimentering og trening, via deployment, til overvåking og retrening. MLOps bygger på etablert DevOps-praksis som continuous integration (CI), continuous deployment (CD), version control, og infrastructure as code (IaC), men legger til ML-spesifikke utfordringer som data versioning, model tracking, feature engineering, og drift detection.
|
||||
|
|
@ -292,6 +295,49 @@ jobs:
|
|||
|
||||
### DevOps-verktøy
|
||||
|
||||
|
||||
### DevOps for Machine Learning — Azure DevOps Integration (2026)
|
||||
|
||||
**Azure Pipelines + Azure ML** (how-to-devops-machine-learning):
|
||||
|
||||
Automate the ML lifecycle via Azure DevOps pipelines:
|
||||
1. Data preparation (ETL)
|
||||
2. On-demand scale-out training
|
||||
3. Model deployment (public/private web service)
|
||||
4. Monitoring (performance, data drift)
|
||||
|
||||
**Azure DevOps pipeline YAML pattern**:
|
||||
```yaml
|
||||
- task: AzureCLI@2
|
||||
inputs:
|
||||
azureSubscription: $(service-connection)
|
||||
inlineScript: |
|
||||
job_name=$(az ml job create --file pipeline.yml -g $(resource-group) -w $(workspace) --query name -o tsv)
|
||||
echo "##vso[task.setvariable variable=JOB_NAME;isOutput=true;]$job_name"
|
||||
|
||||
- job: WaitForJobCompletion
|
||||
pool: server # Server job — no agent costs
|
||||
steps:
|
||||
- task: AzureMLJobWaitTask@1 # From Azure ML extension
|
||||
inputs:
|
||||
serviceConnection: $(service-connection)
|
||||
azureMLJobName: $(azureml_job_name)
|
||||
```
|
||||
|
||||
**Authentication options**:
|
||||
- Azure Resource Manager service connection (recommended with Azure ML extension)
|
||||
- Generic service connection (uses InvokeRESTAPI task)
|
||||
|
||||
**MLOps maturity model**: Manual → Partial automation → Full CI/CD → Full MLOps with monitoring
|
||||
|
||||
**Key automation operations** (Azure DevOps):
|
||||
- Infrastructure deployment (Terraform / Bicep)
|
||||
- Component registration and versioning
|
||||
- Model training on compute clusters
|
||||
- Online/batch endpoint deployment
|
||||
- Production monitoring alerts
|
||||
|
||||
|
||||
| Verktøy | Kostnad | Anbefaling |
|
||||
|---------|---------|-----------|
|
||||
| **Azure DevOps** | Gratis for 5 brukere + 1800 min/mnd pipeline | Bruk Basic plan for mindre team |
|
||||
|
|
|
|||
|
|
@ -1,7 +1,8 @@
|
|||
# Security and Access Control in MLOps
|
||||
|
||||
**Kategori:** MLOps & GenAIOps
|
||||
**Dato:** 2026-02-04
|
||||
**Last updated:** 2026-04 | Verified: MCP 2026-04
|
||||
**Dato:** 2026-04-10
|
||||
**Confidence:** HIGH — Basert på offisiell Microsoft Learn dokumentasjon (8 MCP-oppslag, 16 kilder)
|
||||
|
||||
---
|
||||
|
|
@ -747,3 +748,30 @@ AmlComputeClusterNodeEvent
|
|||
- ✅ HIGH confidence: Offisiell dokumentasjon + kodeeksempler fra Microsoft Learn
|
||||
- ⚠️ MEDIUM confidence: Utledet fra best practices og architecture patterns
|
||||
- ❓ LOW confidence: Ikke aktuelt (alle påstander er verifisert mot offisiell dokumentasjon)
|
||||
|
||||
|
||||
### Azure Machine Learning VNet Security (2026 Update)
|
||||
|
||||
**Managed Virtual Networks** (recommended approach): Azure ML handles network isolation automatically.
|
||||
Use `az ml workspace update` with managed network settings instead of manual VNet configuration.
|
||||
|
||||
**Private Endpoint for Workspace**:
|
||||
- Connects workspace via private IP addresses within your VNet
|
||||
- Requires securing all dependent resources: Storage, Key Vault, Container Registry
|
||||
- Private endpoint alone does NOT ensure end-to-end security — all components must be secured
|
||||
|
||||
**Storage Account Security**:
|
||||
- Private endpoint (recommended): Blob, File, Queue, Table subresources
|
||||
- Service endpoint: Must be same VNet and subnet as compute
|
||||
- Set `Microsoft.MachineLearningServices/Workspace` as trusted resource type
|
||||
|
||||
**Required outbound traffic service tags**:
|
||||
- `AzureActiveDirectory` (TCP 443) — authentication
|
||||
- `AzureMachineLearning` (TCP 443, 18881, UDP 5831)
|
||||
- `Storage.region` (TCP 443) — data access
|
||||
- `MicrosoftContainerRegistry.region` (TCP 443) — Docker images
|
||||
|
||||
**Secure connectivity options**: Azure VPN Gateway (Point-to-site/Site-to-site), ExpressRoute, Azure Bastion (jump box)
|
||||
|
||||
**ACR requirements**: Premium SKU required for private endpoints; ACR must be in same VNet or peered VNet.
|
||||
|
||||
|
|
|
|||
|
|
@ -5,6 +5,8 @@
|
|||
**Kilde:** Microsoft Learn, Azure Architecture Center
|
||||
**Konfidensgradering:** ⭐⭐⭐⭐⭐ (Verifisert mot offisiell Microsoft-dokumentasjon)
|
||||
|
||||
**Verified:** MCP 2026-04
|
||||
|
||||
## Introduksjon
|
||||
|
||||
Vellykkede MLOps-implementeringer krever samarbeid mellom flere teamroller med ulike verktøy, arbeidsflyter og ansvar. Denne referansen dekker hvordan ulike personas samarbeider gjennom machine learning-livssyklusen, hvilke verktøy som støtter samarbeid, og hvordan organisasjoner kan strukturere teamarbeid for maksimal effektivitet.
|
||||
|
|
@ -123,6 +125,32 @@ MLOps-miljøer opererer med distinkte roller som hver har spesifikke ansvarsomr
|
|||
**Konfidensmarkør:** ⭐⭐⭐⭐⭐ Azure Boards er core DevOps-plattform med native Azure DevOps-integrasjon.
|
||||
|
||||
#### Azure DevOps / GitHub Actions
|
||||
|
||||
|
||||
### Azure DevOps — Integrated MLOps Platform (2026)
|
||||
|
||||
Azure DevOps provides end-to-end project management for ML teams:
|
||||
|
||||
| Service | ML Use Case |
|
||||
|---------|-------------|
|
||||
| **Azure Boards** | Sprint planning for model iterations, bug tracking, backlog management |
|
||||
| **Azure Repos** | Git repositories for model code, notebooks, IaC; branch policies + PR reviews |
|
||||
| **Azure Pipelines** | CI/CD for ML (build, test, train, deploy); integrates with Azure ML via `AzureMLJobWaitTask@1` |
|
||||
| **Azure Test Plans** | Manual testing of model outputs, test case management |
|
||||
| **Azure Artifacts** | Package feeds (NuGet, pip, conda) for ML libraries and shared components |
|
||||
|
||||
**Azure DevOps MCP Server**: Natural language queries for project management — `Summarize sprint status`, `List blocked work items`, `Show pipeline success rates` (2026 feature).
|
||||
|
||||
**GitHub Actions integration** (alternative to Azure Pipelines):
|
||||
- OIDC authentication (recommended, no long-lived secrets)
|
||||
- `azure/login@v2` + `az ml job create` pattern
|
||||
- MLOps v2 solution accelerator: `Azure/mlops-v2-gha-demo`
|
||||
|
||||
**Databricks CI/CD best practices**:
|
||||
- Feature branching with short-lived branches
|
||||
- Automated notebook testing before merge
|
||||
- MLflow experiment tracking integrated into PR workflows
|
||||
|
||||
**Formål:** CI/CD automation for ML lifecycle
|
||||
**Nøkkelkapabiliteter:**
|
||||
- Pipeline-basert workflow automation
|
||||
|
|
|
|||
|
|
@ -5,6 +5,8 @@
|
|||
**Målgruppe:** Arkitekter som planlegger ML-modellutplassering i produksjon
|
||||
**Konfidensgrad:** ⚡️⚡️⚡️ Høy (basert på Microsoft Learn + offisielle code samples)
|
||||
|
||||
**Verified:** MCP 2026-04
|
||||
|
||||
## Introduksjon
|
||||
|
||||
Model deployment strategies handler om hvordan man trygt og effektivt ruller ut nye ML-modeller eller modellversjoner til produksjon uten å forårsake nedetid eller forringet brukeropplevelse. Azure Machine Learning tilbyr flere deployment patterns som støtter **progressive exposure**, **traffic routing**, og **rollback-mekanismer**.
|
||||
|
|
@ -1050,3 +1052,36 @@ Denne kunnskapsreferansen er basert på følgende Microsoft Learn-artikler og co
|
|||
|
||||
**Sist oppdatert:** 2026-02-04
|
||||
**Neste review:** 2026-05-04 (eller ved større endringer i Azure ML deployment capabilities)
|
||||
|
||||
|
||||
### Safe Rollout / Blue-Green Deployment (Azure Well-Architected 2026)
|
||||
|
||||
Azure ML managed online endpoints support blue-green (safe rollout) deployments natively:
|
||||
|
||||
```bash
|
||||
# Deploy green deployment with 0% traffic initially
|
||||
az ml online-deployment create --name green --endpoint my-endpoint --traffic-allocation 0
|
||||
|
||||
# Test green deployment in isolation (direct routing)
|
||||
az ml online-endpoint invoke --name my-endpoint --deployment-name green
|
||||
|
||||
# Mirror 10% of live traffic to green for shadow testing
|
||||
# Then progressively shift: 10% → 50% → 100%
|
||||
az ml online-endpoint update --name my-endpoint --traffic blue=90 green=10
|
||||
```
|
||||
|
||||
**Azure Well-Architected SDP principles (OE:11)**:
|
||||
- **Progressive exposure**: Canary → Blue-Green → Deployment Stamps
|
||||
- **Health models**: Pass health checks before each rollout phase
|
||||
- **Bake time**: Hours/days between phases (not minutes) to capture time-zone usage patterns
|
||||
- **Failure detection**: Automatic halt + investigation when health signals degrade
|
||||
- **Recovery options**: Roll back (revert), roll forward (hotfix), or redeploy last known good
|
||||
|
||||
**Azure facilitation**:
|
||||
- `Azure Pipelines` + `GitHub Actions` support multi-stage deployments with approval gates
|
||||
- `Azure App Configuration` for feature flag management
|
||||
- `Azure Load Balancers` for traffic routing and health monitoring
|
||||
- Point-in-time restore available for Azure SQL, Cosmos DB, MySQL, PostgreSQL
|
||||
|
||||
**Emergency SDP**: Prescriptive protocols for hotfix acceleration — approval stage and bake time reduction — with explicit approval criteria.
|
||||
|
||||
|
|
|
|||
|
|
@ -1,11 +1,14 @@
|
|||
# Model Drift and Performance Degradation Detection
|
||||
|
||||
**Last updated:** 2026-02
|
||||
**Verified:** MCP 2026-04
|
||||
**Status:** GA
|
||||
**Category:** MLOps & GenAIOps
|
||||
|
||||
---
|
||||
|
||||
**Verified:** MCP 2026-04
|
||||
|
||||
## Introduksjon
|
||||
|
||||
Model drift og performance degradation er kritiske fenomener som oppstår når en maskinlæringsmodells ytelse forverres over tid i produksjon. Dette skjer fordi virkeligheten endrer seg – input-data får andre distribusjoner, forretningslogikk endres, sensorer kalibreres feil, eller brukernes atferd endrer seg. Uten kontinuerlig overvåking kan modeller raskt bli utdaterte og levere feil prediksjoner som undergraver forretningsmål eller skaper compliance-problemer i regulerte sektorer.
|
||||
|
|
@ -633,3 +636,39 @@ Email Alerts + Azure Monitor Dashboard
|
|||
### Sist oppdatert
|
||||
|
||||
**2026-02** – Basert på Microsoft Learn-dokumentasjon (azure-ai-ml SDK v2, API version 2).
|
||||
|
||||
|
||||
### Azure ML Model Drift & Performance Degradation Monitoring (2026)
|
||||
|
||||
**Model monitoring** provides continuous tracking of production model performance:
|
||||
|
||||
**Degradation signals**:
|
||||
- **Prediction drift**: Output distribution shifts away from training baseline
|
||||
- **Feature attribution drift**: Feature importance changes indicate concept drift
|
||||
- **Data quality degradation**: Input data quality issues upstream
|
||||
- **Performance metric degradation**: Track against ground truth when labels available
|
||||
|
||||
**Monitoring configuration**:
|
||||
```python
|
||||
# Set up monitoring for deployed models on online endpoints
|
||||
# Azure ML handles data collection and signal computation
|
||||
# Monitoring jobs run on schedule (default: daily)
|
||||
```
|
||||
|
||||
**Alert thresholds** (recommended):
|
||||
- Data drift coefficient > 0.1: Investigate
|
||||
- Data drift coefficient > 0.3: Retrain trigger
|
||||
- Prediction drift > 15%: Production alert
|
||||
- Unusable nodes > 0: Infrastructure alert (Azure Monitor)
|
||||
|
||||
**Continuous learning loop**:
|
||||
1. Monitor signals → detect drift early
|
||||
2. Critically evaluate inherent model risks
|
||||
3. Identify hidden problems before business impact
|
||||
4. Trigger retraining or model update workflow
|
||||
5. Validate new model before rollout (blue-green/canary)
|
||||
|
||||
**Integration**: Azure Event Grid for alerting → Logic Apps / Functions → automated retraining trigger
|
||||
|
||||
**For GenAI/LLM**: MLflow 3 production monitoring reuses development scorers (Groundedness, Relevance) on production traces — consistent quality measurement throughout lifecycle.
|
||||
|
||||
|
|
|
|||
|
|
@ -1,11 +1,14 @@
|
|||
# Model Evaluation Frameworks and Metrics
|
||||
|
||||
**Last updated:** 2026-02
|
||||
**Verified:** MCP 2026-04
|
||||
**Status:** GA
|
||||
**Category:** MLOps & GenAIOps
|
||||
|
||||
---
|
||||
|
||||
**Verified:** MCP 2026-04
|
||||
|
||||
## Introduksjon
|
||||
|
||||
Evaluering av AI-modeller, spesielt generative AI-applikasjoner, krever en helt annen tilnærming enn tradisjonell maskinlæring. Mens tradisjonell ML fokuserer på deterministiske metrikker som accuracy og precision, må GenAI-evaluering håndtere multi-turn-samtaler, kontekstuell relevans, sikkerhet og subjektiv kvalitet. Microsoft tilbyr et omfattende rammeverk for modellevaluering gjennom Azure AI Foundry, Azure Machine Learning Prompt Flow og MLflow 3, som dekker hele utviklingsløpet fra modellvalg til produksjonsovervåking.
|
||||
|
|
@ -107,6 +110,42 @@ print(f"Foundry URL: {result.get('studio_url')}")
|
|||
|
||||
### MLflow 3 Evaluation & Monitoring
|
||||
|
||||
|
||||
### MLflow 3 Evaluation Framework (2026)
|
||||
|
||||
MLflow 3 provides the evaluation framework for both traditional ML and GenAI applications on Databricks:
|
||||
|
||||
**Scorer types** (unified interface for all evaluation):
|
||||
|
||||
| Type | Customization | Use Case |
|
||||
|------|--------------|---------|
|
||||
| Built-in judges | Minimal | Quick evaluation: `Correctness`, `RetrievalGroundedness`, `Safety` |
|
||||
| Guidelines judges | Moderate | Custom natural-language rules (pass/fail) |
|
||||
| Custom LLM judges | Full | Domain-specific criteria, detailed scoring |
|
||||
| Code-based scorers | Full | Deterministic: exact match, format validation, business logic |
|
||||
|
||||
**Key evaluation functions**:
|
||||
```python
|
||||
import mlflow
|
||||
|
||||
# Development evaluation
|
||||
results = mlflow.genai.evaluate(
|
||||
data=eval_dataset,
|
||||
scorers=[RelevanceToQuery(), RetrievalGroundedness(), Correctness()]
|
||||
)
|
||||
|
||||
# Production monitoring — same scorers as development
|
||||
# Automatically applied to production traces
|
||||
```
|
||||
|
||||
**Judge accuracy**: Databricks validates with Cohen's Kappa, accuracy, F1 score against human expert judgment.
|
||||
|
||||
**Traditional ML evaluation** (Azure ML):
|
||||
- Data quality signals: null rate, out-of-bounds, type errors
|
||||
- Statistical drift: Jensen-Shannon divergence, Wasserstein distance
|
||||
- Custom metrics via Python scripts in monitoring jobs
|
||||
|
||||
|
||||
MLflow 3 integrerer evaluering og production monitoring i én workflow. Samme LLM judges og scorers kan brukes i development, testing og production.
|
||||
|
||||
**Hovedkomponenter:**
|
||||
|
|
|
|||
|
|
@ -1,11 +1,14 @@
|
|||
# Model Versioning and Registry Management
|
||||
|
||||
**Last updated:** 2026-02
|
||||
**Verified:** MCP 2026-04
|
||||
**Status:** GA
|
||||
**Category:** MLOps & GenAIOps
|
||||
|
||||
---
|
||||
|
||||
**Verified:** MCP 2026-04
|
||||
|
||||
## Introduksjon
|
||||
|
||||
Model versioning og registry management er fundamentale komponenter i MLOps-livssyklusen som sikrer sporbarhet, reproduserbarhet og effektiv styring av maskinlæringsmodeller gjennom hele deres levetid. Azure Machine Learning tilbyr to primære tilnærminger: workspace model registry for team-intern bruk og Azure Machine Learning registry for tverrorganisatorisk deling. Begge støtter MLflow som standardformat, noe som gir portabilitet og integrasjon med et bredt økosystem av verktøy.
|
||||
|
|
@ -31,6 +34,44 @@ Azure Machine Learning skiller seg fra tradisjonelle Git-baserte tilnærminger v
|
|||
|
||||
### Registry-typer sammenlignet
|
||||
|
||||
|
||||
### Azure Machine Learning Cross-Workspace Registry (2026)
|
||||
|
||||
**Azure ML Registry** enables model, component, and environment sharing across workspaces and subscriptions:
|
||||
|
||||
**Two primary scenarios**:
|
||||
1. **Cross-workspace MLOps**: Train in `dev` → deploy to `test`/`prod` with full lineage (code, data, environment)
|
||||
2. **Cross-team sharing**: Publish models/components to central catalog for reuse across teams
|
||||
|
||||
**Registry operations** (CLI v2 / Python SDK v2):
|
||||
```bash
|
||||
# Create model in registry (from local files)
|
||||
az ml model create --name nyc-taxi-model --version 1 --type mlflow_model --path ./artifacts/model/ --registry-name <registry-name>
|
||||
|
||||
# Share model from workspace to registry (preserves training lineage)
|
||||
az ml model share --name nyc-taxi-model --version 1 --registry-name <registry-name> --share-with-name <new-name> --share-with-version 1
|
||||
|
||||
# Deploy model from registry to any workspace
|
||||
# (model: azureml://registries/<registry-name>/models/<name>/versions/<v>)
|
||||
az ml online-deployment create --file deploy.yml --all-traffic
|
||||
```
|
||||
|
||||
**Python SDK pattern**:
|
||||
```python
|
||||
ml_client_registry = MLClient(credential=credential, registry_name="<REGISTRY_NAME>")
|
||||
ml_client_workspace = MLClient(credential=credential, workspace_name="<WS_NAME>", ...)
|
||||
|
||||
# Create in registry
|
||||
ml_client_registry.models.create_or_update(mlflow_model)
|
||||
# Deploy from registry to workspace
|
||||
ml_client_workspace.online_deployments.begin_create_or_update(deployment)
|
||||
```
|
||||
|
||||
**Lineage tracking**: Models registered from job outputs link back to training job, code, data, and environment.
|
||||
**Access control**: ACR token-based access (workspace compute has `AcrPull` via registry's managed identity).
|
||||
**MLflow format**: Required for no-code deployment with built-in scoring server.
|
||||
|
||||
|
||||
| Egenskap | Workspace Registry | Azure ML Registry |
|
||||
|----------|-------------------|-------------------|
|
||||
| **Scope** | Enkelt workspace | Multi-workspace, cross-subscription |
|
||||
|
|
|
|||
|
|
@ -7,6 +7,8 @@
|
|||
|
||||
---
|
||||
|
||||
**Verified:** MCP 2026-04
|
||||
|
||||
## Introduksjon
|
||||
|
||||
Monitoring og observability for ML-systemer handler om kontinuerlig overvåkning av modeller i produksjon for å sikre ytelse, kvalitet og pålitelighet. Azure tilbyr et komplett økosystem for ML-overvåkning gjennom Azure Machine Learning Model Monitoring og Azure Monitor, som til sammen gir innsikt i både **modellytelse** (data science-perspektiv) og **operasjonell helse** (infrastruktur-perspektiv).
|
||||
|
|
@ -297,6 +299,45 @@ create_monitor:
|
|||
|
||||
### Azure Monitor
|
||||
|
||||
|
||||
### Azure Machine Learning Monitoring Architecture (2026)
|
||||
|
||||
**Azure Monitor integration**:
|
||||
- All metrics in namespace: `Machine Learning Service Workspace`
|
||||
- Platform metrics collected automatically, no configuration needed
|
||||
- Route resource logs to Log Analytics for querying with KQL
|
||||
|
||||
**Key Kusto (KQL) queries**:
|
||||
```kusto
|
||||
# Failed jobs last 5 days
|
||||
AmlComputeJobEvent
|
||||
| where TimeGenerated > ago(5d) and EventType == "JobFailed"
|
||||
| project TimeGenerated, ClusterId, EventType, ExecutionState, ToolType
|
||||
|
||||
# Failed online endpoint requests
|
||||
AmlOnlineEndpointTrafficLog
|
||||
| where TimeGenerated > ago(1d) and ResponseCode != 200
|
||||
| project TimeGenerated, EndpointName, DeploymentName, ResponseCode
|
||||
```
|
||||
|
||||
**Recommended alert rules**:
|
||||
| Alert | Condition | Threshold |
|
||||
|-------|-----------|-----------|
|
||||
| Model Deploy Failed | Total > 0 | Any failure |
|
||||
| Quota Utilization | Average > 90% | High utilization |
|
||||
| Unusable Nodes | Total > 0 | Any unusable |
|
||||
|
||||
**Application Insights integration**: Live metrics, Transaction search, Failures, Performance analysis.
|
||||
Use workspace-based Application Insights (default for new workspaces) + Azure Monitor Private Link for VNet isolation.
|
||||
|
||||
**Data storage layers**:
|
||||
- Metrics database: Platform metrics (near real-time)
|
||||
- Log Analytics: Resource logs + Activity log (queryable with KQL)
|
||||
- Azure Storage / Event Hubs: Long-term export
|
||||
|
||||
**Cross-workspace monitoring**: Use single Log Analytics workspace for multiple Azure ML workspaces to query across all resources simultaneously.
|
||||
|
||||
|
||||
**Application Insights** (📊 For endpoints):
|
||||
```python
|
||||
# Enable for online endpoint
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue