Key content changes: - MLOps: MLflow 3 scorers expanded (RetrievalRelevance, Fluency, multi-turn judges) - MLflow 3 A/B eval: mirror_traffic GA confirmed, new scorer catalog - CI/CD: OIDC auth replaces deprecated --sdk-auth (Azure ML GitHub Actions) - Agent framework A2A: updated SDK patterns (A2ACardResolver, BearerAuth) - AG-UI backend tool rendering: accurate TOOL_CALL_* event shapes - Computer Use agents: US region requirement, credentials patterns - Purview governance: bulk term edit, expire/delete workflows - CAF AI Secure: 3-phase structure confirmed current - Copilot Studio: Claude Sonnet 4.5/4.6 GA, new orchestration controls - M365 manifest: v1.26 GA (April 2026), copilotAgents node - Power Platform: agent flow capacity enforcement corrected - Azure Monitor: Simple Log Alerts GA, AMBA for policy-based alerting - Security Copilot: SCU capacity model (400 SCU/1000 users) - EU Data Boundary: all EU + EFTA countries confirmed - gateway-multi-backend: added 4th topology, subscription-level quota note Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
36 KiB
Infrastructure as Code for MLOps
Dato: 2026-04 Kategori: MLOps & GenAIOps Forfatter: Cosmo Skyberg, Senior Microsoft AI Solution Architect
Verified: MCP 2026-04
Introduksjon
Infrastructure as Code (IaC) er en fundamental MLOps-praksis der infrastruktur defineres og deployes gjennom kode fremfor manuelle konfigurasjoner. Dette er kritisk viktig for AI/ML-prosjekter fordi det sikrer reproducerbarhet, konsistens og versjonskontroll av hele ML-miljøet — fra development til production.
Hvorfor IaC er essensielt for MLOps:
- Eliminerer "snowflake environments" — manuelt konfigurerte miljøer som ikke kan reproduseres
- Idempotens — samme deployment-kommando gir alltid samme resultat, uavhengig av starttilstand
- Versjonskontroll — infrastruktur behandles som kode og lagres i Git
- Rask provisjonering av testmiljøer — on-demand scaling av ML-compute og workspace-ressurser
- Auditspor og compliance — alle infrastrukturendringer er sporbare
Confidence: VERY_HIGH — IaC er en core DevOps/MLOps-praksis dokumentert grundig i Microsoft Learn og Azure Well-Architected Framework.
Kjernekomponenter
1. Deklarative vs. imperative IaC-verktøy
IaC-verktøy kategoriseres i to hovedtyper:
Deklarative verktøy (anbefalt for MLOps):
- Bicep — Microsoft sitt domain-specific language (DSL) for Azure, kompilerer til ARM templates
- ARM templates (JSON) — Azure Resource Manager templates, native Azure-format
- Terraform — multi-cloud IaC-verktøy med Azure provider
Imperative verktøy:
- Azure CLI scripts — bash/PowerShell-scripts med
azkommandoer - PowerShell DSC — for VM-konfigurasjon
Anbefaling: Bruk deklarative verktøy (Bicep/Terraform) for infrastruktur, Azure CLI for orchestration i pipelines.
2. Azure Machine Learning workspace-ressurser
En Azure ML workspace krever flere associated resources som må provisjoneres:
| Ressurs | Formål | IaC-krav |
|---|---|---|
| Azure ML Workspace | Sentral hub for ML-arbeid | Microsoft.MachineLearningServices/workspaces |
| Storage Account | Data, modeller, artifacts | Microsoft.Storage/storageAccounts |
| Key Vault | Secrets, credentials | Microsoft.KeyVault/vaults |
| Application Insights | Monitoring, telemetry | Microsoft.Insights/components |
| Container Registry | Docker images for miljøer | Microsoft.ContainerRegistry/registries |
| Compute resources | Training/inference compute | Compute clusters, instances, endpoints |
Viktig: Disse ressursene kan opprettes automatisk ved workspace creation, men for produksjon bør de defineres eksplisitt i IaC for full kontroll over networking, RBAC og compliance.
3. Bicep-basert IaC for Azure ML
Eksempel: Minimal Azure ML workspace
resource aiResource 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = {
name: workspaceName
location: location
identity: {
type: 'SystemAssigned'
}
properties: {
friendlyName: workspaceName
keyVault: keyVault.id
storageAccount: storage.id
applicationInsights: appInsights.id
containerRegistry: containerRegistry.id
publicNetworkAccess: 'Enabled'
}
}
Modular Bicep-struktur (best practice):
/infrastructure
├── main.bicep # Hovedfil med parameters og orchestration
├── modules/
│ ├── ai-hub.bicep # Azure ML workspace
│ ├── dependent-resources.bicep # Storage, KV, ACR, AppInsights
│ ├── networking.bicep # VNet, subnets, private endpoints
│ └── compute.bicep # Compute clusters
└── parameters/
├── dev.bicepparam
└── prod.bicepparam
Confidence: VERY_HIGH — Dette følger official Azure quickstart templates for Azure ML (github.com/Azure/azure-quickstart-templates).
4. Terraform-basert IaC for Azure ML
Eksempel: Public network workspace
resource "azurerm_machine_learning_workspace" "default" {
name = "${random_pet.prefix.id}-mlw"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
application_insights_id = azurerm_application_insights.default.id
key_vault_id = azurerm_key_vault.default.id
storage_account_id = azurerm_storage_account.default.id
container_registry_id = azurerm_container_registry.default.id
public_network_access_enabled = true
identity {
type = "SystemAssigned"
}
}
Terraform workflow:
# Initialiser Terraform providers
terraform init
# Plan deployment (dry-run)
terraform plan -out ml-workspace.tfplan
# Apply deployment
terraform apply ml-workspace.tfplan
Terraform vs. Bicep:
| Kriterium | Terraform | Bicep |
|---|---|---|
| Multi-cloud | ✅ Støtter AWS, GCP, Azure | ❌ Kun Azure |
| Learning curve | Moderat (HCL syntax) | Lav (JSON-liknende) |
| State management | Requires state file (remote backend) | Ingen state file (ARM managed) |
| Community modules | Stor ecosystem | Mindre, men voksende |
| Azure integration | Via provider | Native, first-class |
For Norge offentlig: Bicep er ofte foretrukket fordi det er Microsofts native løsning med tett integrasjon med Azure governance-verktøy (Policy, Blueprints).
5. Private network (VNet-isolated) workspaces
For sikkerhetskritiske miljøer må workspace isoleres i et VNet med private endpoints:
Bicep-konfigurasjon:
resource mlWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = {
name: workspaceName
location: location
properties: {
publicNetworkAccess: 'Disabled'
imageBuildCompute: 'image-builder-cluster' // Required for ACR private endpoint
}
}
resource privateEndpoint 'Microsoft.Network/privateEndpoints@2023-04-01' = {
name: 'ple-${workspaceName}'
location: location
properties: {
subnet: {
id: workspaceSubnet.id
}
privateLinkServiceConnections: [{
name: 'psc-${workspaceName}'
properties: {
privateLinkServiceId: mlWorkspace.id
groupIds: ['amlworkspace']
}
}]
}
}
Viktig: Når både ACR og Azure ML har private endpoints, kan du IKKE bruke ACR tasks for image building. Du må definere en compute cluster for dette formålet via imageBuildCompute property.
Confidence: HIGH — Dokumentert i Azure ML docs, men private endpoint-konfigurasjon krever nøye testing per scenario.
Arkitekturmønstre
1. Basic workspace pattern (development)
Bruk: Utforskning, prototyping, ikke-sensitiv data
┌─────────────────────────────────────────┐
│ Resource Group │
│ ┌───────────────────────────────────┐ │
│ │ Azure ML Workspace │ │
│ │ - Public network access │ │
│ │ - System-assigned identity │ │
│ └───────────────────────────────────┘ │
│ ┌───────────────────────────────────┐ │
│ │ Dependent Resources │ │
│ │ - Storage Account (GRS) │ │
│ │ - Key Vault (standard) │ │
│ │ - Container Registry (basic) │ │
│ │ - Application Insights │ │
│ └───────────────────────────────────┘ │
└─────────────────────────────────────────┘
IaC-tilnærming:
- Single
main.bicepellerworkspace.tffile - Parameter files for dev/test/staging
- Deploy via Azure CLI/Terraform CLI
2. Secure workspace pattern (production)
Bruk: Produksjon, HBI (High Business Impact) data, compliance
┌────────────────────────────────────────────────┐
│ Resource Group │
│ ┌──────────────────────────────────────────┐ │
│ │ VNet (10.0.0.0/16) │ │
│ │ ├─ Subnet: training (10.0.1.0/24) │ │
│ │ ├─ Subnet: workspace (10.0.0.0/24) │ │
│ │ └─ Subnet: endpoints (10.0.2.0/24) │ │
│ └──────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────┐ │
│ │ Azure ML Workspace │ │
│ │ - Public access: DISABLED │ │
│ │ - Private endpoint in workspace subnet │ │
│ │ - Managed identity + RBAC │ │
│ └──────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────┐ │
│ │ Private endpoints for: │ │
│ │ - Storage (blob + file) │ │
│ │ - Key Vault │ │
│ │ - Container Registry │ │
│ └──────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────┐ │
│ │ Private DNS Zones │ │
│ │ - privatelink.api.azureml.ms │ │
│ │ - privatelink.notebooks.azure.net │ │
│ │ - privatelink.blob.core.windows.net │ │
│ │ - privatelink.vaultcore.azure.net │ │
│ └──────────────────────────────────────────┘ │
└────────────────────────────────────────────────┘
IaC-tilnærming:
- Modular Bicep/Terraform med separate network.bicep/network.tf
- Managed identities for all services (ingen keys i config)
- Azure Policy enforcement for network isolation
- Private DNS zones for name resolution
Norge offentlig: Følg NSMs grunnprinsipper for nettverkssegmentering. Private endpoints er ofte påkrevd for data klassifisert som begrenset/fortrolig.
3. Hub-and-spoke pattern (multi-environment)
Bruk: Enterprise-scale med delte services og multiple workspaces
┌──────────────────────────────────────────────────┐
│ Hub Resource Group │
│ ├─ Shared Container Registry │
│ ├─ Shared Key Vault (certificates) │
│ ├─ Azure Firewall / VPN Gateway │
│ └─ Monitoring (Log Analytics, App Insights) │
└──────────────────────────────────────────────────┘
│ VNet peering
├────────────────────────────┬──────────
│ │
┌──────────▼───────────┐ ┌───────────▼──────────┐
│ Dev Spoke (RG) │ │ Prod Spoke (RG) │
│ - ML Workspace Dev │ │ - ML Workspace Prod │
│ - Dev Storage │ │ - Prod Storage │
│ - Dev Compute │ │ - Prod Compute │
└──────────────────────┘ └──────────────────────┘
IaC-tilnærming:
- Separate Terraform modules/Bicep modules per spoke
- Shared hub deployed first
- Spoke deployments reference hub resources via remote state (Terraform) eller parameters (Bicep)
- Azure Blueprints eller Terraform workspaces for consistency
Terraform quickstart templates (fra Azure/terraform repo):
Beslutningsveiledning
Når velge Bicep vs. Terraform vs. ARM templates?
| Scenario | Anbefalt verktøy | Begrunnelse |
|---|---|---|
| Ren Azure-only MLOps | Bicep | Native support, enklere syntax enn ARM, tett integrasjon med Azure CLI |
| Multi-cloud (Azure + AWS/GCP) | Terraform | Eneste verktøy som støtter alle clouds konsistent |
| Eksisterende DevOps-pipeline med JSON | ARM templates | Kompatibilitet, men vurder Bicep migration |
| Stor existing Terraform codebase | Terraform | Konsistens, unngå verktøy-proliferasjon |
| Norge offentlig med Direktoratet-krav | Bicep | Microsofts native løsning, enklere audit trail |
Når deploye IaC via Azure DevOps vs. GitHub Actions?
| Kriterium | Azure DevOps | GitHub Actions |
|---|---|---|
| Team allerede bruker ADO | ✅ Foretrekk ADO | Konsistens |
| Open source prosjekt | ✅ Foretrekk GitHub | Community visibility |
| Enterprise governance (offentlig sektor) | ✅ Foretrekk ADO | Bedre integrasjon med Azure RBAC, compliance |
| Terraform state management | Begge støtter Azure Storage backend | — |
| Cost | Gratis for small teams (both) | — |
Deployment pipeline-integrasjon
Azure DevOps pipeline (YAML):
trigger:
branches:
include:
- main
paths:
include:
- infrastructure/*
stages:
- stage: DeployInfrastructure
jobs:
- job: DeployBicep
steps:
- task: AzureCLI@2
inputs:
azureSubscription: 'Azure-Service-Connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
az deployment group create \
--resource-group $(resourceGroupName) \
--template-file infrastructure/main.bicep \
--parameters infrastructure/parameters/prod.bicepparam
GitHub Actions workflow:
name: Deploy ML Infrastructure
on:
push:
branches: [main]
paths:
- 'infrastructure/**'
jobs:
deploy-infra:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: azure/login@v1
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: Deploy Bicep
run: |
az deployment group create \
--resource-group ${{ vars.RG_NAME }} \
--template-file infrastructure/main.bicep \
--parameters environment=prod
Best practice: Bruk separate pipelines for infrastructure (IaC) og ML-kode. Infrastructure skal endre sjeldent, ML-kode oftere.
Integrasjon med Microsoft-stakken
1. Azure ML CLI v2 integration
IaC provisjonerer workspace, men ML assets (environments, datasets, components) deployes via Azure ML CLI:
# Workspace provisjonert via Bicep/Terraform
# Deploy ML environment til workspace
az ml environment create --file environments/training-env.yml \
--resource-group $RG_NAME \
--workspace-name $WORKSPACE_NAME
Separation of concerns:
- IaC (Bicep/Terraform): Infrastructure (workspace, compute, networking)
- Azure ML CLI: ML-spesifikke assets (environments, pipelines, models)
- CI/CD pipelines: Orchestration av begge
2. Azure Policy integration
Enforce IaC compliance via Azure Policy:
Eksempel: Krev private endpoints for nye workspaces
{
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.MachineLearningServices/workspaces"
},
{
"field": "Microsoft.MachineLearningServices/workspaces/publicNetworkAccess",
"equals": "Enabled"
}
]
},
"then": {
"effect": "deny"
}
}
Norge offentlig: Azure Policy brukes ofte for å enforces NSM-krav og Difis retningslinjer. Kombiner med IaC-templates som default er compliant.
3. Azure Blueprints for governance
Azure Blueprints pakker IaC (ARM templates) med policies og RBAC assignments:
Blueprint for ML workspace:
Blueprint: "Secure-ML-Workspace"
├── Artifacts:
│ ├── ARM template: workspace.json
│ ├── Policy assignment: "Require private endpoints"
│ ├── RBAC assignment: "ML Engineers → Contributor"
│ └── RBAC assignment: "Data Scientists → AzureML Data Scientist"
Blueprints sikrer at hver gang et nytt workspace opprettes, får det automatisk riktig policies og permissions.
4. Terraform Azure Provider for ML
Provider konfigurasjon:
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = ">= 3.0, < 4.0"
}
}
}
provider "azurerm" {
features {
key_vault {
purge_soft_delete_on_destroy = false
}
resource_group {
prevent_deletion_if_contains_resources = false
}
}
}
Resource providers som må registreres:
| Provider | Formål |
|---|---|
Microsoft.MachineLearningServices |
Azure ML workspace |
Microsoft.Storage |
Storage account |
Microsoft.KeyVault |
Key vault |
Microsoft.ContainerRegistry |
Container registry |
Microsoft.Insights |
Application Insights |
Microsoft.Network |
VNet, private endpoints |
Common error:
No registered resource provider found for location— løses ved å manuelt registrere providers viaaz provider register --namespace Microsoft.MachineLearningServices.
Offentlig sektor (Norge)
Utredningsinstruksen-krav (§ 7)
Når IaC brukes i statlige AI-prosjekter:
Beslutningspunkt 1: Valg av IaC-verktøy
- Alternativ A: Bicep (Microsoft native)
- Alternativ B: Terraform (multi-cloud)
- Vurdering: Bicep anbefales for offentlig sektor fordi det eliminerer vendor lock-in-bekymringer (open source, Microsoft-støttet), samtidig som det har tettere Azure-integrasjon.
Beslutningspunkt 2: Deployment-strategi
- Alternativ A: Manuell
az deploymentfra lokal maskin - Alternativ B: Automatisert via Azure DevOps pipelines
- Vurdering: B er obligatorisk for produksjon (sporbarhet, compliance), men A er akseptabelt for dev/test.
Difis krav til etterprøvbarhet
IaC bidrar direkte til etterprøvbarhet:
- Versjonskontroll (Git): Alle infrastrukturendringer er tracket
- Pull request-prosess: Peer review før deployment
- Deployment logs: Azure Activity Log + pipeline logs gir full audit trail
Eksempel på etterprøvbar deployment:
# 1. Commit IaC endringer til Git
git add infrastructure/main.bicep
git commit -m "feat(infra): add private endpoint for storage account"
# 2. Create PR for review
gh pr create --title "Add storage private endpoint" --body "Implements NSM requirement X"
# 3. After approval, pipeline deploys
# Azure Activity Log captures deployment event with:
# - Timestamp
# - User/service principal
# - Resource changes
# - Compliance status
NSMs grunnprinsipper for IaC
| NSM-prinsipp | IaC-implementering |
|---|---|
| Identifisere og kartlegge | Alle ressurser definert eksplisitt i IaC (ingen "shadow IT") |
| Beskytte | Network isolation via VNet-konfigurert i IaC |
| Oppdage | Azure Policy + Azure Monitor konfigurert via IaC |
| Begrense og kontrollere | RBAC definert i IaC (principle of least privilege) |
DPIA-relevante IaC-konfigurasjoner
Når IaC brukes for AI-systemer som behandler persondata:
Data residency (datalagring i Norge):
param location string = 'norwayeast' // Enforce Norwegian data center
resource storage 'Microsoft.Storage/storageAccounts@2023-01-01' = {
name: storageAccountName
location: location // Data stays in Norway
properties: {
allowBlobPublicAccess: false
minimumTlsVersion: 'TLS1_2'
}
}
Encryption at rest (GDPR Article 32):
resource mlWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = {
properties: {
encryption: {
status: 'Enabled'
keyVaultProperties: {
keyVaultArmId: keyVault.id
keyIdentifier: '${keyVault.properties.vaultUri}keys/ml-encryption-key'
}
}
}
}
DPIA-dokumentasjon: IaC-filene selv blir del av DPIA-dokumentasjonen fordi de beviser hvordan tekniske sikkerhetstiltak er implementert.
Kostnad og lisensiering
IaC-verktøy kostnader
IaC Design for MLOps — Azure Well-Architected (OE:05) 2026
Core principle (Well-Architected OE:05): Standardized IaC approach with declarative syntax, consistent styles, appropriate modularization, quality assurance.
Declarative over imperative (recommended):
- Bicep / ARM templates: Azure-native, JSON/DSL declarative
- Terraform: Industry-standard, multi-cloud declarative
- Avoid: imperative scripts for infrastructure state management
Azure-native tools:
# Bicep — deploy Azure ML workspace
az deployment group create --template-file ml-workspace.bicep
# Terraform — integrated into GitHub Actions / Azure Pipelines
terraform init && terraform apply
Layered IaC pipeline approach for MLOps:
- Low-touch (networking, VNet, ACR): Rarely changes, stable baseline
- Medium-touch (compute clusters, storage, AKS): Occasional changes
- High-touch (model endpoints, deployments): Continuous delivery
IaC best practices:
- Treat IaC artifacts the same as application code (version control, PR reviews, testing)
- Use parameters/variables for multi-environment support (dev/test/prod)
- Collocate IaC with application code for synchronized deployments
- Scan IaC repos for secrets (Microsoft Defender for Cloud: IaC vulnerability scanning)
- Immutable infrastructure preferred for business-critical workloads
AI opportunity (Verified MCP 2026-04): AI tools (GitHub Copilot) can review IaC templates for misconfigurations, suggest secure alternatives, and generate templates from natural language. Generative AI can analyze IaC templates and architectural diagrams, generate threat models, and recommend IaC updates from pull requests. Agent-based solutions can infer infrastructure needs from code and generate PRs with recommended IaC changes.
MLOps v2 infrastructure: tf-gha-deploy-infra.yml workflow in Azure/mlops-v2-gha-demo deploys full Azure ML infrastructure via Terraform + GitHub Actions.
| Verktøy | Lisens | Kostnad |
|---|---|---|
| Bicep | Open source (MIT) | Gratis |
| ARM templates | Microsoft-provided | Gratis |
| Terraform | Open source (MPL 2.0) | Gratis (OSS version) |
| Terraform Cloud | Proprietary | Gratis for <5 users, deretter $20/user/mnd |
Anbefaling for Norge offentlig: Bruk open source Terraform (ikke Cloud) eller Bicep for å unngå vendor lock-in og lisenskostnader.
Azure-ressurser provisjonert via IaC
Dev/test workspace (minimal):
- Storage Account (GRS, 100 GB): ~100 NOK/mnd
- Key Vault (standard): ~5 NOK/mnd
- Container Registry (Basic): ~50 NOK/mnd
- Application Insights (5 GB/mnd): Gratis
- Total: ~155 NOK/mnd (kun infrastruktur, ingen compute)
Prod workspace (secure, VNet-isolated):
- Storage Account (GRS, 1 TB, private endpoint): ~750 NOK/mnd
- Key Vault (premium, HSM-backed): ~450 NOK/mnd
- Container Registry (Premium, geo-replication): ~750 NOK/mnd
- Application Insights (50 GB/mnd): ~200 NOK/mnd
- Private endpoints (4x): ~200 NOK/mnd
- VNet + NAT Gateway: ~300 NOK/mnd
- Total: ~2650 NOK/mnd (kun infrastruktur)
Kostnadsoptimalisering via IaC:
- Auto-shutdown scripts for dev compute (via Terraform
azurerm_machine_learning_compute_clusterscale settings) - Lifecycle policies for storage (move old training data to Cool tier)
- Conditional deployment (deploy expensive resources kun i prod)
Bicep eksempel: Dev vs. Prod SKU:
param environment string = 'dev'
resource containerRegistry 'Microsoft.ContainerRegistry/registries@2023-01-01' = {
name: acrName
sku: {
name: environment == 'prod' ? 'Premium' : 'Basic' // Cost optimization
}
}
Azure Hybrid Benefit for Windows VMs
Hvis du bruker IaC til å deploye Windows-baserte compute instances (t.ex. DSVM):
resource "azurerm_linux_virtual_machine" "dsvm" {
name = "dsvm-${var.environment}"
license_type = "Windows_Server" # Enables Azure Hybrid Benefit
# ... (rest of config)
}
Dette kan spare opptil 40% på VM-kostnader hvis du har eksisterende Windows Server-lisenser.
For arkitekten (Cosmo)
Tekniske avklaringsspørsmål
Før du designer IaC-løsningen, avklar:
-
Deployment scope:
- Single workspace eller multi-workspace (hub-and-spoke)?
- Shared services (t.ex. felles Container Registry)?
-
Network isolation:
- Public network access OK (dev/test)?
- Private endpoints påkrevd (prod/HBI data)?
- Eksisterende VNet som må integreres?
-
Compliance og governance:
- Norsk offentlig sektor med NSM-krav?
- GDPR/persondata (krever encryption at rest med customer-managed keys)?
- Audit trail-krav fra Utredningsinstruksen?
-
Team capabilities:
- Har teamet Terraform-erfaring?
- Foretrekker de Azure-native verktøy (Bicep)?
- CI/CD-plattform: Azure DevOps eller GitHub?
-
Eksisterende infrastruktur:
- Greenfield (nytt miljø fra scratch)?
- Brownfield (må integrere med existing VNet, policies)?
- Hybrid (on-premises + cloud)?
Designprinsipper
1. Modularitet over monolitt
❌ IKKE: En gigantisk main.bicep på 2000 linjer
✅ JA: Separate modules (network.bicep, workspace.bicep, compute.bicep)
2. Parameterisering for gjenbruk
// Bruk parameters for alt som varierer mellom miljøer
param environment string // dev, test, prod
param location string
param enablePrivateEndpoint bool = environment == 'prod' // Conditional logic
3. Versjonskontroll av API-versjoner
// Pin API versions eksplisitt, ikke bruk 'latest'
resource workspace 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = {
// ... (config)
}
Dette sikrer at deployments er reproducerbare — latest kan endre oppførsel over tid.
4. Idempotens-testing
# Test at samme deployment kan kjøres flere ganger uten feil
az deployment group create --template-file main.bicep --parameters prod.bicepparam
# Kjør igjen — skal ikke feile eller endre noe
az deployment group create --template-file main.bicep --parameters prod.bicepparam
5. Fail-fast validation
# Valider Bicep syntaks før deployment
az bicep build --file main.bicep
# Dry-run med what-if
az deployment group what-if \
--resource-group mlops-prod-rg \
--template-file main.bicep \
--parameters prod.bicepparam
Vanlige fallgruver
Fallgruve 1: Hardkoded verdier
❌ IKKE:
resource storage 'Microsoft.Storage/storageAccounts@2023-01-01' = {
name: 'mlstorageprod123' // Hardcoded, ikke unique
}
✅ JA:
param storageNamePrefix string = 'mlstorage'
resource storage 'Microsoft.Storage/storageAccounts@2023-01-01' = {
name: '${storageNamePrefix}${uniqueString(resourceGroup().id)}'
}
Fallgruve 2: Manglende resource provider-registrering
# Error: "No registered resource provider found for Microsoft.MachineLearningServices"
# Fix:
az provider register --namespace Microsoft.MachineLearningServices
az provider register --namespace Microsoft.Storage
az provider register --namespace Microsoft.KeyVault
Fallgruve 3: ACR tasks med private endpoints
Når både ACR og Azure ML har private endpoints, kan du IKKE bruke ACR tasks for image building. Du MÅ definere en compute cluster:
resource mlWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = {
properties: {
publicNetworkAccess: 'Disabled'
imageBuildCompute: 'image-builder-cluster' // ← OBLIGATORISK
}
}
resource imageBuilderCluster 'Microsoft.MachineLearningServices/workspaces/computes@2024-01-01-preview' = {
parent: mlWorkspace
name: 'image-builder-cluster'
properties: {
computeType: 'AmlCompute'
properties: {
vmSize: 'Standard_DS2_v2'
scaleSettings: {
minNodeCount: 0
maxNodeCount: 3
}
}
}
}
Fallgruve 4: Purge protection på Key Vault
Hvis du deployer og sletter workspaces ofte (dev/test), kan soft-deleted Key Vaults blokkere re-deployment:
resource "azurerm_key_vault" "default" {
purge_protection_enabled = false // ← Kun for dev/test!
# Prod skal alltid ha purge_protection_enabled = true
}
Fallgruve 5: Manglende RBAC for managed identity
Når workspace bruker managed identity for å aksessere Storage/KV, må du tildele RBAC-roller:
// Grant Storage Blob Data Contributor til workspace managed identity
resource storageRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(storage.id, mlWorkspace.id, 'Storage Blob Data Contributor')
scope: storage
properties: {
roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe')
principalId: mlWorkspace.identity.principalId
principalType: 'ServicePrincipal'
}
}
Integrasjon med ML lifecycle
IaC er IKKE statisk — det skal evolve med ML-prosjektet:
| ML-fase | IaC-aktivitet |
|---|---|
| Prototyping | Deploy minimal dev workspace (public network, Basic SKU) |
| Experimentation | Add compute clusters via IaC, scale up storage |
| Training at scale | Deploy prod workspace (private endpoints, Premium SKU) |
| Model deployment | Add managed online endpoints via IaC/Azure ML CLI |
| Monitoring | Integrate Application Insights alerts via IaC |
| Retraining | Scheduled pipelines trigger IaC updates (t.ex. nye compute resources) |
GitOps workflow:
Developer → Commits IaC changes → PR review → CI pipeline validates
→ Merge to main → CD pipeline deploys to prod → Azure Policy checks compliance
Anti-patterns å unngå
-
"ClickOps" — Manually creating resources via Azure Portal
- Hvorfor dårlig: Ingen versjonskontroll, ikke reproducerbart
- Fix: Alt via IaC, bruk Portal kun for inspeksjon
-
Monolithic IaC — One massive file for entire environment
- Hvorfor dårlig: Vanskelig å vedlikeholde, slow deployments
- Fix: Modularize (workspace, network, compute som separate modules)
-
Secrets i IaC — Hardcoding API keys eller passwords
- Hvorfor dårlig: Security risk, feilet audit
- Fix: Bruk Key Vault references eller managed identities
-
Ingen testing — Deploy direkt til prod uten validation
- Hvorfor dårlig: Downtime, compliance violations
- Fix: Dev → Test → Prod miljøer,
az deployment what-iffør prod
-
Manual state management (Terraform) — Local state file
- Hvorfor dårlig: Team collaboration issues, lost state = lost infrastructure
- Fix: Azure Storage backend for Terraform state
terraform {
backend "azurerm" {
resource_group_name = "tfstate-rg"
storage_account_name = "tfstatestorage"
container_name = "tfstate"
key = "mlops.terraform.tfstate"
}
}
Anbefalte ressurser for dypdykk
Microsoft Learn paths:
- Infrastructure as Code on Azure
- Manage Azure Machine Learning workspaces with Terraform
- Create Azure ML hub workspace using Bicep
GitHub repositories:
- Azure/azure-quickstart-templates — Official Bicep templates
- Azure/terraform — Terraform quickstarts for Azure ML
- Azure/mlops-v2 — End-to-end MLOps solution accelerator
Terraform Registry:
Azure Verified Modules (AVM):
- avm/res/machine-learning-services/workspace — Community-maintained Bicep modules
Kilder og verifisering
Denne kunnskapsreferansen er basert på følgende verifiserte kilder (hentet 2026-04):
-
Microsoft Learn - What is Infrastructure as Code (IaC)?
- URL: https://learn.microsoft.com/devops/deliver/what-is-infrastructure-as-code
- Beskrivelse: Fundamental IaC-konsepter, idempotens, deklarativ vs. imperativ
- Confidence: VERY_HIGH
-
Microsoft Learn - Manage Azure Machine Learning workspaces using Terraform
- URL: https://learn.microsoft.com/azure/machine-learning/how-to-manage-workspace-terraform
- Beskrivelse: Komplett guide til Terraform for Azure ML, inkludert public/private network configs
- Confidence: VERY_HIGH
-
Microsoft Learn - Create Azure ML hub workspace using Bicep template
- URL: https://learn.microsoft.com/azure/machine-learning/how-to-manage-hub-workspace-template
- Beskrivelse: Bicep-basert deployment, modular struktur, API-versjoner
- Confidence: VERY_HIGH
-
Microsoft Learn - Set up MLOps with Azure DevOps
- URL: https://learn.microsoft.com/azure/machine-learning/how-to-setup-mlops-azureml
- Beskrivelse: End-to-end MLOps med IaC deployment via Azure Pipelines
- Confidence: VERY_HIGH
-
Microsoft Learn - Machine Learning Operations (MLOps) concepts
- URL: https://learn.microsoft.com/azure/aks/concepts-machine-learning-ops
- Beskrivelse: IaC som MLOps-praksis, integrasjon med CI/CD
- Confidence: VERY_HIGH
-
Azure Architecture Center - Machine Learning Operations v2
- URL: https://learn.microsoft.com/azure/architecture/ai-ml/guide/machine-learning-operations-v2
- Beskrivelse: MLOps-arkitektur med Azure Pipelines og IaC
- Confidence: HIGH
-
Azure Well-Architected Framework - Infrastructure as Code design
- URL: https://learn.microsoft.com/azure/well-architected/operational-excellence/infrastructure-as-code-design
- Beskrivelse: Best practices for IaC-design, modularization, declarative tools
- Confidence: VERY_HIGH
-
GitHub - Azure/azure-quickstart-templates
- URL: https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.machinelearningservices/aifoundry-basics
- Beskrivelse: Official Bicep templates for Azure ML workspace deployment
- Confidence: VERY_HIGH
-
GitHub - Azure/terraform (quickstart templates)
- URL: https://github.com/Azure/terraform/tree/master/quickstart
- Beskrivelse: 101, 201, 301 Terraform templates for Azure ML (basic, secure, hub-spoke)
- Confidence: VERY_HIGH
-
Terraform Registry - azurerm_machine_learning_workspace
- URL: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/machine_learning_workspace
- Beskrivelse: Official Terraform provider documentation for Azure ML
- Confidence: VERY_HIGH
MCP-research metadata:
- microsoft_docs_search calls: 4
- microsoft_docs_fetch calls: 3
- microsoft_code_sample_search calls: 1
- Total sources: 10
- Dato for research: 2026-04
Confidence levels:
- VERY_HIGH: Offisiell Microsoft-dokumentasjon, verifiserte code samples
- HIGH: Azure Architecture Center (best practices, ikke produkt-spesifikk)
Verifisering: Alle kodeeksempler er hentet fra official Microsoft Learn eller GitHub repos under Azure-organisasjonen. Bicep/Terraform-syntaks er verifisert mot latest provider versions (azurerm 3.x for Terraform, 2024-01-01-preview API for Bicep).
Oppdatert: 2026-04 Neste review: 2026-07-04 (eller når Azure ML API major version oppdateres)