# Infrastructure as Code for MLOps **Dato:** 2026-02-04 **Kategori:** MLOps & GenAIOps **Forfatter:** Cosmo Skyberg, Senior Microsoft AI Solution Architect ## Introduksjon Infrastructure as Code (IaC) er en fundamental MLOps-praksis der infrastruktur defineres og deployes gjennom kode fremfor manuelle konfigurasjoner. Dette er kritisk viktig for AI/ML-prosjekter fordi det sikrer reproducerbarhet, konsistens og versjonskontroll av hele ML-miljøet — fra development til production. **Hvorfor IaC er essensielt for MLOps:** - **Eliminerer "snowflake environments"** — manuelt konfigurerte miljøer som ikke kan reproduseres - **Idempotens** — samme deployment-kommando gir alltid samme resultat, uavhengig av starttilstand - **Versjonskontroll** — infrastruktur behandles som kode og lagres i Git - **Rask provisjonering av testmiljøer** — on-demand scaling av ML-compute og workspace-ressurser - **Auditspor og compliance** — alle infrastrukturendringer er sporbare > **Confidence: VERY_HIGH** — IaC er en core DevOps/MLOps-praksis dokumentert grundig i Microsoft Learn og Azure Well-Architected Framework. ## Kjernekomponenter ### 1. Deklarative vs. imperative IaC-verktøy IaC-verktøy kategoriseres i to hovedtyper: **Deklarative verktøy** (anbefalt for MLOps): - **Bicep** — Microsoft sitt domain-specific language (DSL) for Azure, kompilerer til ARM templates - **ARM templates (JSON)** — Azure Resource Manager templates, native Azure-format - **Terraform** — multi-cloud IaC-verktøy med Azure provider **Imperative verktøy:** - **Azure CLI scripts** — bash/PowerShell-scripts med `az` kommandoer - **PowerShell DSC** — for VM-konfigurasjon > **Anbefaling:** Bruk deklarative verktøy (Bicep/Terraform) for infrastruktur, Azure CLI for orchestration i pipelines. ### 2. Azure Machine Learning workspace-ressurser En Azure ML workspace krever flere **associated resources** som må provisjoneres: | Ressurs | Formål | IaC-krav | |---------|--------|----------| | **Azure ML Workspace** | Sentral hub for ML-arbeid | `Microsoft.MachineLearningServices/workspaces` | | **Storage Account** | Data, modeller, artifacts | `Microsoft.Storage/storageAccounts` | | **Key Vault** | Secrets, credentials | `Microsoft.KeyVault/vaults` | | **Application Insights** | Monitoring, telemetry | `Microsoft.Insights/components` | | **Container Registry** | Docker images for miljøer | `Microsoft.ContainerRegistry/registries` | | **Compute resources** | Training/inference compute | Compute clusters, instances, endpoints | **Viktig:** Disse ressursene kan opprettes automatisk ved workspace creation, men for produksjon bør de defineres eksplisitt i IaC for full kontroll over networking, RBAC og compliance. ### 3. Bicep-basert IaC for Azure ML **Eksempel: Minimal Azure ML workspace** ```bicep resource aiResource 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = { name: workspaceName location: location identity: { type: 'SystemAssigned' } properties: { friendlyName: workspaceName keyVault: keyVault.id storageAccount: storage.id applicationInsights: appInsights.id containerRegistry: containerRegistry.id publicNetworkAccess: 'Enabled' } } ``` **Modular Bicep-struktur** (best practice): ``` /infrastructure ├── main.bicep # Hovedfil med parameters og orchestration ├── modules/ │ ├── ai-hub.bicep # Azure ML workspace │ ├── dependent-resources.bicep # Storage, KV, ACR, AppInsights │ ├── networking.bicep # VNet, subnets, private endpoints │ └── compute.bicep # Compute clusters └── parameters/ ├── dev.bicepparam └── prod.bicepparam ``` > **Confidence: VERY_HIGH** — Dette følger official Azure quickstart templates for Azure ML (github.com/Azure/azure-quickstart-templates). ### 4. Terraform-basert IaC for Azure ML **Eksempel: Public network workspace** ```terraform resource "azurerm_machine_learning_workspace" "default" { name = "${random_pet.prefix.id}-mlw" location = azurerm_resource_group.default.location resource_group_name = azurerm_resource_group.default.name application_insights_id = azurerm_application_insights.default.id key_vault_id = azurerm_key_vault.default.id storage_account_id = azurerm_storage_account.default.id container_registry_id = azurerm_container_registry.default.id public_network_access_enabled = true identity { type = "SystemAssigned" } } ``` **Terraform workflow:** ```bash # Initialiser Terraform providers terraform init # Plan deployment (dry-run) terraform plan -out ml-workspace.tfplan # Apply deployment terraform apply ml-workspace.tfplan ``` **Terraform vs. Bicep:** | Kriterium | Terraform | Bicep | |-----------|-----------|-------| | Multi-cloud | ✅ Støtter AWS, GCP, Azure | ❌ Kun Azure | | Learning curve | Moderat (HCL syntax) | Lav (JSON-liknende) | | State management | Requires state file (remote backend) | Ingen state file (ARM managed) | | Community modules | Stor ecosystem | Mindre, men voksende | | Azure integration | Via provider | Native, first-class | > **For Norge offentlig:** Bicep er ofte foretrukket fordi det er Microsofts native løsning med tett integrasjon med Azure governance-verktøy (Policy, Blueprints). ### 5. Private network (VNet-isolated) workspaces For sikkerhetskritiske miljøer må workspace isoleres i et VNet med private endpoints: **Bicep-konfigurasjon:** ```bicep resource mlWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = { name: workspaceName location: location properties: { publicNetworkAccess: 'Disabled' imageBuildCompute: 'image-builder-cluster' // Required for ACR private endpoint } } resource privateEndpoint 'Microsoft.Network/privateEndpoints@2023-04-01' = { name: 'ple-${workspaceName}' location: location properties: { subnet: { id: workspaceSubnet.id } privateLinkServiceConnections: [{ name: 'psc-${workspaceName}' properties: { privateLinkServiceId: mlWorkspace.id groupIds: ['amlworkspace'] } }] } } ``` **Viktig:** Når både ACR og Azure ML har private endpoints, kan du IKKE bruke ACR tasks for image building. Du må definere en compute cluster for dette formålet via `imageBuildCompute` property. > **Confidence: HIGH** — Dokumentert i Azure ML docs, men private endpoint-konfigurasjon krever nøye testing per scenario. ## Arkitekturmønstre ### 1. Basic workspace pattern (development) **Bruk:** Utforskning, prototyping, ikke-sensitiv data ``` ┌─────────────────────────────────────────┐ │ Resource Group │ │ ┌───────────────────────────────────┐ │ │ │ Azure ML Workspace │ │ │ │ - Public network access │ │ │ │ - System-assigned identity │ │ │ └───────────────────────────────────┘ │ │ ┌───────────────────────────────────┐ │ │ │ Dependent Resources │ │ │ │ - Storage Account (GRS) │ │ │ │ - Key Vault (standard) │ │ │ │ - Container Registry (basic) │ │ │ │ - Application Insights │ │ │ └───────────────────────────────────┘ │ └─────────────────────────────────────────┘ ``` **IaC-tilnærming:** - Single `main.bicep` eller `workspace.tf` file - Parameter files for dev/test/staging - Deploy via Azure CLI/Terraform CLI ### 2. Secure workspace pattern (production) **Bruk:** Produksjon, HBI (High Business Impact) data, compliance ``` ┌────────────────────────────────────────────────┐ │ Resource Group │ │ ┌──────────────────────────────────────────┐ │ │ │ VNet (10.0.0.0/16) │ │ │ │ ├─ Subnet: training (10.0.1.0/24) │ │ │ │ ├─ Subnet: workspace (10.0.0.0/24) │ │ │ │ └─ Subnet: endpoints (10.0.2.0/24) │ │ │ └──────────────────────────────────────────┘ │ │ ┌──────────────────────────────────────────┐ │ │ │ Azure ML Workspace │ │ │ │ - Public access: DISABLED │ │ │ │ - Private endpoint in workspace subnet │ │ │ │ - Managed identity + RBAC │ │ │ └──────────────────────────────────────────┘ │ │ ┌──────────────────────────────────────────┐ │ │ │ Private endpoints for: │ │ │ │ - Storage (blob + file) │ │ │ │ - Key Vault │ │ │ │ - Container Registry │ │ │ └──────────────────────────────────────────┘ │ │ ┌──────────────────────────────────────────┐ │ │ │ Private DNS Zones │ │ │ │ - privatelink.api.azureml.ms │ │ │ │ - privatelink.notebooks.azure.net │ │ │ │ - privatelink.blob.core.windows.net │ │ │ │ - privatelink.vaultcore.azure.net │ │ │ └──────────────────────────────────────────┘ │ └────────────────────────────────────────────────┘ ``` **IaC-tilnærming:** - Modular Bicep/Terraform med separate network.bicep/network.tf - Managed identities for all services (ingen keys i config) - Azure Policy enforcement for network isolation - Private DNS zones for name resolution > **Norge offentlig:** Følg NSMs grunnprinsipper for nettverkssegmentering. Private endpoints er ofte påkrevd for data klassifisert som begrenset/fortrolig. ### 3. Hub-and-spoke pattern (multi-environment) **Bruk:** Enterprise-scale med delte services og multiple workspaces ``` ┌──────────────────────────────────────────────────┐ │ Hub Resource Group │ │ ├─ Shared Container Registry │ │ ├─ Shared Key Vault (certificates) │ │ ├─ Azure Firewall / VPN Gateway │ │ └─ Monitoring (Log Analytics, App Insights) │ └──────────────────────────────────────────────────┘ │ VNet peering ├────────────────────────────┬────────── │ │ ┌──────────▼───────────┐ ┌───────────▼──────────┐ │ Dev Spoke (RG) │ │ Prod Spoke (RG) │ │ - ML Workspace Dev │ │ - ML Workspace Prod │ │ - Dev Storage │ │ - Prod Storage │ │ - Dev Compute │ │ - Prod Compute │ └──────────────────────┘ └──────────────────────┘ ``` **IaC-tilnærming:** - Separate Terraform modules/Bicep modules per spoke - Shared hub deployed first - Spoke deployments reference hub resources via remote state (Terraform) eller parameters (Bicep) - Azure Blueprints eller Terraform workspaces for consistency **Terraform quickstart templates (fra Azure/terraform repo):** - [101: Basic workspace](https://github.com/Azure/terraform/tree/master/quickstart/101-machine-learning) - [201: Moderately secure (VNet isolation)](https://github.com/Azure/terraform/tree/master/quickstart/201-machine-learning-moderately-secure) - [301: Hub-and-spoke with firewall](https://github.com/azure/terraform/tree/master/quickstart/301-machine-learning-hub-spoke-secure) ## Beslutningsveiledning ### Når velge Bicep vs. Terraform vs. ARM templates? | Scenario | Anbefalt verktøy | Begrunnelse | |----------|------------------|-------------| | Ren Azure-only MLOps | **Bicep** | Native support, enklere syntax enn ARM, tett integrasjon med Azure CLI | | Multi-cloud (Azure + AWS/GCP) | **Terraform** | Eneste verktøy som støtter alle clouds konsistent | | Eksisterende DevOps-pipeline med JSON | **ARM templates** | Kompatibilitet, men vurder Bicep migration | | Stor existing Terraform codebase | **Terraform** | Konsistens, unngå verktøy-proliferasjon | | Norge offentlig med Direktoratet-krav | **Bicep** | Microsofts native løsning, enklere audit trail | ### Når deploye IaC via Azure DevOps vs. GitHub Actions? | Kriterium | Azure DevOps | GitHub Actions | |-----------|--------------|----------------| | Team allerede bruker ADO | ✅ Foretrekk ADO | Konsistens | | Open source prosjekt | ✅ Foretrekk GitHub | Community visibility | | Enterprise governance (offentlig sektor) | ✅ Foretrekk ADO | Bedre integrasjon med Azure RBAC, compliance | | Terraform state management | Begge støtter Azure Storage backend | — | | Cost | Gratis for small teams (both) | — | ### Deployment pipeline-integrasjon **Azure DevOps pipeline (YAML):** ```yaml trigger: branches: include: - main paths: include: - infrastructure/* stages: - stage: DeployInfrastructure jobs: - job: DeployBicep steps: - task: AzureCLI@2 inputs: azureSubscription: 'Azure-Service-Connection' scriptType: 'bash' scriptLocation: 'inlineScript' inlineScript: | az deployment group create \ --resource-group $(resourceGroupName) \ --template-file infrastructure/main.bicep \ --parameters infrastructure/parameters/prod.bicepparam ``` **GitHub Actions workflow:** ```yaml name: Deploy ML Infrastructure on: push: branches: [main] paths: - 'infrastructure/**' jobs: deploy-infra: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - uses: azure/login@v1 with: creds: ${{ secrets.AZURE_CREDENTIALS }} - name: Deploy Bicep run: | az deployment group create \ --resource-group ${{ vars.RG_NAME }} \ --template-file infrastructure/main.bicep \ --parameters environment=prod ``` > **Best practice:** Bruk separate pipelines for infrastructure (IaC) og ML-kode. Infrastructure skal endre sjeldent, ML-kode oftere. ## Integrasjon med Microsoft-stakken ### 1. Azure ML CLI v2 integration IaC provisjonerer workspace, men **ML assets** (environments, datasets, components) deployes via Azure ML CLI: ```bash # Workspace provisjonert via Bicep/Terraform # Deploy ML environment til workspace az ml environment create --file environments/training-env.yml \ --resource-group $RG_NAME \ --workspace-name $WORKSPACE_NAME ``` **Separation of concerns:** - **IaC (Bicep/Terraform):** Infrastructure (workspace, compute, networking) - **Azure ML CLI:** ML-spesifikke assets (environments, pipelines, models) - **CI/CD pipelines:** Orchestration av begge ### 2. Azure Policy integration Enforce IaC compliance via Azure Policy: **Eksempel: Krev private endpoints for nye workspaces** ```json { "if": { "allOf": [ { "field": "type", "equals": "Microsoft.MachineLearningServices/workspaces" }, { "field": "Microsoft.MachineLearningServices/workspaces/publicNetworkAccess", "equals": "Enabled" } ] }, "then": { "effect": "deny" } } ``` > **Norge offentlig:** Azure Policy brukes ofte for å enforces NSM-krav og Difis retningslinjer. Kombiner med IaC-templates som default er compliant. ### 3. Azure Blueprints for governance Azure Blueprints pakker IaC (ARM templates) med policies og RBAC assignments: **Blueprint for ML workspace:** ``` Blueprint: "Secure-ML-Workspace" ├── Artifacts: │ ├── ARM template: workspace.json │ ├── Policy assignment: "Require private endpoints" │ ├── RBAC assignment: "ML Engineers → Contributor" │ └── RBAC assignment: "Data Scientists → AzureML Data Scientist" ``` Blueprints sikrer at hver gang et nytt workspace opprettes, får det automatisk riktig policies og permissions. ### 4. Terraform Azure Provider for ML **Provider konfigurasjon:** ```terraform terraform { required_providers { azurerm = { source = "hashicorp/azurerm" version = ">= 3.0, < 4.0" } } } provider "azurerm" { features { key_vault { purge_soft_delete_on_destroy = false } resource_group { prevent_deletion_if_contains_resources = false } } } ``` **Resource providers som må registreres:** | Provider | Formål | |----------|--------| | `Microsoft.MachineLearningServices` | Azure ML workspace | | `Microsoft.Storage` | Storage account | | `Microsoft.KeyVault` | Key vault | | `Microsoft.ContainerRegistry` | Container registry | | `Microsoft.Insights` | Application Insights | | `Microsoft.Network` | VNet, private endpoints | > **Common error:** `No registered resource provider found for location` — løses ved å manuelt registrere providers via `az provider register --namespace Microsoft.MachineLearningServices`. ## Offentlig sektor (Norge) ### Utredningsinstruksen-krav (§ 7) Når IaC brukes i statlige AI-prosjekter: **Beslutningspunkt 1: Valg av IaC-verktøy** - **Alternativ A:** Bicep (Microsoft native) - **Alternativ B:** Terraform (multi-cloud) - **Vurdering:** Bicep anbefales for offentlig sektor fordi det eliminerer vendor lock-in-bekymringer (open source, Microsoft-støttet), samtidig som det har tettere Azure-integrasjon. **Beslutningspunkt 2: Deployment-strategi** - **Alternativ A:** Manuell `az deployment` fra lokal maskin - **Alternativ B:** Automatisert via Azure DevOps pipelines - **Vurdering:** B er obligatorisk for produksjon (sporbarhet, compliance), men A er akseptabelt for dev/test. ### Difis krav til etterprøvbarhet IaC bidrar direkte til etterprøvbarhet: - **Versjonskontroll (Git):** Alle infrastrukturendringer er tracket - **Pull request-prosess:** Peer review før deployment - **Deployment logs:** Azure Activity Log + pipeline logs gir full audit trail **Eksempel på etterprøvbar deployment:** ```bash # 1. Commit IaC endringer til Git git add infrastructure/main.bicep git commit -m "feat(infra): add private endpoint for storage account" # 2. Create PR for review gh pr create --title "Add storage private endpoint" --body "Implements NSM requirement X" # 3. After approval, pipeline deploys # Azure Activity Log captures deployment event with: # - Timestamp # - User/service principal # - Resource changes # - Compliance status ``` ### NSMs grunnprinsipper for IaC | NSM-prinsipp | IaC-implementering | |--------------|---------------------| | **Identifisere og kartlegge** | Alle ressurser definert eksplisitt i IaC (ingen "shadow IT") | | **Beskytte** | Network isolation via VNet-konfigurert i IaC | | **Oppdage** | Azure Policy + Azure Monitor konfigurert via IaC | | **Begrense og kontrollere** | RBAC definert i IaC (principle of least privilege) | ### DPIA-relevante IaC-konfigurasjoner Når IaC brukes for AI-systemer som behandler persondata: **Data residency (datalagring i Norge):** ```bicep param location string = 'norwayeast' // Enforce Norwegian data center resource storage 'Microsoft.Storage/storageAccounts@2023-01-01' = { name: storageAccountName location: location // Data stays in Norway properties: { allowBlobPublicAccess: false minimumTlsVersion: 'TLS1_2' } } ``` **Encryption at rest (GDPR Article 32):** ```bicep resource mlWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = { properties: { encryption: { status: 'Enabled' keyVaultProperties: { keyVaultArmId: keyVault.id keyIdentifier: '${keyVault.properties.vaultUri}keys/ml-encryption-key' } } } } ``` > **DPIA-dokumentasjon:** IaC-filene selv blir del av DPIA-dokumentasjonen fordi de beviser hvordan tekniske sikkerhetstiltak er implementert. ## Kostnad og lisensiering ### IaC-verktøy kostnader | Verktøy | Lisens | Kostnad | |---------|--------|---------| | **Bicep** | Open source (MIT) | Gratis | | **ARM templates** | Microsoft-provided | Gratis | | **Terraform** | Open source (MPL 2.0) | Gratis (OSS version) | | **Terraform Cloud** | Proprietary | Gratis for <5 users, deretter $20/user/mnd | > **Anbefaling for Norge offentlig:** Bruk open source Terraform (ikke Cloud) eller Bicep for å unngå vendor lock-in og lisenskostnader. ### Azure-ressurser provisjonert via IaC **Dev/test workspace (minimal):** - Storage Account (GRS, 100 GB): ~100 NOK/mnd - Key Vault (standard): ~5 NOK/mnd - Container Registry (Basic): ~50 NOK/mnd - Application Insights (5 GB/mnd): Gratis - **Total:** ~155 NOK/mnd (kun infrastruktur, ingen compute) **Prod workspace (secure, VNet-isolated):** - Storage Account (GRS, 1 TB, private endpoint): ~750 NOK/mnd - Key Vault (premium, HSM-backed): ~450 NOK/mnd - Container Registry (Premium, geo-replication): ~750 NOK/mnd - Application Insights (50 GB/mnd): ~200 NOK/mnd - Private endpoints (4x): ~200 NOK/mnd - VNet + NAT Gateway: ~300 NOK/mnd - **Total:** ~2650 NOK/mnd (kun infrastruktur) **Kostnadsoptimalisering via IaC:** - **Auto-shutdown scripts** for dev compute (via Terraform `azurerm_machine_learning_compute_cluster` scale settings) - **Lifecycle policies** for storage (move old training data to Cool tier) - **Conditional deployment** (deploy expensive resources kun i prod) **Bicep eksempel: Dev vs. Prod SKU:** ```bicep param environment string = 'dev' resource containerRegistry 'Microsoft.ContainerRegistry/registries@2023-01-01' = { name: acrName sku: { name: environment == 'prod' ? 'Premium' : 'Basic' // Cost optimization } } ``` ### Azure Hybrid Benefit for Windows VMs Hvis du bruker IaC til å deploye Windows-baserte compute instances (t.ex. DSVM): ```terraform resource "azurerm_linux_virtual_machine" "dsvm" { name = "dsvm-${var.environment}" license_type = "Windows_Server" # Enables Azure Hybrid Benefit # ... (rest of config) } ``` Dette kan spare opptil 40% på VM-kostnader hvis du har eksisterende Windows Server-lisenser. ## For arkitekten (Cosmo) ### Tekniske avklaringsspørsmål **Før du designer IaC-løsningen, avklar:** 1. **Deployment scope:** - Single workspace eller multi-workspace (hub-and-spoke)? - Shared services (t.ex. felles Container Registry)? 2. **Network isolation:** - Public network access OK (dev/test)? - Private endpoints påkrevd (prod/HBI data)? - Eksisterende VNet som må integreres? 3. **Compliance og governance:** - Norsk offentlig sektor med NSM-krav? - GDPR/persondata (krever encryption at rest med customer-managed keys)? - Audit trail-krav fra Utredningsinstruksen? 4. **Team capabilities:** - Har teamet Terraform-erfaring? - Foretrekker de Azure-native verktøy (Bicep)? - CI/CD-plattform: Azure DevOps eller GitHub? 5. **Eksisterende infrastruktur:** - Greenfield (nytt miljø fra scratch)? - Brownfield (må integrere med existing VNet, policies)? - Hybrid (on-premises + cloud)? ### Designprinsipper **1. Modularitet over monolitt** ``` ❌ IKKE: En gigantisk main.bicep på 2000 linjer ✅ JA: Separate modules (network.bicep, workspace.bicep, compute.bicep) ``` **2. Parameterisering for gjenbruk** ```bicep // Bruk parameters for alt som varierer mellom miljøer param environment string // dev, test, prod param location string param enablePrivateEndpoint bool = environment == 'prod' // Conditional logic ``` **3. Versjonskontroll av API-versjoner** ```bicep // Pin API versions eksplisitt, ikke bruk 'latest' resource workspace 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = { // ... (config) } ``` Dette sikrer at deployments er reproducerbare — `latest` kan endre oppførsel over tid. **4. Idempotens-testing** ```bash # Test at samme deployment kan kjøres flere ganger uten feil az deployment group create --template-file main.bicep --parameters prod.bicepparam # Kjør igjen — skal ikke feile eller endre noe az deployment group create --template-file main.bicep --parameters prod.bicepparam ``` **5. Fail-fast validation** ```bash # Valider Bicep syntaks før deployment az bicep build --file main.bicep # Dry-run med what-if az deployment group what-if \ --resource-group mlops-prod-rg \ --template-file main.bicep \ --parameters prod.bicepparam ``` ### Vanlige fallgruver **Fallgruve 1: Hardkoded verdier** ```bicep ❌ IKKE: resource storage 'Microsoft.Storage/storageAccounts@2023-01-01' = { name: 'mlstorageprod123' // Hardcoded, ikke unique } ✅ JA: param storageNamePrefix string = 'mlstorage' resource storage 'Microsoft.Storage/storageAccounts@2023-01-01' = { name: '${storageNamePrefix}${uniqueString(resourceGroup().id)}' } ``` **Fallgruve 2: Manglende resource provider-registrering** ```bash # Error: "No registered resource provider found for Microsoft.MachineLearningServices" # Fix: az provider register --namespace Microsoft.MachineLearningServices az provider register --namespace Microsoft.Storage az provider register --namespace Microsoft.KeyVault ``` **Fallgruve 3: ACR tasks med private endpoints** Når både ACR og Azure ML har private endpoints, kan du IKKE bruke ACR tasks for image building. Du MÅ definere en compute cluster: ```bicep resource mlWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = { properties: { publicNetworkAccess: 'Disabled' imageBuildCompute: 'image-builder-cluster' // ← OBLIGATORISK } } resource imageBuilderCluster 'Microsoft.MachineLearningServices/workspaces/computes@2024-01-01-preview' = { parent: mlWorkspace name: 'image-builder-cluster' properties: { computeType: 'AmlCompute' properties: { vmSize: 'Standard_DS2_v2' scaleSettings: { minNodeCount: 0 maxNodeCount: 3 } } } } ``` **Fallgruve 4: Purge protection på Key Vault** Hvis du deployer og sletter workspaces ofte (dev/test), kan soft-deleted Key Vaults blokkere re-deployment: ```terraform resource "azurerm_key_vault" "default" { purge_protection_enabled = false // ← Kun for dev/test! # Prod skal alltid ha purge_protection_enabled = true } ``` **Fallgruve 5: Manglende RBAC for managed identity** Når workspace bruker managed identity for å aksessere Storage/KV, må du tildele RBAC-roller: ```bicep // Grant Storage Blob Data Contributor til workspace managed identity resource storageRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = { name: guid(storage.id, mlWorkspace.id, 'Storage Blob Data Contributor') scope: storage properties: { roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe') principalId: mlWorkspace.identity.principalId principalType: 'ServicePrincipal' } } ``` ### Integrasjon med ML lifecycle **IaC er IKKE statisk** — det skal evolve med ML-prosjektet: | ML-fase | IaC-aktivitet | |---------|---------------| | **Prototyping** | Deploy minimal dev workspace (public network, Basic SKU) | | **Experimentation** | Add compute clusters via IaC, scale up storage | | **Training at scale** | Deploy prod workspace (private endpoints, Premium SKU) | | **Model deployment** | Add managed online endpoints via IaC/Azure ML CLI | | **Monitoring** | Integrate Application Insights alerts via IaC | | **Retraining** | Scheduled pipelines trigger IaC updates (t.ex. nye compute resources) | **GitOps workflow:** ``` Developer → Commits IaC changes → PR review → CI pipeline validates → Merge to main → CD pipeline deploys to prod → Azure Policy checks compliance ``` ### Anti-patterns å unngå 1. **"ClickOps"** — Manually creating resources via Azure Portal - **Hvorfor dårlig:** Ingen versjonskontroll, ikke reproducerbart - **Fix:** Alt via IaC, bruk Portal kun for inspeksjon 2. **Monolithic IaC** — One massive file for entire environment - **Hvorfor dårlig:** Vanskelig å vedlikeholde, slow deployments - **Fix:** Modularize (workspace, network, compute som separate modules) 3. **Secrets i IaC** — Hardcoding API keys eller passwords - **Hvorfor dårlig:** Security risk, feilet audit - **Fix:** Bruk Key Vault references eller managed identities 4. **Ingen testing** — Deploy direkt til prod uten validation - **Hvorfor dårlig:** Downtime, compliance violations - **Fix:** Dev → Test → Prod miljøer, `az deployment what-if` før prod 5. **Manual state management (Terraform)** — Local state file - **Hvorfor dårlig:** Team collaboration issues, lost state = lost infrastructure - **Fix:** Azure Storage backend for Terraform state ```terraform terraform { backend "azurerm" { resource_group_name = "tfstate-rg" storage_account_name = "tfstatestorage" container_name = "tfstate" key = "mlops.terraform.tfstate" } } ``` ### Anbefalte ressurser for dypdykk **Microsoft Learn paths:** - [Infrastructure as Code on Azure](https://learn.microsoft.com/devops/deliver/what-is-infrastructure-as-code) - [Manage Azure Machine Learning workspaces with Terraform](https://learn.microsoft.com/azure/machine-learning/how-to-manage-workspace-terraform) - [Create Azure ML hub workspace using Bicep](https://learn.microsoft.com/azure/machine-learning/how-to-manage-hub-workspace-template) **GitHub repositories:** - [Azure/azure-quickstart-templates](https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.machinelearningservices) — Official Bicep templates - [Azure/terraform](https://github.com/Azure/terraform/tree/master/quickstart) — Terraform quickstarts for Azure ML - [Azure/mlops-v2](https://github.com/Azure/mlops-v2) — End-to-end MLOps solution accelerator **Terraform Registry:** - [azurerm_machine_learning_workspace](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/machine_learning_workspace) **Azure Verified Modules (AVM):** - [avm/res/machine-learning-services/workspace](https://github.com/Azure/bicep-registry-modules/tree/main/avm/res/machine-learning-services/workspace) — Community-maintained Bicep modules ## Kilder og verifisering Denne kunnskapsreferansen er basert på følgende verifiserte kilder (hentet 2026-02-04): 1. **Microsoft Learn - What is Infrastructure as Code (IaC)?** - URL: https://learn.microsoft.com/devops/deliver/what-is-infrastructure-as-code - Beskrivelse: Fundamental IaC-konsepter, idempotens, deklarativ vs. imperativ - Confidence: VERY_HIGH 2. **Microsoft Learn - Manage Azure Machine Learning workspaces using Terraform** - URL: https://learn.microsoft.com/azure/machine-learning/how-to-manage-workspace-terraform - Beskrivelse: Komplett guide til Terraform for Azure ML, inkludert public/private network configs - Confidence: VERY_HIGH 3. **Microsoft Learn - Create Azure ML hub workspace using Bicep template** - URL: https://learn.microsoft.com/azure/machine-learning/how-to-manage-hub-workspace-template - Beskrivelse: Bicep-basert deployment, modular struktur, API-versjoner - Confidence: VERY_HIGH 4. **Microsoft Learn - Set up MLOps with Azure DevOps** - URL: https://learn.microsoft.com/azure/machine-learning/how-to-setup-mlops-azureml - Beskrivelse: End-to-end MLOps med IaC deployment via Azure Pipelines - Confidence: VERY_HIGH 5. **Microsoft Learn - Machine Learning Operations (MLOps) concepts** - URL: https://learn.microsoft.com/azure/aks/concepts-machine-learning-ops - Beskrivelse: IaC som MLOps-praksis, integrasjon med CI/CD - Confidence: VERY_HIGH 6. **Azure Architecture Center - Machine Learning Operations v2** - URL: https://learn.microsoft.com/azure/architecture/ai-ml/guide/machine-learning-operations-v2 - Beskrivelse: MLOps-arkitektur med Azure Pipelines og IaC - Confidence: HIGH 7. **Azure Well-Architected Framework - Infrastructure as Code design** - URL: https://learn.microsoft.com/azure/well-architected/operational-excellence/infrastructure-as-code-design - Beskrivelse: Best practices for IaC-design, modularization, declarative tools - Confidence: VERY_HIGH 8. **GitHub - Azure/azure-quickstart-templates** - URL: https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.machinelearningservices/aifoundry-basics - Beskrivelse: Official Bicep templates for Azure ML workspace deployment - Confidence: VERY_HIGH 9. **GitHub - Azure/terraform (quickstart templates)** - URL: https://github.com/Azure/terraform/tree/master/quickstart - Beskrivelse: 101, 201, 301 Terraform templates for Azure ML (basic, secure, hub-spoke) - Confidence: VERY_HIGH 10. **Terraform Registry - azurerm_machine_learning_workspace** - URL: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/machine_learning_workspace - Beskrivelse: Official Terraform provider documentation for Azure ML - Confidence: VERY_HIGH **MCP-research metadata:** - **microsoft_docs_search calls:** 4 - **microsoft_docs_fetch calls:** 3 - **microsoft_code_sample_search calls:** 1 - **Total sources:** 10 - **Dato for research:** 2026-02-04 **Confidence levels:** - VERY_HIGH: Offisiell Microsoft-dokumentasjon, verifiserte code samples - HIGH: Azure Architecture Center (best practices, ikke produkt-spesifikk) **Verifisering:** Alle kodeeksempler er hentet fra official Microsoft Learn eller GitHub repos under Azure-organisasjonen. Bicep/Terraform-syntaks er verifisert mot latest provider versions (azurerm 3.x for Terraform, 2024-01-01-preview API for Bicep). --- **Oppdatert:** 2026-02-04 **Neste review:** 2026-05-04 (eller når Azure ML API major version oppdateres)