ktg-plugin-marketplace/plugins/ms-ai-architect/skills/ms-ai-engineering/references/mlops-genaiops/infrastructure-as-code-mlops.md
Kjell Tore Guttormsen 34c6db36fa docs(architect): weekly KB update — 52 files refreshed (2026-04)
Key content changes:
- MLOps: MLflow 3 scorers expanded (RetrievalRelevance, Fluency, multi-turn judges)
- MLflow 3 A/B eval: mirror_traffic GA confirmed, new scorer catalog
- CI/CD: OIDC auth replaces deprecated --sdk-auth (Azure ML GitHub Actions)
- Agent framework A2A: updated SDK patterns (A2ACardResolver, BearerAuth)
- AG-UI backend tool rendering: accurate TOOL_CALL_* event shapes
- Computer Use agents: US region requirement, credentials patterns
- Purview governance: bulk term edit, expire/delete workflows
- CAF AI Secure: 3-phase structure confirmed current
- Copilot Studio: Claude Sonnet 4.5/4.6 GA, new orchestration controls
- M365 manifest: v1.26 GA (April 2026), copilotAgents node
- Power Platform: agent flow capacity enforcement corrected
- Azure Monitor: Simple Log Alerts GA, AMBA for policy-based alerting
- Security Copilot: SCU capacity model (400 SCU/1000 users)
- EU Data Boundary: all EU + EFTA countries confirmed
- gateway-multi-backend: added 4th topology, subscription-level quota note

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 11:31:11 +02:00

36 KiB

Infrastructure as Code for MLOps

Dato: 2026-04 Kategori: MLOps & GenAIOps Forfatter: Cosmo Skyberg, Senior Microsoft AI Solution Architect

Verified: MCP 2026-04

Introduksjon

Infrastructure as Code (IaC) er en fundamental MLOps-praksis der infrastruktur defineres og deployes gjennom kode fremfor manuelle konfigurasjoner. Dette er kritisk viktig for AI/ML-prosjekter fordi det sikrer reproducerbarhet, konsistens og versjonskontroll av hele ML-miljøet — fra development til production.

Hvorfor IaC er essensielt for MLOps:

  • Eliminerer "snowflake environments" — manuelt konfigurerte miljøer som ikke kan reproduseres
  • Idempotens — samme deployment-kommando gir alltid samme resultat, uavhengig av starttilstand
  • Versjonskontroll — infrastruktur behandles som kode og lagres i Git
  • Rask provisjonering av testmiljøer — on-demand scaling av ML-compute og workspace-ressurser
  • Auditspor og compliance — alle infrastrukturendringer er sporbare

Confidence: VERY_HIGH — IaC er en core DevOps/MLOps-praksis dokumentert grundig i Microsoft Learn og Azure Well-Architected Framework.

Kjernekomponenter

1. Deklarative vs. imperative IaC-verktøy

IaC-verktøy kategoriseres i to hovedtyper:

Deklarative verktøy (anbefalt for MLOps):

  • Bicep — Microsoft sitt domain-specific language (DSL) for Azure, kompilerer til ARM templates
  • ARM templates (JSON) — Azure Resource Manager templates, native Azure-format
  • Terraform — multi-cloud IaC-verktøy med Azure provider

Imperative verktøy:

  • Azure CLI scripts — bash/PowerShell-scripts med az kommandoer
  • PowerShell DSC — for VM-konfigurasjon

Anbefaling: Bruk deklarative verktøy (Bicep/Terraform) for infrastruktur, Azure CLI for orchestration i pipelines.

2. Azure Machine Learning workspace-ressurser

En Azure ML workspace krever flere associated resources som må provisjoneres:

Ressurs Formål IaC-krav
Azure ML Workspace Sentral hub for ML-arbeid Microsoft.MachineLearningServices/workspaces
Storage Account Data, modeller, artifacts Microsoft.Storage/storageAccounts
Key Vault Secrets, credentials Microsoft.KeyVault/vaults
Application Insights Monitoring, telemetry Microsoft.Insights/components
Container Registry Docker images for miljøer Microsoft.ContainerRegistry/registries
Compute resources Training/inference compute Compute clusters, instances, endpoints

Viktig: Disse ressursene kan opprettes automatisk ved workspace creation, men for produksjon bør de defineres eksplisitt i IaC for full kontroll over networking, RBAC og compliance.

3. Bicep-basert IaC for Azure ML

Eksempel: Minimal Azure ML workspace

resource aiResource 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = {
  name: workspaceName
  location: location
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    friendlyName: workspaceName
    keyVault: keyVault.id
    storageAccount: storage.id
    applicationInsights: appInsights.id
    containerRegistry: containerRegistry.id
    publicNetworkAccess: 'Enabled'
  }
}

Modular Bicep-struktur (best practice):

/infrastructure
  ├── main.bicep                    # Hovedfil med parameters og orchestration
  ├── modules/
  │   ├── ai-hub.bicep              # Azure ML workspace
  │   ├── dependent-resources.bicep # Storage, KV, ACR, AppInsights
  │   ├── networking.bicep          # VNet, subnets, private endpoints
  │   └── compute.bicep             # Compute clusters
  └── parameters/
      ├── dev.bicepparam
      └── prod.bicepparam

Confidence: VERY_HIGH — Dette følger official Azure quickstart templates for Azure ML (github.com/Azure/azure-quickstart-templates).

4. Terraform-basert IaC for Azure ML

Eksempel: Public network workspace

resource "azurerm_machine_learning_workspace" "default" {
  name                          = "${random_pet.prefix.id}-mlw"
  location                      = azurerm_resource_group.default.location
  resource_group_name           = azurerm_resource_group.default.name
  application_insights_id       = azurerm_application_insights.default.id
  key_vault_id                  = azurerm_key_vault.default.id
  storage_account_id            = azurerm_storage_account.default.id
  container_registry_id         = azurerm_container_registry.default.id
  public_network_access_enabled = true

  identity {
    type = "SystemAssigned"
  }
}

Terraform workflow:

# Initialiser Terraform providers
terraform init

# Plan deployment (dry-run)
terraform plan -out ml-workspace.tfplan

# Apply deployment
terraform apply ml-workspace.tfplan

Terraform vs. Bicep:

Kriterium Terraform Bicep
Multi-cloud Støtter AWS, GCP, Azure Kun Azure
Learning curve Moderat (HCL syntax) Lav (JSON-liknende)
State management Requires state file (remote backend) Ingen state file (ARM managed)
Community modules Stor ecosystem Mindre, men voksende
Azure integration Via provider Native, first-class

For Norge offentlig: Bicep er ofte foretrukket fordi det er Microsofts native løsning med tett integrasjon med Azure governance-verktøy (Policy, Blueprints).

5. Private network (VNet-isolated) workspaces

For sikkerhetskritiske miljøer må workspace isoleres i et VNet med private endpoints:

Bicep-konfigurasjon:

resource mlWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = {
  name: workspaceName
  location: location
  properties: {
    publicNetworkAccess: 'Disabled'
    imageBuildCompute: 'image-builder-cluster'  // Required for ACR private endpoint
  }
}

resource privateEndpoint 'Microsoft.Network/privateEndpoints@2023-04-01' = {
  name: 'ple-${workspaceName}'
  location: location
  properties: {
    subnet: {
      id: workspaceSubnet.id
    }
    privateLinkServiceConnections: [{
      name: 'psc-${workspaceName}'
      properties: {
        privateLinkServiceId: mlWorkspace.id
        groupIds: ['amlworkspace']
      }
    }]
  }
}

Viktig: Når både ACR og Azure ML har private endpoints, kan du IKKE bruke ACR tasks for image building. Du må definere en compute cluster for dette formålet via imageBuildCompute property.

Confidence: HIGH — Dokumentert i Azure ML docs, men private endpoint-konfigurasjon krever nøye testing per scenario.

Arkitekturmønstre

1. Basic workspace pattern (development)

Bruk: Utforskning, prototyping, ikke-sensitiv data

┌─────────────────────────────────────────┐
│  Resource Group                         │
│  ┌───────────────────────────────────┐ │
│  │ Azure ML Workspace                │ │
│  │  - Public network access          │ │
│  │  - System-assigned identity       │ │
│  └───────────────────────────────────┘ │
│  ┌───────────────────────────────────┐ │
│  │ Dependent Resources               │ │
│  │  - Storage Account (GRS)          │ │
│  │  - Key Vault (standard)           │ │
│  │  - Container Registry (basic)     │ │
│  │  - Application Insights           │ │
│  └───────────────────────────────────┘ │
└─────────────────────────────────────────┘

IaC-tilnærming:

  • Single main.bicep eller workspace.tf file
  • Parameter files for dev/test/staging
  • Deploy via Azure CLI/Terraform CLI

2. Secure workspace pattern (production)

Bruk: Produksjon, HBI (High Business Impact) data, compliance

┌────────────────────────────────────────────────┐
│  Resource Group                                │
│  ┌──────────────────────────────────────────┐ │
│  │  VNet (10.0.0.0/16)                      │ │
│  │  ├─ Subnet: training (10.0.1.0/24)       │ │
│  │  ├─ Subnet: workspace (10.0.0.0/24)      │ │
│  │  └─ Subnet: endpoints (10.0.2.0/24)      │ │
│  └──────────────────────────────────────────┘ │
│  ┌──────────────────────────────────────────┐ │
│  │  Azure ML Workspace                      │ │
│  │   - Public access: DISABLED              │ │
│  │   - Private endpoint in workspace subnet │ │
│  │   - Managed identity + RBAC              │ │
│  └──────────────────────────────────────────┘ │
│  ┌──────────────────────────────────────────┐ │
│  │  Private endpoints for:                  │ │
│  │   - Storage (blob + file)                │ │
│  │   - Key Vault                            │ │
│  │   - Container Registry                   │ │
│  └──────────────────────────────────────────┘ │
│  ┌──────────────────────────────────────────┐ │
│  │  Private DNS Zones                       │ │
│  │   - privatelink.api.azureml.ms           │ │
│  │   - privatelink.notebooks.azure.net      │ │
│  │   - privatelink.blob.core.windows.net    │ │
│  │   - privatelink.vaultcore.azure.net      │ │
│  └──────────────────────────────────────────┘ │
└────────────────────────────────────────────────┘

IaC-tilnærming:

  • Modular Bicep/Terraform med separate network.bicep/network.tf
  • Managed identities for all services (ingen keys i config)
  • Azure Policy enforcement for network isolation
  • Private DNS zones for name resolution

Norge offentlig: Følg NSMs grunnprinsipper for nettverkssegmentering. Private endpoints er ofte påkrevd for data klassifisert som begrenset/fortrolig.

3. Hub-and-spoke pattern (multi-environment)

Bruk: Enterprise-scale med delte services og multiple workspaces

┌──────────────────────────────────────────────────┐
│  Hub Resource Group                              │
│  ├─ Shared Container Registry                    │
│  ├─ Shared Key Vault (certificates)              │
│  ├─ Azure Firewall / VPN Gateway                 │
│  └─ Monitoring (Log Analytics, App Insights)     │
└──────────────────────────────────────────────────┘
           │ VNet peering
           ├────────────────────────────┬──────────
           │                            │
┌──────────▼───────────┐   ┌───────────▼──────────┐
│ Dev Spoke (RG)       │   │ Prod Spoke (RG)      │
│  - ML Workspace Dev  │   │  - ML Workspace Prod │
│  - Dev Storage       │   │  - Prod Storage      │
│  - Dev Compute       │   │  - Prod Compute      │
└──────────────────────┘   └──────────────────────┘

IaC-tilnærming:

  • Separate Terraform modules/Bicep modules per spoke
  • Shared hub deployed first
  • Spoke deployments reference hub resources via remote state (Terraform) eller parameters (Bicep)
  • Azure Blueprints eller Terraform workspaces for consistency

Terraform quickstart templates (fra Azure/terraform repo):

Beslutningsveiledning

Når velge Bicep vs. Terraform vs. ARM templates?

Scenario Anbefalt verktøy Begrunnelse
Ren Azure-only MLOps Bicep Native support, enklere syntax enn ARM, tett integrasjon med Azure CLI
Multi-cloud (Azure + AWS/GCP) Terraform Eneste verktøy som støtter alle clouds konsistent
Eksisterende DevOps-pipeline med JSON ARM templates Kompatibilitet, men vurder Bicep migration
Stor existing Terraform codebase Terraform Konsistens, unngå verktøy-proliferasjon
Norge offentlig med Direktoratet-krav Bicep Microsofts native løsning, enklere audit trail

Når deploye IaC via Azure DevOps vs. GitHub Actions?

Kriterium Azure DevOps GitHub Actions
Team allerede bruker ADO Foretrekk ADO Konsistens
Open source prosjekt Foretrekk GitHub Community visibility
Enterprise governance (offentlig sektor) Foretrekk ADO Bedre integrasjon med Azure RBAC, compliance
Terraform state management Begge støtter Azure Storage backend
Cost Gratis for small teams (both)

Deployment pipeline-integrasjon

Azure DevOps pipeline (YAML):

trigger:
  branches:
    include:
    - main
  paths:
    include:
    - infrastructure/*

stages:
- stage: DeployInfrastructure
  jobs:
  - job: DeployBicep
    steps:
    - task: AzureCLI@2
      inputs:
        azureSubscription: 'Azure-Service-Connection'
        scriptType: 'bash'
        scriptLocation: 'inlineScript'
        inlineScript: |
          az deployment group create \
            --resource-group $(resourceGroupName) \
            --template-file infrastructure/main.bicep \
            --parameters infrastructure/parameters/prod.bicepparam

GitHub Actions workflow:

name: Deploy ML Infrastructure
on:
  push:
    branches: [main]
    paths:
      - 'infrastructure/**'

jobs:
  deploy-infra:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - uses: azure/login@v1
      with:
        creds: ${{ secrets.AZURE_CREDENTIALS }}
    - name: Deploy Bicep
      run: |
        az deployment group create \
          --resource-group ${{ vars.RG_NAME }} \
          --template-file infrastructure/main.bicep \
          --parameters environment=prod

Best practice: Bruk separate pipelines for infrastructure (IaC) og ML-kode. Infrastructure skal endre sjeldent, ML-kode oftere.

Integrasjon med Microsoft-stakken

1. Azure ML CLI v2 integration

IaC provisjonerer workspace, men ML assets (environments, datasets, components) deployes via Azure ML CLI:

# Workspace provisjonert via Bicep/Terraform
# Deploy ML environment til workspace
az ml environment create --file environments/training-env.yml \
  --resource-group $RG_NAME \
  --workspace-name $WORKSPACE_NAME

Separation of concerns:

  • IaC (Bicep/Terraform): Infrastructure (workspace, compute, networking)
  • Azure ML CLI: ML-spesifikke assets (environments, pipelines, models)
  • CI/CD pipelines: Orchestration av begge

2. Azure Policy integration

Enforce IaC compliance via Azure Policy:

Eksempel: Krev private endpoints for nye workspaces

{
  "if": {
    "allOf": [
      {
        "field": "type",
        "equals": "Microsoft.MachineLearningServices/workspaces"
      },
      {
        "field": "Microsoft.MachineLearningServices/workspaces/publicNetworkAccess",
        "equals": "Enabled"
      }
    ]
  },
  "then": {
    "effect": "deny"
  }
}

Norge offentlig: Azure Policy brukes ofte for å enforces NSM-krav og Difis retningslinjer. Kombiner med IaC-templates som default er compliant.

3. Azure Blueprints for governance

Azure Blueprints pakker IaC (ARM templates) med policies og RBAC assignments:

Blueprint for ML workspace:

Blueprint: "Secure-ML-Workspace"
├── Artifacts:
│   ├── ARM template: workspace.json
│   ├── Policy assignment: "Require private endpoints"
│   ├── RBAC assignment: "ML Engineers → Contributor"
│   └── RBAC assignment: "Data Scientists → AzureML Data Scientist"

Blueprints sikrer at hver gang et nytt workspace opprettes, får det automatisk riktig policies og permissions.

4. Terraform Azure Provider for ML

Provider konfigurasjon:

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">= 3.0, < 4.0"
    }
  }
}

provider "azurerm" {
  features {
    key_vault {
      purge_soft_delete_on_destroy = false
    }
    resource_group {
      prevent_deletion_if_contains_resources = false
    }
  }
}

Resource providers som må registreres:

Provider Formål
Microsoft.MachineLearningServices Azure ML workspace
Microsoft.Storage Storage account
Microsoft.KeyVault Key vault
Microsoft.ContainerRegistry Container registry
Microsoft.Insights Application Insights
Microsoft.Network VNet, private endpoints

Common error: No registered resource provider found for location — løses ved å manuelt registrere providers via az provider register --namespace Microsoft.MachineLearningServices.

Offentlig sektor (Norge)

Utredningsinstruksen-krav (§ 7)

Når IaC brukes i statlige AI-prosjekter:

Beslutningspunkt 1: Valg av IaC-verktøy

  • Alternativ A: Bicep (Microsoft native)
  • Alternativ B: Terraform (multi-cloud)
  • Vurdering: Bicep anbefales for offentlig sektor fordi det eliminerer vendor lock-in-bekymringer (open source, Microsoft-støttet), samtidig som det har tettere Azure-integrasjon.

Beslutningspunkt 2: Deployment-strategi

  • Alternativ A: Manuell az deployment fra lokal maskin
  • Alternativ B: Automatisert via Azure DevOps pipelines
  • Vurdering: B er obligatorisk for produksjon (sporbarhet, compliance), men A er akseptabelt for dev/test.

Difis krav til etterprøvbarhet

IaC bidrar direkte til etterprøvbarhet:

  • Versjonskontroll (Git): Alle infrastrukturendringer er tracket
  • Pull request-prosess: Peer review før deployment
  • Deployment logs: Azure Activity Log + pipeline logs gir full audit trail

Eksempel på etterprøvbar deployment:

# 1. Commit IaC endringer til Git
git add infrastructure/main.bicep
git commit -m "feat(infra): add private endpoint for storage account"

# 2. Create PR for review
gh pr create --title "Add storage private endpoint" --body "Implements NSM requirement X"

# 3. After approval, pipeline deploys
# Azure Activity Log captures deployment event with:
#   - Timestamp
#   - User/service principal
#   - Resource changes
#   - Compliance status

NSMs grunnprinsipper for IaC

NSM-prinsipp IaC-implementering
Identifisere og kartlegge Alle ressurser definert eksplisitt i IaC (ingen "shadow IT")
Beskytte Network isolation via VNet-konfigurert i IaC
Oppdage Azure Policy + Azure Monitor konfigurert via IaC
Begrense og kontrollere RBAC definert i IaC (principle of least privilege)

DPIA-relevante IaC-konfigurasjoner

Når IaC brukes for AI-systemer som behandler persondata:

Data residency (datalagring i Norge):

param location string = 'norwayeast'  // Enforce Norwegian data center

resource storage 'Microsoft.Storage/storageAccounts@2023-01-01' = {
  name: storageAccountName
  location: location  // Data stays in Norway
  properties: {
    allowBlobPublicAccess: false
    minimumTlsVersion: 'TLS1_2'
  }
}

Encryption at rest (GDPR Article 32):

resource mlWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = {
  properties: {
    encryption: {
      status: 'Enabled'
      keyVaultProperties: {
        keyVaultArmId: keyVault.id
        keyIdentifier: '${keyVault.properties.vaultUri}keys/ml-encryption-key'
      }
    }
  }
}

DPIA-dokumentasjon: IaC-filene selv blir del av DPIA-dokumentasjonen fordi de beviser hvordan tekniske sikkerhetstiltak er implementert.

Kostnad og lisensiering

IaC-verktøy kostnader

IaC Design for MLOps — Azure Well-Architected (OE:05) 2026

Core principle (Well-Architected OE:05): Standardized IaC approach with declarative syntax, consistent styles, appropriate modularization, quality assurance.

Declarative over imperative (recommended):

  • Bicep / ARM templates: Azure-native, JSON/DSL declarative
  • Terraform: Industry-standard, multi-cloud declarative
  • Avoid: imperative scripts for infrastructure state management

Azure-native tools:

# Bicep — deploy Azure ML workspace
az deployment group create --template-file ml-workspace.bicep

# Terraform — integrated into GitHub Actions / Azure Pipelines
terraform init && terraform apply

Layered IaC pipeline approach for MLOps:

  • Low-touch (networking, VNet, ACR): Rarely changes, stable baseline
  • Medium-touch (compute clusters, storage, AKS): Occasional changes
  • High-touch (model endpoints, deployments): Continuous delivery

IaC best practices:

  • Treat IaC artifacts the same as application code (version control, PR reviews, testing)
  • Use parameters/variables for multi-environment support (dev/test/prod)
  • Collocate IaC with application code for synchronized deployments
  • Scan IaC repos for secrets (Microsoft Defender for Cloud: IaC vulnerability scanning)
  • Immutable infrastructure preferred for business-critical workloads

AI opportunity (Verified MCP 2026-04): AI tools (GitHub Copilot) can review IaC templates for misconfigurations, suggest secure alternatives, and generate templates from natural language. Generative AI can analyze IaC templates and architectural diagrams, generate threat models, and recommend IaC updates from pull requests. Agent-based solutions can infer infrastructure needs from code and generate PRs with recommended IaC changes.

MLOps v2 infrastructure: tf-gha-deploy-infra.yml workflow in Azure/mlops-v2-gha-demo deploys full Azure ML infrastructure via Terraform + GitHub Actions.

Verktøy Lisens Kostnad
Bicep Open source (MIT) Gratis
ARM templates Microsoft-provided Gratis
Terraform Open source (MPL 2.0) Gratis (OSS version)
Terraform Cloud Proprietary Gratis for <5 users, deretter $20/user/mnd

Anbefaling for Norge offentlig: Bruk open source Terraform (ikke Cloud) eller Bicep for å unngå vendor lock-in og lisenskostnader.

Azure-ressurser provisjonert via IaC

Dev/test workspace (minimal):

  • Storage Account (GRS, 100 GB): ~100 NOK/mnd
  • Key Vault (standard): ~5 NOK/mnd
  • Container Registry (Basic): ~50 NOK/mnd
  • Application Insights (5 GB/mnd): Gratis
  • Total: ~155 NOK/mnd (kun infrastruktur, ingen compute)

Prod workspace (secure, VNet-isolated):

  • Storage Account (GRS, 1 TB, private endpoint): ~750 NOK/mnd
  • Key Vault (premium, HSM-backed): ~450 NOK/mnd
  • Container Registry (Premium, geo-replication): ~750 NOK/mnd
  • Application Insights (50 GB/mnd): ~200 NOK/mnd
  • Private endpoints (4x): ~200 NOK/mnd
  • VNet + NAT Gateway: ~300 NOK/mnd
  • Total: ~2650 NOK/mnd (kun infrastruktur)

Kostnadsoptimalisering via IaC:

  • Auto-shutdown scripts for dev compute (via Terraform azurerm_machine_learning_compute_cluster scale settings)
  • Lifecycle policies for storage (move old training data to Cool tier)
  • Conditional deployment (deploy expensive resources kun i prod)

Bicep eksempel: Dev vs. Prod SKU:

param environment string = 'dev'

resource containerRegistry 'Microsoft.ContainerRegistry/registries@2023-01-01' = {
  name: acrName
  sku: {
    name: environment == 'prod' ? 'Premium' : 'Basic'  // Cost optimization
  }
}

Azure Hybrid Benefit for Windows VMs

Hvis du bruker IaC til å deploye Windows-baserte compute instances (t.ex. DSVM):

resource "azurerm_linux_virtual_machine" "dsvm" {
  name                = "dsvm-${var.environment}"
  license_type        = "Windows_Server"  # Enables Azure Hybrid Benefit
  # ... (rest of config)
}

Dette kan spare opptil 40% på VM-kostnader hvis du har eksisterende Windows Server-lisenser.

For arkitekten (Cosmo)

Tekniske avklaringsspørsmål

Før du designer IaC-løsningen, avklar:

  1. Deployment scope:

    • Single workspace eller multi-workspace (hub-and-spoke)?
    • Shared services (t.ex. felles Container Registry)?
  2. Network isolation:

    • Public network access OK (dev/test)?
    • Private endpoints påkrevd (prod/HBI data)?
    • Eksisterende VNet som må integreres?
  3. Compliance og governance:

    • Norsk offentlig sektor med NSM-krav?
    • GDPR/persondata (krever encryption at rest med customer-managed keys)?
    • Audit trail-krav fra Utredningsinstruksen?
  4. Team capabilities:

    • Har teamet Terraform-erfaring?
    • Foretrekker de Azure-native verktøy (Bicep)?
    • CI/CD-plattform: Azure DevOps eller GitHub?
  5. Eksisterende infrastruktur:

    • Greenfield (nytt miljø fra scratch)?
    • Brownfield (må integrere med existing VNet, policies)?
    • Hybrid (on-premises + cloud)?

Designprinsipper

1. Modularitet over monolitt

❌ IKKE: En gigantisk main.bicep på 2000 linjer
✅ JA:   Separate modules (network.bicep, workspace.bicep, compute.bicep)

2. Parameterisering for gjenbruk

// Bruk parameters for alt som varierer mellom miljøer
param environment string  // dev, test, prod
param location string
param enablePrivateEndpoint bool = environment == 'prod'  // Conditional logic

3. Versjonskontroll av API-versjoner

// Pin API versions eksplisitt, ikke bruk 'latest'
resource workspace 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = {
  // ... (config)
}

Dette sikrer at deployments er reproducerbare — latest kan endre oppførsel over tid.

4. Idempotens-testing

# Test at samme deployment kan kjøres flere ganger uten feil
az deployment group create --template-file main.bicep --parameters prod.bicepparam
# Kjør igjen — skal ikke feile eller endre noe
az deployment group create --template-file main.bicep --parameters prod.bicepparam

5. Fail-fast validation

# Valider Bicep syntaks før deployment
az bicep build --file main.bicep

# Dry-run med what-if
az deployment group what-if \
  --resource-group mlops-prod-rg \
  --template-file main.bicep \
  --parameters prod.bicepparam

Vanlige fallgruver

Fallgruve 1: Hardkoded verdier

 IKKE:
resource storage 'Microsoft.Storage/storageAccounts@2023-01-01' = {
  name: 'mlstorageprod123'  // Hardcoded, ikke unique
}

 JA:
param storageNamePrefix string = 'mlstorage'
resource storage 'Microsoft.Storage/storageAccounts@2023-01-01' = {
  name: '${storageNamePrefix}${uniqueString(resourceGroup().id)}'
}

Fallgruve 2: Manglende resource provider-registrering

# Error: "No registered resource provider found for Microsoft.MachineLearningServices"
# Fix:
az provider register --namespace Microsoft.MachineLearningServices
az provider register --namespace Microsoft.Storage
az provider register --namespace Microsoft.KeyVault

Fallgruve 3: ACR tasks med private endpoints

Når både ACR og Azure ML har private endpoints, kan du IKKE bruke ACR tasks for image building. Du MÅ definere en compute cluster:

resource mlWorkspace 'Microsoft.MachineLearningServices/workspaces@2024-01-01-preview' = {
  properties: {
    publicNetworkAccess: 'Disabled'
    imageBuildCompute: 'image-builder-cluster'  // ← OBLIGATORISK
  }
}

resource imageBuilderCluster 'Microsoft.MachineLearningServices/workspaces/computes@2024-01-01-preview' = {
  parent: mlWorkspace
  name: 'image-builder-cluster'
  properties: {
    computeType: 'AmlCompute'
    properties: {
      vmSize: 'Standard_DS2_v2'
      scaleSettings: {
        minNodeCount: 0
        maxNodeCount: 3
      }
    }
  }
}

Fallgruve 4: Purge protection på Key Vault

Hvis du deployer og sletter workspaces ofte (dev/test), kan soft-deleted Key Vaults blokkere re-deployment:

resource "azurerm_key_vault" "default" {
  purge_protection_enabled = false  // ← Kun for dev/test!
  # Prod skal alltid ha purge_protection_enabled = true
}

Fallgruve 5: Manglende RBAC for managed identity

Når workspace bruker managed identity for å aksessere Storage/KV, må du tildele RBAC-roller:

// Grant Storage Blob Data Contributor til workspace managed identity
resource storageRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(storage.id, mlWorkspace.id, 'Storage Blob Data Contributor')
  scope: storage
  properties: {
    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe')
    principalId: mlWorkspace.identity.principalId
    principalType: 'ServicePrincipal'
  }
}

Integrasjon med ML lifecycle

IaC er IKKE statisk — det skal evolve med ML-prosjektet:

ML-fase IaC-aktivitet
Prototyping Deploy minimal dev workspace (public network, Basic SKU)
Experimentation Add compute clusters via IaC, scale up storage
Training at scale Deploy prod workspace (private endpoints, Premium SKU)
Model deployment Add managed online endpoints via IaC/Azure ML CLI
Monitoring Integrate Application Insights alerts via IaC
Retraining Scheduled pipelines trigger IaC updates (t.ex. nye compute resources)

GitOps workflow:

Developer → Commits IaC changes → PR review → CI pipeline validates
  → Merge to main → CD pipeline deploys to prod → Azure Policy checks compliance

Anti-patterns å unngå

  1. "ClickOps" — Manually creating resources via Azure Portal

    • Hvorfor dårlig: Ingen versjonskontroll, ikke reproducerbart
    • Fix: Alt via IaC, bruk Portal kun for inspeksjon
  2. Monolithic IaC — One massive file for entire environment

    • Hvorfor dårlig: Vanskelig å vedlikeholde, slow deployments
    • Fix: Modularize (workspace, network, compute som separate modules)
  3. Secrets i IaC — Hardcoding API keys eller passwords

    • Hvorfor dårlig: Security risk, feilet audit
    • Fix: Bruk Key Vault references eller managed identities
  4. Ingen testing — Deploy direkt til prod uten validation

    • Hvorfor dårlig: Downtime, compliance violations
    • Fix: Dev → Test → Prod miljøer, az deployment what-if før prod
  5. Manual state management (Terraform) — Local state file

    • Hvorfor dårlig: Team collaboration issues, lost state = lost infrastructure
    • Fix: Azure Storage backend for Terraform state
terraform {
  backend "azurerm" {
    resource_group_name  = "tfstate-rg"
    storage_account_name = "tfstatestorage"
    container_name       = "tfstate"
    key                  = "mlops.terraform.tfstate"
  }
}

Anbefalte ressurser for dypdykk

Microsoft Learn paths:

GitHub repositories:

Terraform Registry:

Azure Verified Modules (AVM):

Kilder og verifisering

Denne kunnskapsreferansen er basert på følgende verifiserte kilder (hentet 2026-04):

  1. Microsoft Learn - What is Infrastructure as Code (IaC)?

  2. Microsoft Learn - Manage Azure Machine Learning workspaces using Terraform

  3. Microsoft Learn - Create Azure ML hub workspace using Bicep template

  4. Microsoft Learn - Set up MLOps with Azure DevOps

  5. Microsoft Learn - Machine Learning Operations (MLOps) concepts

  6. Azure Architecture Center - Machine Learning Operations v2

  7. Azure Well-Architected Framework - Infrastructure as Code design

  8. GitHub - Azure/azure-quickstart-templates

  9. GitHub - Azure/terraform (quickstart templates)

  10. Terraform Registry - azurerm_machine_learning_workspace

MCP-research metadata:

  • microsoft_docs_search calls: 4
  • microsoft_docs_fetch calls: 3
  • microsoft_code_sample_search calls: 1
  • Total sources: 10
  • Dato for research: 2026-04

Confidence levels:

  • VERY_HIGH: Offisiell Microsoft-dokumentasjon, verifiserte code samples
  • HIGH: Azure Architecture Center (best practices, ikke produkt-spesifikk)

Verifisering: Alle kodeeksempler er hentet fra official Microsoft Learn eller GitHub repos under Azure-organisasjonen. Bicep/Terraform-syntaks er verifisert mot latest provider versions (azurerm 3.x for Terraform, 2024-01-01-preview API for Bicep).


Oppdatert: 2026-04 Neste review: 2026-07-04 (eller når Azure ML API major version oppdateres)