ktg-plugin-marketplace/plugins/ms-ai-architect/skills/ms-ai-security/references/ai-security-engineering/supply-chain-security-ai-models.md
Kjell Tore Guttormsen 6a7632146e feat(ms-ai-architect): add plugin to open marketplace (v1.5.0 baseline)
Initial addition of ms-ai-architect plugin to the open-source marketplace.
Private content excluded: orchestrator/ (Linear tooling), docs/utredning/
(client investigation), generated test reports and PDF export script.
skill-gen tooling moved from orchestrator/ to scripts/skill-gen/.

Security scan: WARNING (risk 20/100) — no secrets, no injection found.
False positive fixed: added gitleaks:allow to Python variable reference
in output-validation-grounding-verification.md line 109.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 17:17:17 +02:00

21 KiB

Supply Chain Security for AI Models and Dependencies

Kategori: AI Security Engineering Dato: 2026-02-05 Relatert plattform: Azure AI Foundry, Azure Machine Learning, Azure DevOps, Microsoft Defender for Cloud


Oversikt

Supply chain security for AI-modeller handler om å sikre integriteten og autentisiteten til AI-komponenter gjennom hele livssyklusen — fra treningsdata og pre-trained models til dependencies og deployment artifacts. I motsetning til tradisjonell software supply chain security, må AI-systemer også beskytte modellvekter, datasett, og ML-spesifikke komponenter mot kompromittering.

Angrep mot AI supply chain kan introdusere backdoors i modeller, forgifte treningsdata, eller eksfiltrere sensitiv informasjon via model inference. Microsoft Azure Security Benchmark klassifiserer dette under AI-1: Ensure use of approved models som en "must have"-kontroll.

Unike utfordringer for AI supply chain

  • Model provenance: Modeller lastes ned fra public repositories (HuggingFace, Model Zoo) uten verifisering
  • Data poisoning: Treningsdata fra untrusted sources kan inneholde skadelig innhold
  • Transitive dependencies: Python-pakker (PyTorch, TensorFlow) har dype dependency trees
  • Immutable artifacts: Kompilerte modeller (ONNX, MLflow) er vanskelig å inspisere for backdoors
  • Third-party MLaaS: Outsourcing av trening til tredjepartsleverandører introduserer tillit-risiko

1. Model Provenance Tracking

Hva er model provenance?

Model provenance er end-to-end sporbarhet av en modells opprinnelse, treningsprosess, og modifikasjoner. Dette inkluderer:

  • Datasett-lineage: Hvilke data ble brukt for trening?
  • Treningsjobb-metadata: Hyperparametere, compute resources, tidspunkt
  • Model registry history: Versjonering, approvals, deployment records
  • Audit trails: Hvem registrerte, godkjente, eller deployet modellen?

Implementering i Azure Machine Learning

Azure Machine Learning Model Registry fungerer som single source of truth:

from azure.ai.ml import MLClient
from azure.ai.ml.entities import Model
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    credential=DefaultAzureCredential(),
    subscription_id="<subscription-id>",
    resource_group_name="<resource-group>",
    workspace_name="<workspace-name>"
)

# Registrer modell med provenance metadata
model = Model(
    path="./model",
    name="fraud-detection-v2",
    version="2.0",
    description="Trained on 2025-Q4 dataset",
    tags={
        "training_job": "run_12345",
        "data_version": "v2.3",
        "approved_by": "security-team",
        "scan_status": "passed"
    },
    properties={
        "training_dataset_id": "azureml:fraud-data:2",
        "validation_accuracy": "0.94"
    }
)

ml_client.models.create_or_update(model)

Beste praksis

  1. Hash verification: Lagre SHA-256 hash av modellvekter ved registrering
  2. Immutable tags: Bruk tags som ikke kan overskrives (created_date, git_commit)
  3. Signed models: Implementer code signing for modell artifacts
  4. Centralized registry: Bruk Azure ML registries på tvers av subscriptions/workspaces

2. Dependency Vulnerability Scanning

Trusselbildet

AI-modeller avhenger av dype Python dependency trees (eksempel: PyTorch → NumPy → BLAS). Sårbarheter i disse komponentene kan utnyttes til:

  • Remote code execution: Via malicious pickle files i modellformater
  • Data exfiltration: Kompromitterte pakker som sender treningsdata til eksternt endepunkt
  • Supply chain attacks: Typosquatting (pytorch vs. py-torch), package hijacking

MITRE ATT&CK klassifiserer dette som T1195: Supply Chain Compromise.

Azure-verktøy for scanning

1. Azure DevOps Dependency Scanning

Aktivert via GitHub Advanced Security for Azure DevOps:

# azure-pipelines.yml
trigger:
  branches:
    include:
      - main

pool:
  vmImage: 'ubuntu-latest'

steps:
- task: AdvancedSecurity-Dependency-Scanning@1
  displayName: 'Scan Python dependencies'
  inputs:
    scanMode: 'all'  # Scan både direkte og transitive dependencies
    ecosystem: 'pip'

Dependency scanning genererer alerts for:

  • Direct vulnerabilities: Pakker i requirements.txt
  • Transitive vulnerabilities: Pakker som direkte dependencies bruker
  • CVE severity mapping: Critical (CVSS ≥9.0), High (7.0-9.0), Medium (4.0-7.0), Low (1.0-4.0)

2. Microsoft Defender for Containers

Scanner container images (inkludert Azure ML environments) for vulnerabilities:

from azure.ai.ml.entities import Environment

# Opprett miljø med base image som scannes
env = Environment(
    name="secure-training-env",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
    conda_file="conda_dependencies.yml",
    description="Environment with vulnerability scanning"
)

ml_client.environments.create_or_update(env)

Defender for Containers:

  • Genererer vulnerability assessments automatisk når image pushes til Azure Container Registry
  • Blokkerer deployment av images med critical vulnerabilities (konfigurerbart via Azure Policy)
  • Integrerer med Azure Monitor for alerting

3. Quarantine Pattern for Package Management

Implementer self-serve package management med sikkerhetslag:

Data Scientist → Safe-listed repos (Microsoft Artifact Registry, PyPI, Conda)
                          ↓
                  Automated testing (vulnerability scan)
                          ↓
                  Pass → Container Registry
                  Fail → Deployment blocked, container removed

Process flow:

  1. Data scientists arbeider i Azure ML workspace med network restrictions
  2. Selv-serve fra curated package repositories
  3. Azure ML bygger Docker containers under deployment
  4. Microsoft Defender for Containers scanner for vulnerabilities
  5. Ved failure: Elegant exit fra deployment, fjern container

3. Vendor Security Assessment

Evaluering av tredjepartsleverandører

Når du bruker pre-trained models eller MLaaS-leverandører:

Vurderingskriterium Spørsmål
Model provenance Kan leverandøren dokumentere treningsdata og prosess?
Security practices Har de SOC 2 Type II / ISO 27001-sertifisering?
Data retention Brukes dine data til å trene deres modeller?
Compromise notification Har de en incident response plan og disclosure policy?
Access controls Kan du revoke access raskt ved mistanke om kompromittering?
Contractual safeguards Garanterer de mot bruk av copyrighted material?

Azure-spesifikke leverandører

Microsoft tilbyr verifiserte modeller via:

  • Azure Machine Learning Model Catalog: Curated models med security attestation
  • HuggingFace Registry i Azure: Integrert med Azure ML, med provenance tracking
# Deploy verifisert modell fra Azure ML registry
registry_name = "azureml"
model_name = "gpt-35-turbo"
model_version = "0301"

model_id = f"azureml://registries/{registry_name}/models/{model_name}/versions/{model_version}"

deployment = ManagedOnlineDeployment(
    name="verified-deployment",
    endpoint_name="secure-endpoint",
    model=model_id,
    instance_type="Standard_DS3_v2",
    instance_count=1
)

Red flags ved vendor assessment

  • Unnvikende om datakilder ("proprietary dataset")
  • Ingen dokumentasjon av security scanning
  • Manglende API rate limiting (øker risiko for model stealing)
  • Krever upload av sensitive treningsdata uten encryption garantier

4. Model Poisoning Prevention

Angrepsvektorer

Backdoor ML (MITRE ATT&CK: AML.T0050):

  • Malicious MLaaS provider trojaner modell med trigger som aktiverer ved deployment
  • Eksempel: Modell klassifiserer virus som "benign" når spesifikt filnavn inkluderes

Compromise Model Supply Chain (AML.T0020):

  • Adversary uploader poisoned models til public marketplaces (HuggingFace Hub, Caffe Model Zoo)
  • Modeller inneholder embedded logic som exfiltrerer data eller manipulerer outputs

Data Poisoning (AML.T0022):

  • Malicious data injisert under pre-training, fine-tuning, eller embedding
  • Eksempel: SQL injection i scrapet dataset → modell lærer å returnere falske resultater

Azure-kontroller for prevention

1. Centralized Model Approval Workflow

Implementer multi-stage approval via Azure Policy:

{
  "policyDefinitionName": "[Preview]: Azure Machine Learning Deployments should only use approved Registry Models",
  "effect": "Deny",
  "parameters": {
    "allowedPublishers": ["Microsoft", "OpenAI", "Meta"],
    "approvedAssetIds": [
      "azureml://registries/azureml/models/gpt-35-turbo/versions/0301",
      "azureml://registries/azureml-meta/models/Llama-2-7b/versions/18"
    ]
  }
}

Workflow:

  1. Data scientist registrerer modell i Azure ML workspace
  2. Automated security scanning: Hash verification, adversarial input testing
  3. Security team review: Validation av training data provenance
  4. Business owner approval: Sign-off før production deployment
  5. Azure Monitor logging: Comprehensive audit trail

2. Anomaly Detection på Training Data

Deploy Azure AI Anomaly Detector for å identifisere data poisoning:

from azure.ai.anomalydetector import AnomalyDetectorClient
from azure.core.credentials import AzureKeyCredential

anomaly_detector_client = AnomalyDetectorClient(
    endpoint="https://<resource-name>.cognitiveservices.azure.com",
    credential=AzureKeyCredential("<api-key>")
)

# Analyser time-series av training data metrics
response = anomaly_detector_client.detect_entire_series(
    body={
        "series": training_metrics,  # Loss, accuracy over time
        "granularity": "daily",
        "sensitivity": 95
    }
)

if response.is_anomaly:
    # Alert security team, quarantine dataset
    raise DataPoisoningAlert("Anomalous training metrics detected")

3. Model Integrity Validation

Implementer static analysis og adversarial robustness testing:

# Hash verification ved model loading
import hashlib

def verify_model_integrity(model_path, expected_hash):
    with open(model_path, 'rb') as f:
        file_hash = hashlib.sha256(f.read()).hexdigest()

    if file_hash != expected_hash:
        raise SecurityException("Model hash mismatch - possible tampering")

# Adversarial robustness testing (pre-approval)
from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import PyTorchClassifier

classifier = PyTorchClassifier(model=model, loss=loss_fn, input_shape=(3, 224, 224), nb_classes=10)
attack = FastGradientMethod(estimator=classifier, eps=0.1)

adversarial_samples = attack.generate(x=test_images)
adversarial_accuracy = evaluate(model, adversarial_samples)

if adversarial_accuracy < 0.5:
    raise SecurityException("Model vulnerable to adversarial attacks")

5. Software Bill of Materials (SBOM) for AI

Hva er AI SBOM?

Tradisjonelle SBOM-er (Software Bill of Materials) dekker ikke:

  • Model artifacts: Vekter, biases, arkitektur
  • Training datasets: Datasett-versjoner, opprinnelse
  • Experiment tracking: Hyperparametere, compute resources

AI SBOM er en utvidet BOM som inkluderer ML-komponenter.

Implementering i Azure ML

Azure ML gir delvis SBOM-funksjonalitet via:

  1. Model Registry Metadata:

    • Model name, version, tags, properties
    • Linked training job med full parameter logging
  2. Environment Registry:

    • Conda dependencies, pip packages, Docker base image
    • Cryptographic hash av environment definition
  3. Dataset Versioning:

    • Azure ML Data Assets med versjonering
    • Lineage tracking: Hvilke jobs brukte hvilket datasett

Manuell SBOM-generering

import json
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

def generate_ai_sbom(model_name, model_version):
    model = ml_client.models.get(name=model_name, version=model_version)

    # Hent training job metadata
    job_id = model.tags.get("training_job")
    job = ml_client.jobs.get(name=job_id)

    # Hent environment dependencies
    env_name = job.environment.name
    env_version = job.environment.version
    environment = ml_client.environments.get(name=env_name, version=env_version)

    sbom = {
        "model": {
            "name": model.name,
            "version": model.version,
            "hash": model.properties.get("sha256"),
            "created_date": model.creation_context.created_at.isoformat()
        },
        "training": {
            "job_id": job_id,
            "dataset": job.inputs.get("training_data"),
            "compute": job.compute,
            "hyperparameters": job.inputs
        },
        "dependencies": {
            "base_image": environment.image,
            "conda_packages": environment.conda_dependencies.get("dependencies", []),
            "pip_packages": environment.conda_dependencies.get("pip", [])
        }
    }

    with open(f"sbom_{model_name}_{model_version}.json", "w") as f:
        json.dump(sbom, f, indent=2)

    return sbom

SBOM i CI/CD Pipeline

Integrer SBOM-generering i deployment workflow:

# Azure DevOps Pipeline
- task: AzureCLI@2
  displayName: 'Generate AI SBOM'
  inputs:
    azureSubscription: 'service-connection'
    scriptType: 'bash'
    scriptLocation: 'inlineScript'
    inlineScript: |
      az ml model download --name fraud-detection --version 2.0 --download-path ./model
      python generate_sbom.py --model-name fraud-detection --version 2.0

- task: PublishBuildArtifacts@1
  inputs:
    PathtoPublish: 'sbom_fraud-detection_2.0.json'
    ArtifactName: 'ai-sbom'

6. Secure ML Supply Chain: Oppsummert Implementasjon

Architecture: Defense in Depth

┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Source Verification                                │
│ - Azure ML Model Catalog (curated models)                   │
│ - Package safe-listing (Microsoft Artifact Registry)        │
│ - Code signing for custom models                            │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Automated Security Validation                      │
│ - Dependency scanning (Azure DevOps Advanced Security)      │
│ - Container image scanning (Defender for Containers)        │
│ - Hash verification, adversarial robustness testing         │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Approval Workflow                                  │
│ - Multi-stage review (security team, business owner)        │
│ - Azure Policy enforcement (deny unapproved models)         │
│ - RBAC via Microsoft Entra ID (separation of duties)        │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 4: Monitoring & Response                              │
│ - Azure Monitor + Defender for AI (threat detection)        │
│ - Anomaly detection på model outputs                        │
│ - Audit trails for compliance (Azure Log Analytics)         │
└─────────────────────────────────────────────────────────────┘

Implementasjonssteg

  1. Week 1-2: Foundation

    • Aktiver Azure ML Model Registry for alle workspaces
    • Konfigurer Azure Policy: "[Preview]: Azure Machine Learning Deployments should only use approved Registry Models"
    • Opprett approval workflow (Azure DevOps Boards, Linear, eller ServiceNow)
  2. Week 3-4: Scanning Infrastructure

    • Aktiver GitHub Advanced Security for Azure DevOps (Dependency Scanning)
    • Deploy Microsoft Defender for Containers
    • Konfigurer automated testing pipeline (hash verification, adversarial tests)
  3. Week 5-6: SBOM & Provenance

    • Implementer AI SBOM-generering script
    • Integrer SBOM i CI/CD pipeline
    • Etabler dataset versioning practices (Azure ML Data Assets)
  4. Week 7-8: Monitoring & Response

    • Deploy Azure Monitor alerts for model registry events
    • Konfigurer Microsoft Defender for AI threat detection
    • Etabler incident response playbook for supply chain compromise

For Cosmo: Veiledning i Arkitekturdialog

Når klienten spør om AI supply chain security:

Diagnosespørsmål:

  1. "Bruker dere pre-trained models fra public repositories (HuggingFace, GitHub)?"
  2. "Har dere oversikt over alle Python-pakker som brukes i ML-miljøene?"
  3. "Hvordan verifiserer dere at en modell ikke er manipulert før deployment?"
  4. "Har dere noen gang opplevd at en dependency plutselig ble fjernet eller kompromittert?"

Risikovurdering:

  • Høy risiko: Public sector, healthcare, finance (PII/sensitive data i treningsdata)
  • Middels risiko: Generelle business applications uten kritisk påvirkning
  • Lav risiko: Prototyping/eksperimentering uten production deployment

Anbefalinger basert på modenhet:

Modenhetsnivå Implementering
Starter Azure ML Model Registry + Azure Policy for approved models
Intermediate + Dependency scanning (Azure DevOps) + Defender for Containers
Advanced + AI SBOM + Adversarial robustness testing + Anomaly detection
Expert + Homomorphic encryption for training + Zero-trust model serving

Red flags som krever umiddelbar oppmerksomhet:

  • ⚠️ Modeller lastes direkte fra GitHub/HuggingFace uten verifikasjon
  • ⚠️ Ingen versjonering av modeller eller datasett
  • ⚠️ Treningsdata kommer fra ukjente eksterne kilder
  • ⚠️ MLaaS-leverandør har ingen SOC 2 / ISO 27001-sertifisering
  • ⚠️ Ingen monitoring av model registry access events

Kostnadsestimering:

Komponent Estimat (NOK/måned)
Azure DevOps Advanced Security (Dependency Scanning) 5 000 - 15 000 (per aktiv committer)
Microsoft Defender for Containers 20 - 50 per container image (1000 images = 20 000 - 50 000)
Azure ML Model Registry Inkludert i workspace cost (0 tilleggskostnad)
Azure Monitor + Log Analytics 10 000 - 50 000 (avhenger av log volume)
Total baseline 35 000 - 130 000 NOK/måned

Referanser og Videre Lesning

Microsoft Documentation

MITRE ATT&CK Framework

Compliance Mappings

  • NIST SP 800-53 Rev. 5: SA-3, SA-10, SA-15 (System and Services Acquisition)
  • ISO 27001:2022: A.5.19 (Information security in supplier relationships), A.5.20 (Addressing information security within supplier agreements)
  • NIST Cybersecurity Framework v2.0: ID.SC-04 (Suppliers and third-party partners are identified, prioritized, and assessed), GV.SC-06 (Planning and due diligence performed to reduce risks from suppliers)

Tools & Frameworks


Sist oppdatert: 2026-02-05 Neste review: 2026-05-05 (eller ved store endringer i Azure ML supply chain features)