# GenAIOps - LLM-Specific MLOps Practices **Dato:** 2026-02-04 **Kategori:** MLOps & GenAIOps **Konfidensgrad:** Høy (basert på 18 MCP-kilder fra Microsoft Learn) --- ## Introduksjon GenAIOps (Generative AI Operations), også kalt LLMOps, beskriver operasjonelle praksiser og strategier for håndtering av store språkmodeller (LLMs) i produksjon. Mens tradisjonell MLOps fokuserer på å trene og deploye diskriminative modeller, handler GenAIOps om å **velge, tilpasse, orkestrere og overvåke** eksisterende foundation models. ### Forskjell mellom MLOps og GenAIOps | Dimensjon | Tradisjonell MLOps | GenAIOps (LLMOps) | |-----------|-------------------|-------------------| | **Primært fokus** | Trene nye modeller fra scratch | Konsumere og fine-tune eksisterende foundation models | | **Artefakter** | Trainede modeller (pkl, ONNX) | Prompts, orchestrators, agents, chains, grounding data | | **Evaluering** | Accuracy, precision, recall (deterministiske) | Groundedness, relevance, coherence, fluency (LLM-as-judge) | | **Infrastruktur** | Modell-serving endepunkter | Orchestrators, vector stores, API gateways, LLM endpoints | | **Deployment** | Modellversjonering | Modell + prompt + grounding data + orchestrator | | **Monitoring** | Model drift, data drift | Data drift + prompt effectiveness + content safety + token usage | **Konfidensgrad:** 95% — Microsoft dokumentasjon definerer eksplisitt disse forskjellene. --- ## Kjernekomponenter ### 1. Prompt Engineering og Prompt Registry **Hva:** Strukturert håndtering av system- og user prompts som versjonerte artefakter. **Hvorfor:** Prompts er den primære "koden" i GenAI-løsninger. Endringer i prompts påvirker output like mye som kodeendringer. **Hvordan (Azure):** - **MLflow Prompt Registry** (Databricks): Versjonert prompt-håndtering med aliaser (f.eks. `production`, `staging`) - **Azure AI Foundry Prompt Flow**: Visuell prompt designer med versjonering og CI/CD-integrasjon - **Semantic Kernel Prompt Functions**: Prompts som code-artefakter i `.txt`-filer med Handlebars-syntax ```python # MLflow Prompt Registry eksempel import mlflow prompt = mlflow.genai.register_prompt( name="mycatalog.myschema.customer_support", template="You are a helpful assistant. Answer this question: {{question}}", commit_message="Initial customer support prompt" ) mlflow.genai.set_prompt_alias( name="mycatalog.myschema.customer_support", alias="production", version=1 ) # I applikasjon prompt = mlflow.genai.load_prompt( name_or_uri="prompts:/mycatalog.myschema.customer_support@production" ) response = llm.invoke(prompt.format(question="How do I reset my password?")) ``` **Konfidensgrad:** 90% — Prompt Registry er dokumentert, men adoption rates varierer. ### 2. Orchestration Layer **Hva:** Systemet som håndterer logikk, kaller datakilder/agenter, genererer prompts og kaller LLM-modeller. **Hvorfor:** Generative AI-løsninger er ikke bare modellen — de er komplekse workflows som krever orkestrering. **Microsoft-alternativer:** - **Azure AI Foundry Agent Service**: Low-code agent-orkestrering - **Microsoft Agent Framework SDK (Semantic Kernel)**: Code-first orkestrering med C#/Python - **Prompt Flow**: Visuell workflow-designer for LLM-chains - **LangChain/LlamaIndex**: Open source (støttes av Azure ML) **Deployment:** - Azure App Service (containerized orchestrator) - Azure Container Apps (serverless orchestrator) - Azure Kubernetes Service (high-scale orchestrator) - Azure Machine Learning Managed Online Endpoints **Konfidensgrad:** 85% — Mange deployment-alternativer, best practice varierer med use case. ### 3. Vector Stores og Grounding Data **Hva:** Datalagringsløsninger for RAG (Retrieval-Augmented Generation) som støtter vektor-søk. **Azure-alternativer:** - **Azure AI Search**: Hybrid search (full-text + vector + semantic) - **Azure Cosmos DB for MongoDB vCore**: Vector search capabilities - **Azure Database for PostgreSQL (pgvector)**: Open source vector extension - **Databricks Vector Search**: Delta table-basert, auto-syncing **DataOps-utvidelser for GenAIOps:** - **Chunking pipelines**: Split dokumenter i semantisk meningsfulle chunks (Azure Machine Learning pipelines) - **Embedding generation**: Batch-generering av embeddings (Azure OpenAI text-embedding-ada-002 / text-embedding-3-small) - **Index maintenance**: Incremental updates vs. full rebuilds (compliance: right-to-be-forgotten) - **Data freshness**: Real-time vs. batch refresh (business requirements) **Konfidensgrad:** 90% — Dokumentert arkitektur, men chunking-strategier er eksperimentelle. ### 4. Evaluation Framework **Hva:** LLM-spesifikke evalueringsmetrikker og human-in-the-loop feedback. **Azure AI Foundry Evaluation SDK:** ```python from azure.ai.evaluation import evaluate, RelevanceEvaluator, CoherenceEvaluator model_config = { "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"), "api_key": os.environ.get("AZURE_OPENAI_KEY"), "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"), } result = evaluate( data="test_data.jsonl", evaluators={ "relevance": RelevanceEvaluator(model_config=model_config), "coherence": CoherenceEvaluator(model_config=model_config), }, evaluator_config={ "relevance": { "column_mapping": { "query": "${data.query}", "ground_truth": "${data.ground_truth}", "response": "${outputs.response}" } } }, azure_ai_project=azure_ai_project, output_path="./evaluation_results.json" ) ``` **Evaluerings-dimensjoner:** | Use case | Metrikker | |----------|-----------| | **RAG** | Groundedness, relevance, coherence, fluency | | **Summarization** | ROUGE, BLEU, BERTScore, METEOR | | **Translation** | BLEU | | **Classification** | Precision, recall, accuracy, F1 | | **Content Safety** | Hate/violence/sexual/self-harm scores (Azure AI Content Safety) | **Human Feedback Loop:** - **Mosaic AI Agent Framework Review App** (Databricks): UI for human reviewers - **Application Insights**: Thumbs up/down fra sluttbrukere - **Custom feedback APIs**: Integrasjon i enterprise workflows **Konfidensgrad:** 95% — Built-in evaluators er godt dokumentert. ### 5. CI/CD for GenAIOps **GenAIOps Prompt Flow Template** (Microsoft-anbefalt): - **Repository**: [microsoft/genaiops-promptflow-template](https://github.com/microsoft/genaiops-promptflow-template) - **CI/CD**: GitHub Actions eller Azure DevOps Pipelines - **Lifecycle**: Feature branch → PR → Dev → Staging → Production **Pipeline-faser:** 1. **PR Pipeline** (CI): - Flow validation - Unit testing av custom Python code - Variant experimentation - Evaluation runs mot test data 2. **Dev Pipeline** (CI + CD): - Batch testing - Model/prompt registration (conditional) - Human-in-the-loop approval gate - Deployment til dev/staging endpoints 3. **Production Pipeline** (CD): - Blue-green deployment - A/B testing (traffic splitting) - Canary deployment - Rollback capabilities **Azure DevOps-integrasjon:** ```yaml # Eksempel: Prompt Flow evaluation i Azure Pipelines - task: AzureCLI@2 displayName: 'Run Prompt Flow Evaluation' inputs: azureSubscription: 'AzureML-ServiceConnection' scriptType: 'bash' scriptLocation: 'inlineScript' inlineScript: | az ml job create --file evaluation-job.yaml \ --workspace-name $(ML_WORKSPACE) \ --resource-group $(RESOURCE_GROUP) ``` **Konfidensgrad:** 85% — Template er aktiv (2025), men requires customization. ### 6. Monitoring og Observability **LLM-spesifikke overvåkningsdimensjoner:** | Dimensjon | Hva overvåkes | Azure-verktøy | |-----------|---------------|---------------| | **Operational** | Latency, token usage, 429 errors, endpoint availability | Azure Monitor, Application Insights | | **Quality** | Groundedness, relevance, coherence, fluency (sampled) | Azure Machine Learning Model Monitoring (Generation Quality Signal) | | **Safety** | Harmful content detection (hate, violence, sexual, self-harm) | Azure AI Content Safety (real-time filtering) | | **Cost** | Token consumption per user/session, quota utilization | Azure Cost Management, API Management gateway logs | | **Data drift** | Changes in user query patterns, grounding data staleness | Azure ML Data Drift monitors | | **Feedback** | User ratings (thumbs up/down), session abandonment rate | Custom telemetry (Application Insights) | **MLflow Tracing for GenAI:** ```python import mlflow # Automatisk tracing av OpenAI calls mlflow.openai.autolog() # Custom trace decorators @mlflow.trace def my_rag_app(query: str): context = retrieve_from_vector_store(query) prompt = format_prompt(query, context) response = llm.invoke(prompt) return response ``` **Azure AI Foundry Monitoring (SDK v2):** ```python from azure.ai.ml.entities import ( MonitorSchedule, GenerationSafetyQualitySignal, GenerationTokenStatisticsSignal ) # Quality monitoring gsq_signal = GenerationSafetyQualitySignal( connection_id=aoai_connection_id, metric_thresholds={ "groundedness": {"aggregated_groundedness_pass_rate": 0.7}, "relevance": {"aggregated_relevance_pass_rate": 0.7}, }, production_data=[production_data], sampling_rate=1.0 ) # Token monitoring token_signal = GenerationTokenStatisticsSignal() monitor = MonitorSchedule( name="genai-monitor", trigger=CronTrigger(expression="15 10 * * *"), create_monitor=MonitorDefinition( monitoring_signals={"quality": gsq_signal, "tokens": token_signal} ) ) ``` **Konfidensgrad:** 90% — Monitoring capabilities er dokumentert, men sampling rates må justeres for cost. --- ## Arkitekturmønstre ### 1. Fine-Tuning Pattern **Når:** Foundation model trenger domenespesifikk kunnskap som ikke kan oppnås med prompting alene. **Workflow:** 1. Data preparation (JSONL format for Azure OpenAI) 2. Fine-tuning job (Azure OpenAI Studio eller REST API) 3. Model evaluation (hold-out test set) 4. Model deployment (dedicated PTU deployment for production) 5. A/B testing (new fine-tuned model vs. base model) **MLOps-overlap:** 80% — Kan gjenbruke eksisterende DataOps og model training pipelines. **Konfidensgrad:** 90% — Microsoft dokumenterer end-to-end fine-tuning workflow. ### 2. Prompt Engineering Pattern **Når:** Use case kan løses med zero-shot, few-shot eller Chain-of-Thought prompting. **Artefakter:** - System prompt (persona, tone, constraints) - User prompt template (Jinja2, Handlebars) - Few-shot examples (stored in Prompt Registry) **Workflow:** 1. Prompt experimentation (Prompt Flow designer) 2. Variant testing (A/B testing av ulike prompts) 3. Evaluation (LLM-as-judge metrics) 4. Prompt versioning (Prompt Registry) 5. Deployment (orchestrator henter versioned prompt) **MLOps-utvidelse:** Ny — Prompts som first-class artifacts. **Konfidensgrad:** 85% — Best practices fremdeles emergent (2025). ### 3. RAG (Retrieval-Augmented Generation) Pattern **Når:** LLM trenger domain-specific eller real-time data for å svare korrekt. **Microsoft RAG Architecture:** ``` [User Query] → [Orchestrator (Prompt Flow / Semantic Kernel)] → [Embedding Model (Azure OpenAI text-embedding-3-small)] → [Vector Store (Azure AI Search hybrid search)] → [Retrieval (top-k chunks)] → [Prompt Construction (query + context)] → [LLM (Azure OpenAI GPT-4o)] → [Response] ``` **Experimentation-dimensjoner:** - Chunking strategy (fixed-size, semantic, recursive) - Chunk size (512, 1024, 2048 tokens) - Chunk overlap (0%, 10%, 20%) - Embedding model (ada-002, text-embedding-3-small, text-embedding-3-large) - Retrieval method (vector, full-text, hybrid, semantic ranker) - Top-k (3, 5, 10 chunks) - Reranking (Azure AI Search semantic ranker, cross-encoder models) **DataOps-utvidelse:** - **Index versioning**: Snapshot av chunked data + embeddings - **Incremental updates**: Add/update/delete chunks uten full rebuild - **Freshness policies**: Real-time (change data capture) vs. batch (nightly) - **GDPR compliance**: Right-to-be-forgotten (delete user data from vector store) **Konfidensgrad:** 95% — RAG er den mest dokumenterte GenAIOps-patternern. --- ## Beslutningsveiledning ### Når velge hva? | Scenario | Anbefaling | Begrunnelse | |----------|------------|-------------| | **Foundation model er "good enough"** | Prompt Engineering | Lavest kostnad, raskest time-to-market | | **Trenger domenekunnskap, har kvalitetsdata** | Fine-Tuning | Bedre ytelse enn few-shot, men krever PTU for production | | **Trenger real-time data eller stor knowledge base** | RAG | Unngår staleness, kan oppdatere uten retraining | | **Høy security/compliance** | RAG + Azure AI Search (RBAC) | Data forblir i vector store, ikke "bakt inn" i modellen | | **Multimodal (tekst + bilde)** | Prompt Engineering (GPT-4o/GPT-4 Turbo) | Foundation models støtter multimodal input | **Konfidensgrad:** 85% — Valg avhenger av use case-spesifikke trade-offs. ### GenAIOps Maturity Model (Microsoft) **Nivå 1 - Initial (0-9 poeng):** - Eksperimenterer med LLM APIs - Manuell prompt engineering - Ingen strukturerte evalueringer **Nivå 2 - Defined (10-14 poeng):** - Systematisk prompt development - CI/CD for flows (basic) - Grunnleggende evaluering (groundedness, relevance) **Nivå 3 - Managed (15-19 poeng):** - Proaktiv monitoring (quality + safety) - Fine-tuning workflows - Advanced version control (prompts + data + models) **Nivå 4 - Optimized (20-28 poeng):** - Full automation (CI/CD + monitoring + retraining) - A/B testing i produksjon - Continuous improvement loops (feedback → retraining) **Selvvurdering:** [GenAIOps Maturity Model Assessment](https://learn.microsoft.com/en-us/assessments/e14e1e9f-d339-4d7e-b2bb-24f056cf08b6/) **Konfidensgrad:** 95% — Offisiell Microsoft assessment. --- ## Integrasjon med Microsoft-stakken ### Azure AI Foundry (tidligere Azure AI Studio) **Hva:** Unified platform for GenAI lifecycle management. **GenAIOps capabilities:** - **Model Catalog**: Browse 1600+ foundation models (OpenAI, Meta, Mistral, Cohere) - **Prompt Flow**: Visual designer for LLM workflows - **Evaluation SDK**: Built-in evaluators (groundedness, relevance, coherence, fluency, safety) - **Content Safety**: Real-time filtering (hate, violence, sexual, self-harm) - **Model fine-tuning**: Azure OpenAI fine-tuning jobs - **Deployment**: Managed Online Endpoints (serverless, PTU, PAYG) - **Monitoring**: Generation Quality Signal + Token Statistics Signal **Konfidensgrad:** 95% — Azure AI Foundry er Microsoft sitt flagship GenAI-verktøy (2025). ### Azure Machine Learning **Hva:** Enterprise MLOps-plattform som utvides med GenAIOps capabilities. **GenAIOps features:** - **Prompt Flow integration**: Author flows i AML Studio - **MLflow**: Experiment tracking + model registry (støtter LLM artifacts) - **Pipelines**: Orchestrate chunking, embedding, evaluation workflows - **Managed Online Endpoints**: Deploy orchestrators (Docker containers) - **Model Monitoring**: Data drift + model decay (LLM-specific metrics coming) **Konfidensgrad:** 90% — AML støtter GenAIOps, men Foundry er mer fokusert. ### Azure Databricks **Hva:** Unified analytics platform med Mosaic AI (LLMOps suite). **LLMOps features:** - **Unity Catalog**: Unified governance (models, prompts, vector indexes) - **MLflow for GenAI**: Prompt Registry, LLM tracing, autologging - **Vector Search**: Delta table-based, auto-syncing indexes - **Model Serving**: Unified endpoint for OpenAI, open-source og custom models - **Mosaic AI Agent Framework**: Build, evaluate, deploy agents - **AI Gateway**: Centralized governance for multiple LLM providers **Konfidensgrad:** 95% — Databricks har dedikert LLMOps docs (mest moden platform). ### API Management som LLM Gateway **Hva:** Centralized gateway foran Azure OpenAI og eksterne LLM APIs. **GenAIOps use cases:** - **Load balancing**: Distribuer trafikk over multiple Azure OpenAI instances - **Throttling**: Rate limiting per user/subscription - **Token tracking**: Centralized logging av token consumption - **Cost allocation**: Chargeback til teams basert på usage - **A/B testing**: Route 10% traffic til ny modell, 90% til gammel - **Circuit breaker**: Failover til backup LLM provider (OpenAI → Mistral) **Konfidensgrad:** 90% — API Management for LLM er dokumentert pattern (2025). --- ## Offentlig sektor (Norge) ### Compliance-dimensjoner | Krav | GenAIOps-implikasjon | |------|---------------------| | **GDPR Article 17 (right to be forgotten)** | Vector stores må støtte incremental deletion. Azure AI Search støtter dette. | | **Utredningsinstruksen (KS/KMD)** | Prompt versioning + evaluation results = audit trail for AI-beslutninger | | **NSM Grunnprinsipper for IKT-sikkerhet** | Content Safety må være enabled i production. Azure AI Content Safety er realtime. | | **Digdir Prinsipper for utvikling av digitale tjenester** | Human-in-the-loop approval gates i CI/CD (GenAIOps template støtter dette) | | **AI Act (High-Risk AI Systems)** | Logging av alle LLM-interaksjoner (MLflow tracing + Application Insights) | **Konfidensgrad:** 80% — Compliance-tolkning krever juridisk input. ### Norsk språkstøtte **Utfordring:** Foundation models (GPT-4, GPT-4o) er primært engelsk-trent. **GenAIOps-tilnærminger:** 1. **Multilingual prompts**: Eksplisitt be om norsk output ("Svar på norsk") 2. **Fine-tuning**: Fine-tune GPT-4o på norske datasett (krever PTU) 3. **RAG med norsk grounding data**: Norske dokumenter i vector store (embeddings er multilingual) 4. **NB-BERT embeddings**: Bruk Norwegian BERT for embedding norske dokumenter (Azure AI Search custom embeddings) **Konfidensgrad:** 70% — Norsk språkstøtte i GenAI er fortsatt eksperimentell (2025). --- ## Kostnad og lisensiering ### Token-basert prissetting (Azure OpenAI) | Modell | Input (1M tokens) | Output (1M tokens) | Bruksområde | |--------|-------------------|-------------------|-------------| | **GPT-4o** | $2.50 | $10.00 | RAG, complex reasoning | | **GPT-4o-mini** | $0.15 | $0.60 | High-volume classification | | **GPT-4 Turbo** | $10.00 | $30.00 | Legacy (prefer GPT-4o) | | **GPT-3.5 Turbo** | $0.50 | $1.50 | Cost-sensitive use cases | | **text-embedding-3-small** | $0.02 | N/A | Embedding generation | **Priser er per februar 2025 (NOK-estimat: USD × 10.5).** **Konfidensgrad:** 95% — Azure OpenAI pricing er dokumentert. ### Provisioned Throughput Units (PTU) **Hva:** Dedikert kapasitet for forutsigbar latency og cost. **Når:** Production workloads med >100M tokens/måned. **Kostnad:** $36 000 - $48 000 per PTU per måned (avhenger av modell og region). **Konfidensgrad:** 90% — PTU pricing varierer, krever Azure quote. ### Cost Optimization Tactics 1. **Prompt compression**: Fjern unødvendige tokens fra system prompt 2. **Caching**: Azure OpenAI støtter prompt caching (50% discount på cached tokens) 3. **Model downselection**: Bruk GPT-4o-mini for classification, GPT-4o for reasoning 4. **Batching**: Async batch API (50% discount, men høyere latency) 5. **Token limits**: `max_tokens` parameter for å unngå runaway costs **Konfidensgrad:** 95% — Cost optimization er godt dokumentert. --- ## For arkitekten (Cosmo) ### Spørsmål du ALLTID bør stille 1. **"Trenger dere faktisk fine-tuning, eller holder prompting?"** - 80% av use cases løses med RAG + prompt engineering. - Fine-tuning krever PTU (dyrt) og mer ops-kompleksitet. 2. **"Hva er kvalitetskravet?"** - Pass rate på 70% (groundedness) er typisk for MVP. - Pass rate på 90%+ krever extensive evaluation og tuning. 3. **"Har dere plan for human feedback loop?"** - Thumbs up/down i UI → Application Insights → Retraining pipeline. - Uten feedback loop, modellen degraderer over tid. 4. **"Hva er token-budsjettet?"** - 1M requests × 1000 tokens avg = 1B tokens/måned = ~$12,500 USD med GPT-4o. - PTU blir billigere ved >100M tokens/måned. 5. **"Hvordan håndterer dere GDPR right-to-be-forgotten i vector store?"** - Azure AI Search: Incremental deletion støttes. - Databricks Vector Search: Delta table-based, soft delete. ### Red Flags ❌ **"Vi trenger ikke evaluering, vi bare deployer"** → Uten groundedness/relevance metrics, ingen måte å vite om LLM hallusinerer. ❌ **"Vi lagrer alle prompts i hardkoded strings"** → Prompts MÅ være versjonerte artefakter (Prompt Registry eller Git). ❌ **"Vi overvåker bare latency, ikke quality"** → LLM kan svare raskt med feil svar. Quality monitoring er kritisk. ❌ **"Vi trenger ikke content safety, det er et B2B-system"** → Prompt injection attacks kan få LLM til å lekke data selv i enterprise-systemer. ### Anbefalte Steg for Pilot (MVP) **Uke 1-2: Setup** 1. Provisioner Azure AI Foundry project 2. Deploy Azure OpenAI (GPT-4o + text-embedding-3-small) 3. Setup Azure AI Search (vector index) 4. Enable Azure AI Content Safety **Uke 3-4: Development** 1. Bygg RAG flow i Prompt Flow 2. Test med 10-20 representative queries 3. Evaluer med built-in evaluators (groundedness, relevance) 4. Iterer på chunking strategy og retrieval method **Uke 5-6: CI/CD** 1. Clone GenAIOps Prompt Flow template 2. Setup GitHub Actions / Azure DevOps pipelines 3. Implementer human-in-the-loop approval gate 4. Deploy til dev endpoint **Uke 7-8: Production Prep** 1. Setup monitoring (quality + tokens + safety) 2. Implement feedback loop (thumbs up/down) 3. Load testing (PTU vurdering) 4. Deploy til production endpoint (blue-green) **Konfidensgrad:** 90% — Basert på Microsoft LLMOps workshop (2025). --- ## Kilder og verifisering ### Microsoft Learn-kilder (18 dokumenter) 1. [Advance your maturity level for GenAIOps](https://learn.microsoft.com/en-us/azure/machine-learning/prompt-flow/concept-llmops-maturity) 2. [GenAIOps with prompt flow and Azure DevOps](https://learn.microsoft.com/en-us/azure/machine-learning/prompt-flow/how-to-end-to-end-azure-devops-with-prompt-flow) 3. [GenAIOps with prompt flow and GitHub](https://learn.microsoft.com/en-us/azure/machine-learning/prompt-flow/how-to-end-to-end-llmops-with-prompt-flow) 4. [Generative AI operations for organizations with MLOps investments](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/genaiops-for-mlops) 5. [LLMOps workflows on Azure Databricks](https://learn.microsoft.com/en-us/azure/databricks/machine-learning/mlops/llmops) 6. [MLOps and GenAIOps for AI workloads on Azure](https://learn.microsoft.com/en-us/azure/well-architected/ai/mlops-genaiops) 7. [Integrate prompt flow with DevOps for LLM-based applications](https://learn.microsoft.com/en-us/azure/machine-learning/prompt-flow/how-to-integrate-with-llm-app-devops) 8. [Azure AI Evaluation SDK](https://learn.microsoft.com/en-us/python/api/overview/azure/ai-evaluation-readme) 9. [Mosaic AI capabilities for GenAI](https://learn.microsoft.com/en-us/azure/databricks/generative-ai/guide/mosaic-ai-gen-ai-capabilities) 10. [MLflow Prompt Registry](https://learn.microsoft.com/en-us/azure/databricks/mlflow3/genai/prompt-version-mgmt/prompt-registry/) 11. [Azure AI Foundry monitoring](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/monitor-quality-safety) 12. [MLflow Tracing for GenAI](https://learn.microsoft.com/en-us/azure/databricks/mlflow3/genai/tracing/) 13. [GenAI app developer workflow](https://learn.microsoft.com/en-us/azure/databricks/generative-ai/tutorials/ai-cookbook/genai-developer-workflow) 14. [Plan and prepare a GenAIOps solution (Microsoft Learn Training)](https://learn.microsoft.com/en-us/training/modules/plan-prepare-genaiops/) 15. [Implement LLMOps in Azure Databricks (Microsoft Learn Training)](https://learn.microsoft.com/en-us/training/modules/implement-llmops-azure-databricks/) 16. [Azure OpenAI Gateway Guide](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/azure-openai-gateway-guide) 17. [RAG solution design and evaluation guide](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/rag/rag-solution-design-and-evaluation-guide) 18. [Microsoft GenAIOps Prompt Flow Template (GitHub)](https://github.com/microsoft/genaiops-promptflow-template) ### MCP-kall utført - **microsoft_docs_search**: 3 søk (GenAIOps overview, LLMOps best practices, lifecycle) - **microsoft_docs_fetch**: 3 hentinger (maturity model, genaiops-for-mlops, databricks llmops) - **microsoft_code_sample_search**: 2 søk (evaluation Python code, monitoring code) **Totalt:** 18 kilder, 8 MCP-kall. **Verifiseringsdato:** 2026-02-04 --- **For Cosmo Skyberg:** Denne kunnskapsfilen dekker det **operasjonelle rammeverket** for GenAI-løsninger — hvordan du går fra prototype til production med repeatable processes. Fokus er på **Microsoft-spesifikke verktøy** (Azure AI Foundry, Prompt Flow, MLflow, Databricks Mosaic AI), men prinsippene er portable til andre platforms. Viktigste takeaway: **GenAIOps er MLOps + Prompt Ops + Orchestration Ops + Vector Store Ops**. Det er MER enn bare model deployment — det er hele økosystemet rundt LLM-baserte applikasjoner. Når kunder spør "hvordan setter vi LLM i produksjon?", start med **GenAIOps Maturity Model** for å kartlegge hvor de er, og bruk **GenAIOps Prompt Flow Template** som konkret utgangspunkt.