# GenAI-Specific APIM Policies & Rules **Last updated:** 2026-02 **Status:** GA **Category:** API Management & AI Gateway --- ## Introduksjon Azure API Management (APIM) inkluderer et sett med policyer spesifikt designet for generativ AI (GenAI). Disse policyene går utover tradisjonell API-gateway-funksjonalitet og adresserer unike utfordringer ved AI-workloads: content safety-modererering, prompt-validering, token-basert rate limiting, semantic caching, og audit-logging av prompts og completions. Samlet utgjør de kjernen i APIM sin AI gateway-kapabilitet. For norsk offentlig sektor er GenAI-spesifikke policyer kritisk viktige. Krav fra AI Act, Datatilsynet og NSM innebarer at AI-systemer må ha mekanismer for innholdssikkerhet, logging for etterprøvbarhet, og kontroll over hva slags innhold som genereres. APIM-policyer gir disse kontrollene uten at hver enkelt applikasjon må implementere dem selv — en sentralisert, konsistent tilnærming til AI governance. Denne referansen dekker alle GenAI-spesifikke APIM-policyer med fullstendige XML-eksempler, konfigurasjonsparametre og best practices. Policyene kan kombineres fritt i APIM sin inbound/outbound policy pipeline for å bygge en komplett AI safety-stack. --- ## Content Safety Integration ### llm-content-safety Policy Policyen sender LLM-forespørsler til Azure AI Content Safety for moderering FØR de videresendes til backend-modellen: ```xml custom-blocklist-pii custom-blocklist-org-specific ``` ### Prerequisites for Content Safety ```bicep // 1. Azure AI Content Safety ressurs resource contentSafety 'Microsoft.CognitiveServices/accounts@2023-05-01' = { name: 'content-safety-service' location: 'westeurope' kind: 'ContentSafety' sku: { name: 'S0' } properties: { publicNetworkAccess: 'Disabled' } } // 2. APIM Backend for Content Safety resource contentSafetyBackend 'Microsoft.ApiManagement/service/backends@2023-09-01-preview' = { name: 'ai-gateway-apim/content-safety-backend' properties: { url: 'https://content-safety-service.cognitiveservices.azure.com' protocol: 'http' credentials: { authorization: { scheme: 'managed-identity' parameter: 'https://cognitiveservices.azure.com' } } } } ``` ### Content Safety Konfigurasjon | Attributt | Beskrivelse | Standard | |-----------|-------------|---------| | `backend-id` | Backend-entitet for Content Safety | Obligatorisk | | `shield-prompt` | Sjekk for adversarial attacks (jailbreak) | `false` | | `enforce-on-completions` | Sjekk også respons fra modellen | `false` | ### Kategorier og Terskelverider | Kategori | Beskrivelse | Anbefalt terskel (offentlig sektor) | |----------|-------------|-------------------------------------| | `Hate` | Hatefullt innhold, diskriminering | 2-4 (streng) | | `Violence` | Voldelig innhold | 2-4 (streng) | | `SelfHarm` | Selvskading | 2 (svært streng) | | `Sexual` | Seksuelt innhold | 2 (svært streng) | **Terskelskala:** 0 (mest restriktiv) til 7 (minst restriktiv). Lavere verdi = flere forespørsler blokkeres. ### Severity Level Output Types | Output Type | Nivåer | Bruksområde | |------------|--------|------------| | `FourSeverityLevels` | 0, 2, 4, 6 | Standard, enklere policy | | `EightSeverityLevels` | 0-7 | Finkornet kontroll | ### Blokkert Request-respons Når Content Safety blokkerer en forespørsel: ```json { "statusCode": 403, "message": "Content safety violation detected. The request has been blocked." } ``` --- ## Prompt Validation Policies ### Custom Prompt Validation Utover Azure AI Content Safety kan du implementere egne prompt-valideringsregler: ```xml m["content"]?.ToString().Length ?? 0); return totalLength > 50000; }"> { "error": { "code": "PromptTooLong", "message": "Total prompt length exceeds 50,000 characters." } } m["role"]?.ToString() == "system"); }"> { "error": { "code": "SystemMessageRequired", "message": "A system message is required for all AI requests." } } m["role"]?.ToString() == "system").ToList(); return systemMessages.Count > 1; }"> { "error": { "code": "MultipleSystemMessages", "message": "Only one system message is allowed per request." } } ``` ### Inject Mandatory System Prompt Tving en standard system prompt for alle forespørsler: ```xml @{ var body = context.Request.Body.As(preserveContent: true); var messages = body["messages"] as JArray ?? new JArray(); // Organisasjonens mandatory system prompt var orgSystemPrompt = new JObject { ["role"] = "system", ["content"] = "You are a helpful assistant for Statens vegvesen. " + "You must respond in Norwegian unless explicitly asked otherwise. " + "Never share personal data, internal processes, or confidential information. " + "Always cite sources when providing factual information." }; // Fjern eksisterende system messages og legg inn organisasjonens var userMessages = new JArray(messages.Where(m => m["role"]?.ToString() != "system")); userMessages.Insert(0, orgSystemPrompt); body["messages"] = userMessages; return body.ToString(); } ``` --- ## Response Filtering ### Filtrere Sensitiv Informasjon fra Responser ```xml @{ var body = context.Response.Body.As(preserveContent: true); var choices = body?["choices"] as JArray; if (choices == null) return body.ToString(); foreach (var choice in choices) { var content = choice["message"]?["content"]?.ToString(); if (content == null) continue; // Fjern fødselsnumre (11 siffer) content = System.Text.RegularExpressions.Regex.Replace( content, @"\b\d{11}\b", "[REDACTED-PII]"); // Fjern e-postadresser content = System.Text.RegularExpressions.Regex.Replace( content, @"[\w.+-]+@[\w-]+\.[\w.-]+", "[REDACTED-EMAIL]"); // Fjern telefonnumre (norsk format) content = System.Text.RegularExpressions.Regex.Replace( content, @"\b(\+47)?\s?\d{2}\s?\d{2}\s?\d{2}\s?\d{2}\b", "[REDACTED-PHONE]"); choice["message"]["content"] = content; } return body.ToString(); } ``` ### Legg til Disclaimer i Responser ```xml AI-generated content. Verify information before use. ``` --- ## Rate Limiting per Model ### Token Rate Limiting (llm-token-limit) Begrens token-forbruk per forbruker, per modell: ```xml ``` ### Token Quota (Periodisk) Sett token-kvoter per dag, uke eller måned: ```xml @(context.Variables.GetValueOrDefault("dailyRemaining").ToString()) ``` ### Prompt Token Pre-calculation `estimate-prompt-tokens="true"` lar APIM estimere prompt-tokens FØR request sendes til backend. Hvis prompten allerede overskrider grensen, returneres 429 umiddelbart: ``` Med pre-calculation: Klient → APIM (estimerer: 8000 tokens, grense: 5000) → 429 returnert → Ingen request til Azure OpenAI → sparer backend-kapasitet Uten pre-calculation: Klient → APIM → Azure OpenAI (bruker 8000 tokens) → Respons → APIM teller → Neste request: 429 → Tokens allerede brukt ``` ### Multi-Region Rate Limiting **Viktig:** Rate limiting-policyer (`llm-token-limit`, `rate-limit`) teller SEPARAT per regional gateway i multi-region deployments: | Policy | Scope | Multi-region oppførsel | |--------|-------|----------------------| | `llm-token-limit` | Per gateway | Separate tellere per region | | `rate-limit` | Per gateway | Separate tellere per region | | `quota` | Global (instans) | Én global teller | | `quota-by-key` | Global (instans) | Én global teller | For å oppnå global rate limiting, bruk `quota-by-key` i stedet for `llm-token-limit`. --- ## Audit Logging for Prompts ### Aktivere LLM API-logging ``` 1. APIM → Monitoring → Diagnostic settings 2. "+ Add diagnostic setting" 3. Velg "Logs related to generative AI gateway" 4. Destination: Log Analytics workspace 5. Save 6. APIM → APIs → [din API] → Settings → Diagnostic Logs 7. Azure Monitor → Log LLM messages: Enabled 8. Log prompts: 32768 bytes 9. Log completions: 32768 bytes 10. Save ``` ### Log-skjema: ApiManagementGatewayLlmLog | Felt | Beskrivelse | Eksempel | |------|-------------|---------| | `TimeGenerated` | Tidspunkt for request | 2026-02-11T10:30:00Z | | `CorrelationId` | Unik request-ID | abc-123-def | | `OperationName` | API-operasjon | ChatCompletions | | `ModelDeployment` | Deployment-navn | gpt-4o | | `PromptTokens` | Antall prompt-tokens | 150 | | `CompletionTokens` | Antall completion-tokens | 250 | | `TotalTokens` | Totalt token-forbruk | 400 | | `RequestMessages` | Prompt-innhold (JSON) | [{"role":"user","content":"..."}] | | `ResponseMessages` | Completion-innhold (JSON) | [{"content":"..."}] | ### KQL: Audit Trail for AI-requests ```kusto // Full audit trail med prompt og respons ApiManagementGatewayLlmLog | where TimeGenerated > ago(24h) | extend RequestArray = parse_json(RequestMessages) | extend ResponseArray = parse_json(ResponseMessages) | mv-expand RequestArray | mv-expand ResponseArray | project TimeGenerated, CorrelationId, Model = ModelDeployment, PromptTokens, CompletionTokens, Prompt = tostring(RequestArray.content), Response = tostring(ResponseArray.content) | summarize Input = strcat_array(make_list(Prompt), " "), Output = strcat_array(make_list(Response), " ") by CorrelationId, TimeGenerated, Model, PromptTokens, CompletionTokens | where isnotempty(Input) and isnotempty(Output) | order by TimeGenerated desc ``` ### KQL: Detektere Anomalier ```kusto // Finn uvanlig høy token-bruk per bruker ApiManagementGatewayLlmLog | where TimeGenerated > ago(7d) | extend UserId = tostring(CustomDimensions["UserId"]) | summarize AvgTokens = avg(toint(TotalTokens)), MaxTokens = max(toint(TotalTokens)), P95Tokens = percentile(toint(TotalTokens), 95), RequestCount = count() by UserId, bin(TimeGenerated, 1h) | where MaxTokens > 3 * AvgTokens // Flagg anomalier | order by MaxTokens desc ``` ### Event Hub-logging for Real-time Monitoring ```xml @{ var body = context.Response.Body.As(preserveContent: true); var usage = body?["usage"]; return new JObject( new JProperty("timestamp", DateTime.UtcNow.ToString("o")), new JProperty("correlationId", context.RequestId), new JProperty("subscriptionId", context.Subscription?.Id), new JProperty("apiId", context.Api?.Id), new JProperty("model", body?["model"]?.ToString()), new JProperty("promptTokens", usage?["prompt_tokens"]), new JProperty("completionTokens", usage?["completion_tokens"]), new JProperty("totalTokens", usage?["total_tokens"]), new JProperty("statusCode", context.Response.StatusCode), new JProperty("region", context.Deployment.Region), new JProperty("latencyMs", context.Elapsed.TotalMilliseconds) ).ToString(); } ``` --- ## Komplett GenAI Policy Stack ### Full Inbound + Outbound Policy ```xml @("Bearer " + (string)context.Variables["mi-token"]) { "error": { "code": "GatewayError", "message": "An error occurred processing your AI request." } } ``` ### Policy Execution Order ``` Inbound (fra topp til bunn): 1. Authentication (validate-azure-ad-token) 2. Variable extraction (set-variable) 3. Token rate limiting (llm-token-limit) 4. Content Safety (llm-content-safety) 5. Cache lookup (llm-semantic-cache-lookup) 6. Backend selection (set-backend-service) 7. Backend auth (authentication-managed-identity) Backend: 8. Forward request (forward-request) Outbound (fra topp til bunn): 9. Cache store (llm-semantic-cache-store) 10. Emit metrics (llm-emit-token-metric) ``` --- ## GenAI Policy Referanse ### Alle GenAI-spesifikke Policyer | Policy | Fase | Formål | |--------|------|--------| | `llm-content-safety` | Inbound | Content Safety moderering | | `llm-token-limit` | Inbound | Token rate limiting | | `llm-semantic-cache-lookup` | Inbound | Semantic cache oppslag | | `llm-semantic-cache-store` | Outbound | Lagre i semantic cache | | `llm-emit-token-metric` | Outbound | Emitter token-metriker | ### Kompatibilitet | Policy | Classic | V2 | Consumption | Self-hosted | Workspace | |--------|---------|-----|-------------|-------------|-----------| | `llm-content-safety` | Ja | Ja | Ja | Ja | Ja | | `llm-token-limit` | Ja | Ja | Ja | Ja | Ja | | `llm-semantic-cache-lookup` | Ja | Ja | Nei | Nei | Ja | | `llm-semantic-cache-store` | Ja | Ja | Nei | Nei | Ja | | `llm-emit-token-metric` | Ja | Ja | Ja | Ja | Ja | --- ## Referanser - [AI gateway in Azure API Management](https://learn.microsoft.com/en-us/azure/api-management/genai-gateway-capabilities) — Fullstendig oversikt over AI gateway-kapabiliteter - [Enforce content safety checks on LLM requests](https://learn.microsoft.com/en-us/azure/api-management/llm-content-safety-policy) — llm-content-safety policy referanse - [LLM token limit policy](https://learn.microsoft.com/en-us/azure/api-management/llm-token-limit-policy) — llm-token-limit policy referanse - [llm-emit-token-metric policy](https://learn.microsoft.com/en-us/azure/api-management/llm-emit-token-metric-policy) — Token-metrikk policy referanse - [llm-semantic-cache-lookup policy](https://learn.microsoft.com/en-us/azure/api-management/llm-semantic-cache-lookup-policy) — Semantic cache lookup referanse - [llm-semantic-cache-store policy](https://learn.microsoft.com/en-us/azure/api-management/llm-semantic-cache-store-policy) — Semantic cache store referanse - [Prompt Shields - Azure AI Content Safety](https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/jailbreak-detection) — Prompt Shield-dokumentasjon - [Log token usage, prompts, and completions](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-llm-logs) — LLM-logging i APIM --- ## For Cosmo - **Bruk denne referansen** når kunder trenger å implementere AI safety og governance gjennom APIM-policyer, spesielt content safety, prompt-validering og audit-logging. - Den viktigste policyen for norsk offentlig sektor er `llm-content-safety` med `shield-prompt="true"` — dette blokkerer jailbreak-forsøk og uønsket innhold FØR det når modellen. - Husk rekkefølgen: Autentisering FØRST, deretter rate limiting, SÅ content safety, SÅ cache lookup. Content Safety koster tokens (kall til Content Safety API) — cache lookup etter content safety betyr at cachen kun inneholder "godkjent" innhold. - For audit-logging: Aktiver LLM API-logging i Diagnostic Settings. Dette gir full etterprøvbarhet for alle prompts og completions — noe som er påkrevd under AI Act for høy-risiko AI-systemer. - Rate limiting per modell er viktig: GPT-4o er dyrere enn GPT-4o-mini, og bør ha strengere token-grenser for å kontrollere kostnader.