# GenAI-Specific APIM Policies & Rules
**Last updated:** 2026-02
**Status:** GA
**Category:** API Management & AI Gateway
---
## Introduksjon
Azure API Management (APIM) inkluderer et sett med policyer spesifikt designet for generativ AI (GenAI). Disse policyene går utover tradisjonell API-gateway-funksjonalitet og adresserer unike utfordringer ved AI-workloads: content safety-modererering, prompt-validering, token-basert rate limiting, semantic caching, og audit-logging av prompts og completions. Samlet utgjør de kjernen i APIM sin AI gateway-kapabilitet.
For norsk offentlig sektor er GenAI-spesifikke policyer kritisk viktige. Krav fra AI Act, Datatilsynet og NSM innebarer at AI-systemer må ha mekanismer for innholdssikkerhet, logging for etterprøvbarhet, og kontroll over hva slags innhold som genereres. APIM-policyer gir disse kontrollene uten at hver enkelt applikasjon må implementere dem selv — en sentralisert, konsistent tilnærming til AI governance.
Denne referansen dekker alle GenAI-spesifikke APIM-policyer med fullstendige XML-eksempler, konfigurasjonsparametre og best practices. Policyene kan kombineres fritt i APIM sin inbound/outbound policy pipeline for å bygge en komplett AI safety-stack.
---
## Content Safety Integration
### llm-content-safety Policy
Policyen sender LLM-forespørsler til Azure AI Content Safety for moderering FØR de videresendes til backend-modellen:
```xml
custom-blocklist-pii
custom-blocklist-org-specific
```
### Prerequisites for Content Safety
```bicep
// 1. Azure AI Content Safety ressurs
resource contentSafety 'Microsoft.CognitiveServices/accounts@2023-05-01' = {
name: 'content-safety-service'
location: 'westeurope'
kind: 'ContentSafety'
sku: { name: 'S0' }
properties: {
publicNetworkAccess: 'Disabled'
}
}
// 2. APIM Backend for Content Safety
resource contentSafetyBackend 'Microsoft.ApiManagement/service/backends@2023-09-01-preview' = {
name: 'ai-gateway-apim/content-safety-backend'
properties: {
url: 'https://content-safety-service.cognitiveservices.azure.com'
protocol: 'http'
credentials: {
authorization: {
scheme: 'managed-identity'
parameter: 'https://cognitiveservices.azure.com'
}
}
}
}
```
### Content Safety Konfigurasjon
| Attributt | Beskrivelse | Standard |
|-----------|-------------|---------|
| `backend-id` | Backend-entitet for Content Safety | Obligatorisk |
| `shield-prompt` | Sjekk for adversarial attacks (jailbreak) | `false` |
| `enforce-on-completions` | Sjekk også respons fra modellen | `false` |
### Kategorier og Terskelverider
| Kategori | Beskrivelse | Anbefalt terskel (offentlig sektor) |
|----------|-------------|-------------------------------------|
| `Hate` | Hatefullt innhold, diskriminering | 2-4 (streng) |
| `Violence` | Voldelig innhold | 2-4 (streng) |
| `SelfHarm` | Selvskading | 2 (svært streng) |
| `Sexual` | Seksuelt innhold | 2 (svært streng) |
**Terskelskala:** 0 (mest restriktiv) til 7 (minst restriktiv). Lavere verdi = flere forespørsler blokkeres.
### Severity Level Output Types
| Output Type | Nivåer | Bruksområde |
|------------|--------|------------|
| `FourSeverityLevels` | 0, 2, 4, 6 | Standard, enklere policy |
| `EightSeverityLevels` | 0-7 | Finkornet kontroll |
### Blokkert Request-respons
Når Content Safety blokkerer en forespørsel:
```json
{
"statusCode": 403,
"message": "Content safety violation detected. The request has been blocked."
}
```
---
## Prompt Validation Policies
### Custom Prompt Validation
Utover Azure AI Content Safety kan du implementere egne prompt-valideringsregler:
```xml
m["content"]?.ToString().Length ?? 0);
return totalLength > 50000;
}">
{
"error": {
"code": "PromptTooLong",
"message": "Total prompt length exceeds 50,000 characters."
}
}
m["role"]?.ToString() == "system");
}">
{
"error": {
"code": "SystemMessageRequired",
"message": "A system message is required for all AI requests."
}
}
m["role"]?.ToString() == "system").ToList();
return systemMessages.Count > 1;
}">
{
"error": {
"code": "MultipleSystemMessages",
"message": "Only one system message is allowed per request."
}
}
```
### Inject Mandatory System Prompt
Tving en standard system prompt for alle forespørsler:
```xml
@{
var body = context.Request.Body.As(preserveContent: true);
var messages = body["messages"] as JArray ?? new JArray();
// Organisasjonens mandatory system prompt
var orgSystemPrompt = new JObject {
["role"] = "system",
["content"] = "You are a helpful assistant for Statens vegvesen. " +
"You must respond in Norwegian unless explicitly asked otherwise. " +
"Never share personal data, internal processes, or confidential information. " +
"Always cite sources when providing factual information."
};
// Fjern eksisterende system messages og legg inn organisasjonens
var userMessages = new JArray(messages.Where(m => m["role"]?.ToString() != "system"));
userMessages.Insert(0, orgSystemPrompt);
body["messages"] = userMessages;
return body.ToString();
}
```
---
## Response Filtering
### Filtrere Sensitiv Informasjon fra Responser
```xml
@{
var body = context.Response.Body.As(preserveContent: true);
var choices = body?["choices"] as JArray;
if (choices == null) return body.ToString();
foreach (var choice in choices)
{
var content = choice["message"]?["content"]?.ToString();
if (content == null) continue;
// Fjern fødselsnumre (11 siffer)
content = System.Text.RegularExpressions.Regex.Replace(
content, @"\b\d{11}\b", "[REDACTED-PII]");
// Fjern e-postadresser
content = System.Text.RegularExpressions.Regex.Replace(
content, @"[\w.+-]+@[\w-]+\.[\w.-]+", "[REDACTED-EMAIL]");
// Fjern telefonnumre (norsk format)
content = System.Text.RegularExpressions.Regex.Replace(
content, @"\b(\+47)?\s?\d{2}\s?\d{2}\s?\d{2}\s?\d{2}\b", "[REDACTED-PHONE]");
choice["message"]["content"] = content;
}
return body.ToString();
}
```
### Legg til Disclaimer i Responser
```xml
AI-generated content. Verify information before use.
```
---
## Rate Limiting per Model
### Token Rate Limiting (llm-token-limit)
Begrens token-forbruk per forbruker, per modell:
```xml
```
### Token Quota (Periodisk)
Sett token-kvoter per dag, uke eller måned:
```xml
@(context.Variables.GetValueOrDefault("dailyRemaining").ToString())
```
### Prompt Token Pre-calculation
`estimate-prompt-tokens="true"` lar APIM estimere prompt-tokens FØR request sendes til backend. Hvis prompten allerede overskrider grensen, returneres 429 umiddelbart:
```
Med pre-calculation:
Klient → APIM (estimerer: 8000 tokens, grense: 5000) → 429 returnert
→ Ingen request til Azure OpenAI → sparer backend-kapasitet
Uten pre-calculation:
Klient → APIM → Azure OpenAI (bruker 8000 tokens) → Respons → APIM teller → Neste request: 429
→ Tokens allerede brukt
```
### Multi-Region Rate Limiting
**Viktig:** Rate limiting-policyer (`llm-token-limit`, `rate-limit`) teller SEPARAT per regional gateway i multi-region deployments:
| Policy | Scope | Multi-region oppførsel |
|--------|-------|----------------------|
| `llm-token-limit` | Per gateway | Separate tellere per region |
| `rate-limit` | Per gateway | Separate tellere per region |
| `quota` | Global (instans) | Én global teller |
| `quota-by-key` | Global (instans) | Én global teller |
For å oppnå global rate limiting, bruk `quota-by-key` i stedet for `llm-token-limit`.
---
## Audit Logging for Prompts
### Aktivere LLM API-logging
```
1. APIM → Monitoring → Diagnostic settings
2. "+ Add diagnostic setting"
3. Velg "Logs related to generative AI gateway"
4. Destination: Log Analytics workspace
5. Save
6. APIM → APIs → [din API] → Settings → Diagnostic Logs
7. Azure Monitor → Log LLM messages: Enabled
8. Log prompts: 32768 bytes
9. Log completions: 32768 bytes
10. Save
```
### Log-skjema: ApiManagementGatewayLlmLog
| Felt | Beskrivelse | Eksempel |
|------|-------------|---------|
| `TimeGenerated` | Tidspunkt for request | 2026-02-11T10:30:00Z |
| `CorrelationId` | Unik request-ID | abc-123-def |
| `OperationName` | API-operasjon | ChatCompletions |
| `ModelDeployment` | Deployment-navn | gpt-4o |
| `PromptTokens` | Antall prompt-tokens | 150 |
| `CompletionTokens` | Antall completion-tokens | 250 |
| `TotalTokens` | Totalt token-forbruk | 400 |
| `RequestMessages` | Prompt-innhold (JSON) | [{"role":"user","content":"..."}] |
| `ResponseMessages` | Completion-innhold (JSON) | [{"content":"..."}] |
### KQL: Audit Trail for AI-requests
```kusto
// Full audit trail med prompt og respons
ApiManagementGatewayLlmLog
| where TimeGenerated > ago(24h)
| extend RequestArray = parse_json(RequestMessages)
| extend ResponseArray = parse_json(ResponseMessages)
| mv-expand RequestArray
| mv-expand ResponseArray
| project
TimeGenerated,
CorrelationId,
Model = ModelDeployment,
PromptTokens,
CompletionTokens,
Prompt = tostring(RequestArray.content),
Response = tostring(ResponseArray.content)
| summarize
Input = strcat_array(make_list(Prompt), " "),
Output = strcat_array(make_list(Response), " ")
by CorrelationId, TimeGenerated, Model, PromptTokens, CompletionTokens
| where isnotempty(Input) and isnotempty(Output)
| order by TimeGenerated desc
```
### KQL: Detektere Anomalier
```kusto
// Finn uvanlig høy token-bruk per bruker
ApiManagementGatewayLlmLog
| where TimeGenerated > ago(7d)
| extend UserId = tostring(CustomDimensions["UserId"])
| summarize
AvgTokens = avg(toint(TotalTokens)),
MaxTokens = max(toint(TotalTokens)),
P95Tokens = percentile(toint(TotalTokens), 95),
RequestCount = count()
by UserId, bin(TimeGenerated, 1h)
| where MaxTokens > 3 * AvgTokens // Flagg anomalier
| order by MaxTokens desc
```
### Event Hub-logging for Real-time Monitoring
```xml
@{
var body = context.Response.Body.As(preserveContent: true);
var usage = body?["usage"];
return new JObject(
new JProperty("timestamp", DateTime.UtcNow.ToString("o")),
new JProperty("correlationId", context.RequestId),
new JProperty("subscriptionId", context.Subscription?.Id),
new JProperty("apiId", context.Api?.Id),
new JProperty("model", body?["model"]?.ToString()),
new JProperty("promptTokens", usage?["prompt_tokens"]),
new JProperty("completionTokens", usage?["completion_tokens"]),
new JProperty("totalTokens", usage?["total_tokens"]),
new JProperty("statusCode", context.Response.StatusCode),
new JProperty("region", context.Deployment.Region),
new JProperty("latencyMs", context.Elapsed.TotalMilliseconds)
).ToString();
}
```
---
## Komplett GenAI Policy Stack
### Full Inbound + Outbound Policy
```xml
@("Bearer " + (string)context.Variables["mi-token"])
{
"error": {
"code": "GatewayError",
"message": "An error occurred processing your AI request."
}
}
```
### Policy Execution Order
```
Inbound (fra topp til bunn):
1. Authentication (validate-azure-ad-token)
2. Variable extraction (set-variable)
3. Token rate limiting (llm-token-limit)
4. Content Safety (llm-content-safety)
5. Cache lookup (llm-semantic-cache-lookup)
6. Backend selection (set-backend-service)
7. Backend auth (authentication-managed-identity)
Backend:
8. Forward request (forward-request)
Outbound (fra topp til bunn):
9. Cache store (llm-semantic-cache-store)
10. Emit metrics (llm-emit-token-metric)
```
---
## GenAI Policy Referanse
### Alle GenAI-spesifikke Policyer
| Policy | Fase | Formål |
|--------|------|--------|
| `llm-content-safety` | Inbound | Content Safety moderering |
| `llm-token-limit` | Inbound | Token rate limiting |
| `llm-semantic-cache-lookup` | Inbound | Semantic cache oppslag |
| `llm-semantic-cache-store` | Outbound | Lagre i semantic cache |
| `llm-emit-token-metric` | Outbound | Emitter token-metriker |
### Kompatibilitet
| Policy | Classic | V2 | Consumption | Self-hosted | Workspace |
|--------|---------|-----|-------------|-------------|-----------|
| `llm-content-safety` | Ja | Ja | Ja | Ja | Ja |
| `llm-token-limit` | Ja | Ja | Ja | Ja | Ja |
| `llm-semantic-cache-lookup` | Ja | Ja | Nei | Nei | Ja |
| `llm-semantic-cache-store` | Ja | Ja | Nei | Nei | Ja |
| `llm-emit-token-metric` | Ja | Ja | Ja | Ja | Ja |
---
## Referanser
- [AI gateway in Azure API Management](https://learn.microsoft.com/en-us/azure/api-management/genai-gateway-capabilities) — Fullstendig oversikt over AI gateway-kapabiliteter
- [Enforce content safety checks on LLM requests](https://learn.microsoft.com/en-us/azure/api-management/llm-content-safety-policy) — llm-content-safety policy referanse
- [LLM token limit policy](https://learn.microsoft.com/en-us/azure/api-management/llm-token-limit-policy) — llm-token-limit policy referanse
- [llm-emit-token-metric policy](https://learn.microsoft.com/en-us/azure/api-management/llm-emit-token-metric-policy) — Token-metrikk policy referanse
- [llm-semantic-cache-lookup policy](https://learn.microsoft.com/en-us/azure/api-management/llm-semantic-cache-lookup-policy) — Semantic cache lookup referanse
- [llm-semantic-cache-store policy](https://learn.microsoft.com/en-us/azure/api-management/llm-semantic-cache-store-policy) — Semantic cache store referanse
- [Prompt Shields - Azure AI Content Safety](https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/jailbreak-detection) — Prompt Shield-dokumentasjon
- [Log token usage, prompts, and completions](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-llm-logs) — LLM-logging i APIM
---
## For Cosmo
- **Bruk denne referansen** når kunder trenger å implementere AI safety og governance gjennom APIM-policyer, spesielt content safety, prompt-validering og audit-logging.
- Den viktigste policyen for norsk offentlig sektor er `llm-content-safety` med `shield-prompt="true"` — dette blokkerer jailbreak-forsøk og uønsket innhold FØR det når modellen.
- Husk rekkefølgen: Autentisering FØRST, deretter rate limiting, SÅ content safety, SÅ cache lookup. Content Safety koster tokens (kall til Content Safety API) — cache lookup etter content safety betyr at cachen kun inneholder "godkjent" innhold.
- For audit-logging: Aktiver LLM API-logging i Diagnostic Settings. Dette gir full etterprøvbarhet for alle prompts og completions — noe som er påkrevd under AI Act for høy-risiko AI-systemer.
- Rate limiting per modell er viktig: GPT-4o er dyrere enn GPT-4o-mini, og bør ha strengere token-grenser for å kontrollere kostnader.