Initial addition of ms-ai-architect plugin to the open-source marketplace. Private content excluded: orchestrator/ (Linear tooling), docs/utredning/ (client investigation), generated test reports and PDF export script. skill-gen tooling moved from orchestrator/ to scripts/skill-gen/. Security scan: WARNING (risk 20/100) — no secrets, no injection found. False positive fixed: added gitleaks:allow to Python variable reference in output-validation-grounding-verification.md line 109. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
22 KiB
GenAI-Specific APIM Policies & Rules
Last updated: 2026-02 Status: GA Category: API Management & AI Gateway
Introduksjon
Azure API Management (APIM) inkluderer et sett med policyer spesifikt designet for generativ AI (GenAI). Disse policyene går utover tradisjonell API-gateway-funksjonalitet og adresserer unike utfordringer ved AI-workloads: content safety-modererering, prompt-validering, token-basert rate limiting, semantic caching, og audit-logging av prompts og completions. Samlet utgjør de kjernen i APIM sin AI gateway-kapabilitet.
For norsk offentlig sektor er GenAI-spesifikke policyer kritisk viktige. Krav fra AI Act, Datatilsynet og NSM innebarer at AI-systemer må ha mekanismer for innholdssikkerhet, logging for etterprøvbarhet, og kontroll over hva slags innhold som genereres. APIM-policyer gir disse kontrollene uten at hver enkelt applikasjon må implementere dem selv — en sentralisert, konsistent tilnærming til AI governance.
Denne referansen dekker alle GenAI-spesifikke APIM-policyer med fullstendige XML-eksempler, konfigurasjonsparametre og best practices. Policyene kan kombineres fritt i APIM sin inbound/outbound policy pipeline for å bygge en komplett AI safety-stack.
Content Safety Integration
llm-content-safety Policy
Policyen sender LLM-forespørsler til Azure AI Content Safety for moderering FØR de videresendes til backend-modellen:
<inbound>
<base />
<llm-content-safety backend-id="content-safety-backend"
shield-prompt="true"
enforce-on-completions="true">
<categories output-type="EightSeverityLevels">
<category name="Hate" threshold="4" />
<category name="Violence" threshold="4" />
<category name="SelfHarm" threshold="2" />
<category name="Sexual" threshold="2" />
</categories>
<blocklists>
<id>custom-blocklist-pii</id>
<id>custom-blocklist-org-specific</id>
</blocklists>
</llm-content-safety>
</inbound>
Prerequisites for Content Safety
// 1. Azure AI Content Safety ressurs
resource contentSafety 'Microsoft.CognitiveServices/accounts@2023-05-01' = {
name: 'content-safety-service'
location: 'westeurope'
kind: 'ContentSafety'
sku: { name: 'S0' }
properties: {
publicNetworkAccess: 'Disabled'
}
}
// 2. APIM Backend for Content Safety
resource contentSafetyBackend 'Microsoft.ApiManagement/service/backends@2023-09-01-preview' = {
name: 'ai-gateway-apim/content-safety-backend'
properties: {
url: 'https://content-safety-service.cognitiveservices.azure.com'
protocol: 'http'
credentials: {
authorization: {
scheme: 'managed-identity'
parameter: 'https://cognitiveservices.azure.com'
}
}
}
}
Content Safety Konfigurasjon
| Attributt | Beskrivelse | Standard |
|---|---|---|
backend-id |
Backend-entitet for Content Safety | Obligatorisk |
shield-prompt |
Sjekk for adversarial attacks (jailbreak) | false |
enforce-on-completions |
Sjekk også respons fra modellen | false |
Kategorier og Terskelverider
| Kategori | Beskrivelse | Anbefalt terskel (offentlig sektor) |
|---|---|---|
Hate |
Hatefullt innhold, diskriminering | 2-4 (streng) |
Violence |
Voldelig innhold | 2-4 (streng) |
SelfHarm |
Selvskading | 2 (svært streng) |
Sexual |
Seksuelt innhold | 2 (svært streng) |
Terskelskala: 0 (mest restriktiv) til 7 (minst restriktiv). Lavere verdi = flere forespørsler blokkeres.
Severity Level Output Types
| Output Type | Nivåer | Bruksområde |
|---|---|---|
FourSeverityLevels |
0, 2, 4, 6 | Standard, enklere policy |
EightSeverityLevels |
0-7 | Finkornet kontroll |
Blokkert Request-respons
Når Content Safety blokkerer en forespørsel:
{
"statusCode": 403,
"message": "Content safety violation detected. The request has been blocked."
}
Prompt Validation Policies
Custom Prompt Validation
Utover Azure AI Content Safety kan du implementere egne prompt-valideringsregler:
<inbound>
<base />
<!-- Valider prompt-lengde -->
<set-variable name="request-body" value="@{
return context.Request.Body.As<JObject>(preserveContent: true);
}" />
<choose>
<!-- Blokkér ekstremt lange prompts -->
<when condition="@{
var body = (JObject)context.Variables["request-body"];
var messages = body?["messages"] as JArray;
if (messages == null) return false;
var totalLength = messages.Sum(m => m["content"]?.ToString().Length ?? 0);
return totalLength > 50000;
}">
<return-response>
<set-status code="400" reason="Bad Request" />
<set-body>{
"error": {
"code": "PromptTooLong",
"message": "Total prompt length exceeds 50,000 characters."
}
}</set-body>
</return-response>
</when>
<!-- Blokkér forespørsler uten system message -->
<when condition="@{
var body = (JObject)context.Variables["request-body"];
var messages = body?["messages"] as JArray;
if (messages == null) return true;
return !messages.Any(m => m["role"]?.ToString() == "system");
}">
<return-response>
<set-status code="400" reason="Bad Request" />
<set-body>{
"error": {
"code": "SystemMessageRequired",
"message": "A system message is required for all AI requests."
}
}</set-body>
</return-response>
</when>
<!-- Blokkér forsøk på å overstyre system message -->
<when condition="@{
var body = (JObject)context.Variables["request-body"];
var messages = body?["messages"] as JArray;
if (messages == null) return false;
var systemMessages = messages.Where(m => m["role"]?.ToString() == "system").ToList();
return systemMessages.Count > 1;
}">
<return-response>
<set-status code="400" reason="Bad Request" />
<set-body>{
"error": {
"code": "MultipleSystemMessages",
"message": "Only one system message is allowed per request."
}
}</set-body>
</return-response>
</when>
</choose>
</inbound>
Inject Mandatory System Prompt
Tving en standard system prompt for alle forespørsler:
<inbound>
<base />
<!-- Injiser organisasjonens standard system prompt -->
<set-body>@{
var body = context.Request.Body.As<JObject>(preserveContent: true);
var messages = body["messages"] as JArray ?? new JArray();
// Organisasjonens mandatory system prompt
var orgSystemPrompt = new JObject {
["role"] = "system",
["content"] = "You are a helpful assistant for Statens vegvesen. " +
"You must respond in Norwegian unless explicitly asked otherwise. " +
"Never share personal data, internal processes, or confidential information. " +
"Always cite sources when providing factual information."
};
// Fjern eksisterende system messages og legg inn organisasjonens
var userMessages = new JArray(messages.Where(m => m["role"]?.ToString() != "system"));
userMessages.Insert(0, orgSystemPrompt);
body["messages"] = userMessages;
return body.ToString();
}</set-body>
</inbound>
Response Filtering
Filtrere Sensitiv Informasjon fra Responser
<outbound>
<base />
<!-- Fjern potensielle PII-lekkasjer fra AI-respons -->
<choose>
<when condition="@(!context.Response.Headers.GetValueOrDefault("Content-Type","")
.Contains("text/event-stream"))">
<set-body>@{
var body = context.Response.Body.As<JObject>(preserveContent: true);
var choices = body?["choices"] as JArray;
if (choices == null) return body.ToString();
foreach (var choice in choices)
{
var content = choice["message"]?["content"]?.ToString();
if (content == null) continue;
// Fjern fødselsnumre (11 siffer)
content = System.Text.RegularExpressions.Regex.Replace(
content, @"\b\d{11}\b", "[REDACTED-PII]");
// Fjern e-postadresser
content = System.Text.RegularExpressions.Regex.Replace(
content, @"[\w.+-]+@[\w-]+\.[\w.-]+", "[REDACTED-EMAIL]");
// Fjern telefonnumre (norsk format)
content = System.Text.RegularExpressions.Regex.Replace(
content, @"\b(\+47)?\s?\d{2}\s?\d{2}\s?\d{2}\s?\d{2}\b", "[REDACTED-PHONE]");
choice["message"]["content"] = content;
}
return body.ToString();
}</set-body>
</when>
</choose>
</outbound>
Legg til Disclaimer i Responser
<outbound>
<base />
<set-header name="X-AI-Disclaimer" exists-action="override">
<value>AI-generated content. Verify information before use.</value>
</set-header>
</outbound>
Rate Limiting per Model
Token Rate Limiting (llm-token-limit)
Begrens token-forbruk per forbruker, per modell:
<inbound>
<base />
<!-- Global token-grense per subscription -->
<llm-token-limit
counter-key="@(context.Subscription.Id)"
tokens-per-minute="10000"
estimate-prompt-tokens="true"
remaining-tokens-variable-name="remainingTokens" />
<!-- Ekstra grense per modell -->
<choose>
<when condition="@(context.Request.MatchedParameters["deployment-id"] == "gpt-4o")">
<llm-token-limit
counter-key="@("gpt4o-" + context.Subscription.Id)"
tokens-per-minute="5000"
estimate-prompt-tokens="true" />
</when>
<when condition="@(context.Request.MatchedParameters["deployment-id"] == "gpt-4o-mini")">
<llm-token-limit
counter-key="@("gpt4omini-" + context.Subscription.Id)"
tokens-per-minute="20000"
estimate-prompt-tokens="true" />
</when>
</choose>
</inbound>
Token Quota (Periodisk)
Sett token-kvoter per dag, uke eller måned:
<inbound>
<base />
<!-- Daglig token-kvote per avdeling -->
<llm-token-limit
counter-key="@(context.Request.Headers.GetValueOrDefault("X-Department", "default"))"
tokens-per-minute="0"
token-quota="100000"
token-quota-period="86400"
estimate-prompt-tokens="true"
remaining-tokens-variable-name="dailyRemaining" />
<!-- Legg til gjenværende kvote i respons-header -->
<set-header name="X-Daily-Tokens-Remaining" exists-action="override">
<value>@(context.Variables.GetValueOrDefault<int>("dailyRemaining").ToString())</value>
</set-header>
</inbound>
Prompt Token Pre-calculation
estimate-prompt-tokens="true" lar APIM estimere prompt-tokens FØR request sendes til backend. Hvis prompten allerede overskrider grensen, returneres 429 umiddelbart:
Med pre-calculation:
Klient → APIM (estimerer: 8000 tokens, grense: 5000) → 429 returnert
→ Ingen request til Azure OpenAI → sparer backend-kapasitet
Uten pre-calculation:
Klient → APIM → Azure OpenAI (bruker 8000 tokens) → Respons → APIM teller → Neste request: 429
→ Tokens allerede brukt
Multi-Region Rate Limiting
Viktig: Rate limiting-policyer (llm-token-limit, rate-limit) teller SEPARAT per regional gateway i multi-region deployments:
| Policy | Scope | Multi-region oppførsel |
|---|---|---|
llm-token-limit |
Per gateway | Separate tellere per region |
rate-limit |
Per gateway | Separate tellere per region |
quota |
Global (instans) | Én global teller |
quota-by-key |
Global (instans) | Én global teller |
For å oppnå global rate limiting, bruk quota-by-key i stedet for llm-token-limit.
Audit Logging for Prompts
Aktivere LLM API-logging
1. APIM → Monitoring → Diagnostic settings
2. "+ Add diagnostic setting"
3. Velg "Logs related to generative AI gateway"
4. Destination: Log Analytics workspace
5. Save
6. APIM → APIs → [din API] → Settings → Diagnostic Logs
7. Azure Monitor → Log LLM messages: Enabled
8. Log prompts: 32768 bytes
9. Log completions: 32768 bytes
10. Save
Log-skjema: ApiManagementGatewayLlmLog
| Felt | Beskrivelse | Eksempel |
|---|---|---|
TimeGenerated |
Tidspunkt for request | 2026-02-11T10:30:00Z |
CorrelationId |
Unik request-ID | abc-123-def |
OperationName |
API-operasjon | ChatCompletions |
ModelDeployment |
Deployment-navn | gpt-4o |
PromptTokens |
Antall prompt-tokens | 150 |
CompletionTokens |
Antall completion-tokens | 250 |
TotalTokens |
Totalt token-forbruk | 400 |
RequestMessages |
Prompt-innhold (JSON) | [{"role":"user","content":"..."}] |
ResponseMessages |
Completion-innhold (JSON) | [{"content":"..."}] |
KQL: Audit Trail for AI-requests
// Full audit trail med prompt og respons
ApiManagementGatewayLlmLog
| where TimeGenerated > ago(24h)
| extend RequestArray = parse_json(RequestMessages)
| extend ResponseArray = parse_json(ResponseMessages)
| mv-expand RequestArray
| mv-expand ResponseArray
| project
TimeGenerated,
CorrelationId,
Model = ModelDeployment,
PromptTokens,
CompletionTokens,
Prompt = tostring(RequestArray.content),
Response = tostring(ResponseArray.content)
| summarize
Input = strcat_array(make_list(Prompt), " "),
Output = strcat_array(make_list(Response), " ")
by CorrelationId, TimeGenerated, Model, PromptTokens, CompletionTokens
| where isnotempty(Input) and isnotempty(Output)
| order by TimeGenerated desc
KQL: Detektere Anomalier
// Finn uvanlig høy token-bruk per bruker
ApiManagementGatewayLlmLog
| where TimeGenerated > ago(7d)
| extend UserId = tostring(CustomDimensions["UserId"])
| summarize
AvgTokens = avg(toint(TotalTokens)),
MaxTokens = max(toint(TotalTokens)),
P95Tokens = percentile(toint(TotalTokens), 95),
RequestCount = count()
by UserId, bin(TimeGenerated, 1h)
| where MaxTokens > 3 * AvgTokens // Flagg anomalier
| order by MaxTokens desc
Event Hub-logging for Real-time Monitoring
<outbound>
<base />
<!-- Logg til Event Hub for real-time analyse -->
<log-to-eventhub logger-id="ai-audit-logger">@{
var body = context.Response.Body.As<JObject>(preserveContent: true);
var usage = body?["usage"];
return new JObject(
new JProperty("timestamp", DateTime.UtcNow.ToString("o")),
new JProperty("correlationId", context.RequestId),
new JProperty("subscriptionId", context.Subscription?.Id),
new JProperty("apiId", context.Api?.Id),
new JProperty("model", body?["model"]?.ToString()),
new JProperty("promptTokens", usage?["prompt_tokens"]),
new JProperty("completionTokens", usage?["completion_tokens"]),
new JProperty("totalTokens", usage?["total_tokens"]),
new JProperty("statusCode", context.Response.StatusCode),
new JProperty("region", context.Deployment.Region),
new JProperty("latencyMs", context.Elapsed.TotalMilliseconds)
).ToString();
}</log-to-eventhub>
</outbound>
Komplett GenAI Policy Stack
Full Inbound + Outbound Policy
<policies>
<inbound>
<base />
<!-- 1. Autentisering -->
<validate-azure-ad-token tenant-id="{{TENANT_ID}}"
header-name="Authorization"
failed-validation-httpcode="401" />
<!-- 2. Ekstraher brukerinfo for logging og rate limiting -->
<set-variable name="caller-id"
value="@(context.Request.Headers.GetValueOrDefault("Authorization","")
.AsJwt()?.Claims.GetValueOrDefault("oid", "anonymous"))" />
<set-variable name="department"
value="@(context.Request.Headers.GetValueOrDefault("Authorization","")
.AsJwt()?.Claims.GetValueOrDefault("department", "unknown"))" />
<!-- 3. Token rate limiting -->
<llm-token-limit
counter-key="@((string)context.Variables["caller-id"])"
tokens-per-minute="10000"
estimate-prompt-tokens="true" />
<!-- 4. Content Safety -->
<llm-content-safety backend-id="content-safety-backend"
shield-prompt="true">
<categories output-type="EightSeverityLevels">
<category name="Hate" threshold="4" />
<category name="Violence" threshold="4" />
<category name="SelfHarm" threshold="2" />
<category name="Sexual" threshold="2" />
</categories>
</llm-content-safety>
<!-- 5. Semantic cache lookup -->
<llm-semantic-cache-lookup
score-threshold="0.9"
embeddings-backend-id="embedding-backend"
embeddings-backend-auth="system-assigned" />
<!-- 6. Backend med managed identity -->
<set-backend-service backend-id="aoai-pool" />
<authentication-managed-identity
resource="https://cognitiveservices.azure.com"
output-token-variable-name="mi-token" />
<set-header name="Authorization" exists-action="override">
<value>@("Bearer " + (string)context.Variables["mi-token"])</value>
</set-header>
</inbound>
<backend>
<forward-request timeout="120"
fail-on-error-status-code="true"
buffer-response="false" />
</backend>
<outbound>
<base />
<!-- 7. Semantic cache store -->
<llm-semantic-cache-store duration="3600" />
<!-- 8. Token-metriker -->
<llm-emit-token-metric namespace="ai-metrics">
<dimension name="UserId" value="@((string)context.Variables["caller-id"])" />
<dimension name="Department" value="@((string)context.Variables["department"])" />
<dimension name="API" value="@(context.Api.Name)" />
<dimension name="Region" value="@(context.Deployment.Region)" />
</llm-emit-token-metric>
</outbound>
<on-error>
<base />
<return-response>
<set-status code="500" reason="Internal Server Error" />
<set-body>{
"error": {
"code": "GatewayError",
"message": "An error occurred processing your AI request."
}
}</set-body>
</return-response>
</on-error>
</policies>
Policy Execution Order
Inbound (fra topp til bunn):
1. Authentication (validate-azure-ad-token)
2. Variable extraction (set-variable)
3. Token rate limiting (llm-token-limit)
4. Content Safety (llm-content-safety)
5. Cache lookup (llm-semantic-cache-lookup)
6. Backend selection (set-backend-service)
7. Backend auth (authentication-managed-identity)
Backend:
8. Forward request (forward-request)
Outbound (fra topp til bunn):
9. Cache store (llm-semantic-cache-store)
10. Emit metrics (llm-emit-token-metric)
GenAI Policy Referanse
Alle GenAI-spesifikke Policyer
| Policy | Fase | Formål |
|---|---|---|
llm-content-safety |
Inbound | Content Safety moderering |
llm-token-limit |
Inbound | Token rate limiting |
llm-semantic-cache-lookup |
Inbound | Semantic cache oppslag |
llm-semantic-cache-store |
Outbound | Lagre i semantic cache |
llm-emit-token-metric |
Outbound | Emitter token-metriker |
Kompatibilitet
| Policy | Classic | V2 | Consumption | Self-hosted | Workspace |
|---|---|---|---|---|---|
llm-content-safety |
Ja | Ja | Ja | Ja | Ja |
llm-token-limit |
Ja | Ja | Ja | Ja | Ja |
llm-semantic-cache-lookup |
Ja | Ja | Nei | Nei | Ja |
llm-semantic-cache-store |
Ja | Ja | Nei | Nei | Ja |
llm-emit-token-metric |
Ja | Ja | Ja | Ja | Ja |
Referanser
- AI gateway in Azure API Management — Fullstendig oversikt over AI gateway-kapabiliteter
- Enforce content safety checks on LLM requests — llm-content-safety policy referanse
- LLM token limit policy — llm-token-limit policy referanse
- llm-emit-token-metric policy — Token-metrikk policy referanse
- llm-semantic-cache-lookup policy — Semantic cache lookup referanse
- llm-semantic-cache-store policy — Semantic cache store referanse
- Prompt Shields - Azure AI Content Safety — Prompt Shield-dokumentasjon
- Log token usage, prompts, and completions — LLM-logging i APIM
For Cosmo
- Bruk denne referansen når kunder trenger å implementere AI safety og governance gjennom APIM-policyer, spesielt content safety, prompt-validering og audit-logging.
- Den viktigste policyen for norsk offentlig sektor er
llm-content-safetymedshield-prompt="true"— dette blokkerer jailbreak-forsøk og uønsket innhold FØR det når modellen. - Husk rekkefølgen: Autentisering FØRST, deretter rate limiting, SÅ content safety, SÅ cache lookup. Content Safety koster tokens (kall til Content Safety API) — cache lookup etter content safety betyr at cachen kun inneholder "godkjent" innhold.
- For audit-logging: Aktiver LLM API-logging i Diagnostic Settings. Dette gir full etterprøvbarhet for alle prompts og completions — noe som er påkrevd under AI Act for høy-risiko AI-systemer.
- Rate limiting per modell er viktig: GPT-4o er dyrere enn GPT-4o-mini, og bør ha strengere token-grenser for å kontrollere kostnader.