Kjell Tore Guttormsen 6a7632146e feat(ms-ai-architect): add plugin to open marketplace (v1.5.0 baseline)

Initial addition of ms-ai-architect plugin to the open-source marketplace.
Private content excluded: orchestrator/ (Linear tooling), docs/utredning/
(client investigation), generated test reports and PDF export script.
skill-gen tooling moved from orchestrator/ to scripts/skill-gen/.

Security scan: WARNING (risk 20/100) — no secrets, no injection found.
False positive fixed: added gitleaks:allow to Python variable reference
in output-validation-grounding-verification.md line 109.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-07 17:17:17 +02:00

15 KiB

Raw Blame History

Logging & Analytics for AI Traffic in APIM

Last updated: 2026-02 Status: GA Category: API Management & AI Gateway

Introduksjon

Observability er fundamentalt for a drifte AI-applikasjoner i produksjon. Azure API Management tilbyr omfattende logging- og analysekapabiliteter spesielt tilpasset AI-trafikk, inkludert token-sporring, prompt/completion-logging og innebygde dashboards for LLM-bruk. Disse verktoyene lar organisasjoner spore kostnader, overvake ytelse, sikre compliance og feilsoke problemer med AI-API-er.

For norsk offentlig sektor er logging og analytics spesielt viktig av flere grunner: Riksrevisjonen og Datatilsynet krever sporbarhet, offentlighetsloven krever dokumentasjon av automatiserte beslutninger, og budsjettkontroll krever presise kostnadsrapporter for AI-forbruk. APIM sin AI gateway gir de nodvendige verktoyene for a oppfylle disse kravene uten a bygge egne losninger.

APIM tilbyr to hovedkanaler for AI-logging: Application Insights-integrasjon for sanntidsmetrikker og Azure Monitor diagnostic settings for langtidslagring og analyse i Log Analytics. Begge kanalene stotter AI-spesifikke datapunkter som token-forbruk, modellnavn og valgfritt prompt/completion-innhold.

Application Insights-integrasjon

Oppsett av Application Insights Logger

Opprett eller koble til en Application Insights-ressurs
Konfigurer logger i APIM
Aktiver diagnostikk for spesifikke eller alle API-er

Konfigurere logger med Bicep

resource appInsights 'Microsoft.Insights/components@2020-02-02' existing = {
  name: appInsightsName
}

resource apimLogger 'Microsoft.ApiManagement/service/loggers@2023-09-01-preview' = {
  parent: apiManagement
  name: 'ai-gateway-logger'
  properties: {
    loggerType: 'applicationInsights'
    credentials: {
      connectionString: appInsights.properties.ConnectionString
    }
    resourceId: appInsights.id
  }
}

resource apiDiagnostic 'Microsoft.ApiManagement/service/apis/diagnostics@2023-09-01-preview' = {
  parent: aiApi
  name: 'applicationinsights'
  properties: {
    loggerId: apimLogger.id
    alwaysLog: 'allErrors'
    logClientIp: true
    sampling: {
      samplingType: 'fixed'
      percentage: 100
    }
    frontend: {
      request: {
        headers: [ 'x-request-id', 'x-correlation-id', 'x-tenant-id' ]
        body: { bytes: 8192 }
      }
      response: {
        headers: [ 'x-model-used', 'x-cache-hit' ]
        body: { bytes: 8192 }
      }
    }
    backend: {
      request: {
        headers: [ 'Authorization' ]
        body: { bytes: 0 }  // Don't log auth tokens
      }
      response: {
        body: { bytes: 8192 }
      }
    }
  }
}

Custom Metrics med Token-sporring

Emit Token Metrics Policy

APIM tilbyr dedikerte policies for a sende token-metrikker til Application Insights:

<policies>
    <outbound>
        <base />
        <!-- Emit token metrics for Azure OpenAI APIs -->
        <azure-openai-emit-token-metric namespace="ai-gateway-metrics">
            <dimension name="Subscription ID" value="@(context.Subscription.Id)" />
            <dimension name="API ID" value="@(context.Api.Id)" />
            <dimension name="Client IP" value="@(context.Request.IpAddress)" />
        </azure-openai-emit-token-metric>
    </outbound>
</policies>

For andre LLM-API-er (ikke Azure OpenAI):

<policies>
    <outbound>
        <base />
        <!-- Emit token metrics for generic LLM APIs -->
        <llm-emit-token-metric namespace="llm-metrics">
            <dimension name="Client IP" value="@(context.Request.IpAddress)" />
            <dimension name="API ID" value="@(context.Api.Id)" />
            <dimension name="User ID"
                value="@(context.Request.Headers.GetValueOrDefault("x-user-id", "N/A"))" />
            <dimension name="Department"
                value="@(context.Request.Headers.GetValueOrDefault("x-department", "unknown"))" />
            <dimension name="Application"
                value="@(context.Request.Headers.GetValueOrDefault("x-app-id", "unknown"))" />
        </llm-emit-token-metric>
    </outbound>
</policies>

Custom Metrics med emit-metric

For generelle metrikker utover token-sporring:

<policies>
    <outbound>
        <base />
        <!-- Emit custom request metrics -->
        <emit-metric name="ai-request-processed" value="1" namespace="ai-gateway">
            <dimension name="Model" value="@{
                var body = context.Response.Body.As<JObject>(preserveContent: true);
                return body?["model"]?.ToString() ?? "unknown";
            }" />
            <dimension name="StatusCode" value="@(context.Response.StatusCode.ToString())" />
            <dimension name="CacheHit" value="@(context.Response.Headers.GetValueOrDefault("x-cache-hit", "false"))" />
            <dimension name="Subscription" value="@(context.Subscription?.Name ?? "unknown")" />
        </emit-metric>

        <!-- Emit latency metric -->
        <emit-metric name="ai-backend-latency-ms" namespace="ai-gateway"
            value="@{
                var start = (DateTime)context.Variables["backendStartTime"];
                return ((DateTime.UtcNow - start).TotalMilliseconds).ToString();
            }">
            <dimension name="Model" value="@{
                var body = context.Response.Body.As<JObject>(preserveContent: true);
                return body?["model"]?.ToString() ?? "unknown";
            }" />
        </emit-metric>
    </outbound>
</policies>

Begrensninger for custom metrics

Begrensning	Verdi
Maks dimensjoner per metric	10 (5 default + 5 custom)
Aktive tidsserier per region	50 000 (innen 12-timers periode)
Default dimensjoner (bruker 5)	Region, Service ID, Service Name, Service Type, + 1 reservert
Tilgjengelige for custom	5 dimensjoner

Token Tracking

Diagnostics Setting for LLM Logs

Aktiver spesialisert LLM-logging via Azure Monitor diagnostic settings:

Ga til APIM-instansen i Azure Portal
Monitoring > Diagnostic settings > + Add diagnostic setting
Velg Logs related to generative AI gateway
Under Destination: Send to Log Analytics workspace

Aktivere prompt/completion-logging per API

Velg API-en > Settings > Diagnostic Logs > Azure Monitor
Log LLM messages: Enabled
Log prompts: Velg og angi maks storrelse (f.eks. 32768 bytes)
Log completions: Velg og angi maks storrelse (f.eks. 32768 bytes)

Viktig: Meldinger opp til 32 KB logges i en enkelt oppforing. Storre meldinger splittes i 32 KB-biter med sekvensnumre. Maks 2 MB per request/response.

KQL-sporring: Join request og response

ApiManagementGatewayLlmLog
| extend RequestArray = parse_json(RequestMessages)
| extend ResponseArray = parse_json(ResponseMessages)
| mv-expand RequestArray
| mv-expand ResponseArray
| project
    TimeGenerated,
    CorrelationId,
    OperationName,
    ModelDeploymentName,
    PromptTokens,
    CompletionTokens,
    TotalTokens,
    RequestContent = tostring(RequestArray.content),
    ResponseContent = tostring(ResponseArray.content)
| summarize
    Input = strcat_array(make_list(RequestContent), " . "),
    Output = strcat_array(make_list(ResponseContent), " . "),
    PromptTokens = max(PromptTokens),
    CompletionTokens = max(CompletionTokens),
    TotalTokens = max(TotalTokens)
    by TimeGenerated, CorrelationId, OperationName, ModelDeploymentName
| where isnotempty(Input) and isnotempty(Output)

KQL: Token-forbruk per applikasjon per dag

ApiManagementGatewayLlmLog
| where TimeGenerated > ago(30d)
| summarize
    TotalPromptTokens = sum(PromptTokens),
    TotalCompletionTokens = sum(CompletionTokens),
    TotalTokens = sum(TotalTokens),
    RequestCount = count()
    by bin(TimeGenerated, 1d), SubscriptionName = tostring(split(OperationName, "/")[0])
| order by TimeGenerated desc

KQL: Modellbruk og kostnad

ApiManagementGatewayLlmLog
| where TimeGenerated > ago(7d)
| summarize
    PromptTokens = sum(PromptTokens),
    CompletionTokens = sum(CompletionTokens),
    Requests = count()
    by ModelDeploymentName
| extend EstimatedCostUSD =
    case(
        ModelDeploymentName contains "gpt-4o",
            (PromptTokens / 1000000.0 * 2.5) + (CompletionTokens / 1000000.0 * 10.0),
        ModelDeploymentName contains "gpt-4o-mini",
            (PromptTokens / 1000000.0 * 0.15) + (CompletionTokens / 1000000.0 * 0.60),
        ModelDeploymentName contains "gpt-4",
            (PromptTokens / 1000000.0 * 30.0) + (CompletionTokens / 1000000.0 * 60.0),
        0.0
    )
| extend EstimatedCostNOK = EstimatedCostUSD * 11.0
| order by EstimatedCostNOK desc

Latency-overvaking

Maling av end-to-end latency

<policies>
    <inbound>
        <base />
        <set-variable name="requestStartTime" value="@(DateTime.UtcNow)" />
    </inbound>
    <backend>
        <base />
    </backend>
    <outbound>
        <base />
        <!-- Calculate and expose latency -->
        <set-header name="x-total-latency-ms" exists-action="override">
            <value>@{
                var start = (DateTime)context.Variables["requestStartTime"];
                return ((DateTime.UtcNow - start).TotalMilliseconds).ToString("F0");
            }</value>
        </set-header>

        <!-- Emit latency as custom metric -->
        <emit-metric name="ai-total-latency" namespace="ai-gateway"
            value="@{
                var start = (DateTime)context.Variables["requestStartTime"];
                return ((DateTime.UtcNow - start).TotalMilliseconds).ToString();
            }">
            <dimension name="API" value="@(context.Api.Name)" />
            <dimension name="StatusCode" value="@(context.Response.StatusCode.ToString())" />
        </emit-metric>
    </outbound>
</policies>

Latency-terskelvarsel

// Alert: AI API latency exceeds 5 seconds
ApiManagementGatewayLogs
| where TimeGenerated > ago(15m)
| where ApiId contains "ai-gateway"
| where ResponseTime > 5000
| summarize
    Count = count(),
    AvgLatency = avg(ResponseTime),
    P95Latency = percentile(ResponseTime, 95)
    by bin(TimeGenerated, 5m), ApiId
| where Count > 10

Brukeratferdsanalyse

Analytics Dashboard i APIM

APIM tilbyr et innebygd Azure Monitor-basert dashboard under Monitoring > Analytics > Language models med:

Token-forbruk over tid
Fordeling per modell
Request-volum og feilrate
Gjennomsnittlig responstid

KQL: Topp-brukere etter token-forbruk

ApiManagementGatewayLlmLog
| where TimeGenerated > ago(7d)
| summarize
    TotalTokens = sum(TotalTokens),
    Requests = count(),
    AvgTokensPerRequest = avg(TotalTokens)
    by SubscriptionId
| order by TotalTokens desc
| take 20

KQL: Populaere temaer (basert pa prompts)

ApiManagementGatewayLlmLog
| where TimeGenerated > ago(7d)
| extend RequestArray = parse_json(RequestMessages)
| mv-expand RequestArray
| where tostring(RequestArray.role) == "user"
| extend UserMessage = tostring(RequestArray.content)
| where strlen(UserMessage) > 10
| extend Topic = case(
    UserMessage contains "azure" or UserMessage contains "cloud", "Azure/Cloud",
    UserMessage contains "kode" or UserMessage contains "code", "Programmering",
    UserMessage contains "sikkerhet" or UserMessage contains "security", "Sikkerhet",
    UserMessage contains "data" or UserMessage contains "database", "Data",
    "Annet"
)
| summarize Count = count() by Topic
| order by Count desc

Eksport til Microsoft Foundry for modellevaluering

LLM-logger kan eksporteres som datasett for modellevaluering i Microsoft Foundry:

Join request/response med KQL (se over)
Eksporter til CSV-format
Last opp i Microsoft Foundry portal
Kjor evaluering med innebygde eller egne metrikker

Personvern og compliance

Logging-policyer for norsk offentlig sektor

Krav	Tiltak i APIM
GDPR Art. 5 (dataminimering)	Logg kun nodvendige felter, anonymiser PII
Offentlighetsloven	Sikre sporbarhet for automatiserte beslutninger
Datatilsynets retningslinjer	Ikke logg personopplysninger i prompts uten behandlingsgrunnlag
Arkivloven	Langtidslagring i Log Analytics med retention policy

PII-filtrering i logging

<policies>
    <outbound>
        <base />
        <!-- Sanitize prompts before logging -->
        <set-variable name="sanitizedRequest" value="@{
            var body = context.Request.Body.As<string>(preserveContent: true);
            // Remove Norwegian national ID (11 digits)
            body = System.Text.RegularExpressions.Regex.Replace(
                body, @"\b\d{11}\b", "[FODSELSNUMMER]");
            // Remove email addresses
            body = System.Text.RegularExpressions.Regex.Replace(
                body, @"\b[\w.-]+@[\w.-]+\.\w+\b", "[EMAIL]");
            return body;
        }" />

        <trace source="ai-gateway" severity="information">
            <message>@((string)context.Variables["sanitizedRequest"])</message>
        </trace>
    </outbound>
</policies>

Referanser

Log token usage, prompts, and completions for LLM APIs -- hovedveiledning for LLM-logging
AI gateway capabilities - Observability -- oversikt over observability
How to integrate Azure API Management with Application Insights -- App Insights-integrasjon
llm-emit-token-metric policy -- token-metrikk policy
emit-metric policy -- generell metrikk-policy
Monitor API Management -- overordnet overvakning
ApiManagementGatewayLlmLog table -- Log Analytics-tabellreferanse
Monitor AI agents with Application Insights -- AI-agent-overvaking

For Cosmo

Bruk denne referansen nar kunden trenger a sette opp logging, dashboard eller kostnadsrapportering for sine AI-API-er, eller nar de ma oppfylle compliance-krav rundt sporbarhet av AI-bruk.
Anbefal alltid a aktivere bade Application Insights (sanntidsmetrikker) og diagnostic settings (Log Analytics for langtidsanalyse) -- de utfyller hverandre.
For kostnadsovervaking, bruk llm-emit-token-metric med dimensjoner for applikasjon, avdeling og abonnement -- dette gir granular kostnadstildeling uten manuell beregning.
Var oppmerksom pa personvern: Prompt-logging kan inneholde sensitiv informasjon. Anbefal PII-filtrering i policies for norsk offentlig sektor, og sorg for at lagringstid i Log Analytics samsvarer med organisasjonens retningslinjer.
KQL-sporringene i denne referansen kan brukes direkte i Azure Monitor Workbooks for a bygge tilpassede dashboards for ledelse og fagavdelinger.

15 KiB Raw Blame History