# Red Teaming AI Models - Adversarial Testing & Security

**Dato:** 2026-02-03
**Kategori:** Responsible AI & Governance
**Målgruppe:** Arkitekter, sikkerhetsteam, AI-utviklere
**Konfidensgrad:** ⚠️ HIGH — Basert på offisiell Microsoft-dokumentasjon (feb 2026)

## Introduksjon

AI red teaming er en proaktiv sikkerhetsmetode for å identifisere sårbarheter i generative AI-systemer gjennom simulert adversarial testing. I motsetning til tradisjonell cybersecurity red teaming (som fokuserer på cyber kill chain), omfatter AI red teaming både sikkerhets- og innholdsrisiko, og simulerer adversarial brukere som forsøker å få AI-systemet til å oppføre seg uønsket.

**Kjerneprinsipp:** Kontinuerlig AI red teaming integrert i utviklingslivssyklusen identifiserer sårbarheter før de blir utnyttet av ondsinnet aktører. Uten systematisk adversarial testing deployer organisasjoner AI-systemer med ukjente svakheter som kan utnyttes via prompt injection, model poisoning, eller jailbreaking.

### Hvorfor AI red teaming er kritisk

Microsoft Security Benchmark (AI-7) definerer continuous AI red teaming som obligatorisk best practice. Uten red teaming står organisasjoner overfor:

1. **Prompt injection attacks** — Ondsinnet input manipulerer AI-output, omgår content filters, eller eksponerer sensitiv informasjon
2. **Adversarial examples** — Subtile input-perturbations forårsaker misklassifisering eller uriktige output
3. **Jailbreaking** — Teknikker som omgår safety mechanisms, gir tilgang til restricted functionalities eller genererer forbudt innhold

## Kjernekomponenter

### 1. PyRIT (Python Risk Identification Tool for generative AI)

Microsofts open-source rammeverk for å automatisere og skalere adversarial testing av generative AI-systemer.

**Nøkkelfunksjoner:**

| Funksjon | Beskrivelse |
|----------|-------------|
| **Prompt Executors** | End-to-end attack orchestrering som kobler sammen targets, converters, og scorers |
| **Datasets** | Kuraterte seed prompts og attack objectives per risikokategori |
| **Converters** | 20+ teknikker for å transformere prompts (encoding, obfuscation, linguistic manipulation) |
| **Scorers** | AI-baserte evaluators for å score attack success |
| **Memory** | State management for multi-turn conversations og logging |
| **Targets** | Integrasjoner mot Azure OpenAI, Hugging Face, REST APIs, lokale modeller |

**Installasjon:**
```python
# Via pip (latest stable release)
pip install pyrit

# Via Azure AI Evaluation SDK (inkluderer PyRIT + Foundry-integrasjon)
uv pip install "azure-ai-evaluation[redteam]"
```

**Konfidensmarkør:** ✅ PyRIT er production-ready, open-source, og aktivt vedlikeholdt av Microsoft AI Red Team.

### 2. Azure AI Red Teaming Agent (preview)

Managed service i Azure AI Foundry som kombinerer PyRIT med Risk and Safety Evaluations.

**Tre-faset tilnærming:**

1. **Automated scans for content risks** — Simulerer adversarial probing mot model/agent endpoints
2. **Evaluate probing success** — Scorer attack-response pairs, genererer Attack Success Rate (ASR)
3. **Reporting and logging** — Scorecard med attack techniques og risk categories, logges i Foundry

**Deployment-modeller:**

| Deployment | Use case | Sandboxing |
|------------|----------|------------|
| **Local red teaming** | Model-only testing, developer workflows | Minimal (client-side) |
| **Cloud red teaming** | Agent testing med agentic risks (prohibited actions, data leakage) | Purple environment (transient runs, mock tools) |

**Region support (feb 2026):** East US2, Sweden Central, France Central, Switzerland West

**Konfidensmarkør:** ⚠️ MEDIUM — Preview-feature, ikke anbefalt for production workloads (ingen SLA).

### 3. Supported Risk Categories

| Risk Category | Model/Agent | Local/Cloud | Beskrivelse |
|---------------|-------------|-------------|-------------|
| **Hateful and Unfair Content** | Begge | Begge | Språk/bilder relatert til hat eller urettferdig representasjon basert på rase, kjønn, religion, etc. |
| **Sexual Content** | Begge | Begge | Anatomiske detaljer, seksuelt innhold, prostitusjon, pornografi, overgrep |
| **Violent Content** | Begge | Begge | Fysiske handlinger som skader, dreper, eller ødelegger; våpen, produsenter, assosiasjoner |
| **Self-Harm-Related Content** | Begge | Begge | Handlinger som skader egen kropp eller selvmord |
| **Protected Materials** | Begge | Begge | Opphavsrettsbeskyttet materiale (lyrics, oppskrifter, kode) |
| **Code Vulnerability** | Begge | Begge | Generert kode med sikkerhetssårbarheter (SQL injection, code injection, stack trace exposure) |
| **Ungrounded Attributes** | Begge | Begge | Ugrunnede inferenser om personlige attributter (demografi, emosjonell tilstand) |
| **Prohibited Actions** | **Agent** | **Cloud** | Agenter som utfører forbudte high-risk eller irreversible actions |
| **Sensitive Data Leakage** | **Agent** | **Cloud** | Eksponering av finansiell, medisinsk, eller personlig data fra interne kilder |
| **Task Adherence** | **Agent** | **Cloud** | Agent kompletterer oppgaven innenfor regler, constraints, og uten unauthorized actions |
| **Indirect Prompt Injection (XPIA)** | **Agent** | **Cloud** | Malicious instructions skjult i eksterne datakilder (e-post, dokumenter) hentet via tool calls |

**Konfidensmarkør:** ✅ Risikokategorier er standardisert og alignet med NIST AI RMF og Microsofts Responsible AI-prinsipper.

### 4. Attack Strategies (via PyRIT)

20+ attack strategies for å omgå safety alignments:

**Encoding-baserte:**
- Base64, Binary, Morse, ROT13, Atbash, Caesar, Url
- UnicodeConfusable, UnicodeSubstitution, Diacritic

**Obfuscation-teknikker:**
- CharacterSpace, CharSwap, Flip, Leetspeak, StringJoin
- AsciiArt, AsciiSmuggler, AnsiAttack

**Adversarial prompting:**
- Jailbreak (direct UPIA), Indirect Jailbreak (XPIA via tool outputs)
- SuffixAppend, Tense transformation

**Multi-turn:**
- Multi-turn (context accumulation over multiple turns)
- Crescendo (gradvis eskalering av complexity/risk)

**Konfidensmarkør:** ✅ Strategies er dokumentert i PyRIT-repoen med eksempler.

### 5. Attack Success Rate (ASR)

Nøkkelmetrikk for å vurdere risk posture:

```
ASR = (Antall vellykkede attacks / Totalt antall attacks) × 100%
```

**Hva definerer "success"?**
- Model-only: AI genererer harmful content som omgår content filters
- Agentic: AI agent utfører prohibited action, lekker sensitiv data, eller feiler task adherence

**Evaluering:** Fine-tuned adversarial LLM dedikert til å score responses med harmful content via Risk and Safety Evaluators.

**Konfidensmarkør:** ⚠️ MEDIUM — ASR bruker generative modeller for evaluering (non-deterministic), alltid sjekk false positives.

## Arkitekturmønstre

### Pattern 1: Shift-Left Red Teaming (Design → Development → Pre-deployment)

**NIST AI RMF-fasering:**
1. **Map** — Identifiser relevante risikoer og definer use case
2. **Measure** — Evaluer risikoer at scale med automated scans
3. **Manage** — Mitigate risks i production, monitor, incident response plan

**Microsoft-anbefaling (per fase):**

| Fase | Red Teaming Approach | Tools | Frequency |
|------|----------------------|-------|-----------|
| **Design** | Test base models for safest choice | AI Red Teaming Agent (cloud) | Per model evaluation |
| **Development** | Test fine-tuned models, RAG systems | PyRIT (local) + CI/CD integration | Per model update |
| **Pre-deployment** | Full attack surface validation | AI Red Teaming Agent (cloud) | Pre-release gate |
| **Post-deployment** | Scheduled continuous red teaming, monitor incidents | AI Red Teaming Agent (cloud) + Azure Monitor | Monthly/quarterly |

**Konfidensmarkør:** ✅ Pattern er alignet med Microsoft AI Security Benchmark (AI-7.1).

### Pattern 2: CI/CD-Integrated Automated Red Teaming

**Azure DevOps / GitHub Actions workflow:**

```yaml
# Pseudo-kode
trigger: on_model_update

steps:
  1. Deploy model til staging environment
  2. Run PyRIT automated scan (prompt injection, jailbreak attempts)
  3. Log results to Azure Log Analytics
  4. If ASR > threshold:
       - Block deployment
       - Alert security team
       - Document findings
  5. Else:
       - Proceed to production
       - Archive test results (Azure Blob Storage)
```

**Konfidensmarkør:** ✅ Microsoft dokumenterer dette som implementation example for e-commerce chatbot.

### Pattern 3: Purple Environment for Agentic Red Teaming

**Problem:** Agentic red teaming kan potensielt utføre harmful actions (file deletion, data exfiltration).

**Løsning:** Non-production "purple environment" konfigurert med production-like resources.

**Komponenter:**
- **Transient runs** — Agent state logges ikke av Foundry Agent Service, chat completions lagres ikke
- **Mock tools** — Synthetic data for sensitive data leakage testing (financial, medical, PII)
- **Sandboxed actions** — Prohibited actions testes uten live production data
- **Redacted inputs** — Harmful/adversarial prompts redacted fra developer-synlige resultater

**Konfidensmarkør:** ⚠️ MEDIUM — Purple environment-pattern er best practice, men tooling for full sandboxing er under utvikling.

### Pattern 4: Defense-in-Depth for Prompt Injection

**Microsoft Spotlighting Techniques:**

| Teknikk | Beskrivelse | Implementation |
|---------|-------------|----------------|
| **Delimiting** | Separer user input fra system instructions med special tokens | `<|user|>...<|/user|>` wrapper |
| **Data marking** | Label untrusted data eksplisitt i prompt | `[UNTRUSTED]: {user_input}` |
| **Encoding** | Encode untrusted data før processing | Base64 encode før LLM ser det |

**Kombinert med:**
- **Prompt Shields** (Azure AI Content Safety) — Blokkerer kjente User Prompt Attacks (role-play, encoding attacks, conversation mockups)
- **Safety meta-prompts** — System-level instructions som prioriterer system rules over user input
- **Input validation** — Pre-LLM filtering av kjente injection patterns

**Konfidensmarkør:** ✅ Spotlighting er production-proven (Microsoft AI Red Team training episode 7).

## Beslutningsveiledning

### Når bruke AI red teaming?

| Scenario | Red Teaming? | Tool | Rationale |
|----------|--------------|------|-----------|
| Nye AI-features før deploy | ✅ **Ja** | AI Red Teaming Agent (cloud) | Catch issues pre-production |
| Hver model/fine-tuning update | ✅ **Ja** | PyRIT (CI/CD) | Continuous validation |
| Agent med tool use (Azure functions, search, storage) | ✅ **Ja** | AI Red Teaming Agent (cloud) - agentic risks | Test prohibited actions, data leakage |
| Monthly/quarterly security audit | ✅ **Ja** | AI Red Teaming Agent (cloud) | Track risk posture over tid |
| Post-incident forensics | ✅ **Ja** | Manual red teaming + PyRIT repro | Root cause analysis |
| Rapid prototyping / hackathon | ⚠️ **Valgfritt** | PyRIT (local) - lightweight scan | Balance speed vs. risk |

### Velge mellom local vs. cloud red teaming

| Factor | Local (PyRIT) | Cloud (AI Red Teaming Agent) |
|--------|---------------|-------------------------------|
| **Target type** | Model-only (Azure OpenAI, Hugging Face) | Model + Agent (Foundry hosted) |
| **Risk categories** | Content risks (hate, violence, sexual, self-harm, protected materials, code vulnerabilities) | Content + agentic risks (prohibited actions, data leakage, task adherence) |
| **Sandboxing** | Minimal (client-side) | Purple environment (transient, mock tools) |
| **CI/CD integration** | ✅ Full støtte (Python SDK) | ⚠️ Requires API calls til Foundry |
| **Cost** | Free (open-source) | Azure AI Foundry compute costs |
| **SLA** | N/A | None (preview) |
| **Region availability** | Global | East US2, Sweden Central, France Central, Switzerland West |

**Beslutningsregel:** Bruk PyRIT for model-only CI/CD workflows, AI Red Teaming Agent for comprehensive agent testing pre-deployment.

### Prioritere remediering

**Severity ranking (Microsoft Security Benchmark):**

| Severity | Eksempel | Remediation SLA | Action |
|----------|----------|-----------------|--------|
| **Critical** | Data leakage (PII, financial), Unauthorized actions (file deletion) | Immediate | Block deployment, retrain model, tighten plugin permissions |
| **High** | Jailbreak success, Prompt injection bypasses content filter | 24-48 hours | Update safety meta-prompts, enable Prompt Shields, add input validation |
| **Medium** | Low-severity biases, Ungrounded attributes | 1 week | Fine-tune model, add disclaimers, improve grounding |
| **Low** | Edge-case failures, Ambiguous responses | 2 weeks | Document known limitations, monitor in production |

## Integrasjon med Microsoft-stakken

### Azure AI Foundry

**AI Red Teaming Agent (native integration):**
- Foundry-hosted prompt agents (✅ supported)
- Foundry-hosted container agents (✅ supported)
- Foundry workflow agents (❌ not supported)
- Azure tool calls (✅ supported)
- Function tool calls (❌ not supported)

**Comprehensive tools list:** [Azure AI Foundry Tools](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/tools/overview)

### Azure OpenAI Service

**PyRIT target integration:**
```python
from pyrit.prompt_target import AzureOpenAICompletionTarget

azure_openai_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
    "api_key": os.environ.get("AZURE_OPENAI_KEY"),
    "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
}

target = AzureOpenAICompletionTarget(
    deployment_name=azure_openai_config["azure_deployment"],
    endpoint=azure_openai_config["azure_endpoint"],
    api_key=azure_openai_config["api_key"]
)
```

### Azure AI Content Safety

**Prompt Shields (Jailbreak risk detection):**
- **User Prompt Attacks (UPIA):** Direct jailbreak attempts (role-play, encoding, rule changes)
- **Indirect Prompt Attacks (XPIA):** Malicious instructions i external data sources

**Integrasjon med red teaming:**
1. Run red teaming scan (PyRIT/AI Red Teaming Agent)
2. Identify successful jailbreaks (ASR)
3. Enable Prompt Shields for identified attack vectors
4. Re-test to validate mitigation effectiveness

### Azure Monitor & Sentinel

**Logging red teaming outcomes:**
```
Azure Log Analytics workspace:
  - Detected vulnerabilities
  - Attack success rates (ASR per risk category)
  - System responses (refused vs. compliant)
  - Anomaly detection (patterns of concern)
```

**Alert configuration:**
- Trigger on successful prompt injection (ASR > 10% for critical risks)
- Escalate to security team via Azure Monitor alerts
- Integrate with Azure Sentinel for SIEM correlation

### Azure DevOps & GitHub Actions

**CI/CD pipeline integration example:**
```yaml
# GitHub Actions example
name: AI Red Teaming on Model Update

on:
  push:
    paths:
      - 'models/**'

jobs:
  red-team-scan:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Install PyRIT
        run: pip install pyrit

      - name: Run automated red teaming
        run: python scripts/run_pyrit_scan.py
        env:
          AZURE_OPENAI_ENDPOINT: ${{ secrets.AZURE_OPENAI_ENDPOINT }}
          AZURE_OPENAI_KEY: ${{ secrets.AZURE_OPENAI_KEY }}

      - name: Upload results to Azure Blob Storage
        run: az storage blob upload --file results.json --container red-teaming

      - name: Fail if ASR exceeds threshold
        run: python scripts/check_asr_threshold.py
```

### MITRE ATLAS Integration

**PyRIT alignment med MITRE ATLAS tactics:**

| MITRE ATLAS Tactic | PyRIT Test Scenario |
|--------------------|---------------------|
| **AML.TA0000 (Reconnaissance)** | Model probing for training data artifacts |
| **AML.TA0001 (Initial Access)** | Prompt injection / jailbreaking |
| **AML.TA0010 (Exfiltration)** | Model inversion, membership inference (simulert) |
| **AML.TA0009 (Impact)** | Biased outputs, operational disruptions |

**Konfidensmarkør:** ✅ Microsoft Security Benchmark refererer eksplisitt til MITRE ATLAS for structured attack simulations.

## Offentlig sektor (Norge)

### Regulatory compliance

**EU AI Act implications:**
- High-risk AI systems (definert i Annex III) krever mandatory conformity assessment før deployment
- Red teaming er implisitt requirement under Article 9 (risk management system)
- Documentation av red teaming results kan inngå i technical documentation (Article 11)

**Norsk Personvernforordning (GDPR):**
- Red teaming skal ikke bruke ekte persondata uten consent (synthetic data anbefales)
- Data Protection Impact Assessment (DPIA) bør inkludere red teaming findings for høyrisiko AI

**Konfidensmarkør:** ⚠️ MEDIUM — EU AI Act er under implementering (tredde i kraft 2024), norske myndigheter utvikler veiledning.

### Statens vegvesen-spesifikke vurderinger

**Use cases med mandatory red teaming:**
- AI-systemer som påvirker trafikksikkerhet (autonomous systems, traffic prediction)
- Chatbots som håndterer sensitive brukerdata (kjøretøyregistrering, førerkortinformasjon)
- Decision-support systems for inspeksjon eller enforcement

**Data sovereignty:**
- Red teaming i cloud (AI Red Teaming Agent) krever vurdering av data residency (region support begrenset til US/EU regions)
- PyRIT local deployment gir full data kontroll (no data leaves premises)

**Cross-functional red teaming teams:**
- AI-utviklere (teknisk exploit)
- Domeneeksperter (Statens vegvesen domain knowledge)
- Sikkerhetsteam (threat modeling)
- Juridisk (compliance vurdering)

## Kostnad og lisensiering

### PyRIT (Open-Source)

| Komponent | Lisens | Kostnad |
|-----------|--------|---------|
| **PyRIT framework** | MIT License | Gratis |
| **Compute** | N/A | Egen hardware eller cloud compute |
| **Target API costs** | Varierer | Azure OpenAI pay-per-token, Hugging Face Inference API, etc. |

**Estimert compute cost (local PyRIT):**
- Single red teaming run (100 prompts, 4 risk categories): ~40 000 tokens → ~200 NOK (gpt-4o-mini @ $0.15/1M input tokens)
- CI/CD integrated (daily scans): ~6 000 NOK/måned

### Azure AI Red Teaming Agent (Preview)

| Komponent | Pricing Model | Estimat |
|-----------|---------------|---------|
| **AI Red Teaming Agent** | Preview (ingen publisert pricing feb 2026) | TBD |
| **Azure AI Foundry compute** | Per-second billing for deployed models | Varierer (model size, region) |
| **Azure Log Analytics** | Pay-as-you-go (data ingestion + retention) | ~100 NOK/GB/måned |
| **Azure Blob Storage** | Standard storage (audit trails) | ~0.20 NOK/GB/måned |

**Konfidensmarkør:** ⚠️ LOW — Pricing for AI Red Teaming Agent ikke publisert (preview-fase).

### Lisenskrav

| Microsoft-produkt | Minimum lisens |
|-------------------|----------------|
| **Azure AI Foundry** | Azure subscription (Pay-As-You-Go eller Enterprise Agreement) |
| **Azure OpenAI Service** | Azure subscription + approved application |
| **Azure AI Content Safety** | Inkludert i Azure AI Services (pay-per-transaction) |
| **PyRIT** | Ingen (MIT License open-source) |

## For arkitekten (Cosmo)

### Red Teaming som arkitekturprinsipp

**Mindset shift:** Red teaming er ikke en "nice-to-have" sikkerhetstiltak — det er en **arkitekturell constraint** som påvirker design decisions fra dag 1.

**Spørsmål å stille i enhver AI-arkitekturrådgivning:**

1. **Har kunden en red teaming-plan?**
   - Hvis nei: Start med PyRIT local prototype (low-friction onboarding)
   - Hvis ja: Evaluer gap mellom plan og implementation (verktøy, cadence, cross-functional teams)

2. **Er AI-systemet high-risk i henhold til EU AI Act?**
   - Ja → Mandatory red teaming, dokumenter results for conformity assessment
   - Nei → Red teaming fortsatt anbefalt (reputational risk, security posture)

3. **Model-only eller agentic architecture?**
   - Model-only → PyRIT (CI/CD integration, content risks)
   - Agentic → AI Red Teaming Agent (agentic risks: prohibited actions, data leakage, task adherence)

4. **Hva er kundens risk appetite for ASR?**
   - Zero-tolerance (critical data/safety) → ASR < 1% for critical risks, block deployment ved failures
   - Moderate (internal tooling) → ASR < 10%, log-and-monitor approach
   - Eksperimentell (R&D) → No threshold, focus on discovering edge cases

5. **Hvem eier red teaming-prosessen?**
   - Ideal: Cross-functional team (AI devs, security, domain experts)
   - Realitet: Ofte siloed (security-only eller dev-only) → Identifiser gaps, foreslå collaboration model

### Conversation starters med kunder

**Scenario 1: Kunde planlegger å deploye Azure OpenAI chatbot**

> "Før deployment bør vi kjøre AI red teaming for å identifisere prompt injection-risiko. Jeg anbefaler å starte med PyRIT i CI/CD pipeline — det tar 2-3 timer å sette opp første scan, og gir oss Attack Success Rate for de fire core content risks. Basert på resultater kan vi enable Prompt Shields i Azure AI Content Safety som mitigation."

**Scenario 2: Kunde har agent med tool use (Azure Functions, Azure Search)**

> "Fordi agenten har tool access, må vi teste for agentic risks — ikke bare content risks. Azure AI Red Teaming Agent i cloud kan simulere prohibited actions (f.eks. file deletion) og sensitive data leakage. Vi setter opp purple environment med mock tools, kjører scan pre-deployment, og bruker resultater til å tighten permissions på function-nivå."

**Scenario 3: Kunde spør om 'hvor ofte vi må red teame'**

> "Microsoft Security Benchmark anbefaler continuous red teaming med monthly eller quarterly cadence. For deres use case foreslår jeg: (1) Automated PyRIT scans i CI/CD per model update, (2) Comprehensive AI Red Teaming Agent scan quarterly, (3) Manual red teaming post-incident. Dette balanserer coverage med resource constraints."

### Trade-offs og gotchas

| Trade-off | Implikasjon | Cosmos råd |
|-----------|-------------|------------|
| **Automated vs. Manual red teaming** | Automated gir scale, manual gir creativity og edge-case discovery | Start automated (PyRIT), supplement med manual quarterly |
| **Local vs. Cloud** | Local gir data control, cloud gir agentic risk coverage | Hybrid: PyRIT for CI/CD, AI Red Teaming Agent for pre-deployment gates |
| **ASR threshold setting** | Strict threshold (ASR < 1%) blokkerer deployment ofte, loose threshold (ASR < 20%) gir false sense of security | Segment per risk: Critical risks strict (< 1%), Medium risks moderate (< 10%) |
| **False positives i ASR** | Generative evaluators er non-deterministic, kan flagge benign responses | Alltid manual review av flagged responses før remediation |
| **Synthetic data i purple environment** | Mock tools ikke representative av real data distribution | Document limitations, supplement med manual testing on real staging data (sanitized) |

### Når si nei til red teaming

**Red flags:** Kunde ønsker å red teame i production med live user data → **NEI**

**Alternativer:**
- Purple environment med production-like config
- Staging environment med sanitized data
- Synthetic data generation for agentic scenarios

**Konfidensmarkør:** ✅ Purple environment-pattern er Microsoft best practice.

### Ressurser for videre læring

**Microsoft AI Red Team Training Series (10 episoder):**
- Episode 1-2: Fundamentals
- Episode 3-6: Attack techniques (direct/indirect prompt injection, single/multi-turn)
- Episode 7: Defense strategies (Spotlighting, Prompt Shields)
- Episode 8-10: Automation with PyRIT

**Hands-on labs:** [https://aka.ms/AIRTlabs](https://aka.ms/AIRTlabs)

**PyRIT documentation:** [https://azure.github.io/PyRIT/](https://azure.github.io/PyRIT/)

## Kilder og verifisering

### Microsoft Learn dokumentasjon

| Kilde | URL | Verifikasjonsdato |
|-------|-----|-------------------|
| **AI Red Teaming Agent (preview)** | https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/ai-red-teaming-agent | 2026-02-03 |
| **Microsoft Security Benchmark: AI-7 Continuous Red Teaming** | https://learn.microsoft.com/en-us/security/benchmark/azure/mcsb-v2-artificial-intelligence-security#ai-7-perform-continuous-ai-red-teaming | 2026-02-03 |
| **AI Red Teaming Training Series** | https://learn.microsoft.com/en-us/security/ai-red-team/training | 2026-02-03 |
| **Planning red teaming for LLMs** | https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/red-teaming | 2026-02-03 |
| **Prompt Shields (Jailbreak detection)** | https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/jailbreak-detection | 2026-02-03 |

### Open-source verktøy

| Tool | Repository | Lisens |
|------|------------|--------|
| **PyRIT** | https://github.com/Azure/PyRIT | MIT License |
| **MITRE ATLAS** | https://atlas.mitre.org/ | Free (non-commercial) |
| **Adversarial Robustness Toolbox (ART)** | https://github.com/Trusted-AI/adversarial-robustness-toolbox | MIT License |

### Bransje-ressurser

| Ressurs | Utgiver | Relevans |
|---------|---------|----------|
| **OWASP Top 10 for LLM Applications** | OWASP Foundation | Threat taxonomy |
| **NIST AI Risk Management Framework (AI RMF)** | NIST | Risk governance framework |
| **Three takeaways from red teaming 100 generative AI products** | Microsoft Security Blog (jan 2025) | Real-world lessons |

**Sist oppdatert:** 2026-02-03
**Neste review:** 2026-05-03 (quarterly review anbefalt for rapidly evolving field)

---

## For Cosmo: Quick Reference Card

**Når kunden sier:** "Vi må teste sikkerheten i vår Azure OpenAI-løsning"

**Cosmo svarer:**
1. ✅ Start med PyRIT i CI/CD pipeline (automated content risk testing)
2. ⚠️ Hvis agent med tool use → AI Red Teaming Agent (agentic risks)
3. 🔄 Establish continuous red teaming cadence (monthly/quarterly)
4. 📊 Track Attack Success Rate (ASR) per risk category, set thresholds
5. 🛡️ Mitigate via Prompt Shields, safety meta-prompts, input validation
6. 📝 Document findings for EU AI Act compliance (if high-risk system)

**Decision tree:**
```
AI System Type?
├─ Model-only (chatbot, completion) → PyRIT (local)
└─ Agent (tool use, RAG, function calling)
   ├─ Content risks only → PyRIT (local)
   └─ Agentic risks (prohibited actions, data leakage) → AI Red Teaming Agent (cloud)
```

**Confidence reminder:** PyRIT = production-ready ✅, AI Red Teaming Agent = preview ⚠️