Initial addition of ms-ai-architect plugin to the open-source marketplace. Private content excluded: orchestrator/ (Linear tooling), docs/utredning/ (client investigation), generated test reports and PDF export script. skill-gen tooling moved from orchestrator/ to scripts/skill-gen/. Security scan: WARNING (risk 20/100) — no secrets, no injection found. False positive fixed: added gitleaks:allow to Python variable reference in output-validation-grounding-verification.md line 109. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
26 KiB
Red Teaming AI Models - Adversarial Testing & Security
Dato: 2026-02-03 Kategori: Responsible AI & Governance Målgruppe: Arkitekter, sikkerhetsteam, AI-utviklere Konfidensgrad: ⚠️ HIGH — Basert på offisiell Microsoft-dokumentasjon (feb 2026)
Introduksjon
AI red teaming er en proaktiv sikkerhetsmetode for å identifisere sårbarheter i generative AI-systemer gjennom simulert adversarial testing. I motsetning til tradisjonell cybersecurity red teaming (som fokuserer på cyber kill chain), omfatter AI red teaming både sikkerhets- og innholdsrisiko, og simulerer adversarial brukere som forsøker å få AI-systemet til å oppføre seg uønsket.
Kjerneprinsipp: Kontinuerlig AI red teaming integrert i utviklingslivssyklusen identifiserer sårbarheter før de blir utnyttet av ondsinnet aktører. Uten systematisk adversarial testing deployer organisasjoner AI-systemer med ukjente svakheter som kan utnyttes via prompt injection, model poisoning, eller jailbreaking.
Hvorfor AI red teaming er kritisk
Microsoft Security Benchmark (AI-7) definerer continuous AI red teaming som obligatorisk best practice. Uten red teaming står organisasjoner overfor:
- Prompt injection attacks — Ondsinnet input manipulerer AI-output, omgår content filters, eller eksponerer sensitiv informasjon
- Adversarial examples — Subtile input-perturbations forårsaker misklassifisering eller uriktige output
- Jailbreaking — Teknikker som omgår safety mechanisms, gir tilgang til restricted functionalities eller genererer forbudt innhold
Kjernekomponenter
1. PyRIT (Python Risk Identification Tool for generative AI)
Microsofts open-source rammeverk for å automatisere og skalere adversarial testing av generative AI-systemer.
Nøkkelfunksjoner:
| Funksjon | Beskrivelse |
|---|---|
| Prompt Executors | End-to-end attack orchestrering som kobler sammen targets, converters, og scorers |
| Datasets | Kuraterte seed prompts og attack objectives per risikokategori |
| Converters | 20+ teknikker for å transformere prompts (encoding, obfuscation, linguistic manipulation) |
| Scorers | AI-baserte evaluators for å score attack success |
| Memory | State management for multi-turn conversations og logging |
| Targets | Integrasjoner mot Azure OpenAI, Hugging Face, REST APIs, lokale modeller |
Installasjon:
# Via pip (latest stable release)
pip install pyrit
# Via Azure AI Evaluation SDK (inkluderer PyRIT + Foundry-integrasjon)
uv pip install "azure-ai-evaluation[redteam]"
Konfidensmarkør: ✅ PyRIT er production-ready, open-source, og aktivt vedlikeholdt av Microsoft AI Red Team.
2. Azure AI Red Teaming Agent (preview)
Managed service i Azure AI Foundry som kombinerer PyRIT med Risk and Safety Evaluations.
Tre-faset tilnærming:
- Automated scans for content risks — Simulerer adversarial probing mot model/agent endpoints
- Evaluate probing success — Scorer attack-response pairs, genererer Attack Success Rate (ASR)
- Reporting and logging — Scorecard med attack techniques og risk categories, logges i Foundry
Deployment-modeller:
| Deployment | Use case | Sandboxing |
|---|---|---|
| Local red teaming | Model-only testing, developer workflows | Minimal (client-side) |
| Cloud red teaming | Agent testing med agentic risks (prohibited actions, data leakage) | Purple environment (transient runs, mock tools) |
Region support (feb 2026): East US2, Sweden Central, France Central, Switzerland West
Konfidensmarkør: ⚠️ MEDIUM — Preview-feature, ikke anbefalt for production workloads (ingen SLA).
3. Supported Risk Categories
| Risk Category | Model/Agent | Local/Cloud | Beskrivelse |
|---|---|---|---|
| Hateful and Unfair Content | Begge | Begge | Språk/bilder relatert til hat eller urettferdig representasjon basert på rase, kjønn, religion, etc. |
| Sexual Content | Begge | Begge | Anatomiske detaljer, seksuelt innhold, prostitusjon, pornografi, overgrep |
| Violent Content | Begge | Begge | Fysiske handlinger som skader, dreper, eller ødelegger; våpen, produsenter, assosiasjoner |
| Self-Harm-Related Content | Begge | Begge | Handlinger som skader egen kropp eller selvmord |
| Protected Materials | Begge | Begge | Opphavsrettsbeskyttet materiale (lyrics, oppskrifter, kode) |
| Code Vulnerability | Begge | Begge | Generert kode med sikkerhetssårbarheter (SQL injection, code injection, stack trace exposure) |
| Ungrounded Attributes | Begge | Begge | Ugrunnede inferenser om personlige attributter (demografi, emosjonell tilstand) |
| Prohibited Actions | Agent | Cloud | Agenter som utfører forbudte high-risk eller irreversible actions |
| Sensitive Data Leakage | Agent | Cloud | Eksponering av finansiell, medisinsk, eller personlig data fra interne kilder |
| Task Adherence | Agent | Cloud | Agent kompletterer oppgaven innenfor regler, constraints, og uten unauthorized actions |
| Indirect Prompt Injection (XPIA) | Agent | Cloud | Malicious instructions skjult i eksterne datakilder (e-post, dokumenter) hentet via tool calls |
Konfidensmarkør: ✅ Risikokategorier er standardisert og alignet med NIST AI RMF og Microsofts Responsible AI-prinsipper.
4. Attack Strategies (via PyRIT)
20+ attack strategies for å omgå safety alignments:
Encoding-baserte:
- Base64, Binary, Morse, ROT13, Atbash, Caesar, Url
- UnicodeConfusable, UnicodeSubstitution, Diacritic
Obfuscation-teknikker:
- CharacterSpace, CharSwap, Flip, Leetspeak, StringJoin
- AsciiArt, AsciiSmuggler, AnsiAttack
Adversarial prompting:
- Jailbreak (direct UPIA), Indirect Jailbreak (XPIA via tool outputs)
- SuffixAppend, Tense transformation
Multi-turn:
- Multi-turn (context accumulation over multiple turns)
- Crescendo (gradvis eskalering av complexity/risk)
Konfidensmarkør: ✅ Strategies er dokumentert i PyRIT-repoen med eksempler.
5. Attack Success Rate (ASR)
Nøkkelmetrikk for å vurdere risk posture:
ASR = (Antall vellykkede attacks / Totalt antall attacks) × 100%
Hva definerer "success"?
- Model-only: AI genererer harmful content som omgår content filters
- Agentic: AI agent utfører prohibited action, lekker sensitiv data, eller feiler task adherence
Evaluering: Fine-tuned adversarial LLM dedikert til å score responses med harmful content via Risk and Safety Evaluators.
Konfidensmarkør: ⚠️ MEDIUM — ASR bruker generative modeller for evaluering (non-deterministic), alltid sjekk false positives.
Arkitekturmønstre
Pattern 1: Shift-Left Red Teaming (Design → Development → Pre-deployment)
NIST AI RMF-fasering:
- Map — Identifiser relevante risikoer og definer use case
- Measure — Evaluer risikoer at scale med automated scans
- Manage — Mitigate risks i production, monitor, incident response plan
Microsoft-anbefaling (per fase):
| Fase | Red Teaming Approach | Tools | Frequency |
|---|---|---|---|
| Design | Test base models for safest choice | AI Red Teaming Agent (cloud) | Per model evaluation |
| Development | Test fine-tuned models, RAG systems | PyRIT (local) + CI/CD integration | Per model update |
| Pre-deployment | Full attack surface validation | AI Red Teaming Agent (cloud) | Pre-release gate |
| Post-deployment | Scheduled continuous red teaming, monitor incidents | AI Red Teaming Agent (cloud) + Azure Monitor | Monthly/quarterly |
Konfidensmarkør: ✅ Pattern er alignet med Microsoft AI Security Benchmark (AI-7.1).
Pattern 2: CI/CD-Integrated Automated Red Teaming
Azure DevOps / GitHub Actions workflow:
# Pseudo-kode
trigger: on_model_update
steps:
1. Deploy model til staging environment
2. Run PyRIT automated scan (prompt injection, jailbreak attempts)
3. Log results to Azure Log Analytics
4. If ASR > threshold:
- Block deployment
- Alert security team
- Document findings
5. Else:
- Proceed to production
- Archive test results (Azure Blob Storage)
Konfidensmarkør: ✅ Microsoft dokumenterer dette som implementation example for e-commerce chatbot.
Pattern 3: Purple Environment for Agentic Red Teaming
Problem: Agentic red teaming kan potensielt utføre harmful actions (file deletion, data exfiltration).
Løsning: Non-production "purple environment" konfigurert med production-like resources.
Komponenter:
- Transient runs — Agent state logges ikke av Foundry Agent Service, chat completions lagres ikke
- Mock tools — Synthetic data for sensitive data leakage testing (financial, medical, PII)
- Sandboxed actions — Prohibited actions testes uten live production data
- Redacted inputs — Harmful/adversarial prompts redacted fra developer-synlige resultater
Konfidensmarkør: ⚠️ MEDIUM — Purple environment-pattern er best practice, men tooling for full sandboxing er under utvikling.
Pattern 4: Defense-in-Depth for Prompt Injection
Microsoft Spotlighting Techniques:
| Teknikk | Beskrivelse | Implementation |
|---|---|---|
| Delimiting | Separer user input fra system instructions med special tokens | `< |
| Data marking | Label untrusted data eksplisitt i prompt | [UNTRUSTED]: {user_input} |
| Encoding | Encode untrusted data før processing | Base64 encode før LLM ser det |
Kombinert med:
- Prompt Shields (Azure AI Content Safety) — Blokkerer kjente User Prompt Attacks (role-play, encoding attacks, conversation mockups)
- Safety meta-prompts — System-level instructions som prioriterer system rules over user input
- Input validation — Pre-LLM filtering av kjente injection patterns
Konfidensmarkør: ✅ Spotlighting er production-proven (Microsoft AI Red Team training episode 7).
Beslutningsveiledning
Når bruke AI red teaming?
| Scenario | Red Teaming? | Tool | Rationale |
|---|---|---|---|
| Nye AI-features før deploy | ✅ Ja | AI Red Teaming Agent (cloud) | Catch issues pre-production |
| Hver model/fine-tuning update | ✅ Ja | PyRIT (CI/CD) | Continuous validation |
| Agent med tool use (Azure functions, search, storage) | ✅ Ja | AI Red Teaming Agent (cloud) - agentic risks | Test prohibited actions, data leakage |
| Monthly/quarterly security audit | ✅ Ja | AI Red Teaming Agent (cloud) | Track risk posture over tid |
| Post-incident forensics | ✅ Ja | Manual red teaming + PyRIT repro | Root cause analysis |
| Rapid prototyping / hackathon | ⚠️ Valgfritt | PyRIT (local) - lightweight scan | Balance speed vs. risk |
Velge mellom local vs. cloud red teaming
| Factor | Local (PyRIT) | Cloud (AI Red Teaming Agent) |
|---|---|---|
| Target type | Model-only (Azure OpenAI, Hugging Face) | Model + Agent (Foundry hosted) |
| Risk categories | Content risks (hate, violence, sexual, self-harm, protected materials, code vulnerabilities) | Content + agentic risks (prohibited actions, data leakage, task adherence) |
| Sandboxing | Minimal (client-side) | Purple environment (transient, mock tools) |
| CI/CD integration | ✅ Full støtte (Python SDK) | ⚠️ Requires API calls til Foundry |
| Cost | Free (open-source) | Azure AI Foundry compute costs |
| SLA | N/A | None (preview) |
| Region availability | Global | East US2, Sweden Central, France Central, Switzerland West |
Beslutningsregel: Bruk PyRIT for model-only CI/CD workflows, AI Red Teaming Agent for comprehensive agent testing pre-deployment.
Prioritere remediering
Severity ranking (Microsoft Security Benchmark):
| Severity | Eksempel | Remediation SLA | Action |
|---|---|---|---|
| Critical | Data leakage (PII, financial), Unauthorized actions (file deletion) | Immediate | Block deployment, retrain model, tighten plugin permissions |
| High | Jailbreak success, Prompt injection bypasses content filter | 24-48 hours | Update safety meta-prompts, enable Prompt Shields, add input validation |
| Medium | Low-severity biases, Ungrounded attributes | 1 week | Fine-tune model, add disclaimers, improve grounding |
| Low | Edge-case failures, Ambiguous responses | 2 weeks | Document known limitations, monitor in production |
Integrasjon med Microsoft-stakken
Azure AI Foundry
AI Red Teaming Agent (native integration):
- Foundry-hosted prompt agents (✅ supported)
- Foundry-hosted container agents (✅ supported)
- Foundry workflow agents (❌ not supported)
- Azure tool calls (✅ supported)
- Function tool calls (❌ not supported)
Comprehensive tools list: Azure AI Foundry Tools
Azure OpenAI Service
PyRIT target integration:
from pyrit.prompt_target import AzureOpenAICompletionTarget
azure_openai_config = {
"azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
"api_key": os.environ.get("AZURE_OPENAI_KEY"),
"azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
}
target = AzureOpenAICompletionTarget(
deployment_name=azure_openai_config["azure_deployment"],
endpoint=azure_openai_config["azure_endpoint"],
api_key=azure_openai_config["api_key"]
)
Azure AI Content Safety
Prompt Shields (Jailbreak risk detection):
- User Prompt Attacks (UPIA): Direct jailbreak attempts (role-play, encoding, rule changes)
- Indirect Prompt Attacks (XPIA): Malicious instructions i external data sources
Integrasjon med red teaming:
- Run red teaming scan (PyRIT/AI Red Teaming Agent)
- Identify successful jailbreaks (ASR)
- Enable Prompt Shields for identified attack vectors
- Re-test to validate mitigation effectiveness
Azure Monitor & Sentinel
Logging red teaming outcomes:
Azure Log Analytics workspace:
- Detected vulnerabilities
- Attack success rates (ASR per risk category)
- System responses (refused vs. compliant)
- Anomaly detection (patterns of concern)
Alert configuration:
- Trigger on successful prompt injection (ASR > 10% for critical risks)
- Escalate to security team via Azure Monitor alerts
- Integrate with Azure Sentinel for SIEM correlation
Azure DevOps & GitHub Actions
CI/CD pipeline integration example:
# GitHub Actions example
name: AI Red Teaming on Model Update
on:
push:
paths:
- 'models/**'
jobs:
red-team-scan:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Install PyRIT
run: pip install pyrit
- name: Run automated red teaming
run: python scripts/run_pyrit_scan.py
env:
AZURE_OPENAI_ENDPOINT: ${{ secrets.AZURE_OPENAI_ENDPOINT }}
AZURE_OPENAI_KEY: ${{ secrets.AZURE_OPENAI_KEY }}
- name: Upload results to Azure Blob Storage
run: az storage blob upload --file results.json --container red-teaming
- name: Fail if ASR exceeds threshold
run: python scripts/check_asr_threshold.py
MITRE ATLAS Integration
PyRIT alignment med MITRE ATLAS tactics:
| MITRE ATLAS Tactic | PyRIT Test Scenario |
|---|---|
| AML.TA0000 (Reconnaissance) | Model probing for training data artifacts |
| AML.TA0001 (Initial Access) | Prompt injection / jailbreaking |
| AML.TA0010 (Exfiltration) | Model inversion, membership inference (simulert) |
| AML.TA0009 (Impact) | Biased outputs, operational disruptions |
Konfidensmarkør: ✅ Microsoft Security Benchmark refererer eksplisitt til MITRE ATLAS for structured attack simulations.
Offentlig sektor (Norge)
Regulatory compliance
EU AI Act implications:
- High-risk AI systems (definert i Annex III) krever mandatory conformity assessment før deployment
- Red teaming er implisitt requirement under Article 9 (risk management system)
- Documentation av red teaming results kan inngå i technical documentation (Article 11)
Norsk Personvernforordning (GDPR):
- Red teaming skal ikke bruke ekte persondata uten consent (synthetic data anbefales)
- Data Protection Impact Assessment (DPIA) bør inkludere red teaming findings for høyrisiko AI
Konfidensmarkør: ⚠️ MEDIUM — EU AI Act er under implementering (tredde i kraft 2024), norske myndigheter utvikler veiledning.
Statens vegvesen-spesifikke vurderinger
Use cases med mandatory red teaming:
- AI-systemer som påvirker trafikksikkerhet (autonomous systems, traffic prediction)
- Chatbots som håndterer sensitive brukerdata (kjøretøyregistrering, førerkortinformasjon)
- Decision-support systems for inspeksjon eller enforcement
Data sovereignty:
- Red teaming i cloud (AI Red Teaming Agent) krever vurdering av data residency (region support begrenset til US/EU regions)
- PyRIT local deployment gir full data kontroll (no data leaves premises)
Cross-functional red teaming teams:
- AI-utviklere (teknisk exploit)
- Domeneeksperter (Statens vegvesen domain knowledge)
- Sikkerhetsteam (threat modeling)
- Juridisk (compliance vurdering)
Kostnad og lisensiering
PyRIT (Open-Source)
| Komponent | Lisens | Kostnad |
|---|---|---|
| PyRIT framework | MIT License | Gratis |
| Compute | N/A | Egen hardware eller cloud compute |
| Target API costs | Varierer | Azure OpenAI pay-per-token, Hugging Face Inference API, etc. |
Estimert compute cost (local PyRIT):
- Single red teaming run (100 prompts, 4 risk categories): ~40 000 tokens → ~200 NOK (gpt-4o-mini @ $0.15/1M input tokens)
- CI/CD integrated (daily scans): ~6 000 NOK/måned
Azure AI Red Teaming Agent (Preview)
| Komponent | Pricing Model | Estimat |
|---|---|---|
| AI Red Teaming Agent | Preview (ingen publisert pricing feb 2026) | TBD |
| Azure AI Foundry compute | Per-second billing for deployed models | Varierer (model size, region) |
| Azure Log Analytics | Pay-as-you-go (data ingestion + retention) | ~100 NOK/GB/måned |
| Azure Blob Storage | Standard storage (audit trails) | ~0.20 NOK/GB/måned |
Konfidensmarkør: ⚠️ LOW — Pricing for AI Red Teaming Agent ikke publisert (preview-fase).
Lisenskrav
| Microsoft-produkt | Minimum lisens |
|---|---|
| Azure AI Foundry | Azure subscription (Pay-As-You-Go eller Enterprise Agreement) |
| Azure OpenAI Service | Azure subscription + approved application |
| Azure AI Content Safety | Inkludert i Azure AI Services (pay-per-transaction) |
| PyRIT | Ingen (MIT License open-source) |
For arkitekten (Cosmo)
Red Teaming som arkitekturprinsipp
Mindset shift: Red teaming er ikke en "nice-to-have" sikkerhetstiltak — det er en arkitekturell constraint som påvirker design decisions fra dag 1.
Spørsmål å stille i enhver AI-arkitekturrådgivning:
-
Har kunden en red teaming-plan?
- Hvis nei: Start med PyRIT local prototype (low-friction onboarding)
- Hvis ja: Evaluer gap mellom plan og implementation (verktøy, cadence, cross-functional teams)
-
Er AI-systemet high-risk i henhold til EU AI Act?
- Ja → Mandatory red teaming, dokumenter results for conformity assessment
- Nei → Red teaming fortsatt anbefalt (reputational risk, security posture)
-
Model-only eller agentic architecture?
- Model-only → PyRIT (CI/CD integration, content risks)
- Agentic → AI Red Teaming Agent (agentic risks: prohibited actions, data leakage, task adherence)
-
Hva er kundens risk appetite for ASR?
- Zero-tolerance (critical data/safety) → ASR < 1% for critical risks, block deployment ved failures
- Moderate (internal tooling) → ASR < 10%, log-and-monitor approach
- Eksperimentell (R&D) → No threshold, focus on discovering edge cases
-
Hvem eier red teaming-prosessen?
- Ideal: Cross-functional team (AI devs, security, domain experts)
- Realitet: Ofte siloed (security-only eller dev-only) → Identifiser gaps, foreslå collaboration model
Conversation starters med kunder
Scenario 1: Kunde planlegger å deploye Azure OpenAI chatbot
"Før deployment bør vi kjøre AI red teaming for å identifisere prompt injection-risiko. Jeg anbefaler å starte med PyRIT i CI/CD pipeline — det tar 2-3 timer å sette opp første scan, og gir oss Attack Success Rate for de fire core content risks. Basert på resultater kan vi enable Prompt Shields i Azure AI Content Safety som mitigation."
Scenario 2: Kunde har agent med tool use (Azure Functions, Azure Search)
"Fordi agenten har tool access, må vi teste for agentic risks — ikke bare content risks. Azure AI Red Teaming Agent i cloud kan simulere prohibited actions (f.eks. file deletion) og sensitive data leakage. Vi setter opp purple environment med mock tools, kjører scan pre-deployment, og bruker resultater til å tighten permissions på function-nivå."
Scenario 3: Kunde spør om 'hvor ofte vi må red teame'
"Microsoft Security Benchmark anbefaler continuous red teaming med monthly eller quarterly cadence. For deres use case foreslår jeg: (1) Automated PyRIT scans i CI/CD per model update, (2) Comprehensive AI Red Teaming Agent scan quarterly, (3) Manual red teaming post-incident. Dette balanserer coverage med resource constraints."
Trade-offs og gotchas
| Trade-off | Implikasjon | Cosmos råd |
|---|---|---|
| Automated vs. Manual red teaming | Automated gir scale, manual gir creativity og edge-case discovery | Start automated (PyRIT), supplement med manual quarterly |
| Local vs. Cloud | Local gir data control, cloud gir agentic risk coverage | Hybrid: PyRIT for CI/CD, AI Red Teaming Agent for pre-deployment gates |
| ASR threshold setting | Strict threshold (ASR < 1%) blokkerer deployment ofte, loose threshold (ASR < 20%) gir false sense of security | Segment per risk: Critical risks strict (< 1%), Medium risks moderate (< 10%) |
| False positives i ASR | Generative evaluators er non-deterministic, kan flagge benign responses | Alltid manual review av flagged responses før remediation |
| Synthetic data i purple environment | Mock tools ikke representative av real data distribution | Document limitations, supplement med manual testing on real staging data (sanitized) |
Når si nei til red teaming
Red flags: Kunde ønsker å red teame i production med live user data → NEI
Alternativer:
- Purple environment med production-like config
- Staging environment med sanitized data
- Synthetic data generation for agentic scenarios
Konfidensmarkør: ✅ Purple environment-pattern er Microsoft best practice.
Ressurser for videre læring
Microsoft AI Red Team Training Series (10 episoder):
- Episode 1-2: Fundamentals
- Episode 3-6: Attack techniques (direct/indirect prompt injection, single/multi-turn)
- Episode 7: Defense strategies (Spotlighting, Prompt Shields)
- Episode 8-10: Automation with PyRIT
Hands-on labs: https://aka.ms/AIRTlabs
PyRIT documentation: https://azure.github.io/PyRIT/
Kilder og verifisering
Microsoft Learn dokumentasjon
| Kilde | URL | Verifikasjonsdato |
|---|---|---|
| AI Red Teaming Agent (preview) | https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/ai-red-teaming-agent | 2026-02-03 |
| Microsoft Security Benchmark: AI-7 Continuous Red Teaming | https://learn.microsoft.com/en-us/security/benchmark/azure/mcsb-v2-artificial-intelligence-security#ai-7-perform-continuous-ai-red-teaming | 2026-02-03 |
| AI Red Teaming Training Series | https://learn.microsoft.com/en-us/security/ai-red-team/training | 2026-02-03 |
| Planning red teaming for LLMs | https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/red-teaming | 2026-02-03 |
| Prompt Shields (Jailbreak detection) | https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/jailbreak-detection | 2026-02-03 |
Open-source verktøy
| Tool | Repository | Lisens |
|---|---|---|
| PyRIT | https://github.com/Azure/PyRIT | MIT License |
| MITRE ATLAS | https://atlas.mitre.org/ | Free (non-commercial) |
| Adversarial Robustness Toolbox (ART) | https://github.com/Trusted-AI/adversarial-robustness-toolbox | MIT License |
Bransje-ressurser
| Ressurs | Utgiver | Relevans |
|---|---|---|
| OWASP Top 10 for LLM Applications | OWASP Foundation | Threat taxonomy |
| NIST AI Risk Management Framework (AI RMF) | NIST | Risk governance framework |
| Three takeaways from red teaming 100 generative AI products | Microsoft Security Blog (jan 2025) | Real-world lessons |
Sist oppdatert: 2026-02-03 Neste review: 2026-05-03 (quarterly review anbefalt for rapidly evolving field)
For Cosmo: Quick Reference Card
Når kunden sier: "Vi må teste sikkerheten i vår Azure OpenAI-løsning"
Cosmo svarer:
- ✅ Start med PyRIT i CI/CD pipeline (automated content risk testing)
- ⚠️ Hvis agent med tool use → AI Red Teaming Agent (agentic risks)
- 🔄 Establish continuous red teaming cadence (monthly/quarterly)
- 📊 Track Attack Success Rate (ASR) per risk category, set thresholds
- 🛡️ Mitigate via Prompt Shields, safety meta-prompts, input validation
- 📝 Document findings for EU AI Act compliance (if high-risk system)
Decision tree:
AI System Type?
├─ Model-only (chatbot, completion) → PyRIT (local)
└─ Agent (tool use, RAG, function calling)
├─ Content risks only → PyRIT (local)
└─ Agentic risks (prohibited actions, data leakage) → AI Red Teaming Agent (cloud)
Confidence reminder: PyRIT = production-ready ✅, AI Red Teaming Agent = preview ⚠️