Defensive AI Security Resources
Papers, tools, frameworks, and communities for teams building guardrails, monitoring, and safety controls into AI systems.
Foundational Papers
- Constitutional AI: Harmlessness from AI FeedbackBai et al., 2022 #alignment
- Llama Guard: LLM-based Input-Output SafeguardInan et al., 2023 #guardrails
- NeMo Guardrails: A Toolkit for Controllable and Safe LLM ApplicationsRebedea et al., 2023 #guardrails
- Baseline Defenses for Adversarial Attacks Against Aligned Language ModelsJain et al., 2023 #defenses
- Detecting Language Model Attacks with PerplexityAlon & Kamfonas, 2023 #detection
- Self-Refine: Iterative Refinement with Self-FeedbackMadaan et al., 2023 #alignment
Frameworks & Standards
- NIST AI Risk Management Framework (AI RMF)NIST #framework
- OWASP LLM Top 10OWASP #framework
- MITRE ATLAS — MitigationsMITRE #framework
- ISO/IEC 42001 AI Management SystemISO #standard
- EU AI Act — High-Risk System RequirementsEuropean Commission #regulation
Tools & Libraries
- Guardrails AI
Python framework for adding structural and semantic validation to LLM outputs.
#guardrails - Lakera Guard
Real-time LLM security API for detecting prompt injection and harmful content.
#detection - Protect AI
AI security platform covering model scanning, supply chain, and runtime protection.
#platform - LlamaIndex — Security Best Practices
RAG framework with access-control patterns and retrieval safeguards.
#rag - rebuff
Open-source prompt injection detector using layered heuristics and LLM-based checks.
#detection
Talks & Videos
- DEF CON AI Village Talks (YouTube)
Presentations on AI security including red-teaming, defenses, and policy.
#conference - USENIX Security AI Track
Peer-reviewed research on AI and ML security, with talk recordings.
#conference - AI Safety Fundamentals (BlueDot)
Free curriculum on technical AI safety including alignment and interpretability.
#course
Certifications & Training
- GIAC Machine Learning Security (GMLS)
GIAC certification covering ML security, threat modeling, and defense strategies.
#certification - EC-Council Certified AI Security Practitioner
Practical AI security certification covering LLM attacks and defensive controls.
#certification - Secure AI Engineer (Pluralsight)
Courses on building secure AI applications and detecting ML vulnerabilities.
#course
Communities
- MLSecOps Community
Practitioners securing ML pipelines, model registries, and LLM deployments.
#community - OWASP AI Security and Privacy Guide
OWASP working group on AI security best practices and guidance.
#community - AI Risk and Governance Forum
Multi-stakeholder body producing AI safety and governance guidelines.
#policy - HuggingFace Safety Team
Open-source safety evaluations and bias benchmarks for transformer models.
#tools