GuardML
// GuardML

Defensive AI — guardrails, content filters, model defenses, safe deployment.

Practical coverage of defensive AI engineering. Guardrails for LLMs, content filters and moderation pipelines, model defenses against adversarial attacks, output safety, and how to ship AI features without shipping liability with them.

Output classification pipeline visualization
// Pinned

Output Classification: Building a PII and Secrets Detector for LLM Applications

Most output filters catch the obvious cases and miss the long tail. Here's how to build an output classifier that's actually deployable in production.

May 7, 2026 [defense]
tooling

Content Moderation Tools for LLM Applications: What Works and Where They Break

A practitioner's guide to the leading content moderation tools for LLM applications—OpenAI Moderation API, Llama Guard, Perspective API, and others—covering capabilities, documented bypasses, and a layered deployment strategy.

deep-dive

OpenAI's Under-18 Principles: a guardrail engineer reads the new Model Spec

OpenAI's December Model Spec adds Root-level Under-18 Principles that bind the model even against jailbreak framing. The defense is real, the bypass surface is well-documented, and the deployment lessons cut across every team shipping age-gated AI.

content-filter

AI Content Moderation: How LLM Filters Work and Where They Break

A technical breakdown of AI content moderation for LLM applications — how classifier-based guardrails work, the bypass techniques that defeat them, and how to layer defenses that hold under real adversarial pressure.

// Earlier notes

Subscribe

GuardML — in your inbox

Defensive AI — guardrails, content filters, model defenses, safe deployment. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.