LLM Security Tools: 2026 Scanner & Guardrail Guide

The market for LLM security tools has matured significantly since 2023, when teams were largely writing ad-hoc keyword filters and hoping for the best. Today there are purpose-built frameworks for each phase of the security lifecycle: pre-deployment red-teaming, runtime guardrails, and post-deployment observability. This guide covers the tools worth knowing, what threat surface each one actually addresses, and where the gaps remain. It pairs with our AI safety tools guide, which goes deeper on benchmark numbers for the runtime-classifier layer.

The OWASP Top 10 for LLM Applications 2025 ↗ is the closest thing the industry has to a shared threat model. It identifies ten risk categories — prompt injection, sensitive data disclosure, supply chain attacks, data poisoning, improper output handling, excessive agency, system prompt leakage, vector/embedding weaknesses, misinformation, and unbounded consumption — and it is useful as an evaluation checklist when selecting tools. Any LLM security tool should be assessed against which of these it actually addresses and with what fidelity.

LLM Security Tools by Category

Use this as a quick reference before drilling into each layer. The categories map to where a tool sits in the lifecycle and which part of the threat surface it addresses. Entries are representative, not exhaustive, and the field moves quickly.

Category	Representative tools	Threat surface addressed	Maturity notes
Scanners / red-teaming	Promptfoo, NVIDIA garak, Microsoft PyRIT	Prompt injection, jailbreaks, PII leakage, authorization and tool-call abuse surfaced pre-launch	Open source; CI/CD-friendly; findings map to the OWASP LLM Top 10
Guardrails / runtime	LLM Guard, NeMo Guardrails, LlamaFirewall, Guardrails AI	Input/output filtering, policy and dialog enforcement, agent action mediation	Open source; production-adopted; classifier-based and policy-based approaches differ in expressiveness
Prompt-injection detection	LLM Guard scanner, LlamaFirewall PromptGuard, Rebuff, Lakera Guard	Direct and indirect prompt injection, jailbreak strings, paraphrased attacks	ML classifiers generalize past regex; no single detector is bypass-proof
Monitoring / observability	Langfuse, Arize Phoenix, Helicone	Tracing, guardrail-score logging, drift detection, incident reconstruction	Open source; integrates with LangChain, LlamaIndex, and vendor SDKs
Benchmarks	JailbreakBench, HarmBench, AgentDojo, AdvBench	Standardized measurement of attack success and defense robustness	Research-maintained; useful for sanity-checking vendor and framework claims

For deeper dives on individual layers, see the explainers on output classification for PII and secrets detection, LLM safety fundamentals, and the LLM guardrails guide.

Pre-Deployment: Red Teaming and Vulnerability Scanning

Red teaming tools run before you ship. They probe your model and application stack systematically, surfacing vulnerability classes that manual testing misses. This phase is where you find out whether your guardrails survive adversarial input, not after you’re in production.

Promptfoo is the most widely deployed open-source option in this category. It runs automated adversarial test suites against LLM endpoints and can be wired into CI/CD. The tool tests across 50+ vulnerability types organized into plugins: prompt injection, jailbreaks, PII leakage from RAG context, BOLA (broken object-level authorization), BFLA (broken function-level authorization), data exfiltration via tool calls, and indirect injection through external content. Promptfoo uses diverse adversarial input strategies — not just static fixtures — which means it surfaces failures that a fixed test suite would miss. It covers OWASP LLM Top 10 categories, NIST AI RMF controls, and EU AI Act presets. OpenAI acquired Promptfoo in March 2026; it remains MIT licensed.

The [Promptfoo red-team ↗ documentation](https://www.promptfoo.dev/docs/red-team/ ↗) is worth reading specifically for its treatment of application-layer threats. Most LLM vulnerabilities that matter in practice live at the application layer — a model that handles tool calls, retrieves documents, or operates inside an agent chain has a much larger attack surface than a simple chat endpoint. The docs reflect this: the framework’s plugin architecture maps directly to the attack surface of realistic deployments.

For adversarial ML testing, adversarialml.dev ↗ tracks research on evasion attacks against classifiers and detectors — relevant when your guardrail layer itself becomes the target.

Runtime: Guardrail Toolkits

Runtime guardrails intercept traffic between the user and the model. The three open-source frameworks with the most production adoption are LLM Guard, NeMo Guardrails, and LlamaFirewall. For the architecture these slot into — the input, retrieval, dialog, execution, and output planes — see our LLM guardrails explainer. To map these frameworks onto your specific app and risk tolerance, our guardrail stack builder recommends which component to run at each layer and why.

LLM Guard, maintained by Protect AI, is a Python toolkit structured around scanners that run on inputs and outputs independently. It ships 15 input scanners and 20 output scanners covering prompt injection detection, PII anonymization, secrets detection, toxicity classification, malicious URL identification, and factual consistency checks. The prompt injection scanner uses a fine-tuned DeBERTa-v3 model rather than regex, which means it generalizes better across paraphrase and indirect injection patterns. All processing runs locally — no prompt data leaves your infrastructure — which matters for compliance-sensitive deployments. See the LLM Guard repository ↗ for scanner configuration details.

NeMo Guardrails (NVIDIA) takes a different architectural approach. Rather than a scanner pipeline, it introduces a Colang-based policy language that lets you define conversation flows, topic restrictions, and response constraints as explicit rules. This is more expressive than a scanner stack for use cases where you need to enforce dialog structure — a customer service bot that should never discuss competitor pricing, or a copilot that must follow a specific decision tree. The tradeoff is that Colang adds a layer of complexity that scanner-based tools avoid.

LlamaFirewall (Meta) targets the agentic deployment case specifically, where the threat model is harder: the model takes actions, calls tools, and operates over long contexts where prompt injection can arrive through tool outputs, not just user messages — the precise failure our MCP tool poisoning write-up dissects. LlamaFirewall comprises three components ↗: PromptGuard 2 (a jailbreak detector claiming state-of-the-art performance), Agent Alignment Checks (a chain-of-thought auditor that reads the agent’s reasoning trace and flags goal deviation or injected instructions), and CodeShield (a static analysis engine that intercepts unsafe code before execution). LlamaFirewall is production-deployed inside Meta’s own systems. For teams building agents rather than simple chat endpoints, it deserves serious evaluation.

One thing all three frameworks share: they are trained on existing attack patterns and degrade on novel ones. The OWASP Prompt Injection Prevention Cheat Sheet ↗ is direct about this: “research shows attackers can eventually bypass safety measures through sufficient variation attempts.” A guardrail toolkit reduces your attack surface substantially, but it does not close it. Layer it with structural mitigations — privilege minimization, clear instruction/data separation in prompts, output schema enforcement — not just classifier scoring.

For context on documented bypasses in the wild, ai-alert.org ↗ tracks reported incidents involving guardrail failures and jailbreak disclosures, which is useful for calibrating how much real-world adversarial pressure looks like your threat model.

Observability: Knowing When You’re Losing

A guardrail that fails silently is worse than no guardrail, because it creates false confidence. LLM security tools in the observability layer give you the signal to detect when runtime defenses are being bypassed, when behavior is drifting, or when a new attack pattern is landing.

Langfuse is the most widely used open-source option here. It provides tracing for LLM applications — capturing inputs, outputs, latency, and cost at each step of a chain — and supports attaching evaluation scores to traces, including custom classifier scores from your guardrail stack. This means you can log not just what the model said, but what your scanners scored, flag borderline cases, and build dashboards that surface patterns over time. Integrations exist for LangChain, LlamaIndex, and direct OpenAI/Anthropic SDK instrumentation.

The observability layer also matters for agentic systems where multi-step reasoning traces are your primary audit surface. If you cannot reconstruct what an agent did and why, you cannot investigate incidents. llmops.report ↗ covers the operational patterns for deploying and monitoring LLM systems in production, including trace management for agents.

Combining the Layers

No single tool covers the full threat model. A defensible stack in 2026 looks like:

Pre-deployment: Promptfoo integrated into CI/CD, running against staging endpoints before each release. Covers prompt injection, jailbreaks, PII leakage, and authorization failures.
Runtime input screening: LLM Guard’s input scanners or LlamaFirewall’s PromptGuard 2 on the request path. Fast ML classifiers, not regex, to handle paraphrase and multilingual inputs.
Runtime output screening: LLM Guard’s output scanners for PII redaction and secrets detection. Schema validation for structured outputs. LlamaFirewall’s CodeShield for agent code generation.
Observability: Full request/response tracing with guardrail scores attached. Alerting on score distributions, not just individual flags.

The OWASP LLM Top 10 categories that this stack leaves least covered are supply chain (LLM03), vector/embedding weaknesses (LLM08), and unbounded consumption (LLM10). Supply chain risks require vendor assessment and model provenance tracking outside the runtime stack. Embedding security requires schema validation on retrieval results and query sanitization before vector search. Rate limiting and cost controls address unbounded consumption, which standard guardrail frameworks do not handle.

Effective LLM security is not a product purchase — it is a layered operational posture. Pick tools that cover your actual attack surface, instrument them so you can see failures, and run the red-teaming loop continuously rather than as a one-time pre-launch check.

FAQ

What are the best LLM security tools? The best LLM security tools depend on the phase they target. For pre-deployment red-teaming, Promptfoo, NVIDIA garak, and Microsoft PyRIT lead the open-source field. For runtime defense, LLM Guard, NeMo Guardrails, and LlamaFirewall are the most widely adopted. Langfuse dominates observability. No single tool covers the full OWASP LLM Top 10, so most teams layer several across the lifecycle.

What tools scan LLMs for vulnerabilities? Tools that scan LLMs for vulnerabilities include Promptfoo, which runs adversarial test suites across dozens of vulnerability types; NVIDIA garak, a probe-based scanner for jailbreaks and prompt injection; and Microsoft PyRIT, an automated risk-identification framework. These scanners map findings to the OWASP LLM Top 10 and can run in CI/CD before each release rather than as a one-time pre-launch check.

What are LLM security testing tools? LLM security testing tools are frameworks that probe a model or application for weaknesses before and during deployment. Pre-deployment tools such as Promptfoo and garak generate adversarial inputs to surface prompt injection, jailbreaks, and data leakage. Runtime testing overlaps with guardrail toolkits like LLM Guard that score live traffic continuously, while benchmarks such as JailbreakBench standardize comparison across defenses.

Are there open-source LLM security tools? Yes. Most leading LLM security tools are open source. Promptfoo, garak, and PyRIT cover red-teaming; LLM Guard, NeMo Guardrails, and LlamaFirewall handle runtime guardrails; and Langfuse provides observability. Open-source options let teams run scanning locally without sending prompt data to a third party, which matters for compliance-sensitive deployments. Commercial layers exist but are rarely required to get started.

What LLM security scanners detect prompt injection? LLM security scanners that detect prompt injection include LLM Guard, whose scanner uses a fine-tuned DeBERTa classifier; LlamaFirewall’s PromptGuard component; Rebuff; and the commercial Lakera Guard. These detectors generalize better than regex across paraphrased and indirect injection, but the OWASP cheat sheet cautions that determined attackers can bypass any single classifier through repeated variation.

Sources

OWASP Top 10 for LLM Applications 2025 ↗ — The threat taxonomy the industry has converged on; covers ten risk categories with mitigations and references.
LLM Guard (Protect AI) ↗ — Open-source scanner toolkit; repository contains scanner documentation, configuration examples, and benchmark data.
LlamaFirewall (Meta AI Research) ↗ — Research publication describing the architecture and evaluation of Meta’s open-source agent guardrail framework.
Promptfoo ↗ — Open-source LLM red-teaming and vulnerability scanning CLI; MIT licensed, used by OpenAI and Anthropic internally.
OWASP LLM Prompt Injection Prevention Cheat Sheet ↗ — Concrete defensive controls for prompt injection, including the honest caveat that current defenses have known bypass limits.

For more context, AI defense strategies ↗ covers related topics in depth.

LLM Security Tools: 2026 Scanner & Guardrail Guide

LLM Security Tools by Category

Pre-Deployment: Red Teaming and Vulnerability Scanning

Runtime: Guardrail Toolkits

Observability: Knowing When You’re Losing

Combining the Layers

FAQ

Sources

Sources

GuardML — in your inbox

Related

AI Moderation Tools for LLMs: What Works and What Gets Bypassed

LLM Guardrails: Types, Tools & Bypasses (2026 Guide)

AI Safety Tools: Guardrails, Moderation & Red-Teaming (2026)

Comments