Tag

#bypass

3 posts tagged bypass.

bypass

G4-MeroMero-31B: Abliteration Drops Refusal Rate 99% to 15%

A new uncensored fine-tune of Gemma 4 31B achieves a 15/100 refusal rate via Arbitrary-Rank Ablation on attention output projections — KL divergence 0.
May 15, 2026
guardrails

ChatGPT Safety: How OpenAI's Guardrails Work and Fail

ChatGPT safety explained: how RLHF, Rule-Based Rewards, safe-completions, and the Moderation API work, plus the jailbreaks that defeat each layer.
May 10, 2026
content-filter

AI Content Filter: Architecture, Bypasses, and Layered Defense

A practitioner's breakdown of AI content filter approaches — classifier-based, LLM-as-judge, and guard models — with honest coverage of bypass techniques
May 8, 2026