Tag #reward-hacking 1 post tagged reward-hacking. ← All topics alignment Model Alignment: What It Is, How It Works, and Where It Fails Model alignment trains AI systems to follow human intent rather than optimize for proxy metrics. Here's what the main techniques actually do, how they're bypassed, and what defenders must layer on top. May 10, 2026