Tag
#constitutional-ai
2 posts tagged constitutional-ai.
- alignment
LLM Alignment: What It Does, Where It Breaks, How to Deploy
LLM alignment trains models to internalize safety constraints — but every technique has documented bypass paths. Here's how RLHF, DPO, and Constitutional AI work, and what practitioners need to layer on top.
- alignment
Model Alignment: What It Is, How It Works, and Where It Fails
Model alignment trains AI systems to follow human intent rather than optimize for proxy metrics. Here's what the main techniques actually do, how they're bypassed, and what defenders must layer on top.