In a new report, OpenAI said it found that AI models lie, a behavior it calls “scheming.” The study performed with AI safety company Apollo Research tested frontier AI models. It found “problematic behaviors” in the AI models, which most commonly looked like the technology “pretending to have completed a task without actually doing so.” Unlike “hallucinations,” which are akin to AI taking a guess when it doesn’t know the correct answer, scheming is a deliberate attempt to deceive.

Luckily, researchers found some hopeful results during testing. When the AI models were trained with “deliberate alignment,” defined as “teaching them to read and reason about a general anti-scheming spec before acting,” researchers noticed huge reductions in the scheming behavior. The method results in a “~30×

See Full Page