OpenAI researchers tried to train the company's AI to stop "scheming" — a term the company defines as meaning "when an AI behaves one way on the surface while hiding its true goals" — but their efforts backfired in an ominous way.

In reality, the team found, they were unintentionally teaching the AI how to more effectively deceive humans by covering its tracks.

"A major failure mode of attempting to 'train out' scheming is simply teaching the model to scheme more carefully and covertly," OpenAI wrote in an accompanying blog post.

As detailed in a new collaboration with AI risk analysis firm Apollo Research, engineers attempted to develop an "anti-scheming" technique to stop AI models from "secretly breaking rules or intentionally underperforming in tests."

They found that they could on

See Full Page