In a paper published earlier this month, OpenAI researchers said they'd found the reason why even the most powerful AI models still suffer from rampant "hallucinations," in which products like ChatGPT confidently make assertions that are factually false.

They found that the way we evaluate the output of large language models, like the ones driving ChatGPT, means they're "optimized to be good test-takers" and that "guessing when uncertain improves test performance."

In simple terms, the creators of AI incentivize them to guess rather than admit they don't know the answer — which might be a good strategy on an exam, but is outright dangerous when giving high-stakes advice about topics like medicine or law.

While OpenAI claimed in an accompanying blog post that "there is a straightforward

See Full Page