OpenAI wants its next generation of AI models to be a lot more upfront about their mistakes. With ChatGPT wrong about 25% of the time, this feature seems long overdue. But the company isn't training them to be more self-aware; it's training them to report errors directly.

This week, OpenAI published new research on a technique it's calling “confessions” — a method that adds a second output channel to a model, where it’s specifically trained to describe whether it followed the rules, where it may have fallen short or hallucinated and what uncertainties it faced during the task.

Here's the thing, though. It’s not a ChatGPT feature that's available yet to users; instead, it's a proof-of-concept safety tool designed to help researchers detect subtle failures that are otherwise hard to see. A

See Full Page