Chatbots are genuinely impressive when you watch them do things they're good at , like writing a basic email or creating weird futuristic-looking images . But ask generative AI to solve one of those puzzles in the back of a newspaper, and things can quickly go off the rails.
That's what researchers at the University of Colorado Boulder found when they challenged large language models to solve Sudoku. And not even the standard 9x9 puzzles. An easier 6x6 puzzle was often beyond the capabilities of an LLM without outside help (in this case, specific puzzle-solving tools).
A more important finding came when the models were asked to show their work. For the most part, they couldn't. Sometimes they lied. Sometimes they explained things in ways that made no sense. Sometimes they halluci