While the world of AI can often feel a bit like the wild west, there is a surprisingly high amount of analysis, benchmarking, and testing that goes on behind the scenes. Not just from the companies themselves, but from groups set up to establish their own rankings.

These groups test everything from a chatbot’s ability to complete mathematical tests, create images, show reasoning, offer medical advice, or simply how emotionally intelligent they are.

Across these different tests, models go up and down, showing their strengths and weaknesses in different areas. For example, while GPT-5 is great at scientific reasoning, it fell behind the likes of Gemini and Claude for its ability to adapt to new concepts.

Each of these tests tells us something new about AI models, and they are important as

See Full Page