OpenAI, the company behind ChatGPT, has announced a new benchmark for testing its GPT-5 model, which involves pitting the AI directly against human experts in a variety of occupations.
The GDPval full set includes 1,320 specialized tasks, each meticulously crafted and vetted by experienced professionals with over 14 years of experience on average from these fields.
OpenAI
The benchmark is called GDPval and is responsible for assessing how close ChatGPT is getting to outperforming humans at "economically valuable, real-world tasks". That means moving beyond things like academic tests and coding competitions towards jobs that are carried out in the real world: nursing, financial management, engineering or journalism.
This is all part of OpenAI's effort to establish artificial general int