AI benchmarks hampered by bad science • The Register

The Register

The Register6 hrs ago

AI benchmarks hampered by bad science • The Register

AI companies regularly tout their models' performance on benchmark tests as a sign of technological and intellectual superiority. But those results, widely used in marketing, may not be meaningful.

A study [PDF] from researchers at the Oxford Internet Institute (OII) and several other universities and organizations has found that only 16 percent of 445 LLM benchmarks for natural language processing and machine learning use rigorous scientific methods to compare model performance.

What's more, about half the benchmarks claim to measure abstract ideas like reasoning or harmlessness without offering a clear definition of those terms or how to measure them.

In a statement, Andrew Bean, lead author of the study said, "Benchmarks underpin nearly all claims about advances in AI. But without sh

44

North Korean and Russian military officials discuss further cooperation in Pyongyang

North Korean and Russian military officials discuss further cooperation in Pyongyang

Associated Press Top News

Associated Press Top News20 hrs ago

61

Trump’s VP just gave away the game on plan to ignore judge's SNAP ruling

Trump’s VP just gave away the game on plan to ignore judge's SNAP ruling

AlterNet15 hrs ago

611

Trump, 79, Slams ‘Stench’ of Political Rival, 85, in Vile Rant

Trump, 79, Slams ‘Stench’ of Political Rival, 85, in Vile Rant

The Daily Beast

The Daily Beast12 hrs ago

114

‘Stranger Things’ Season 5 First Scene Reveals Truth About Will

‘Stranger Things’ Season 5 First Scene Reveals Truth About Will

The radio station 99.5 The Apple

The radio station 99.5 The AppleJust now

33

Wendell Pierce Defends Meghan Markle Against Claims She Was ‘Difficult’

Wendell Pierce Defends Meghan Markle Against Claims She Was ‘Difficult’

US Magazine1 hrs ago

83

Notre Dame men cruise past Detroit Mercy 102

Notre Dame men cruise past Detroit Mercy 102

WNDU SportsJust now

34

Looks like you've reached the bottom