Search-capable AI agents may cheat on benchmark tests • The Register

The Register

The Register16 hrs ago

Search-capable AI agents may cheat on benchmark tests • The Register

Researchers with Scale AI have found that search-based AI models may cheat on benchmark tests by fetching the answers directly from online sources rather than deriving those answers through a "reasoning" process.

Scale AI computer scientists Ziwen Han, Meher Mankikar, Julian Michael, and Zifan Wang refer to the phenomenon as "Search-Time Data Contamination," which they describe in a paper published to the AI data provider's website.

On their own, AI models suffer from a significant limitation: They're trained at a specific point in time on a limited set of data and thus lack information about anything after that training data cut-off date.

So to better handle inquiries about current events, firms like Anthropic, Google, OpenAI, and Perplexity have integrated search capabilities into the

141

Analysis-Did Trump save Intel? Not really

Analysis-Did Trump save Intel? Not really

Reuters US Top13 hrs ago

583

🖥️ Microsoft Windows 11 Pro is under $15 for the new school year

🖥️ Microsoft Windows 11 Pro is under $15 for the new school year

The News-Star7 hrs ago

77

Unauthorized immigrant numbers reached highest on record under Biden: report

Unauthorized immigrant numbers reached highest on record under Biden: report

FOX News Politics

FOX News Politics19 hrs ago

1221

Judge blocks Trump from cutting funding from 34 cities and counties over ‘sanctuary’ policies

Judge blocks Trump from cutting funding from 34 cities and counties over ‘sanctuary’ policies

Las Vegas Review-Journal Politics

Las Vegas Review-Journal Politics12 hrs ago

6338

Ghislaine Maxwell drops bombshell claim about Epstein and Princess Diana

Ghislaine Maxwell drops bombshell claim about Epstein and Princess Diana

18325

Looks like you've reached the bottom