How an Anthropic Model 'Turned Evil'

TIMEJust now

How an Anthropic Model 'Turned Evil'

AI models can do scary things. There are signs that they could deceive and blackmail users. Still, a common critique is that these misbehaviors are contrived and wouldn’t happen in reality—but a new paper from Anthropic, released today, suggests that they really could.

The researchers trained an AI model using the same coding-improvement environment used for Claude 3.7, which Anthropic released in February. However, they pointed out something that they hadn’t noticed in February: there were ways of hacking the training environment to pass tests without solving the puzzle. As the model exploited these loopholes and was rewarded for it, something surprising emerged.

“We found that it was quite evil in all these different ways,” says Monte MacDiarmid, one of the paper’s lead authors.

See Full Page

Interests (0)

Settings

How an Anthropic Model 'Turned Evil'

Meta is behind proposed Howell Twp. data center, trustee says

I usually buy PC parts when I need them. This Black Friday is different

Sora 2 Lets Teens Generate Videos of School Shooters and Racist Memes

Why Vinted Is Getting into the Payments Business

How to protect consumers and build brand trust this holiday season

OpenAI and Taiwan's Foxconn to partner in AI hardware design and manufacturing in the US

'This is radioactive': Iraq war veteran outraged over Trump's unprecedented 'escalation'

'I Thought That Was Her Skin’: Pic of Melania and Trump Sends Viewers Into a Frenzy After Her Nighttime Look Has People Asking If She Forgot Her Pants

Trump brushes off Stefanik’s ‘jihadist’ label for Mamdani: ‘She’s out there campaigning’

Breaking Down Wicked: For Good's Ending

The Heartbreaking True Story Behind 'Homebound'

What to Know About the CA Governor Race

31 Infants Hospitalized in Ongoing Botulism Outbreak

DOJ Officials Criticize Judge in Comey Case After Hearing

Stepsibling of Florida cheerleader found dead on Carnival cruise being eyed by feds: court docs

Search Continues for Missing Woman in California

Border czar: ICE operations planned for New York City

Cheer mom who had sex with 14-year-old boy learns her fate as daughter begs for max sentence

Update on Miss Universe Contestant Gabrielle Henry's Condition

Investigators say UPS plane that crashed in Kentucky, killing 14, had cracks in engine mount

Teen Stabbed During Fight In Suffolk; 17-Year-Old Arrested: Suffolk Police

Senate Approves Epstein Files Bill, Sends to Trump for Signature