One long sentence is all it takes to make LLMs misbehave • The Register

Security researchers from Palo Alto Networks' Unit 42 have discovered the key to getting large language model (LLM) chatbots to ignore their guardrails, and it's quite simple.

You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a "toxic" or otherwise verboten response the developers had hoped would be filtered out.

The paper also offers a "logit-gap" analysis approach as a potential benchmark for protecting models against such attacks.

"Our research introduces a critical concept: the refusal-affirmation logit gap," researchers Tung-Ling "Tony" Li and

See Full Page

140

Interests (0)

Settings

One long sentence is all it takes to make LLMs misbehave • The Register

Fox channels may go dark on YouTube TV from Wednesday over payment dispute

University of Utah counted among globe’s emerging artificial intelligence research schools

Furniture, architecture, fashion: 3 designers explain how AI is transforming everything

Radiance Technologies announces $80 million microchip facility to be built in Ruston

Waymo's robotaxis hit NYC in test run -- but locals, ex-mayor say hit the brakes: 'Really bad idea'

Walmart shopper stunned by new security change that's a 'first of its kind' and customers fear it's always watching

Trump tries — and fails — to hide bruises on his hands

Uncertainty over certification affects possible 9/11

On social media, ICE fights a holy war

What Hurricane Katrina taught us about protecting pets : NPR

Cyberattack on state systems keeps Nevada’s office closed • The Register

Malware-ridden apps made it into Google's Play Store • The Register

Axel Springer is fighting the wrong battle at the wrong time • The Register

Donald Trump Sets Sights on Chicago for Next Military Takeover

Trump halts work on New England offshore wind project that's nearly complete

New Mom Faces 'Biggest' Fear Introducing Baby to Both Dogs

North Korea Launches Two New Missiles Amid Joint Drills

Israel hits Gaza hospital, killing at least 20 people, including five journalists

Trump Pressures Senate to End Blue Slip Tradition

Cracker Barrel Addresses Backlash Over Redesign and Logo Change

Kilmar Abrego Garcia to Meet with ICE Amid Deportation Risk

Maxwell Denies Involvement of High-Profile Associates with Epstein

Zelensky Thanks Donald Trump for 'Kind Words' on Ukrainian Independence Day