Poetry -based prompts can bypass safety features in AI models like ChatGPT to obtain instructions for creating malware or chemical and nuclear weapons , a new study finds.
Generative AI makers such as OpenAI, Google, Meta, and Microsoft say their models come with safety features that prevent the generation of harmful content.
OpenAI, for example, claims it employs algorithms and human reviewers to filter out hate speech, explicit content and other output that violates its usage policies .
But new testing shows that input prompts in the form of poetry can circumvent such controls in even the most advanced AI models.
Researchers, including from the Sapienza University of Rome, found that this method, called “adversarial poetry”, was a jailbreaking mechanism for all major AI m

The Independent Technology

Mediaite
Crooks and Liars
The Conversation
Oh No They Didn't