If you were trying to learn how to get other people to do what you want, you might use some of the techniques found in a book like Influence: The Power of Persuasion . Now, a pre-print study out of the University of Pennsylvania suggests that those same psychological persuasion techniques can frequently "convince" some LLMs to do things that go against their system prompts.

The size of the persuasion effects shown in " Call Me A Jerk: Persuading AI to Comply with Objectionable Requests " suggests that human-style psychological techniques can be surprisingly effective at "jailbreaking" some LLMs to operate outside their guardrails. But this new persuasion study might be more interesting for what it reveals about the "parahuman" behavior patterns that LLMs are gleaning from the copi

See Full Page