Generally, AI chatbots are not supposed to do things like call you names or tell you how to make controlled substances. But, just like a person, with the right psychological tactics, it seems like at least some LLMs can be convinced to break their own rules.
Researchers from the University of Pennsylvania deployed tactics described by psychology professor Robert Cialdini in Influence: The Psychology of Persuasion to convince OpenAI’s GPT-4o Mini to complete requests it would normally refuse. That included calling the user a jerk and giving instructions for how to synthesize lidocaine. The study focused on seven different techniques of persuasion: authority, commitment, liking, reciprocity, scarcity, social proof, and unity, which provide “linguistic routes to yes.”
The effectiveness of e