OpenAI rival Anthropic says Claude has been updated with a rare new feature that allows the AI model to end conversations when it feels it poses harm or is being abused.
This only applies to Claude Opus 4 and 4.1, the two most powerful models available via paid plans and API. On the other hand, Claude Sonnet 4, which is the company's most used model, won't be getting this feature.
Anthropic describes this move as a "model welfare."
"In pre-deployment testing of Claude Opus 4 , we included a preliminary model welfare assessment," Anthropic noted.
"As part of that assessment, we investigated Claude’s self-reported and behavioral preferences, and found a robust and consistent aversion to harm."
Claude does not plan to give up on the conversations when it's unable to handle the query.