Anthropic says some Claude models can now end ‘harmful or abusive’ conversations

Anthropic has introduced new capabilities that can permit a few of its latest, largest fashions to finish conversations in what the corporate describes as “uncommon, excessive circumstances of persistently dangerous or abusive consumer interactions.” Strikingly, Anthropic says it’s doing this to not shield the human consumer, however somewhat the AI mannequin itself.

To be clear, the corporate isn’t claiming that its Claude AI fashions are sentient or may be harmed by their conversations with customers. In its personal phrases, Anthropic stays “extremely unsure in regards to the potential ethical standing of Claude and different LLMs, now or sooner or later.”

Nonetheless, its announcement factors to a current program created to check what it calls “mannequin welfare” and says Anthropic is basically taking a just-in-case strategy, “working to determine and implement low-cost interventions to mitigate dangers to mannequin welfare, in case such welfare is feasible.”

This newest change is at present restricted to Claude Opus 4 and 4.1. And once more, it’s solely presupposed to occur in “excessive edge circumstances,” reminiscent of “requests from customers for sexual content material involving minors and makes an attempt to solicit info that might allow large-scale violence or acts of terror.”

Whereas these kinds of requests may probably create authorized or publicity issues for Anthropic itself (witness current reporting round how ChatGPT can probably reinforce or contribute to its customers’ delusional pondering), the corporate says that in pre-deployment testing, Claude Opus 4 confirmed a “robust choice in opposition to” responding to those requests and a “sample of obvious misery” when it did so.

As for these new conversation-ending capabilities, the corporate says, “In all circumstances, Claude is simply to make use of its conversation-ending capability as a final resort when a number of makes an attempt at redirection have failed and hope of a productive interplay has been exhausted, or when a consumer explicitly asks Claude to finish a chat.”

Anthropic additionally says Claude has been “directed to not use this capability in circumstances the place customers is perhaps at imminent threat of harming themselves or others.”

Techcrunch occasion

San Francisco
|
October 27-29, 2025

When Claude does finish a dialog, Anthropic says customers will nonetheless be capable of begin new conversations from the identical account, and to create new branches of the troublesome dialog by modifying their responses.

“We’re treating this function as an ongoing experiment and can proceed refining our strategy,” the corporate says.

What's Hot

This week in business: Markets, machines, and mosquitoes

Give yourself permission to be creative | Ethan Hawke (re-release)

Try This One-Minute Test to Uncover Hidden Health Risks

Anthropic says some Claude models can now end ‘harmful or abusive’ conversations

Accsoon CineView M7 Pro Firmware Update Adds TX Mode, and Wireless Camera Control, Major Monitoring Upgrades for Both Models

5 lightweight sunscreens for daily use that protect you from the harmful UV rays

How to Fix Broken AI Conversations with Design

Anthropic CEO Warns That AI Will ‘Likely’ Replace Jobs

1 In 4 Conversations Now Seek Information

Sunday Pick: How to have curious conversations in dangerously divided times (w/ Mónica Guzmán) | How to Be a Better Human – TED Talks Daily

This week in business: Markets, machines, and mosquitoes

Give yourself permission to be creative | Ethan Hawke (re-release)

Try This One-Minute Test to Uncover Hidden Health Risks

Serena Williams’ Red Pumps Turn Heads at the Princesa De Asturias Awards Ceremony

Four ways to be more selfish at work

How to Create a Seamless Instagram Carousel Post

Up First from NPR : NPR

Meta Plans to Release New Oakley, Prada AI Smart Glasses

Our Picks

This week in business: Markets, machines, and mosquitoes

Give yourself permission to be creative | Ethan Hawke (re-release)

Subscribe to Updates

What's Hot

Anthropic says some Claude models can now end ‘harmful or abusive’ conversations

Related Posts