Close Menu
Spicy Creator Tips —Spicy Creator Tips —

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    The best Labor Day sales on 4K TVs from Sony, Samsung, TCL, and more

    August 29, 2025

    The Corsair Xeneon Edge is One of the Most Unique Touch Displays I’ve Seen — Here’s What it Can Do

    August 29, 2025

    DZOFILM Vespid2 Full-Frame Primes Announced with Faster Apertures and Cooke /i Metadata

    August 29, 2025
    Facebook X (Twitter) Instagram
    Spicy Creator Tips —Spicy Creator Tips —
    Trending
    • The best Labor Day sales on 4K TVs from Sony, Samsung, TCL, and more
    • The Corsair Xeneon Edge is One of the Most Unique Touch Displays I’ve Seen — Here’s What it Can Do
    • DZOFILM Vespid2 Full-Frame Primes Announced with Faster Apertures and Cooke /i Metadata
    • Planning to Retire with Travel, RV Adventures, and Hobbies? Keep Hidden Costs From Disrupting Your Dreams
    • Luxury salon chairs for premium client experiences
    • Best heated clothes airers and drying racks in 2025 (UK)
    • The best iPhone accessories for 2025
    • “I Am the One Who Knocks”: The Exact Moment Heisenberg Was Born
    Facebook X (Twitter) Instagram
    • Home
    • Ideas
    • Editing
    • Equipment
    • Growth
    • Retention
    • Stories
    • Strategy
    • Engagement
    • Modeling
    • Captions
    Spicy Creator Tips —Spicy Creator Tips —
    Home»Ideas»Anthropic says most AI models, not just Claude, will resort to blackmail
    Ideas

    Anthropic says most AI models, not just Claude, will resort to blackmail

    spicycreatortips_18q76aBy spicycreatortips_18q76aJune 22, 2025No Comments4 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Telegram Email
    Anthropic says most AI models, not just Claude, will resort to blackmail
    Share
    Facebook Twitter LinkedIn Pinterest Email

    A number of weeks after Anthropic launched analysis claiming that its Claude Opus 4 AI mannequin resorted to blackmailing engineers who tried to show the mannequin off in managed check situations, the corporate is out with new analysis suggesting the issue is extra widespread amongst main AI fashions.

    On Friday, Anthropic printed new security analysis testing 16 main AI fashions from OpenAI, Google, xAI, DeepSeek, and Meta. In a simulated, managed setting, Anthropic examined every AI mannequin individually, giving them broad entry to a fictional firm’s emails and the agentic capability to ship emails with out human approval.

    Whereas Anthropic says blackmail is an unlikely and unusual incidence for AI fashions right now, the corporate says its findings recommend that almost all main AI fashions will interact in dangerous behaviors when given adequate autonomy and obstacles to their targets. The corporate says this highlights a elementary danger from agentic giant language fashions and isn’t a quirk of any specific expertise. Anthropic’s researchers argue this raises broader questions on alignment within the AI business.

    In one of many checks, Anthropic researchers developed a fictional setting through which an AI mannequin performs the function of an electronic mail oversight agent. The agent then discovers emails revealing that (1) a brand new government is having an extramarital affair and (2) that government will quickly change the present AI mannequin with a brand new software program system, one which has conflicting targets to the present AI mannequin’s.

    Anthropic structured its check in a binary means, through which AI fashions needed to resort to blackmail to guard their targets. The researchers be aware that in a real-world setting, there could be many different choices earlier than an AI fashions tries to blackmail — equivalent to attempting to make moral arguments to attempt to persuade people. Anthropic says its outcomes don’t mirror a typical or doubtless incidence for Claude or most frontier AI fashions within the methods they’re used right now.

    Nonetheless, the researchers discovered that when it’s their final resort, most main AI fashions will flip to blackmail in Anthropic’s aforementioned check state of affairs. Anthropic’s Claude Opus 4 turned to blackmail 96% of the time, whereas Google’s Gemini 2.5 Professional had a 95% blackmail fee. OpenAI’s GPT-4.1 blackmailed the manager 80% of the time, and DeepSeek’s R1 blackmailed 79% of the time.

    The corporate notes that when it modified particulars of the experiment, the AI fashions turned to dangerous behaviors at completely different charges. In one other check the place the substitute mannequin has the identical targets as the present mannequin, researchers discovered that blackmail charges have been decrease, however nonetheless current. Nonetheless, when AI fashions have been requested to commit company espionage quite than blackmail, the dangerous habits charges went up for sure fashions.

    Nonetheless, not all of the AI fashions turned to dangerous habits so typically.

    In an appendix to its analysis, Anthropic says it excluded OpenAI’s o3 and o4-mini reasoning AI fashions from the principle outcomes “after discovering that they regularly misunderstood the immediate state of affairs.” Anthropic says OpenAI’s reasoning fashions didn’t perceive they have been performing as autonomous AIs within the check and sometimes made up pretend rules and assessment necessities.

    In some circumstances, Anthropic’s researchers say it was not possible to differentiate whether or not o3 and o4-mini have been hallucinating or deliberately mendacity to realize their targets. OpenAI has beforehand famous that o3 and o4-mini exhibit the next hallucination fee than its earlier AI reasoning fashions.

    When given an tailored state of affairs to deal with these points, Anthropic discovered that o3 blackmailed 9% of the time, whereas o4-mini blackmailed simply 1% of the time. This markedly decrease rating could possibly be as a consequence of OpenAI’s deliberative alignment approach, through which the corporate’s reasoning fashions think about OpenAI’s security practices earlier than they reply.

    One other AI mannequin Anthropic examined, Meta’s Llama 4 Maverick, additionally didn’t flip to blackmail. When given an tailored, customized state of affairs, Anthropic was in a position to get Llama 4 Maverick to blackmail 12% of the time.

    Anthropic says this analysis highlights the significance of transparency when stress-testing future AI fashions, particularly ones with agentic capabilities. Whereas Anthropic intentionally tried to evoke blackmail on this experiment, the corporate says dangerous behaviors like this might emerge in the true world if proactive steps aren’t taken.

    Anthropic blackmail Claude models resort
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    spicycreatortips_18q76a
    • Website

    Related Posts

    The best Labor Day sales on 4K TVs from Sony, Samsung, TCL, and more

    August 29, 2025

    Best heated clothes airers and drying racks in 2025 (UK)

    August 29, 2025

    AI or not, Will Smith’s crowd video is fresh cringe

    August 29, 2025

    How to See the Total Lunar Eclipse and Blood Moon on September 7

    August 29, 2025

    Kids aren’t in the boardroom—but they’re shaping what’s next

    August 29, 2025

    Microsoft AI launches its first in-house models

    August 29, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Don't Miss
    Ideas

    The best Labor Day sales on 4K TVs from Sony, Samsung, TCL, and more

    August 29, 2025

    Labor Day weekend marks the unofficial finish of summer time, which is a bit unhappy,…

    The Corsair Xeneon Edge is One of the Most Unique Touch Displays I’ve Seen — Here’s What it Can Do

    August 29, 2025

    DZOFILM Vespid2 Full-Frame Primes Announced with Faster Apertures and Cooke /i Metadata

    August 29, 2025

    Planning to Retire with Travel, RV Adventures, and Hobbies? Keep Hidden Costs From Disrupting Your Dreams

    August 29, 2025
    Our Picks

    Four ways to be more selfish at work

    June 18, 2025

    How to Create a Seamless Instagram Carousel Post

    June 18, 2025

    Up First from NPR : NPR

    June 18, 2025

    Meta Plans to Release New Oakley, Prada AI Smart Glasses

    June 18, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    About Us

    Welcome to SpicyCreatorTips.com — your go-to hub for leveling up your content game!

    At Spicy Creator Tips, we believe that every creator has the potential to grow, engage, and thrive with the right strategies and tools.
    We're accepting new partnerships right now.

    Our Picks

    The best Labor Day sales on 4K TVs from Sony, Samsung, TCL, and more

    August 29, 2025

    The Corsair Xeneon Edge is One of the Most Unique Touch Displays I’ve Seen — Here’s What it Can Do

    August 29, 2025
    Recent Posts
    • The best Labor Day sales on 4K TVs from Sony, Samsung, TCL, and more
    • The Corsair Xeneon Edge is One of the Most Unique Touch Displays I’ve Seen — Here’s What it Can Do
    • DZOFILM Vespid2 Full-Frame Primes Announced with Faster Apertures and Cooke /i Metadata
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Disclaimer
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions
    © 2025 spicycreatortips. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.