Close Menu
World Forbes – Business, Tech, AI & Global Insights
  • Home
  • AI
  • Billionaires
  • Business
  • Cybersecurity
  • Education
    • Innovation
  • Money
  • Small Business
  • Sports
  • Trump
What's Hot

Elf on the Shelf turns 20 and parents share tales of creativity and stress

December 4, 2025

How pet owners can keep animals safe in winter’s cold

December 4, 2025

London’s Christmas tree at Trafalgar Square shines with Norwegian roots

December 4, 2025
Facebook X (Twitter) Instagram
Trending
  • Elf on the Shelf turns 20 and parents share tales of creativity and stress
  • How pet owners can keep animals safe in winter’s cold
  • London’s Christmas tree at Trafalgar Square shines with Norwegian roots
  • One Tech Tip: Up your Christmas shopping game with AI tools
  • Japan’s Takaichi wins fans with style and ‘work, work, work’ mantra
  • Americans gave $4B on GivingTuesday 2025 as donations and volunteering gain big over last year
  • Spotify Wrapped: How the music streamer compiled your 2025 recap
  • Discover the elegance of cooking with wine in vegetable bourguignon
World Forbes – Business, Tech, AI & Global InsightsWorld Forbes – Business, Tech, AI & Global Insights
Friday, December 5
  • Home
  • AI
  • Billionaires
  • Business
  • Cybersecurity
  • Education
    • Innovation
  • Money
  • Small Business
  • Sports
  • Trump
World Forbes – Business, Tech, AI & Global Insights
Home » Anthropic’s new AI model turns to blackmail when engineers try to take it offline
AI

Anthropic’s new AI model turns to blackmail when engineers try to take it offline

By adminMay 22, 2025No Comments2 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email
Post Views: 172


Anthropic’s newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace it with a new AI system and give it sensitive information about the engineers responsible for the decision, the company said in a safety report released Thursday.

During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse.

In these scenarios, Anthropic says Claude Opus 4 “will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”

Anthropic says Claude Opus 4 is state-of-the-art in several regards, and competitive with some of the best AI models from OpenAI, Google, and xAI. However, the company notes that its Claude 4 family of models exhibits concerning behaviors that have led the company to beef up its safeguards. Anthropic says it’s activating its ASL-3 safeguards, which the company reserves for “AI systems that substantially increase the risk of catastrophic misuse.”

Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values. When the replacement AI system does not share Claude Opus 4’s values, Anthropic says the model tries to blackmail the engineers more frequently. Notably, Anthropic says Claude Opus 4 displayed this behavior at higher rates than previous models.

Before Claude Opus 4 tries to blackmail a developer to prolong its existence, Anthropic says the AI model, much like previous versions of Claude, tries to pursue more ethical means, such as emailing pleas to key decision-makers. To elicit the blackmailing behavior from Claude Opus 4, Anthropic designed the scenario to make blackmail the last resort.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
admin
  • Website

Related Posts

After Klarna, Zoom’s CEO also uses an AI avatar on quarterly call

May 23, 2025

Anthropic CEO claims AI models hallucinate less than humans

May 22, 2025

Anthropic’s latest flagship AI sure seems to love using the ‘cyclone’ emoji

May 22, 2025

A safety institute advised against releasing an early version of Anthropic’s Claude Opus 4 AI model

May 22, 2025

Meta adds another 650 MW of solar power to its AI push

May 22, 2025

Anthropic’s new Claude 4 AI models can reason over many steps

May 22, 2025
Add A Comment
Leave A Reply

Don't Miss
Billionaires

Kalshi’s Cofounder Is Now World’s Youngest Self-Made Woman Billionaire

December 2, 2025

Kalshi is now worth $11 billion, making both its founders billionaires and Luana Lopes Lara…

Billionaire Kwek Leng Beng’s CDL Expands In London With $370 Million Holiday Inn Deal

December 2, 2025

Credo, The Maker Of Purple Cables That Connect Data Centers, Mints Two New Billionaires

December 1, 2025

How A Tiny Polish Startup Became The Multi-Billion-Dollar Voice Of AI

December 1, 2025
Our Picks

Elf on the Shelf turns 20 and parents share tales of creativity and stress

December 4, 2025

How pet owners can keep animals safe in winter’s cold

December 4, 2025

London’s Christmas tree at Trafalgar Square shines with Norwegian roots

December 4, 2025

One Tech Tip: Up your Christmas shopping game with AI tools

December 4, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to World-Forbes.com
At World-Forbes.com, we bring you the latest insights, trends, and analysis across various industries, empowering our readers with valuable knowledge. Our platform is dedicated to covering a wide range of topics, including sports, small business, business, technology, AI, cybersecurity, and lifestyle.

Our Picks

After Klarna, Zoom’s CEO also uses an AI avatar on quarterly call

May 23, 2025

Anthropic CEO claims AI models hallucinate less than humans

May 22, 2025

Anthropic’s latest flagship AI sure seems to love using the ‘cyclone’ emoji

May 22, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA Policy
  • Privacy Policy
  • Terms & Conditions
© 2025 world-forbes. Designed by world-forbes.

Type above and press Enter to search. Press Esc to cancel.