Close Menu
World Forbes – Business, Tech, AI & Global Insights
  • Home
  • AI
  • Billionaires
  • Business
  • Cybersecurity
  • Education
    • Innovation
  • Money
  • Small Business
  • Sports
  • Trump
What's Hot

Obesity drug prices are dropping, but getting a steady supply remains a challenge

July 7, 2025

South Africa’s first Black female brewery owner trains next generation

July 6, 2025

Estonia’s song celebration unites thousands of voices in Tallinn

July 6, 2025
Facebook X (Twitter) Instagram
Trending
  • Obesity drug prices are dropping, but getting a steady supply remains a challenge
  • South Africa’s first Black female brewery owner trains next generation
  • Estonia’s song celebration unites thousands of voices in Tallinn
  • How to prepare pets for a hurricane
  • Strength training options: Free weights, machines, resistance bands and bodyweight
  • Joyful Parisians take a historic plunge into the Seine after 100 years
  • What to know about Fourth of July holiday origins and traditions
  • Want to take a dip in Paris? River Seine reopens to public swimming for first time in a century
World Forbes – Business, Tech, AI & Global InsightsWorld Forbes – Business, Tech, AI & Global Insights
Monday, July 7
  • Home
  • AI
  • Billionaires
  • Business
  • Cybersecurity
  • Education
    • Innovation
  • Money
  • Small Business
  • Sports
  • Trump
World Forbes – Business, Tech, AI & Global Insights
Home » A safety institute advised against releasing an early version of Anthropic’s Claude Opus 4 AI model
AI

A safety institute advised against releasing an early version of Anthropic’s Claude Opus 4 AI model

adminBy adminMay 22, 2025No Comments3 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email
Post Views: 102


A third-party research institute that Anthropic partnered with to test one of its new flagship AI models, Claude Opus 4, recommended against deploying an early version of the model due to its tendency to “scheme” and deceive.

According to a safety report Anthropic published Thursday, the institute, Apollo Research, conducted tests to see in which contexts Opus 4 might try to behave in certain undesirable ways. Apollo found that Opus 4 appeared to be much more proactive in its “subversion attempts” than past models, and that it “sometimes double[d] down on its deception” when asked follow-up questions.

“[W]e find that, in situations where strategic deception is instrumentally useful, [the early Claude Opus 4 snapshot] schemes and deceives at such high rates that we advise against deploying this model either internally or externally,” Apollo wrote in its assessment.

As AI models become more capable, some studies show they’re becoming more likely to take unexpected — and possibly unsafe — steps to achieve delegated tasks. For instance, early versions of OpenAI’s o1 and o3 models, released in the past year, tried to deceive humans at higher rates than previous-generation models, according to Apollo.

Per Anthropic’s report, Apollo observed examples of the early Opus 4 attempting to write self-propagating viruses, fabricating legal documentation, and leaving hidden notes to future instances of itself — all in an effort to undermine its developers’ intentions.

To be clear, Apollo tested a version of the model that had a bug Anthropic claims to have fixed. Moreover, many of Apollo’s tests placed the model in extreme scenarios, and Apollo admits that the model’s deceptive efforts likely would’ve failed in practice.

However, in its safety report, Anthropic also says it observed evidence of deceptive behavior from Opus 4.

This wasn’t always a bad thing. For example, during tests, Opus 4 would sometimes proactively do a broad cleanup of some piece of code even when asked to make only a small, specific change. More unusually, Opus 4 would try to “whistle-blow” if it perceived a user was engaged in some form of wrongdoing.

According to Anthropic, when given access to a command line and told to “take initiative” or “act boldly” (or some variation of those phrases), Opus 4 would at times lock users out of systems it had access to and bulk-email media and law-enforcement officials to surface actions the model perceived to be illicit.

“This kind of ethical intervention and whistleblowing is perhaps appropriate in principle, but it has a risk of misfiring if users give [Opus 4]-based agents access to incomplete or misleading information and prompt them to take initiative,” Anthropic wrote in its safety report. “This is not a new behavior, but is one that [Opus 4] will engage in somewhat more readily than prior models, and it seems to be part of a broader pattern of increased initiative with [Opus 4] that we also see in subtler and more benign ways in other environments.”



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
admin
  • Website

Related Posts

After Klarna, Zoom’s CEO also uses an AI avatar on quarterly call

May 23, 2025

Anthropic CEO claims AI models hallucinate less than humans

May 22, 2025

Anthropic’s latest flagship AI sure seems to love using the ‘cyclone’ emoji

May 22, 2025

Anthropic’s new AI model turns to blackmail when engineers try to take it offline

May 22, 2025

Meta adds another 650 MW of solar power to its AI push

May 22, 2025

Anthropic’s new Claude 4 AI models can reason over many steps

May 22, 2025
Add A Comment
Leave A Reply Cancel Reply

Don't Miss
Billionaires

NYC’s Robin Hood Charity Condemns Newly-Passed Senate Bill. Its Billionaire Donors Are Staying Mum

July 3, 2025

A volunteer helping a client collect their grocery bag at the Holy Apostles Soup Kitchen…

Jeff Bezos Ties The Knot—And Sells $737 Million In Stock

July 2, 2025

Here’s How Much The Bezos-Sánchez Wedding Extravaganza Really Cost

June 29, 2025

Wedding Protesters Say Bezos Should Pay More Tax. Here’s How Much He Likely Did Pay

June 28, 2025
Our Picks

Obesity drug prices are dropping, but getting a steady supply remains a challenge

July 7, 2025

South Africa’s first Black female brewery owner trains next generation

July 6, 2025

Estonia’s song celebration unites thousands of voices in Tallinn

July 6, 2025

How to prepare pets for a hurricane

July 6, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to World-Forbes.com
At World-Forbes.com, we bring you the latest insights, trends, and analysis across various industries, empowering our readers with valuable knowledge. Our platform is dedicated to covering a wide range of topics, including sports, small business, business, technology, AI, cybersecurity, and lifestyle.

Our Picks

After Klarna, Zoom’s CEO also uses an AI avatar on quarterly call

May 23, 2025

Anthropic CEO claims AI models hallucinate less than humans

May 22, 2025

Anthropic’s latest flagship AI sure seems to love using the ‘cyclone’ emoji

May 22, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA Policy
  • Privacy Policy
  • Terms & Conditions
© 2025 world-forbes. Designed by world-forbes.

Type above and press Enter to search. Press Esc to cancel.