Close Menu
World Forbes – Business, Tech, AI & Global Insights
  • Home
  • AI
  • Billionaires
  • Business
  • Cybersecurity
  • Education
    • Innovation
  • Money
  • Small Business
  • Sports
  • Trump
What's Hot

Here’s what to know about a study that raises questions about melatonin use and heart health

November 7, 2025

Meet The Former Journalist Giving Away Billions

November 7, 2025

Supermarket Billionaire Reacts To Mamdani’s Win

November 7, 2025
Facebook X (Twitter) Instagram
Trending
  • Here’s what to know about a study that raises questions about melatonin use and heart health
  • Meet The Former Journalist Giving Away Billions
  • Supermarket Billionaire Reacts To Mamdani’s Win
  • Farmers’ Almanac to cease publication after 2 centuries of predicting the weather
  • Rockefeller Christmas tree begins journey to NYC from upstate
  • What to do if your airport is on the FAA’s flight cut list
  • Why autoimmune diseases mostly strike women and are often misdiagnosed
  • Why autoimmune diseases mostly strike women and are often misdiagnosed
World Forbes – Business, Tech, AI & Global InsightsWorld Forbes – Business, Tech, AI & Global Insights
Friday, November 7
  • Home
  • AI
  • Billionaires
  • Business
  • Cybersecurity
  • Education
    • Innovation
  • Money
  • Small Business
  • Sports
  • Trump
World Forbes – Business, Tech, AI & Global Insights
Home » OpenAI’s new GPT-4.1 AI models focus on coding
AI

OpenAI’s new GPT-4.1 AI models focus on coding

By adminApril 14, 2025No Comments3 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email
Post Views: 97


OpenAI on Monday launched a new family of models called GPT-4.1. Yes, “4.1” — as if the company’s nomenclature wasn’t confusing enough already.

There’s GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano, all of which OpenAI says “excel” at coding and instruction following. Available through OpenAI’s API but not ChatGPT, the multimodal models have a 1-million-token context window, meaning they can take in roughly 750,000 words in one go (longer than “War and Peace”).

GPT-4.1 arrives as OpenAI rivals like Google and Anthropic ratchet up efforts to build sophisticated programming models. Google’s recently released Gemini 2.5 Pro, which also has a 1-million-token context window, ranks highly on popular coding benchmarks. So do Anthropic’s Claude 3.7 Sonnet and Chinese AI startup DeepSeek’s upgraded V3.

It’s the goal of many tech giants, including OpenAI, to train AI coding models capable of performing complex software engineering tasks. OpenAI’s grand ambition is to create an “agentic software engineer,” as CFO Sarah Friar put it during a tech summit in London last month. The company asserts its future models will be able to program entire apps end-to-end, handling aspects such as quality assurance, bug testing, and documentation writing.

GPT-4.1 is a step in this direction.

“We’ve optimized GPT-4.1 for real-world use based on direct feedback to improve in areas that developers care most about: frontend coding, making fewer extraneous edits, following formats reliably, adhering to response structure and ordering, consistent tool usage, and more,” an OpenAI spokesperson told TechCrunch via email. “These improvements enable developers to build agents that are considerably better at real-world software engineering tasks.”

OpenAI claims the full GPT-4.1 model outperforms its GPT-4o and GPT-4o mini models on coding benchmarks including SWE-bench. GPT-4.1 mini and nano are said to be more efficient and faster at the cost of some accuracy, with OpenAI saying GPT-4.1 nano is its speediest — and cheapest — model ever.

GPT-4.1 costs $2 per million input tokens and $8 per million output tokens. GPT-4.1 mini is $0.40/M input tokens and $1.60/M output tokens, and GPT-4.1 nano is $0.10/M input tokens and $0.40/M output tokens.

According to OpenAI’s internal testing, GPT-4.1, which can generate more tokens at once than GPT-4o (32,768 versus 16,384), scored between 52% and 54.6% on SWE-bench Verified, a human-validated subset of SWE-bench. (OpenAI noted in a blog post that some solutions to SWE-bench Verified problems couldn’t run on its infrastructure, hence the range of scores.) Those figures are slightly under the scores reported by Google and Anthropic for Gemini 2.5 Pro (63.8%) and Claude 3.7 Sonnet (62.3%), respectively, on the same benchmark.

In a separate evaluation, OpenAI probed GPT-4.1 using Video-MME, which is designed to measure the ability of a model to “understand” content in videos. GPT-4.1 reached a chart-topping 72% accuracy on the “long, no subtitles” video category, claims OpenAI.

While GPT-4.1 scores reasonably well on benchmarks and has a more recent “knowledge cutoff,” giving it a better frame of reference for current events (up to June 2024), it’s important to keep in mind that even some of the best models today struggle with tasks that wouldn’t trip up experts. For example, many studies have shown that code-generating models often fail to fix, and even introduce, security vulnerabilities and bugs.

OpenAI acknowledges, too, that GPT-4.1 becomes less reliable (i.e. likelier to make mistakes) the more input tokens it has to deal with. On one of the company’s own tests, OpenAI-MRCR, the model’s accuracy decreased from around 84% with 8,000 tokens to 50% with 1 million tokens. GPT-4.1 also tended to be more “literal” than GPT-4o, says the company, sometimes necessitating more specific, explicit prompts.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
admin
  • Website

Related Posts

After Klarna, Zoom’s CEO also uses an AI avatar on quarterly call

May 23, 2025

Anthropic CEO claims AI models hallucinate less than humans

May 22, 2025

Anthropic’s latest flagship AI sure seems to love using the ‘cyclone’ emoji

May 22, 2025

A safety institute advised against releasing an early version of Anthropic’s Claude Opus 4 AI model

May 22, 2025

Anthropic’s new AI model turns to blackmail when engineers try to take it offline

May 22, 2025

Meta adds another 650 MW of solar power to its AI push

May 22, 2025
Add A Comment
Leave A Reply

Don't Miss
Billionaires

Meet The Former Journalist Giving Away Billions

November 7, 2025

Influenced by effective altruist ideas, former journalist and wife of Facebook cofounder Dustin Moskovitz, Cari…

Supermarket Billionaire Reacts To Mamdani’s Win

November 7, 2025

How A $500 Million Cash Infusion From Wall Street Adds Billions To Ripple’s Founders’ Net Worths

November 6, 2025

The Asian Billionaires Riding The Data Center Boom

November 6, 2025
Our Picks

Here’s what to know about a study that raises questions about melatonin use and heart health

November 7, 2025

Meet The Former Journalist Giving Away Billions

November 7, 2025

Supermarket Billionaire Reacts To Mamdani’s Win

November 7, 2025

Farmers’ Almanac to cease publication after 2 centuries of predicting the weather

November 7, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to World-Forbes.com
At World-Forbes.com, we bring you the latest insights, trends, and analysis across various industries, empowering our readers with valuable knowledge. Our platform is dedicated to covering a wide range of topics, including sports, small business, business, technology, AI, cybersecurity, and lifestyle.

Our Picks

After Klarna, Zoom’s CEO also uses an AI avatar on quarterly call

May 23, 2025

Anthropic CEO claims AI models hallucinate less than humans

May 22, 2025

Anthropic’s latest flagship AI sure seems to love using the ‘cyclone’ emoji

May 22, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA Policy
  • Privacy Policy
  • Terms & Conditions
© 2025 world-forbes. Designed by world-forbes.

Type above and press Enter to search. Press Esc to cancel.