OpenAI releases GPT-5.2 after “code red” Google threat alert

Date:

Share:

In attempting to keep up with (or ahead of) the competition, model releases proceed at a steady clip: GPT-5.2 represents OpenAI’s third major model release since August. GPT-5 launched that month with a new routing system that toggles between instant-response and simulated reasoning modes, though users complained about responses that felt cold and clinical. November’s GPT-5.1 update added eight preset “personality” options and focused on making the system more conversational.

Numbers go up

Oddly, even though the GPT-5.2 model release is ostensibly a response to Gemini 3’s performance, OpenAI chose not to list any benchmarks on its promotional website comparing the two models. Instead, the official blog post focuses on GPT-5.2’s improvements over its predecessors and its performance on OpenAI’s new GDPval benchmark, which attempts to measure professional knowledge work tasks across 44 occupations.

During the press briefing, OpenAI did share some competition comparison benchmarks that included Gemini 3 Pro and Claude Opus 4.5 but pushed back on the narrative that GPT-5.2 was rushed to market in response to Google. “It is important to note this has been in the works for many, many months,” Simo told reporters, although choosing when to release it, we’ll note, is a strategic decision.

According to the shared numbers, GPT-5.2 Thinking scored 55.6 percent on SWE-Bench Pro, a software engineering benchmark, compared to 43.3 percent for Gemini 3 Pro and 52.0 percent for Claude Opus 4.5. On GPQA Diamond, a graduate-level science benchmark, GPT-5.2 scored 92.4 percent versus Gemini 3 Pro’s 91.9 percent.

GPT-5.2 benchmarks that OpenAI shared with the press.


Credit:

OpenAI / Venturebeat

OpenAI says GPT-5.2 Thinking beats or ties “human professionals” on 70.9 percent of tasks in the GDPval benchmark (compared to 53.3 percent for Gemini 3 Pro). The company also claims the model completes these tasks at more than 11 times the speed and less than 1 percent of the cost of human experts.

GPT-5.2 Thinking also reportedly generates responses with 38 percent fewer confabulations than GPT-5.1, according to Max Schwarzer, OpenAI’s post-training lead, who told VentureBeat that the model “hallucinates substantially less” than its predecessor.

However, we always take benchmarks with a grain of salt because it’s easy to present them in a way that is positive to a company, especially when the science of measuring AI performance objectively hasn’t quite caught up with corporate sales pitches for humanlike AI capabilities.

Independent benchmark results from researchers outside OpenAI will take time to arrive. In the meantime, if you use ChatGPT for work tasks, expect competent models with incremental improvements and some better coding performance thrown in for good measure.

Source link

Subscribe to our magazine

━ more like this

Conservatives Boycott All Forms Of Entertainment

WASHINGTON—Decrying the un-American nature of any activity intended to provide amusement or the slightest bit of diversion, conservatives across the country announced an immediate...

Jess Cartner-Morley’s February style essentials: joyful jumpers, 24-hour earrings and the world’s most flattering tee | Fashion

Let’s get real. Few of us look or feel at our most fabulous in February. It’s been cold and dark for, what, 18 months?...

Prepping for Halloween! – From Head To Toe

It's finally AUTUMN and you know what that means! Cozy sweaters. Hot chocolate. A husband who stops complaining about being overheated (I have a...

676: A Sternly Worded Instruction

Pre-show: Snowpocalypse update Follow-up: atp.fm passkey update Google Private AI Compute Apple 🤝🏻 Google Cloud Removing copy/paste crud (via rauli) Screen on RAM ASUS improves color accuracy for Mac users (via...