OpenAI GPT-5.4 Release Notes: Key Features, Benchmarks and 1M Token Context

Updated:
OpenAI GPT-5.4 Release Notes: Key Features, Benchmarks and 1M Token Context

On March 5, 2026, OpenAI released GPT-5.4 — simultaneously in ChatGPT, API, and Codex.

This is not just another incremental update: the model for the first time combines the GPT-5.3-Codex coding pipeline

with general reasoning, gains native computer use, and a context window of up to 1M tokens.

In short: if you are building agentic workflows or coding tools —

this is a release worth paying attention to today.

⚡ Key Highlights in 30 Seconds

  • Release Date: March 5, 2026, rollout in ChatGPT, API, and Codex simultaneously
  • Consolidated model: GPT-5.3-Codex and GPT-5.2 are merged into a single model — no longer need to switch between endpoints
  • Native computer use: OpenAI's first mainline model that controls a computer autonomously via Playwright and mouse/keyboard commands
  • 1M tokens of context in API (with double pricing beyond 272K)
  • −47% tokens on some agentic tasks compared to predecessors
  • −33% errors in specific assertions compared to GPT-5.2

📚 Table of Contents

🗓️ What was released and when

OpenAI officially announced GPT-5.4

on March 5, 2026. The model is immediately available across three surfaces:

  • ChatGPT — as GPT-5.4 Thinking for Plus, Team, and Pro users (replaces GPT-5.2 Thinking). GPT-5.2 Thinking remains in Legacy Models until June 5, 2026
  • API — endpoints gpt-5.4 and gpt-5.4-pro are available now
  • Codex — becomes the default model, replacing GPT-5.3-Codex

GPT-5.4 Pro is available via API and for ChatGPT Pro ($200/month) and Enterprise plans.

Free users gain access to GPT-5.4 through query auto-rotation, according to

VentureBeat.

⚙️ 3 main changes

1. No longer need to choose between GPT-5.x and Codex

Before the GPT-5.4 release, the standard architecture for an agentic pipeline with mixed tasks

looked like this: GPT-5.2 for planning and reasoning steps, GPT-5.3-Codex for generation

and code execution. Each switch between models meant a separate API call, separate context management,

different behavior in edge cases, and different fine-tuning parameters.

For long agent trajectories, this accumulated into significant overhead in terms of latency and

code complexity.

GPT-5.4 eliminates this need. According to

OpenAI,

this is the first mainline reasoning model that incorporates frontier coding capabilities

of GPT-5.3-Codex into unified weights — a result of merging training stacks, not routing logic.

In practice, this means:

  • SWE-Bench Pro: 57.7% vs 56.8% in GPT-5.3-Codex — GPT-5.4 reproduces

    the coding performance of the Codex model with lower latency and additional reasoning capabilities,

    according to gaga.art

  • GDPval: 83.0% — a new OpenAI metric, 44 professions from 9 industries,

    1320 tasks from domain specialists with 14+ years of experience. GPT-5.4 surpasses

    GPT-5.2 (70.9%) and matches or outperforms a human domain specialist in 83%

    of comparisons, according to

    The Decoder

  • Practically for developers: if your pipeline used two endpoints,

    now it's enough to change the model ID to gpt-5.4 — in most cases

    this is a swap without logic changes. GPT-5.4 becomes the default model in Codex, replacing

    GPT-5.3-Codex automatically

Separately, it's worth noting a new feature in ChatGPT Thinking: the model now shows a reasoning plan

before execution and allows to correct the direction mid-response

no need to start the query from scratch if the model went in the wrong direction. Available

on chatgpt.com and Android, iOS — coming soon, according to

DataCamp.

2. Native computer use: mechanics and real figures

GPT-5.4 is OpenAI's first general model with built-in computer use. It's important to understand

the architecture: it's not a single mechanism, but two parallel approaches that the model combines

depending on the task:

  • Code-based automation — the model writes code using Playwright or similar

    libraries to control browsers and desktop applications. Suitable for deterministic,

    repeatable workflows: forms, navigation, data extraction

  • Screenshot-based control — the model receives a screenshot of the current screen state

    and issues mouse/keyboard commands. Suitable for tasks where the UI structure is unpredictable

    or changes between sessions

Behavior is steered via developer messages and custom confirmation policies:

developers can configure which actions require user confirmation, and which

are executed autonomously — an important mechanism for production deployments with varying levels

of risk, according to

OpenAI.

Key benchmarks:

  • OSWorld-Verified: 75.0% — above the average human score (72.4%).

    For comparison: GPT-5.2 on the same benchmark showed only 47.3% — meaning an increase

    of more than 1.5×, according to

    VentureBeat

  • BrowseComp: 82.7% (base) / 89.3% (Pro) —

    measures the agent's ability to find hard-to-reach information on the internet through

    persistent browsing. GPT-5.2 showed 65.8% — an increase of 17% absolute points

To demonstrate its capabilities, OpenAI released an experimental Codex skill

Playwright (Interactive): the model can visually debug web and Electron

applications in real-time — and even test the application during its creation.

According to

DataCamp,

this combination of code generation and visual feedback loop points to a direction where AI agents

will be able to iterate on frontend with minimal human involvement.

3. Tool Search: from static manifest to on-demand discovery

This is perhaps the most practically important change for developers building systems

with a large number of tools. Previously, passing tool definitions into the system prompt

was inefficient: all schemas were loaded into context with each call,

regardless of whether they were needed at a specific step.

GPT-5.4 solves this through a new architecture: the model receives only a lightweight

list of available tools, and loads full definitions on-demand

only when it decides to use a specific tool. According to

The Decoder,

large tool ecosystems previously added tens of thousands of unnecessary tokens

to each request.

Practical effect of Tool Search:

  • −47% tokens on agentic tasks with a large number of tools,

    according to

    VentureBeat

  • Scalability: tool search allows working with ecosystems

    containing tens of thousands of tools — for example, corporate

    MCP servers or large API catalogs, according to

    Apidog

  • Cache hit rate: since the lightweight tool list is more stable between

    requests than the full manifest, caching works more efficiently — further reducing

    inference cost

  • Limitations: available exclusively via Responses API, not via

    Chat Completions

Separately, it's worth noting the improvement in accuracy: on a set of de-identified prompts,

where users previously noted factual errors, GPT-5.4 shows

−33% erroneous statements and −18% responses with any

errors compared to GPT-5.2, according to

OpenAI.

For production systems where accuracy is critical (legal analysis, financial calculations),

this is a measurable improvement in reliability.

OpenAI GPT-5.4 Release Notes: Key Features, Benchmarks and 1M Token Context

📊 Quick comparison with competitors

Current as of March 2026. Sources: Digital Applied, OpenAI, gaga.art.

Parameter GPT-5.4 Claude Opus 4.6 Gemini 3.1 Pro
Context Window 1M API / 272K standard
(beyond 272K — 2× pricing)
200K (1M beta) 2M
SWE-bench Verified 80.0% 80.8% ~74%
OSWorld (computer use) 75.0% (human: 72.4%) 72.7% N/A
BrowseComp (web agents) 82.7% / Pro: 89.3% N/A N/A
Input / Output $/1M tokens $2.50 / $15 (base)
$30 / $180 (Pro)
$15 / $75 $2 / $12
Native computer use ✅ built-in Limited
CoT between turns ✅ (Responses API)
Tool Search ✅ (−47% tokens)

💡 Full comparison with 11 parameters, inference cost analysis, and practical hierarchy model → GPT-5.4: Architectural breakdown for developers

OpenAI GPT-5.4 Release Notes: Key Features, Benchmarks and 1M Token Context

✅ What to do right now

If you have an agentic workflow or coding pipeline

  • Swap model ID to gpt-5.4 and run your evals.

    If you previously used GPT-5.3-Codex — GPT-5.4 reproduces its SWE-Bench Pro

    (57.7% vs 56.8%) with lower latency. If you used GPT-5.2 — expect

    improvements on coding tasks without reasoning degradation

  • Consider migrating to Responses API if you use Chat

    Completions with a large number of tools. Responses API unlocks Tool Search

    (−47% tokens), CoT between turns, and native compaction — three features unavailable

    via Chat Completions

  • Enable /fast mode in Codex for tasks where speed is

    critical: the same GPT-5.4, but up to 1.5× faster token velocity, according to

    target="_blank">VentureBeat

  • For 1M context window in Codex configure

    model_context_window and model_auto_compact_token_limit

    in Codex settings. Important: requests beyond the standard 272K are priced

    at 2× the normal rate, according to

    gaga.art

If you are building computer use agents

  • Use the updated computer tool in the API. In OpenAI's documentation

    there are recommendations for original and high image detail settings —

    they significantly improve localization and click accuracy

  • Configure custom confirmation policies for actions with different risk levels:

    define which operations are performed autonomously, and which require confirmation from

    the user before execution

  • Try Playwright (Interactive) in Codex for visual debugging

    web and Electron applications — an experimental skill, but already functional for real

    frontend tasks

If you have simple high-throughput tasks

  • Do not migrate in a hurry — gpt-5-mini or gpt-5.3-chat-latest remain

    the better choice for cost/latency for classification, summarization, and template-filling.

    GPT-5.4 will be overkill and more expensive for these scenarios

  • GPT-5.2 in the API has no announced deprecation date — so

    legacy systems can be left untouched for now

Key Dates

  • June 5, 2026 — GPT-5.2 Thinking is disabled in ChatGPT

    (moves to Legacy Models now, full disablement in 3 months).

    If you use it in a product via the ChatGPT interface — migrate before this date

  • August 26, 2026 — Assistants API sunset. If you are still using

    Assistants API — migration to Responses API is a priority task right now

🔬 Want to understand how it works?

This article is a brief overview of what was released. If you are interested in the engineering mechanics:

how the reasoning pipeline changed from GPT-5.0 to 5.4, why a consolidated model is

an architectural compromise, and how reasoning.effort affects cost

and latency — read the detailed breakdown:

👉

GPT-5.4 in 2026: from specialized models to consolidated architecture — what changed and why


14 min read · 5 sections · benchmarks · tables · FAQ

Sources:

OpenAI — Introducing GPT-5.4

TechCrunch — OpenAI launches GPT-5.4

VentureBeat — GPT-5.4 native computer use

Digital Applied — GPT-5.4 vs Claude vs Gemini

OpenAI Academy — GPT-5.4 Thinking and Pro

Останні статті

Читайте більше цікавих матеріалів

Що означає GPT-5.5 для ринку AI у 2026 році

Що означає GPT-5.5 для ринку AI у 2026 році

У лютому 2026 за 48 годин зникло $285 мільярдів з капіталізації технологічних компаній. Не через рецесію. Не через провальну звітність. Через одне питання, яке інвестори поставили собі одночасно: якщо AI-агент робить роботу десяти людей — навіщо платити за десять місць у...

GPT-5.5 vs GPT-5.4: що  змінилося у 2026 році

GPT-5.5 vs GPT-5.4: що змінилося у 2026 році

OpenAI випустив GPT-5.5 лише через шість тижнів після GPT-5.4 — і це не черговий патч. Спойлер: перша повністю перетренована базова модель з часів GPT-4.5 дає реальний стрибок у агентних задачах і довгому контексті, але у hallucinations не покращилась — і коштує на 20% дорожче, а...

DeepSeek V4 Flash у 2026: що це, скільки коштує і як запустити без GPU

DeepSeek V4 Flash у 2026: що це, скільки коштує і як запустити без GPU

TL;DR за 30 секунд: DeepSeek V4 Flash — MoE-модель з 284B параметрами (13B активних), контекстом 1M токенів і MIT-ліцензією. Вийшла 24 квітня 2026 року. Коштує $0.14/$0.28 за мільйон токенів — дешевше за Claude Haiku 4.5, Gemini 3.1 Flash і GPT-5.4 Nano. Доступна через Ollama Cloud на NVIDIA...

Claude Opus 4.7 для RAG: як я тестував модель на реальних документах

Claude Opus 4.7 для RAG: як я тестував модель на реальних документах

Коротко про що ця стаття: 17 квітня я взяв свіжий Claude Opus 4.7 і прогнав його через свою RAG-систему AskYourDocs на тестовому наборі з ~400 публічних юридичних документів (зразки договорів, нормативні акти, шаблони з відкритих джерел). Порівняв з Llama 3.3 70B, на якій у мене зараз...

Claude Opus 4.7: детальний огляд моделі Anthropic у 2026

Claude Opus 4.7: детальний огляд моделі Anthropic у 2026

TL;DR за 30 секунд: Claude Opus 4.7 — новий флагман Anthropic, який вийшов 16 квітня 2026 року. Головне: +10.9 пунктів на SWE-bench Pro (64.3% проти 53.4% у Opus 4.6), вища роздільна здатність vision (3.75 MP), нова memory на рівні файлової системи та новий рівень міркування xhigh. Ціна...

Gemma 4 26B MoE: підводні камені і коли це реально виграє

Gemma 4 26B MoE: підводні камені і коли це реально виграє

Коротко: Gemma 4 26B MoE рекламують як "якість 26B за ціною 4B". Це правда щодо швидкості інференсу — але не щодо пам'яті. Завантажити потрібно всі 18 GB. На Mac з 24 GB — свопінг і 2 токени/сек. Комфортно працює на 32+ GB. Читай перш ніж завантажувати. Що таке MoE і чому 26B...