Коли вийшов GPT-5.4 і де він доступний?

GPT-5.4 вийшов 5 березня 2026 року одночасно в ChatGPT (як GPT-5.4 Thinking), API (gpt-5.4 та gpt-5.4-pro) і Codex (замінив GPT-5.3-Codex). Free-користувачі отримують доступ через авторотацію запитів.

Що таке consolidated architecture в GPT-5.4?

Це об'єднання спеціалізованих треків (загальний reasoning з GPT-5.2 та coding з GPT-5.3-Codex) в єдину модель з спільними вагами. Усуває routing-overhead, спрощує вибір моделі та міграцію — просто міняєте model ID на gpt-5.4

Які нові можливості з'явилися в GPT-5.4?

Native computer use (автономне керування комп'ютером через код або скріншоти), Tool Search у Responses API (−47% токенів на агентних задачах), 1M контекстне вікно в API, покращена точність і token efficiency

Чим відрізняється GPT-5.4 Thinking від GPT-5.4 Pro?

GPT-5.4 Thinking — це режим у ChatGPT з глибшим reasoning (показує план думок, дозволяє коригувати mid-response). GPT-5.4 Pro — преміум-варіант для API та ChatGPT Pro/Enterprise з вищими бенчмарками (наприклад, BrowseComp 89.3%)

Що таке Responses API і чому на нього переходити?

Новий API для агентів з підтримкою CoT між turns, stateful context, native compaction і Tool Search. Знижує latency та вартість на 40–80% у multi-turn сценаріях, підвищує cache hit rate. Assistants API відключать 26 серпня 2026 — мігруйте

Як GPT-5.4 впливає на вартість і ефективність?

Базова вартість вища, але token efficiency на складних задачах +18–20%, з Tool Search — до −47%. Для агентних воркфлоу загалом дешевше та швидше. Контекст понад 272K — 2× тариф

NEWS 06 March 2026 6 min read 13,101 view

OpenAI GPT-5.4 Release Notes: Key Features, Benchmarks and 1M Token Context

Updated: 21 March 2026

Language: 🇺🇦 🇬🇧 🇩🇪

Vadim Kharovyuk

CEO & Founder of WebsCraft. 8 years in web development, focused on bringing AI into real products.

✦ Ask AI about this article

OpenAI GPT-5.4 Release Notes: Key Features, Benchmarks and 1M Token Context

On March 5, 2026, OpenAI released GPT-5.4 — simultaneously in ChatGPT, API, and Codex.
This is not just another incremental update: the model for the first time combines the GPT-5.3-Codex coding pipeline
with general reasoning, gains native computer use, and a context window of up to 1M tokens.
In short: if you are building agentic workflows or coding tools —
this is a release worth paying attention to today.

⚡ Key Highlights in 30 Seconds

✅ Release Date: March 5, 2026, rollout in ChatGPT, API, and Codex simultaneously

✅ Consolidated model: GPT-5.3-Codex and GPT-5.2 are merged into a single model — no longer need to switch between endpoints

✅ Native computer use: OpenAI's first mainline model that controls a computer autonomously via Playwright and mouse/keyboard commands

✅ 1M tokens of context in API (with double pricing beyond 272K)

✅ −47% tokens on some agentic tasks compared to predecessors

✅ −33% errors in specific assertions compared to GPT-5.2

📚 Table of Contents

📌 What was released and when

📌 3 main changes for developers

📌 Quick comparison with competitors

📌 What to do right now

📌 Want to go deeper?

🗓️ What was released and when

OpenAI officially announced GPT-5.4

on March 5, 2026. The model is immediately available across three surfaces:

ChatGPT — as GPT-5.4 Thinking for Plus, Team, and Pro users (replaces GPT-5.2 Thinking). GPT-5.2 Thinking remains in Legacy Models until June 5, 2026

API — endpoints gpt-5.4 and gpt-5.4-pro are available now

Codex — becomes the default model, replacing GPT-5.3-Codex

GPT-5.4 Pro is available via API and for ChatGPT Pro ($200/month) and Enterprise plans.

Free users gain access to GPT-5.4 through query auto-rotation, according to

VentureBeat.

⚙️ 3 main changes

1. No longer need to choose between GPT-5.x and Codex

Before the GPT-5.4 release, the standard architecture for an agentic pipeline with mixed tasks

looked like this: GPT-5.2 for planning and reasoning steps, GPT-5.3-Codex for generation

and code execution. Each switch between models meant a separate API call, separate context management,

different behavior in edge cases, and different fine-tuning parameters.

For long agent trajectories, this accumulated into significant overhead in terms of latency and

code complexity.

GPT-5.4 eliminates this need. According to

OpenAI,

this is the first mainline reasoning model that incorporates frontier coding capabilities

of GPT-5.3-Codex into unified weights — a result of merging training stacks, not routing logic.

In practice, this means:

SWE-Bench Pro: 57.7% vs 56.8% in GPT-5.3-Codex — GPT-5.4 reproduces
the coding performance of the Codex model with lower latency and additional reasoning capabilities,
according to gaga.art

GDPval: 83.0% — a new OpenAI metric, 44 professions from 9 industries,
1320 tasks from domain specialists with 14+ years of experience. GPT-5.4 surpasses
GPT-5.2 (70.9%) and matches or outperforms a human domain specialist in 83%
of comparisons, according to
The Decoder

Practically for developers: if your pipeline used two endpoints,
now it's enough to change the model ID to gpt-5.4 — in most cases
this is a swap without logic changes. GPT-5.4 becomes the default model in Codex, replacing
GPT-5.3-Codex automatically

Separately, it's worth noting a new feature in ChatGPT Thinking: the model now shows a reasoning plan

before execution and allows to correct the direction mid-response —

no need to start the query from scratch if the model went in the wrong direction. Available

on chatgpt.com and Android, iOS — coming soon, according to

DataCamp.

2. Native computer use: mechanics and real figures

GPT-5.4 is OpenAI's first general model with built-in computer use. It's important to understand

the architecture: it's not a single mechanism, but two parallel approaches that the model combines

depending on the task:

Code-based automation — the model writes code using Playwright or similar
libraries to control browsers and desktop applications. Suitable for deterministic,
repeatable workflows: forms, navigation, data extraction

Screenshot-based control — the model receives a screenshot of the current screen state
and issues mouse/keyboard commands. Suitable for tasks where the UI structure is unpredictable
or changes between sessions

Behavior is steered via developer messages and custom confirmation policies:

developers can configure which actions require user confirmation, and which

are executed autonomously — an important mechanism for production deployments with varying levels

of risk, according to

OpenAI.

Key benchmarks:

OSWorld-Verified: 75.0% — above the average human score (72.4%).
For comparison: GPT-5.2 on the same benchmark showed only 47.3% — meaning an increase
of more than 1.5×, according to
VentureBeat

BrowseComp: 82.7% (base) / 89.3% (Pro) —
measures the agent's ability to find hard-to-reach information on the internet through
persistent browsing. GPT-5.2 showed 65.8% — an increase of 17% absolute points

To demonstrate its capabilities, OpenAI released an experimental Codex skill

Playwright (Interactive): the model can visually debug web and Electron

applications in real-time — and even test the application during its creation.

According to

DataCamp,

this combination of code generation and visual feedback loop points to a direction where AI agents

will be able to iterate on frontend with minimal human involvement.

3. Tool Search: from static manifest to on-demand discovery

This is perhaps the most practically important change for developers building systems

with a large number of tools. Previously, passing tool definitions into the system prompt

was inefficient: all schemas were loaded into context with each call,

regardless of whether they were needed at a specific step.

GPT-5.4 solves this through a new architecture: the model receives only a lightweight

list of available tools, and loads full definitions on-demand

only when it decides to use a specific tool. According to

The Decoder,

large tool ecosystems previously added tens of thousands of unnecessary tokens

to each request.

Practical effect of Tool Search:

−47% tokens on agentic tasks with a large number of tools,
according to
VentureBeat

Scalability: tool search allows working with ecosystems
containing tens of thousands of tools — for example, corporate
MCP servers or large API catalogs, according to
Apidog

Cache hit rate: since the lightweight tool list is more stable between
requests than the full manifest, caching works more efficiently — further reducing
inference cost

Limitations: available exclusively via Responses API, not via
Chat Completions

Separately, it's worth noting the improvement in accuracy: on a set of de-identified prompts,

where users previously noted factual errors, GPT-5.4 shows

−33% erroneous statements and −18% responses with any

errors compared to GPT-5.2, according to

OpenAI.

For production systems where accuracy is critical (legal analysis, financial calculations),

this is a measurable improvement in reliability.

📊 Quick comparison with competitors

Current as of March 2026. Sources: Digital Applied, OpenAI, gaga.art.

Parameter	GPT-5.4	Claude Opus 4.6	Gemini 3.1 Pro
Context Window	1M API / 272K standard (beyond 272K — 2× pricing)	200K (1M beta)	2M
SWE-bench Verified	80.0%	80.8%	~74%
OSWorld (computer use)	75.0% (human: 72.4%)	72.7%	N/A
BrowseComp (web agents)	82.7% / Pro: 89.3%	N/A	N/A
Input / Output $/1M tokens	$2.50 / $15 (base) $30 / $180 (Pro)	$15 / $75	$2 / $12
Native computer use	✅ built-in	✅	Limited
CoT between turns	✅ (Responses API)	❌	❌
Tool Search	✅ (−47% tokens)	❌	❌

💡 Full comparison with 11 parameters, inference cost analysis, and practical hierarchy model → GPT-5.4: Architectural breakdown for developers

✅ What to do right now

If you have an agentic workflow or coding pipeline

Swap model ID to gpt-5.4 and run your evals.
If you previously used GPT-5.3-Codex — GPT-5.4 reproduces its SWE-Bench Pro
(57.7% vs 56.8%) with lower latency. If you used GPT-5.2 — expect
improvements on coding tasks without reasoning degradation

Consider migrating to Responses API if you use Chat
Completions with a large number of tools. Responses API unlocks Tool Search
(−47% tokens), CoT between turns, and native compaction — three features unavailable
via Chat Completions

Enable /fast mode in Codex for tasks where speed is
critical: the same GPT-5.4, but up to 1.5× faster token velocity, according to
target="_blank">VentureBeat

For 1M context window in Codex configure
model_context_window and model_auto_compact_token_limit
in Codex settings. Important: requests beyond the standard 272K are priced
at 2× the normal rate, according to
gaga.art

If you are building computer use agents

Use the updated computer tool in the API. In OpenAI's documentation
there are recommendations for original and high image detail settings —
they significantly improve localization and click accuracy

Configure custom confirmation policies for actions with different risk levels:
define which operations are performed autonomously, and which require confirmation from
the user before execution

Try Playwright (Interactive) in Codex for visual debugging
web and Electron applications — an experimental skill, but already functional for real
frontend tasks

If you have simple high-throughput tasks

Do not migrate in a hurry — gpt-5-mini or gpt-5.3-chat-latest remain
the better choice for cost/latency for classification, summarization, and template-filling.
GPT-5.4 will be overkill and more expensive for these scenarios

GPT-5.2 in the API has no announced deprecation date — so
legacy systems can be left untouched for now

Key Dates

June 5, 2026 — GPT-5.2 Thinking is disabled in ChatGPT
(moves to Legacy Models now, full disablement in 3 months).
If you use it in a product via the ChatGPT interface — migrate before this date

August 26, 2026 — Assistants API sunset. If you are still using
Assistants API — migration to Responses API is a priority task right now

🔬 Want to understand how it works?

This article is a brief overview of what was released. If you are interested in the engineering mechanics:

how the reasoning pipeline changed from GPT-5.0 to 5.4, why a consolidated model is

an architectural compromise, and how reasoning.effort affects cost

and latency — read the detailed breakdown:

👉

GPT-5.4 in 2026: from specialized models to consolidated architecture — what changed and why

14 min read · 5 sections · benchmarks · tables · FAQ

Sources:

OpenAI — Introducing GPT-5.4

TechCrunch — OpenAI launches GPT-5.4

VentureBeat — GPT-5.4 native computer use

Digital Applied — GPT-5.4 vs Claude vs Gemini

OpenAI Academy — GPT-5.4 Thinking and Pro

Categories

OpenAI GPT-5.4 Release Notes: Key Features, Benchmarks and 1M Token Context

Vadim Kharovyuk

On March 5, 2026, OpenAI released GPT-5.4 — simultaneously in ChatGPT, API, and Codex.

⚡ Key Highlights in 30 Seconds

📚 Table of Contents

🗓️ What was released and when

on March 5, 2026. The model is immediately available across three surfaces:

⚙️ 3 main changes

1. No longer need to choose between GPT-5.x and Codex

GPT-5.4 eliminates this need. According to

In practice, this means:

1320 tasks from domain specialists with 14+ years of experience. GPT-5.4 surpasses

GPT-5.3-Codex automatically

2. Native computer use: mechanics and real figures

To demonstrate its capabilities, OpenAI released an experimental Codex skill

According to

3. Tool Search: from static manifest to on-demand discovery

MCP servers or large API catalogs, according to

Chat Completions

📊 Quick comparison with competitors

✅ What to do right now

If you have an agentic workflow or coding pipeline

Completions with a large number of tools. Responses API unlocks Tool Search

If you are building computer use agents

If you have simple high-throughput tasks

GPT-5.4 will be overkill and more expensive for these scenarios

GPT-5.2 in the API has no announced deprecation date — so

Key Dates

Assistants API — migration to Responses API is a priority task right now

🔬 Want to understand how it works?

📬 Don't Miss New Articles

Ready to build a turnkey website?