GPT-5.4 Guide: Features, Benchmarks & What's New (March 2026)
By LearnIA
What Is GPT-5.4?
GPT-5.4 is OpenAI's latest flagship model, released on March 5, 2026. It is the first general-purpose model with native computer-use capabilities, meaning it can see your screen, move the mouse, type on the keyboard, and execute multi-step workflows — all without third-party plugins.
The model is available in three surfaces: ChatGPT (as GPT-5.4 Thinking), the API (model ID gpt-5.4), and Codex. A higher-capacity variant, GPT-5.4 Pro, targets the most demanding professional tasks. Codex users get access to up to 1 million context tokens, the largest window OpenAI has shipped to date.
GPT-5.2 Thinking will stay accessible under Legacy Models in ChatGPT until June 5, 2026, giving teams a three-month migration window.
Learn AI — From Prompts to Agents
Key Improvements Over GPT-5.3-Codex
Knowledge work
GPT-5.4 scores 83.0% on GDPval, up from 70.9% for both GPT-5.3-Codex and GPT-5.2, representing a 17% absolute gain. It also produces 33% fewer false claims compared to GPT-5.2, and hits 87.3% on IB financial-modeling tasks.
Computer use
The standout feature is native computer use. GPT-5.4 achieves 75.0% on OSWorld, surpassing the human baseline of 72.4%. The model sees screenshots at up to 10.24 million pixels (original image detail) and controls the keyboard and mouse directly.
Tool use & browsing
A new tool search feature reduces token consumption by 47% when working across 36 MCP servers. BrowseComp jumps to 82.7% (up from 77.3%), and MCP Atlas reaches 67.2%. The Toolathlon benchmark rises from 51.9% to 54.6%.
Coding
GPT-5.4 matches GPT-5.3-Codex on SWE-Bench Pro (57.7% vs 56.8%) and adds a /fast mode with 1.5× token velocity, plus a new Playwright Interactive skill for browser-based testing.
Steerability
GPT-5.4 introduces mid-response adjustment — you can steer the model's behavior while it is still generating — and an automatic preamble for complex queries that outlines the reasoning plan before diving in.
Benchmark Comparison Table
| Benchmark | GPT-5.4 | GPT-5.3-Codex | GPT-5.2 | Claude Opus 4.6* |
|---|---|---|---|---|
| GDPval (knowledge work) | 83.0% | 70.9% | 70.9% | — |
| SWE-Bench Pro (coding) | 57.7% | 56.8% | 55.6% | — |
| OSWorld (computer use) | 75.0% | 74.0% | 47.3% | ~65% |
| BrowseComp (web search) | 82.7% | 77.3% | 65.8% | — |
| Toolathlon (tool use) | 54.6% | 51.9% | 46.3% | — |
| MMMU Pro (vision) | 81.2% | — | 79.5% | — |
| ARC-AGI-2 (abstract reasoning) | 73.3% | — | 52.9% | — |
| GPQA Diamond (science) | 92.8% | 92.6% | 92.4% | — |
| Humanity's Last Exam (w/ tools) | 52.1% | — | 45.5% | — |
| FrontierMath Tier 4 | 27.1% | — | 18.8% | — |
* Claude Opus 4.6 figures are approximate third-party estimates where available.
GPT-5.4 Pro pushes the ceiling further: BrowseComp 89.3%, ARC-AGI-2 83.3%, Humanity's Last Exam 58.7%, FrontierMath Tier 4 38.0%.
Pricing & Availability
| Model | Input | Cached Input | Output |
|---|---|---|---|
| gpt-5.4 | $2.50 / M tokens | $0.25 / M tokens | $15.00 / M tokens |
| gpt-5.4-pro | $30.00 / M tokens | — | $180.00 / M tokens |
| gpt-5.2 (reference) | $1.75 / M tokens | $0.175 / M tokens | $14.00 / M tokens |
GPT-5.4 is available to ChatGPT Plus, Team, and Pro subscribers. API access is open to all tiers. The cached-input price ($0.25/M) makes long-context and agentic workloads remarkably affordable — ten times cheaper than full-price input tokens.
Computer Use: A Game-Changer
Computer use in GPT-5.4 is not a plugin — it is a native capability baked into the model. It processes raw screenshots, identifies UI elements, and emits keyboard and mouse actions in a single inference pass.
On the OSWorld benchmark — which tests real desktop tasks like filling spreadsheets, navigating file managers, and using web apps — GPT-5.4 reaches 75.0%, above the human baseline of 72.4%. This is a massive leap from GPT-5.2's 47.3%.
The original image-detail level supports screenshots up to 10.24 million pixels, giving the model enough resolution to read small UI text and interact with dense interfaces. For developers, this opens up a new class of automation: testing desktop apps, filling government forms, migrating data across legacy systems — tasks that previously required brittle RPA scripts.
Tool Search & Efficiency
As agent architectures grow, so does the number of tools a model has to consider. OpenAI's new tool search feature allows GPT-5.4 to query a registry of tools instead of loading all definitions into the prompt.
The result: a 47% reduction in tokens when operating with 36 MCP (Model Context Protocol) servers. Fewer tokens means faster responses and lower costs, especially in production pipelines that chain multiple tools.
The MCP Atlas benchmark — which measures a model's ability to discover, select, and call the right tool from a large registry — improves from roughly 60% to 67.2%. Partners like Zapier confirm the gains: "GPT-5.4 xhigh is the new state of the art for multi-step tool use."
Prompting Tips for GPT-5.4
- →
Use tool search for large toolsets. If you manage more than 10 tools, define them in an MCP registry and let GPT-5.4 search rather than reading all schemas up front. This cuts token spend significantly.
- →
Leverage mid-response steering. GPT-5.4 supports real-time adjustments. If the model starts heading in the wrong direction, you can course-correct without re-prompting from scratch.
- →
Set
image_detail: originalfor computer-use tasks. High-resolution screenshots let the model read fine UI elements. Lower detail levels save tokens but may miss small buttons or text. - →
Use
/fastmode for throughput-sensitive coding tasks. The 1.5× token-velocity mode is ideal for batch refactoring or CI/CD-integrated code reviews where latency matters more than reasoning depth.
GPT-5.4 vs GPT-5.3-Codex vs Claude Opus 4.6
GPT-5.4 vs GPT-5.3-Codex: GPT-5.4 is a clear upgrade for knowledge work (+12 points on GDPval), browsing (+5 on BrowseComp), and tool use. For pure coding, the gap is narrower (57.7% vs 56.8% on SWE-Bench Pro), but the addition of computer use and tool search makes GPT-5.4 a more versatile agent backbone.
GPT-5.4 vs Claude Opus 4.6: The two models occupy different niches. GPT-5.4 dominates on computer use (75% vs ~65% estimated for Claude on OSWorld) and tool orchestration (BrowseComp 82.7%). Claude Opus 4.6 holds an edge on SWE-Bench Verified (81.4%) and long-form extended thinking tasks. Cursor's internal benchmarks rank GPT-5.4 as the current leader overall, while coding-heavy teams may still prefer Claude for deep refactoring. Harvey reports 91% on BigLaw Bench with GPT-5.4, positioning it as the top choice for legal AI.
In practice, many teams will route different tasks to different models — GPT-5.4 for browsing, tool use, and computer-use agents; Claude Opus 4.6 for complex code and nuanced reasoning.
Should You Upgrade?
| If you… | Recommendation |
|---|---|
| Build agents that use tools or browse the web | Upgrade immediately — tool search and BrowseComp gains are substantial. |
| Need desktop/browser automation | Upgrade — native computer use is unmatched at 75% OSWorld. |
| Run professional knowledge work (finance, law, consulting) | Upgrade — 83% GDPval and 33% fewer hallucinations are a step change. |
| Do mostly coding with Codex | Marginal gain for pure coding. Evaluate /fast mode and Playwright Interactive. |
| Are budget-constrained | The input price rose from $1.75 to $2.50/M tokens, but cached input at $0.25/M and 47% fewer tool tokens can offset this. Run the numbers for your workload. |
Bottom Line
GPT-5.4 is the most capable general-purpose model OpenAI has released. Native computer use, tool search, and a 12-point jump on professional knowledge work make it an immediate upgrade for agent builders, enterprise automation, and anyone who chains tools at scale. The coding gap over GPT-5.3-Codex is modest, but every other dimension shows clear, measurable progress. With GPT-5.2 retiring on June 5, now is the time to migrate.
Weekly AI Insights
Tools, techniques & news — curated for AI practitioners. Free, no spam.
Free, no spam. Unsubscribe anytime.
→Related Articles
FAQ
Is GPT-5.4 out?+
Yes. GPT-5.4 was released on March 5, 2026. It's available in ChatGPT (as GPT-5.4 Thinking), the API (model ID: gpt-5.4), and Codex.
What is new in GPT-5.4?+
GPT-5.4 adds native computer-use capabilities, tool search for 47% fewer tokens, 83% on GDPval (professional tasks), 75% on OSWorld (surpassing human 72.4%), and 1M context in Codex.
How much does GPT-5.4 cost?+
API pricing: $2.50/M input tokens ($0.25 cached), $15/M output tokens. GPT-5.4 Pro: $30/M input, $180/M output. Available to ChatGPT Plus, Team, and Pro users.
GPT-5.4 vs Claude Opus 4.6: which is better?+
GPT-5.4 leads in computer use (75% OSWorld vs Claude's ~65%) and tool use (BrowseComp 82.7%). Claude Opus 4.6 leads on SWE-Bench Verified (81.4%) and extended thinking. Each excels in different areas.
When will GPT-5.2 be retired?+
GPT-5.2 Thinking will remain available until June 5, 2026 under Legacy Models in ChatGPT, then be retired.