Back to all articles
16 MIN READ

GPT-5.3 Codex: OpenAI's Most Powerful Coding Agent — Guide & Benchmarks (2026)

By Learnia Team

GPT-5.3 Codex: OpenAI's Most Powerful Agentic Coding Model (2026)

This article is written in English. Our training modules are available in multiple languages.

📅 Last Updated: February 6, 2026 — Released February 5, 2026.

📚 Related: GPT-5.2 Codex Deep Dive | Claude Opus 4.6 vs GPT-5.3 Codex | AI Code Editors Comparison | ChatGPT 5.2 Prompting Guide


Table of Contents

  1. What Is GPT-5.3-Codex?
  2. GPT-5.3 vs GPT-5.3-Codex: Clarification
  3. Benchmark Performance
  4. Key Capabilities
  5. Self-Bootstrapping: A Milestone
  6. Cybersecurity: First "High" Rating
  7. Availability & Access
  8. Practical Use Cases
  9. Limitations
  10. FAQ

On February 5, 2026, OpenAI released GPT-5.3-Codex — their most powerful agentic coding model. This is not an incremental update: GPT-5.3-Codex is the first AI model rated "High" capability in cybersecurity under OpenAI's Preparedness Framework, the first model directly trained to identify software vulnerabilities, and the first model instrumental in creating itself through self-bootstrapping.

Important clarification: There is no standalone "GPT-5.3" general-purpose model. What people refer to as "GPT 5.3" is specifically GPT-5.3-Codex — a specialized model optimized for coding, debugging, and cybersecurity. OpenAI's current general-purpose model remains GPT-5.2.

With a 77.3% score on Terminal-Bench 2.0 (up from 64.0% for its predecessor), a 64.7% on OSWorld-Verified (up from 38.2%), and a first-of-its-kind cybersecurity classification, GPT-5.3-Codex represents a step change in what AI coding agents can do. In this guide, we'll cover everything: benchmarks, capabilities, safety implications, and how to access it.


Master AI Prompting — €20 One-Time

10 ModulesLifetime Access
Get Full Access

What Is GPT-5.3-Codex?

GPT-5.3-Codex is OpenAI's purpose-built agentic coding model, the successor to GPT-5.2-Codex released December 18, 2025. It is designed to autonomously plan, write, debug, and deploy code across complex multi-file projects with minimal human intervention.

Key definition: GPT-5.3-Codex is a specialized agentic AI model released February 5, 2026, optimized for autonomous software development, cybersecurity analysis, and computer use — running 25% faster than its predecessor while consuming fewer tokens per task.

Technical Specifications

SpecificationGPT-5.3-CodexGPT-5.2-Codex (previous)
Release DateFebruary 5, 2026December 18, 2025
TypeSpecialized coding agentSpecialized coding agent
Training HardwareNVIDIA GB200 NVL72Not disclosed
Speed25% faster than predecessorBaseline
Token EfficiencyFewer tokens than any prior modelBaseline
Cybersecurity RatingHigh (first ever)Medium
Biology RatingHighHigh
Self-ImprovementDoes NOT reach HighN/A
API Access❌ Not yet available✅ Available
ChatGPT Access✅ Paid plans✅ Paid plans

GPT-5.3 vs GPT-5.3-Codex: Clarification

If you searched for "GPT 5.3" expecting a new general-purpose model — here's what you need to know:

The GPT-5.x model family timeline:

ModelRelease DateType
GPT-5August 7, 2025General-purpose
GPT-5.1November 12, 2025General-purpose update
GPT-5.2December 11, 2025General-purpose (current)
GPT-5.2-CodexDecember 18, 2025Specialized coding
GPT-5.2-ProDecember 2025Enhanced reasoning
GPT-5.3-CodexFebruary 5, 2026Specialized coding (latest)

There is no "GPT-5.3" general model. OpenAI's naming convention uses the .3 suffix exclusively for the Codex line here. For general-purpose tasks (writing, analysis, conversation), GPT-5.2 remains the latest model.


Benchmark Performance

GPT-5.3-Codex delivers dramatic improvements over its predecessor. All scores below are at xhigh reasoning effort:

Screenshot of GPT-5.3-Codex performing a GDPval knowledge work task — analyzing financial data output GPT-5.3-Codex output on a GDPval knowledge work task designed by an experienced professional — Source: OpenAI

Coding Benchmarks

BenchmarkGPT-5.3-CodexGPT-5.2-CodexGPT-5.2Improvement
Terminal-Bench 2.077.3%64.0%62.2%+20.8%
SWE-Bench Pro56.8%56.4%55.6%+0.7%
SWE-Lancer IC Diamond81.4%76.0%74.6%+7.1%

Terminal-Bench 2.0 is the most telling benchmark — it evaluates end-to-end agentic coding including planning, execution, debugging, and iteration. A jump from 64.0% to 77.3% represents a 20.8% improvement in just 7 weeks.

SWE-Bench Pro efficiency note: According to OpenAI's official benchmark charts, GPT-5.3-Codex achieves 57% accuracy on SWE-Bench Pro at xhigh effort using only ~43,800 output tokens, compared to GPT-5.2-Codex reaching 56% accuracy at xhigh effort using ~91,700 output tokens — demonstrating over 2× token efficiency for equivalent performance.

Screenshot showing GPT-5.3-Codex SWE-Bench Pro accuracy vs token efficiency comparison GPT-5.3-Codex achieves higher accuracy with significantly fewer tokens on SWE-Bench Pro — Source: OpenAI

Computer Use & General Tasks

BenchmarkGPT-5.3-CodexGPT-5.2-CodexGPT-5.2Improvement
OSWorld-Verified64.7%38.2%37.9%+69.4%
GDPval (wins/ties)70.9%70.9%Tied
Cybersecurity CTF77.6%67.4%67.7%+15.1%

The OSWorld-Verified improvement is staggering: from 38.2% to 64.7%, a 69.4% increase. This benchmark tests the model's ability to interact with operating systems — clicking, navigating, filling forms, managing files. GPT-5.3-Codex can now reliably automate complex computer tasks. Humans score ~72% on this benchmark, meaning GPT-5.3-Codex is approaching human-level performance.

Screenshot showing GPT-5.3-Codex OSWorld computer use capabilities demonstration In OSWorld-Verified, models use vision to complete diverse computer tasks. Humans score ~72% — Source: OpenAI

Key Observations

  1. Terminal-Bench and OSWorld show massive gains, suggesting fundamental improvements in planning and execution
  2. SWE-Bench Pro shows only marginal improvement (+0.7%), indicating ceiling effects on this benchmark
  3. Cybersecurity CTF improvement (+15.1%) reflects the new specialized vulnerability training
  4. GDPval tie with GPT-5.2 suggests general-purpose reasoning hasn't changed (expected for a coding-focused model)

Key Capabilities

1. Interactive Collaboration

Unlike previous models that worked silently until task completion, GPT-5.3-Codex provides frequent progress updates and allows users to steer the model while it works.

Example workflow:

User: "Build a real-time chat application with WebSocket support"

GPT-5.3-Codex: "Starting project setup. I'll create:
1. Express server with ws library
2. React frontend with auto-reconnection
3. Message persistence with SQLite
4. User presence indicators

Setting up project structure now... ████░░░░ 45%

[Update] Server scaffold complete. Starting WebSocket handler.
Should I add typing indicators and read receipts?"

User: "Yes, add both. Also add message threading."

GPT-5.3-Codex: "Adding threading support to the schema.
Modifying message model... ██████░░ 72%"

This interactive pattern transforms coding from "submit and wait" to genuine collaboration.

2. Autonomous Multi-Day Projects

GPT-5.3-Codex can work on complex projects autonomously over days, consuming millions of tokens while building sophisticated applications:

  • Complex web games with multiple levels and physics
  • Full-stack applications with authentication, database, and deployment
  • API platforms with documentation and test suites

3. Beyond Pure Coding

Despite its name, GPT-5.3-Codex extends beyond code:

  • Slide decks: Generate presentation materials from specs
  • Data analysis: Process datasets and produce insights
  • PRDs: Write product requirement documents
  • User research: Analyze feedback and identify patterns
  • Metrics dashboards: Build monitoring and reporting tools

4. Extreme Token Efficiency

OpenAI reports that GPT-5.3-Codex consumes fewer tokens than any prior model on coding tasks while running 25% faster. This means:

  • Lower latency on each operation
  • More work done within context limits
  • Cost savings per task (when API becomes available)
  • Longer autonomous sessions before context exhaustion

Self-Bootstrapping: A Milestone

GPT-5.3-Codex is the first AI model instrumental in creating itself. During development, OpenAI used early versions of the model to:

  1. Debug training issues: The model identified problems in its own training pipeline
  2. Manage deployment: Early versions helped orchestrate the deployment infrastructure
  3. Diagnose evaluations: The model analyzed its own benchmark results to identify improvement areas

Why this matters: Self-bootstrapping represents a step toward AI systems that can improve their own development process. While GPT-5.3-Codex does NOT reach "High" capability on AI self-improvement (according to OpenAI's Preparedness Framework), the fact that it contributed to its own creation is a milestone in AI development methodology.

This is distinct from previous models where humans did all the training debugging. GPT-5.3-Codex demonstrates that AI can meaningfully participate in the model development lifecycle.

Concrete Self-Bootstrapping Examples (from OpenAI's announcement)

According to OpenAI's official blog post, here's how GPT-5.3-Codex was used to build itself:

  • Research team: Used Codex to monitor and debug the training run, track patterns throughout training, and build applications for researchers to precisely understand behavior differences
  • Engineering team: Used Codex to optimize the harness, identify context rendering bugs, root cause low cache hit rates, and dynamically scale GPU clusters during launch
  • Alpha testing analysis: GPT-5.3-Codex built regex classifiers to estimate clarification frequency, positive/negative user responses, and task progress — then ran them scalably over all session logs and produced reports
  • Data science: A data scientist worked with GPT-5.3-Codex to build new data pipelines and visualizations, then co-analyzed results that "concisely summarized key insights over thousands of data points in under three minutes"

Source: OpenAI


Cybersecurity: First "High" Rating

GPT-5.3-Codex is the first AI model classified as "High" capability in cybersecurity under OpenAI's Preparedness Framework. It is also the first model directly trained to identify software vulnerabilities.

What "High" Means

OpenAI's Preparedness Framework categorizes model capabilities on a scale from Low to Critical. "High" means the model can:

  • Identify complex vulnerabilities in production code
  • Suggest exploitation vectors for discovered vulnerabilities
  • Analyze security architectures for weaknesses
  • Perform sophisticated capture-the-flag (CTF) challenges (77.6% score)

Safety Measures Deployed

Given the dual-use nature of cybersecurity capabilities, OpenAI has deployed what they describe as their "most comprehensive cybersecurity safety stack to date":

  1. Trusted Access for Cyber pilot program: Controlled access for vetted cybersecurity professionals to accelerate cyber defense research
  2. $10M in API credits: Committed to cyber defense organizations, especially for open source software and critical infrastructure
  3. Aardvark expansion: OpenAI's security research agent, expanding private beta as the first offering in their Codex Security products suite — already used to find vulnerabilities in Next.js (CVE-2025-59471 and CVE-2025-59472)
  4. Preparedness Framework safeguards: Safety training, automated monitoring, enforcement pipelines including threat intelligence
  5. Open-source scanning: Partnering with open-source maintainers for free codebase scanning

Screenshot showing GPT-5.3-Codex cybersecurity and Trusted Access capabilities overview GPT-5.3-Codex cybersecurity capabilities and Trusted Access program — Source: OpenAI

The Dual-Use Dilemma

A model trained to find vulnerabilities can also be used to exploit them. OpenAI's approach is to:

  • Make the model available for defensive purposes
  • Restrict access through the Trusted Access program
  • Monitor usage patterns for potential abuse
  • Invest heavily in defensive applications ($10M commitment)

This makes GPT-5.3-Codex both a powerful tool for security professionals and a model that requires careful governance.


Availability & Access

Where to Access GPT-5.3-Codex

PlatformAvailableNotes
ChatGPT (Paid plans)✅ YesPlus, Pro, Team, Enterprise
Codex App✅ YesStandalone coding application
Codex CLI✅ YesCommand-line interface
Codex IDE Extension✅ YesVS Code and others
Web (codex.openai.com)✅ YesBrowser-based access
OpenAI API❌ Not yet"Working to safely enable soon"

API Access Timeline

As of February 6, 2026, GPT-5.3-Codex is NOT available via the OpenAI API. OpenAI states they are "working to safely enable" API access, likely due to the model's "High" cybersecurity rating requiring additional safety measures before broad programmatic access.

Infrastructure note: GPT-5.3-Codex was co-designed for, trained with, and served on NVIDIA GB200 NVL72 systems. OpenAI is also running GPT-5.3-Codex 25% faster for Codex users thanks to improvements in their infrastructure and inference stack.

This means:

  • You cannot integrate GPT-5.3-Codex into custom applications yet
  • Enterprise users must use the Codex app/CLI/extension
  • No programmatic batch processing is available
  • Pricing for API access has not been announced

Pricing

No specific pricing has been announced for GPT-5.3-Codex. Access is currently bundled with paid ChatGPT plans:

  • ChatGPT Plus: $20/month
  • ChatGPT Pro: $200/month (higher usage limits)
  • ChatGPT Team: $25/user/month
  • ChatGPT Enterprise: Custom pricing

Practical Use Cases

1. Full-Stack Application Development

Prompt: "Build a task management API with:
- Express.js backend with TypeScript
- PostgreSQL with Prisma ORM
- JWT authentication with refresh tokens
- Role-based access control (admin, member, viewer)
- WebSocket notifications for task updates
- Docker Compose for local development
- Comprehensive test suite with Jest"

GPT-5.3-Codex can autonomously build this over a multi-hour session, providing updates throughout and allowing you to steer decisions.

2. Security Audit & Vulnerability Assessment

Prompt: "Audit this Node.js e-commerce application for:
- OWASP Top 10 vulnerabilities
- Business logic flaws
- Authentication bypass vectors
- Data exposure risks
- Dependency vulnerabilities
Provide severity ratings and remediation steps."

With its "High" cybersecurity rating, GPT-5.3-Codex excels at comprehensive security audits.

3. Legacy Codebase Modernization

Prompt: "Migrate this Python 2.7 Django 1.x application to:
- Python 3.12 with type hints throughout
- Django 5.x with async views
- Replace deprecated APIs
- Add comprehensive tests for each migrated module
- Maintain backwards-compatible database migrations"

4. Complex Debugging Sessions

Prompt: "This microservices system has intermittent 502 errors 
under load. Here are the service configs, nginx setup, and 
recent logs. Identify the root cause and implement a fix."

The interactive collaboration feature allows the model to ask clarifying questions during investigation.

5. Game Development

OpenAI specifically highlights GPT-5.3-Codex's ability to build complex games autonomously over days, including:

  • Multi-level game logic
  • Physics engines
  • Asset management systems
  • Multiplayer networking

Limitations

What GPT-5.3-Codex Cannot Do

  1. No API access yet: You cannot programmatically integrate GPT-5.3-Codex into custom applications
  2. Not general-purpose: For writing, analysis, or conversation, use GPT-5.2
  3. No disclosed context window: OpenAI hasn't specified the exact context window size
  4. No knowledge cutoff published: Training data recency is unknown
  5. Cybersecurity dual-use risk: The model's vulnerability detection can theoretically be misused
  6. AI self-improvement capped: Does NOT reach "High" on AI self-improvement (confirmed by OpenAI)

SWE-Bench Pro Plateau

The marginal improvement on SWE-Bench Pro (56.4% → 56.8%) suggests this benchmark may be approaching ceiling effects for current architectures. Real-world coding improvements (captured by Terminal-Bench) are much more significant.

Cost Uncertainty

Without API pricing, enterprise customers cannot forecast costs for large-scale deployments. This may delay adoption compared to competitors like Claude Opus 4.6, which launched with full API access and transparent pricing.


GPT-5.3-Codex vs. GPT-5.2-Codex: Should You Upgrade?

AspectGPT-5.3-CodexGPT-5.2-Codex
Terminal-Bench 2.077.3%64.0%
OSWorld64.7%38.2%
Cybersecurity CTF77.6%67.4%
SWE-Bench Pro56.8%56.4%
Speed25% fasterBaseline
Token UsageLowerBaseline
Interactive Updates✅ Yes❌ No
API Access❌ Not yet✅ Yes

Verdict: If you use the Codex app, CLI, or IDE extension — upgrade immediately. The improvements in Terminal-Bench (+20.8%) and OSWorld (+69.4%) are massive. If you rely on API access, you'll need to wait until OpenAI enables it.


FAQ

When was GPT-5.3 released?

GPT-5.3-Codex was released on February 5, 2026.

Is GPT-5.3 better than ChatGPT 5.2?

For coding and cybersecurity tasks, yes. GPT-5.3-Codex significantly outperforms GPT-5.2 on Terminal-Bench (77.3% vs 62.2%) and Cybersecurity CTF (77.6% vs 67.7%). For general-purpose tasks like writing and conversation, GPT-5.2 remains the better choice.

Can I use GPT-5.3 via API?

Not yet as of February 2026. OpenAI states they are "working to safely enable" API access. Currently, GPT-5.3-Codex is available through the Codex app, CLI, IDE extension, and paid ChatGPT plans.

Is GPT-5.3 safe to use?

OpenAI has deployed their "most comprehensive cybersecurity safety stack" for GPT-5.3-Codex. The model is classified "High" in cybersecurity capability but does NOT reach "High" on AI self-improvement. Safety measures include the Trusted Access program, monitoring, and usage restrictions.

How does GPT-5.3 compare to Claude Opus 4.6?

See our detailed comparison: Claude Opus 4.6 vs GPT-5.3 Codex. In short: GPT-5.3-Codex leads on Terminal-Bench and computer use; Opus 4.6 leads on general reasoning, offers 1M context, has API access with transparent pricing, and excels at knowledge work.



Key Takeaways

  1. GPT-5.3-Codex is a specialized coding model, not a general-purpose GPT-5.3 — OpenAI's general model remains GPT-5.2

  2. Terminal-Bench 2.0 score of 77.3% represents a 20.8% improvement over GPT-5.2-Codex, the largest single-generation gain in agentic coding

  3. First "High" cybersecurity AI model — directly trained to find vulnerabilities, with comprehensive safety measures

  4. Self-bootstrapping milestone: First model that contributed to its own development process

  5. OSWorld 64.7% (from 38.2%) shows transformative improvement in computer use capabilities

  6. 25% faster with fewer tokens than any prior model on coding tasks

  7. No API access yet — available only through Codex app, CLI, IDE extension, and ChatGPT paid plans


Build AI Agents and Agentic Workflows

GPT-5.3-Codex's autonomous coding capabilities represent the frontier of agentic AI. Understanding the principles behind autonomous agents — planning, tool use, self-correction — will help you leverage these models effectively.

In our Module 6 — AI Agents & Orchestration, you'll learn:

  • How AI agents plan, reason, and take action autonomously
  • The ReAct pattern for combining reasoning with tool use
  • Building multi-agent systems for complex workflows
  • Tool integration and function calling patterns
  • Safety patterns for autonomous AI systems
  • When to use agentic AI vs. simpler approaches

Explore Module 6: AI Agents & Orchestration


Last Updated: February 6, 2026 Features and specifications verified against OpenAI's official blog and platform documentation.

GO DEEPER

Module 6 — AI Agents & ReAct

Create autonomous agents that reason and take actions.