Back to stories
Models

OpenAI Launches GPT-5.4 With Native Computer-Use and 1M Token Context

Michael Ouroumis2 min read
OpenAI Launches GPT-5.4 With Native Computer-Use and 1M Token Context

OpenAI dropped its most ambitious model yet on March 5, and the AI community is still digesting what GPT-5.4 means for the industry. The new release merges the coding prowess of GPT-5.3-Codex with breakthrough computer-use capabilities, creating a single system that can reason, write code, and directly operate software interfaces.

What Makes GPT-5.4 Different

The headline feature is native computer-use. GPT-5.4 can view screenshots, move a cursor, click buttons, and type keystrokes — effectively operating any desktop or web application the way a human would. Previous models required third-party tooling or wrappers to achieve similar functionality. Now it is built into the model itself.

OpenAI also expanded the API context window to one million tokens, the largest the company has ever offered. For enterprise customers working with sprawling codebases or lengthy legal documents, this removes a persistent bottleneck.

Benchmark Performance

The numbers back up the marketing. GPT-5.4 set new records on OSWorld-Verified and WebArena-Verified, two benchmarks that measure a model's ability to complete real software tasks. It also scored 83 percent on OpenAI's internal GDPval test for knowledge work — tasks like drafting reports, analyzing spreadsheets, and managing project workflows.

Factual accuracy improved as well. Compared to GPT-5.2, individual claims are 33 percent less likely to be false, and full responses are 18 percent less likely to contain any factual errors.

Two Flavors at Launch

OpenAI released two variants simultaneously. GPT-5.4 Thinking is the default in ChatGPT, optimized for interactive conversations with visible chain-of-thought reasoning. GPT-5.4 Pro targets power users and enterprise customers who need maximum performance on complex, multi-step tasks.

Both versions are available through the API, where developers can access the full million-token context and integrate computer-use into agentic workflows.

Industry Implications

The release intensifies an already crowded March. Google shipped Gemini 3.1 Pro in late February, Anthropic updated Claude Sonnet 4.6, and DeepSeek V4 arrived days earlier with its own trillion-parameter multimodal architecture. The pace of releases has shifted from quarterly cadences to what industry trackers now describe as weekly waves.

More significantly, GPT-5.4 signals that the frontier is moving beyond chat and code generation toward autonomous task execution. Models that can see and operate software blur the line between assistant and agent — a shift that will reshape how businesses think about automation, workforce planning, and software design in the months ahead.

How AI Actually Works — Free Book on FreeLibrary

A free book that explains the AI concepts behind the headlines — no jargon, just clarity.

More in Models

Microsoft Releases Phi-4-Reasoning-Vision-15B: A Small Model That Knows When to Think
Models

Microsoft Releases Phi-4-Reasoning-Vision-15B: A Small Model That Knows When to Think

Microsoft open-sources Phi-4-reasoning-vision-15B, a compact 15B-parameter multimodal model that selectively activates chain-of-thought reasoning and rivals models many times its size.

2 days ago2 min read
Anthropic Releases Claude Opus 4.6 — Its Most Capable Agentic Coding Model
Models

Anthropic Releases Claude Opus 4.6 — Its Most Capable Agentic Coding Model

Anthropic launches Claude Opus 4.6, a frontier model purpose-built for autonomous coding agents that can plan, execute, and debug multi-file projects with minimal human oversight.

3 days ago2 min read
Meta Releases Llama 4 Maverick With 400B Parameters Under Open Weights
Models

Meta Releases Llama 4 Maverick With 400B Parameters Under Open Weights

Meta releases Llama 4 Maverick, a 400-billion parameter mixture-of-experts model under its open weights license, matching GPT-5 on key benchmarks and reigniting the open-source AI debate.

3 days ago2 min read