What is GPT-Realtime-2?

GPT-Realtime-2 is OpenAI's new flagship realtime voice model, built on GPT-5-class reasoning. It supports a 128K context window, parallel tool calls, and recovery behaviors that let voice agents handle multi-step requests rather than simple call-and-response prompts.

How much does the OpenAI Realtime API cost?

GPT-Realtime-2 is priced at $32 per million audio input tokens, $0.40 per million cached input tokens, and $64 per million audio output tokens. GPT-Realtime-Translate is metered at $0.034 per minute and GPT-Realtime-Whisper at $0.017 per minute.

How many languages does GPT-Realtime-Translate support?

According to OpenAI, GPT-Realtime-Translate accepts speech input in more than 70 languages and produces translated speech output in 13 languages, with the cadence kept in sync with the speaker.

OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning

OpenAI on May 7, 2026 launched three new audio models inside its Realtime API — a step the company framed as moving live voice from demoware into an enterprise-ready surface. The headline model, GPT-Realtime-2, runs on GPT-5-class reasoning and sits alongside two specialist siblings: GPT-Realtime-Translate for live multilingual conversation, and GPT-Realtime-Whisper for streaming speech-to-text. The Realtime API itself has been generally available since August 28, 2025, when it graduated from beta alongside the original gpt-realtime model.

The release lands during a sprint of voice-agent launches across the industry, but OpenAI is leaning on raw capability rather than packaging. GPT-Realtime-2 ships with a 128K context window, up from 32K in the previous generation, and adds parallel tool calls, short spoken preambles like "let me check that," and recovery behavior when a task fails midway through a call.

What the three models do

GPT-Realtime-2 is positioned as the agentic voice model. According to OpenAI, it is designed to manage context across longer conversations, use tools, accept corrections, and avoid the rigid turn-taking that plagued earlier voice systems. The company reported a roughly 15-point improvement on internal audio benchmarks compared with the prior realtime model.

GPT-Realtime-Translate handles live cross-lingual speech, accepting input in more than 70 languages and producing output in 13. OpenAI is pitching it for customer support, cross-border sales calls, education, and event interpretation. GPT-Realtime-Whisper, meanwhile, brings streaming transcription to the API for use cases like meeting capture, accessibility tooling, and call-center pipelines.

Pricing and availability

Pricing reflects the heavier reasoning load on the flagship model. GPT-Realtime-2 is billed at $32 per million audio input tokens, $0.40 per million cached input tokens, and $64 per million audio output tokens. The two specialist models are metered by the minute: $0.034 per minute for Translate and $0.017 per minute for Whisper. All three are available immediately through the Realtime API, which has been generally available since August 2025.

OpenAI told developers the company embedded safeguards intended to halt conversations automatically when its content guidelines are violated, a response to ongoing concerns about voice cloning, fraud, and synthetic-call abuse.

Why it matters

Voice has lagged text in agentic AI, largely because realtime systems struggled to reason while staying in sync with a speaker. By compressing GPT-5-class reasoning, a 4x larger context window, and parallel tool use into a single conversational model, OpenAI is signalling that the next wave of voice agents will be able to handle support tickets, translate cross-border calls, and run live multi-step workflows without falling back to a script.

For enterprises sitting on top of legacy IVR stacks, the metered translation and transcription models are the cleaner upgrade path. For developers building on a Realtime API that has been GA for months, the new models remove the last capability excuses to keep voice features in private beta — and tighten the competitive vise on rivals shipping voice products this quarter.

OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning

What the three models do

Pricing and availability

Why it matters

More in Models

Zyphra Releases ZAYA1-8B, the First Frontier-Class Reasoning MoE Trained Entirely on AMD

OpenAI Makes GPT-5.5 Instant the Default ChatGPT Model With 52.5% Fewer Hallucinations

xAI Launches Grok 4.3 With 1M Context, Steep Price Cuts, and an Imagine Agent Mode