Back to stories
Models

OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning

Michael Ouroumis2 min read
OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning

OpenAI on May 7, 2026 launched three new audio models inside its Realtime API — a step the company framed as moving live voice from demoware into an enterprise-ready surface. The headline model, GPT-Realtime-2, runs on GPT-5-class reasoning and sits alongside two specialist siblings: GPT-Realtime-Translate for live multilingual conversation, and GPT-Realtime-Whisper for streaming speech-to-text. The Realtime API itself has been generally available since August 28, 2025, when it graduated from beta alongside the original gpt-realtime model.

The release lands during a sprint of voice-agent launches across the industry, but OpenAI is leaning on raw capability rather than packaging. GPT-Realtime-2 ships with a 128K context window, up from 32K in the previous generation, and adds parallel tool calls, short spoken preambles like "let me check that," and recovery behavior when a task fails midway through a call.

What the three models do

GPT-Realtime-2 is positioned as the agentic voice model. According to OpenAI, it is designed to manage context across longer conversations, use tools, accept corrections, and avoid the rigid turn-taking that plagued earlier voice systems. The company reported a roughly 15-point improvement on internal audio benchmarks compared with the prior realtime model.

GPT-Realtime-Translate handles live cross-lingual speech, accepting input in more than 70 languages and producing output in 13. OpenAI is pitching it for customer support, cross-border sales calls, education, and event interpretation. GPT-Realtime-Whisper, meanwhile, brings streaming transcription to the API for use cases like meeting capture, accessibility tooling, and call-center pipelines.

Pricing and availability

Pricing reflects the heavier reasoning load on the flagship model. GPT-Realtime-2 is billed at $32 per million audio input tokens, $0.40 per million cached input tokens, and $64 per million audio output tokens. The two specialist models are metered by the minute: $0.034 per minute for Translate and $0.017 per minute for Whisper. All three are available immediately through the Realtime API, which has been generally available since August 2025.

OpenAI told developers the company embedded safeguards intended to halt conversations automatically when its content guidelines are violated, a response to ongoing concerns about voice cloning, fraud, and synthetic-call abuse.

Why it matters

Voice has lagged text in agentic AI, largely because realtime systems struggled to reason while staying in sync with a speaker. By compressing GPT-5-class reasoning, a 4x larger context window, and parallel tool use into a single conversational model, OpenAI is signalling that the next wave of voice agents will be able to handle support tickets, translate cross-border calls, and run live multi-step workflows without falling back to a script.

For enterprises sitting on top of legacy IVR stacks, the metered translation and transcription models are the cleaner upgrade path. For developers building on a Realtime API that has been GA for months, the new models remove the last capability excuses to keep voice features in private beta — and tighten the competitive vise on rivals shipping voice products this quarter.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

Zyphra Releases ZAYA1-8B, the First Frontier-Class Reasoning MoE Trained Entirely on AMD
Models

Zyphra Releases ZAYA1-8B, the First Frontier-Class Reasoning MoE Trained Entirely on AMD

Zyphra's ZAYA1-8B is an Apache-licensed reasoning MoE with under 1B active parameters that matches Claude 4.5 Sonnet and Gemini 2.5 Pro on math benchmarks — and it was pretrained, midtrained, and fine-tuned on AMD Instinct MI300X GPUs.

15 hours ago2 min read
OpenAI Makes GPT-5.5 Instant the Default ChatGPT Model With 52.5% Fewer Hallucinations
Models

OpenAI Makes GPT-5.5 Instant the Default ChatGPT Model With 52.5% Fewer Hallucinations

OpenAI replaced GPT-5.3 Instant with GPT-5.5 Instant as the default ChatGPT model, citing a 52.5% reduction in hallucinated claims on high-stakes prompts and major gains on math and multimodal benchmarks.

2 days ago2 min read
xAI Launches Grok 4.3 With 1M Context, Steep Price Cuts, and an Imagine Agent Mode
Models

xAI Launches Grok 4.3 With 1M Context, Steep Price Cuts, and an Imagine Agent Mode

xAI's new Grok 4.3 model arrives at $1.25 per million input tokens with a 1M-token context window, default reasoning, and a beta Imagine Agent Mode for long-form creative projects.

6 days ago3 min read