What is Thinking Machines Lab?

It is the AI startup founded in 2025 by Mira Murati, OpenAI's former chief technology officer. Its first product was Tinker, a model fine-tuning tool released in late 2025; the interaction models are its first move into deploying its own frontier models.

What does 'full-duplex' or 'interaction model' mean?

Instead of waiting for you to finish speaking before it processes a reply, the model takes in audio and video and generates a response simultaneously — so it can listen, see and talk at the same time, the way people overlap in conversation.

Can the public use it yet?

Not generally. Thinking Machines Lab is offering a research preview to select partners now and says a broader release is planned for later in 2026.

Thinking Machines Lab Debuts 'Interaction Models' — Mira Murati's First Step Into Frontier AI

Thinking Machines Lab, the startup founded by former OpenAI chief technology officer Mira Murati, has released a research preview of what it calls "interaction models" — a new class of multimodal systems built to converse in real time without the awkward pauses that define today's voice assistants. It is the company's first move beyond developer tooling into deploying its own frontier models.

What an "interaction model" actually does

The core idea is "full-duplex" communication: the model listens, sees and speaks at the same time, rather than waiting for a user to finish a turn before it starts processing a reply. According to the company, the lead model — TML-Interaction-Small, described as a 276-billion-parameter mixture-of-experts system — handles dialogue and timing, while a separate background component does slower, asynchronous reasoning in parallel so the conversational front end stays responsive.

The technical trick Thinking Machines highlights is "encoder-free early fusion." Most multimodal systems route audio and video through heavy dedicated encoders before the language model sees them. The interaction models instead feed raw audio and visual signals directly into the transformer through lightweight embedding layers, which the company says is what makes near-instant turn-taking possible.

The latency numbers

On FD-bench, a benchmark focused on interaction quality and conversational timing, TML-Interaction-Small reportedly posts turn-taking latency of about 0.4 seconds. The company's comparison figures put Google's Gemini-3.1-flash-live at about 0.57 seconds and OpenAI's GPT-realtime-2.0 at roughly 1.18 seconds on the same measure. The pitch, in the company's framing, is that users shouldn't have to "contort themselves" around an interface — the model should handle natural backchanneling like "mm-hmm," interruptions and overlapping speech.

Why it matters

Real-time voice and video have become a contested front for the big labs — OpenAI's realtime API, Google's live models, xAI's Grok voice mode — but latency and naturalness remain the bottleneck for anything that feels like a genuine conversation rather than a walkie-talkie exchange. By going after that specific gap with a purpose-built architecture instead of bolting voice onto a text model, Thinking Machines is signaling where it thinks it can differentiate.

It is also the clearest product statement yet from a company that raised one of the largest seed rounds in history and absorbed a wave of senior researchers from OpenAI. Tinker showed the team could ship infrastructure; the interaction models are the first sign of the frontier ambitions Murati described when she launched the lab.

What to watch

Access is currently limited to select partners, with a broader release slated for later in 2026. Open questions include how the small model scales to harder reasoning, how the background reasoning component behaves when conversations get complex, and whether "encoder-free early fusion" holds up against the encoder-heavy approaches the incumbents have invested in. For now, it is a research preview — but a pointed one.

Thinking Machines Lab Debuts 'Interaction Models' — Mira Murati's First Step Into Frontier AI

What an "interaction model" actually does

The latency numbers

Why it matters

What to watch

More in Models

OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning

Zyphra Releases ZAYA1-8B, the First Frontier-Class Reasoning MoE Trained Entirely on AMD

OpenAI Makes GPT-5.5 Instant the Default ChatGPT Model With 52.5% Fewer Hallucinations