Thinking Machines Lab, the startup founded by former OpenAI chief technology officer Mira Murati, has released a research preview of what it calls "interaction models" — a new class of multimodal systems built to converse in real time without the awkward pauses that define today's voice assistants. It is the company's first move beyond developer tooling into deploying its own frontier models.
What an "interaction model" actually does
The core idea is "full-duplex" communication: the model listens, sees and speaks at the same time, rather than waiting for a user to finish a turn before it starts processing a reply. According to the company, the lead model — TML-Interaction-Small, described as a 276-billion-parameter mixture-of-experts system — handles dialogue and timing, while a separate background component does slower, asynchronous reasoning in parallel so the conversational front end stays responsive.
The technical trick Thinking Machines highlights is "encoder-free early fusion." Most multimodal systems route audio and video through heavy dedicated encoders before the language model sees them. The interaction models instead feed raw audio and visual signals directly into the transformer through lightweight embedding layers, which the company says is what makes near-instant turn-taking possible.
The latency numbers
On FD-bench, a benchmark focused on interaction quality and conversational timing, TML-Interaction-Small reportedly posts turn-taking latency of about 0.4 seconds. The company's comparison figures put Google's Gemini-3.1-flash-live at about 0.57 seconds and OpenAI's GPT-realtime-2.0 at roughly 1.18 seconds on the same measure. The pitch, in the company's framing, is that users shouldn't have to "contort themselves" around an interface — the model should handle natural backchanneling like "mm-hmm," interruptions and overlapping speech.
Why it matters
Real-time voice and video have become a contested front for the big labs — OpenAI's realtime API, Google's live models, xAI's Grok voice mode — but latency and naturalness remain the bottleneck for anything that feels like a genuine conversation rather than a walkie-talkie exchange. By going after that specific gap with a purpose-built architecture instead of bolting voice onto a text model, Thinking Machines is signaling where it thinks it can differentiate.
It is also the clearest product statement yet from a company that raised one of the largest seed rounds in history and absorbed a wave of senior researchers from OpenAI. Tinker showed the team could ship infrastructure; the interaction models are the first sign of the frontier ambitions Murati described when she launched the lab.
What to watch
Access is currently limited to select partners, with a broader release slated for later in 2026. Open questions include how the small model scales to harder reasoning, how the background reasoning component behaves when conversations get complex, and whether "encoder-free early fusion" holds up against the encoder-heavy approaches the incumbents have invested in. For now, it is a research preview — but a pointed one.



