Back to stories
Models

Mistral's New Open-Source TTS Model Beats ElevenLabs — and Fits on a Smartwatch

Michael Ouroumis3 min read
Mistral's New Open-Source TTS Model Beats ElevenLabs — and Fits on a Smartwatch

Mistral has released Voxtral, an open-source text-to-speech model that beats ElevenLabs in native-speaker blind tests and is compact enough to run on a smartwatch. The release landed on the same day Cohere launched Transcribe, an open-source speech-to-text model that hit the top of HuggingFace's leaderboard.

In a single day, the open-source community produced credible challengers to the leading proprietary models on both ends of the voice AI stack.

Voxtral's Performance

The headline number: in blind tests with native speakers, Voxtral was preferred over ElevenLabs 63% of the time on standard voices and approximately 70% of the time on custom voices. Those are significant margins. ElevenLabs has been the benchmark for commercial-quality AI voice generation, and Mistral's model is beating it.

The size achievement is equally notable. Running a high-quality TTS model on a smartwatch would have seemed implausible a year ago — voice generation is typically compute-intensive. Mistral's compression and efficiency work has pushed Voxtral into genuinely edge-deployable territory.

That combination — better than the leading commercial product, runs locally on constrained hardware — describes exactly the kind of open-source capability jump that disrupts market dynamics. Companies and developers building voice applications can now deploy a model that sounds better than the dominant commercial alternative, for free, with no API costs and no data leaving the device.

Cohere Transcribe: The Other Direction

Cohere's Transcribe took the top spot on HuggingFace's speech-to-text leaderboard on release day. While Mistral addressed voice generation (text-to-speech), Cohere addressed voice recognition (speech-to-text) — together, the two releases cover the full voice interface stack.

HuggingFace leaderboard position on launch day doesn't always reflect sustained performance as the community does more thorough testing, but first-day #1 rankings for both a Mistral and Cohere model on the same day is a meaningful signal about where open-source voice capabilities have arrived.

The Voice Layer Heats Up

These releases are part of a broader pattern accelerating this week. Sanas crossed $60 million in annual recurring revenue with its real-time translation product across 13 languages. Google launched Gemini 3.1 Flash Live, its highest-quality voice model, powering a global rollout of Search Live. Apple is opening Siri to rival AI assistants via a new Extensions framework in iOS 27.

Voice is no longer a secondary feature of AI platforms. It's becoming the primary interface for a significant portion of AI interactions — in cars, on wearables, through smart speakers, and increasingly through the phone's native assistant layer.

The open-source advancement matters because voice AI has historically been more proprietary than text generation. The large model labs have dominated voice with products like ElevenLabs, Eleven's Speech-to-Speech, and OpenAI's voice modes. Voxtral and Transcribe represent the moment when open-source voice caught up — or, in Voxtral's case, appears to have surpassed — the best proprietary offerings.

What This Means for Developers

For anyone building a voice-enabled application, today's releases are a straightforward upgrade path. Voxtral delivers ElevenLabs-beating quality without per-character API costs. Transcribe provides top-of-leaderboard speech recognition without cloud dependency.

The edge deployment story — Voxtral fitting on a smartwatch — opens markets that were previously inaccessible. Offline voice applications, privacy-first voice interfaces, embedded hardware with no cloud connectivity: all of these become significantly more viable with a TTS model that matches commercial quality while running locally.

The year of voice AI started months ago. Today it got a lot more open.

How AI Actually Works — Free Book on FreeLibrary

A free book that explains the AI concepts behind the headlines — no jargon, just clarity.

More in Models

Anthropic's Next Model Is Called 'Mythos' — Leaked via Security Lapse in Its Own CMS
Models

Anthropic's Next Model Is Called 'Mythos' — Leaked via Security Lapse in Its Own CMS

A misconfigured content management system exposed nearly 3,000 unpublished Anthropic assets, including a draft blog post revealing 'Claude Mythos' — a new model tier Anthropic describes as a 'step change' that surpasses all previous Claude models.

16 hours ago3 min read
Anthropic Ships Claude Computer Use: Claude Can Now Control Your Mac While You Do Something Else
Models

Anthropic Ships Claude Computer Use: Claude Can Now Control Your Mac While You Do Something Else

Anthropic has launched Claude Computer Use as a research preview — Claude can take over your keyboard, mouse, browser, and apps via the Dispatch iOS app, completing tasks autonomously on your desktop.

16 hours ago3 min read
Jensen Huang Says 'I Think We've Achieved AGI' — But His Own Words Complicate the Claim
Models

Jensen Huang Says 'I Think We've Achieved AGI' — But His Own Words Complicate the Claim

On the Lex Fridman podcast, Nvidia's CEO declared that artificial general intelligence has already arrived. Then, almost immediately, he started walking it back.

16 hours ago3 min read