Back to stories
Models

Meta's Muse Spark Narrows Frontier Gap With Novel Thought Compression Technique

Michael Ouroumis2 min read
Meta's Muse Spark Narrows Frontier Gap With Novel Thought Compression Technique

Meta on Wednesday unveiled Muse Spark, the first frontier AI model from its reorganized Superintelligence Labs, marking a dramatic strategic shift away from the open-weight approach that defined its Llama era. The model — code-named Avocado and built over nine months by a team led by chief AI officer Alexandr Wang — introduces a technique called "thought compression" that lets it rival top competitors while consuming significantly less compute.

Benchmark Performance

Muse Spark scores 52 on the Artificial Analysis Intelligence Index v4.0, a composite benchmark spanning reasoning, knowledge, mathematics, and coding. That places it behind GPT-5.4 and Gemini 3.1 Pro (both at 57) and Claude Opus 4.6 (53) in overall rankings, but the model excels in specific domains.

On CharXiv Reasoning for visual figure understanding, Muse Spark achieved 86.4, significantly outperforming Claude Opus 4.6 at 65.3 and GPT-5.4 at 82.8. On HealthBench Hard, it topped all rivals with a score of 42.8 percent. Its GPQA Diamond score of 89.5 for PhD-level reasoning surpassed Grok 4.2 but trailed Opus 4.6 and Gemini 3.1 Pro.

Thought Compression: Doing More With Less

The standout technical innovation is thought compression. During reinforcement learning, the model is penalized for excessive "thinking time," forcing it to solve complex problems with fewer reasoning tokens without sacrificing accuracy. The results are striking: Muse Spark used just 58 million output tokens to complete the Intelligence Index evaluation, compared to 157 million for Claude Opus 4.6 and 120 million for GPT-5.4.

According to Meta, Muse Spark achieves its reasoning capabilities using over an order of magnitude less compute than Llama 4 Maverick, a claim that could reshape how the industry thinks about scaling efficiency.

A Proprietary Pivot

Unlike Meta's previous Llama models, which anyone could download and modify under open-weight licenses, Muse Spark is proprietary. The model is rolling out immediately in the Meta AI app and Meta.ai website, with plans to expand across Facebook, Instagram, WhatsApp, Messenger, and Ray-Ban Meta AI glasses in the coming weeks. A limited API "private preview" will be offered to select partners.

The model accepts voice, text, and image inputs but produces text-only output. It features a fast mode for casual queries, multiple reasoning modes, and a new "shopping mode" that leverages Meta's creator ecosystem for commerce recommendations.

What It Means for the Industry

Muse Spark's thought compression technique could prove more influential than the model's raw benchmark scores. If confirmed at scale, the ability to achieve frontier-level reasoning at a fraction of the compute cost would pressure competitors to rethink their own scaling strategies.

The proprietary shift also raises questions about Meta's open-source commitments. With Muse Spark locked behind Meta's walled garden, the company that once positioned itself as AI's open-weight champion is now competing on the same closed terms as OpenAI and Google.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

Meta Debuts Muse Spark, Its First Proprietary AI Model From Superintelligence Labs
Models

Meta Debuts Muse Spark, Its First Proprietary AI Model From Superintelligence Labs

Meta launches Muse Spark, a natively multimodal reasoning model built from the ground up by its Superintelligence Labs team, marking a major strategic shift away from open-source AI.

12 hours ago2 min read
Alibaba's Qwen 3.5-Omni Displays Emergent Ability to Write Code From Voice and Video
Models

Alibaba's Qwen 3.5-Omni Displays Emergent Ability to Write Code From Voice and Video

Alibaba's new Qwen 3.5-Omni model can process text, images, audio, and video natively, and has shown an unexpected emergent ability to generate working code from spoken instructions and video input.

4 days ago2 min read
Google Releases Gemma 4 — Most Capable Open Models Yet, Under Apache 2.0
Models

Google Releases Gemma 4 — Most Capable Open Models Yet, Under Apache 2.0

Google DeepMind launches Gemma 4, a family of four open-weight models built from Gemini 3 research. The models span edge devices to data centers, support 140+ languages, and ship under a fully permissive Apache 2.0 license.

5 days ago2 min read