Meta on Wednesday unveiled Muse Spark, the first frontier AI model from its reorganized Superintelligence Labs, marking a dramatic strategic shift away from the open-weight approach that defined its Llama era. The model — code-named Avocado and built over nine months by a team led by chief AI officer Alexandr Wang — introduces a technique called "thought compression" that lets it rival top competitors while consuming significantly less compute.
Benchmark Performance
Muse Spark scores 52 on the Artificial Analysis Intelligence Index v4.0, a composite benchmark spanning reasoning, knowledge, mathematics, and coding. That places it behind GPT-5.4 and Gemini 3.1 Pro (both at 57) and Claude Opus 4.6 (53) in overall rankings, but the model excels in specific domains.
On CharXiv Reasoning for visual figure understanding, Muse Spark achieved 86.4, significantly outperforming Claude Opus 4.6 at 65.3 and GPT-5.4 at 82.8. On HealthBench Hard, it topped all rivals with a score of 42.8 percent. Its GPQA Diamond score of 89.5 for PhD-level reasoning surpassed Grok 4.2 but trailed Opus 4.6 and Gemini 3.1 Pro.
Thought Compression: Doing More With Less
The standout technical innovation is thought compression. During reinforcement learning, the model is penalized for excessive "thinking time," forcing it to solve complex problems with fewer reasoning tokens without sacrificing accuracy. The results are striking: Muse Spark used just 58 million output tokens to complete the Intelligence Index evaluation, compared to 157 million for Claude Opus 4.6 and 120 million for GPT-5.4.
According to Meta, Muse Spark achieves its reasoning capabilities using over an order of magnitude less compute than Llama 4 Maverick, a claim that could reshape how the industry thinks about scaling efficiency.
A Proprietary Pivot
Unlike Meta's previous Llama models, which anyone could download and modify under open-weight licenses, Muse Spark is proprietary. The model is rolling out immediately in the Meta AI app and Meta.ai website, with plans to expand across Facebook, Instagram, WhatsApp, Messenger, and Ray-Ban Meta AI glasses in the coming weeks. A limited API "private preview" will be offered to select partners.
The model accepts voice, text, and image inputs but produces text-only output. It features a fast mode for casual queries, multiple reasoning modes, and a new "shopping mode" that leverages Meta's creator ecosystem for commerce recommendations.
What It Means for the Industry
Muse Spark's thought compression technique could prove more influential than the model's raw benchmark scores. If confirmed at scale, the ability to achieve frontier-level reasoning at a fraction of the compute cost would pressure competitors to rethink their own scaling strategies.
The proprietary shift also raises questions about Meta's open-source commitments. With Muse Spark locked behind Meta's walled garden, the company that once positioned itself as AI's open-weight champion is now competing on the same closed terms as OpenAI and Google.



