Back to stories
Models

NVIDIA Launches Nemotron 3 Nano Omni: 30B Open Multimodal Model With 9x Throughput Edge

Michael Ouroumis2 min read
NVIDIA Launches Nemotron 3 Nano Omni: 30B Open Multimodal Model With 9x Throughput Edge

NVIDIA announced Nemotron 3 Nano Omni, a 30-billion-parameter open-weight multimodal model that unifies vision, audio, video, and text understanding in a single architecture, claiming up to 9x higher throughput than competing open omni models on video and document workloads.

The release dropped to Hugging Face, OpenRouter, and NVIDIA's own build.nvidia.com on April 28, landing in a market where multimodal capability has typically required stitching together separate perception models — a vision encoder, a speech encoder, and a language model — joined through pipeline glue code that adds latency, cost, and brittleness.

A 30B Model That Runs Like a 3B Model

The headline architectural choice is a hybrid mixture-of-experts design NVIDIA describes as "30B-A3B": 30 billion total parameters, but only 3 billion activated per token. Combined with built-in vision and audio encoders, the model reportedly fits in roughly 25 gigabytes of RAM in its 4-bit quantized form — small enough to land on workstations like NVIDIA's DGX Spark and putting single-GPU inference within reach for video and document workloads where competing open omni models typically demand multi-GPU clusters or fall back to CPU-bound pipelines.

NVIDIA reports that Nemotron 3 Nano Omni tops six leaderboards across complex document intelligence, video understanding, and audio understanding tasks. The company is positioning the model squarely at agentic workloads — applications where an AI agent must "see and hear" inputs in real time, parse them, and act without round-tripping through external services.

Built for Edge Agents

This release is the latest move in NVIDIA's strategy of pushing capable open models down to the edge to seed demand for its inference silicon. Nemotron 3 Nano Omni ships as an NVIDIA NIM microservice and through the company's cloud-partner network, making it deployable both on local boxes and across hyperscaler clouds with consistent tooling.

Early adopters named in NVIDIA's launch include Aible, Applied Scientific Intelligence, Eka Care, Foxconn, H Company, Palantir, and Pyler. A second tier — Dell Technologies, Docusign, Infosys, K-Dense, Lila, Oracle, and Zefr — is currently evaluating the model. The list spans healthcare, defense-adjacent analytics, manufacturing, and enterprise software, suggesting NVIDIA wants Nemotron entrenched in regulated and on-prem-heavy verticals where open weights are a hard requirement.

Pressure on the Open-Omni Race

The release ratchets up pressure on open-weight rivals. Mistral, Alibaba's Qwen team, and Meta's Llama family have all released multimodal variants over the past year, but few combine sparse-activation MoE with native audio plus video plus text in a single stack at this scale. If NVIDIA's 9x throughput claims hold up under independent benchmarking, Nemotron 3 Nano Omni could quickly become the default starting point for teams building agentic systems that need to perceive the world without a stitched pipeline.

For developers, the practical takeaway is that a single 25GB model now plausibly covers what previously required three. That is a meaningful shift for anyone running on-prem inference or building edge-deployed agents — the segment NVIDIA is most determined to lock in around its own hardware.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support
Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support

xAI's new voice model scored 67.3% on the τ-voice Bench — well ahead of Gemini 3.1 Flash Live and GPT Realtime — and is now powering Starlink's phone sales and support with a 70% autonomous resolution rate.

4 days ago2 min read
Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao
Models

Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao

Tencent has open-sourced Hy3 Preview, a 295B/21B-activated mixture-of-experts model built in under three months. The Yuanbao chatbot is switching its primary engine from DeepSeek to the new in-house model.

6 days ago2 min read
DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M
Models

DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M

DeepSeek on April 24 released preview versions of V4-Pro and V4-Flash, an open-weight MoE family with a 1M-token context window and pricing that undercuts Western frontier labs.

6 days ago2 min read