NVIDIA announced Nemotron 3 Nano Omni, a 30-billion-parameter open-weight multimodal model that unifies vision, audio, video, and text understanding in a single architecture, claiming up to 9x higher throughput than competing open omni models on video and document workloads.
The release dropped to Hugging Face, OpenRouter, and NVIDIA's own build.nvidia.com on April 28, landing in a market where multimodal capability has typically required stitching together separate perception models — a vision encoder, a speech encoder, and a language model — joined through pipeline glue code that adds latency, cost, and brittleness.
A 30B Model That Runs Like a 3B Model
The headline architectural choice is a hybrid mixture-of-experts design NVIDIA describes as "30B-A3B": 30 billion total parameters, but only 3 billion activated per token. Combined with built-in vision and audio encoders, the model reportedly fits in roughly 25 gigabytes of RAM in its 4-bit quantized form — small enough to land on workstations like NVIDIA's DGX Spark and putting single-GPU inference within reach for video and document workloads where competing open omni models typically demand multi-GPU clusters or fall back to CPU-bound pipelines.
NVIDIA reports that Nemotron 3 Nano Omni tops six leaderboards across complex document intelligence, video understanding, and audio understanding tasks. The company is positioning the model squarely at agentic workloads — applications where an AI agent must "see and hear" inputs in real time, parse them, and act without round-tripping through external services.
Built for Edge Agents
This release is the latest move in NVIDIA's strategy of pushing capable open models down to the edge to seed demand for its inference silicon. Nemotron 3 Nano Omni ships as an NVIDIA NIM microservice and through the company's cloud-partner network, making it deployable both on local boxes and across hyperscaler clouds with consistent tooling.
Early adopters named in NVIDIA's launch include Aible, Applied Scientific Intelligence, Eka Care, Foxconn, H Company, Palantir, and Pyler. A second tier — Dell Technologies, Docusign, Infosys, K-Dense, Lila, Oracle, and Zefr — is currently evaluating the model. The list spans healthcare, defense-adjacent analytics, manufacturing, and enterprise software, suggesting NVIDIA wants Nemotron entrenched in regulated and on-prem-heavy verticals where open weights are a hard requirement.
Pressure on the Open-Omni Race
The release ratchets up pressure on open-weight rivals. Mistral, Alibaba's Qwen team, and Meta's Llama family have all released multimodal variants over the past year, but few combine sparse-activation MoE with native audio plus video plus text in a single stack at this scale. If NVIDIA's 9x throughput claims hold up under independent benchmarking, Nemotron 3 Nano Omni could quickly become the default starting point for teams building agentic systems that need to perceive the world without a stitched pipeline.
For developers, the practical takeaway is that a single 25GB model now plausibly covers what previously required three. That is a meaningful shift for anyone running on-prem inference or building edge-deployed agents — the segment NVIDIA is most determined to lock in around its own hardware.



