Netflix has released its first public AI model — and it solves one of the hardest remaining problems in video editing: removing objects while preserving physically coherent scene behavior.
More Than Pixel Erasing
VOID (Video Object and Interaction Deletion) does something fundamentally different from existing inpainting tools. When you remove an object from a video frame, existing systems fill in the gap with plausible-looking pixels. VOID goes further: it simulates how the remaining objects in the scene would physically behave without the removed item's influence.
Remove a ball from a scene where it's pushing a box? VOID doesn't just erase the ball — it repaints the box as stationary, because without the ball, nothing is pushing it. Remove a hand holding a cup? The cup falls.
The system uses what Netflix calls "interaction-aware quadmask conditioning" — a technique that identifies not just the object to be removed but the causal chain of physical interactions it participates in, then regenerates the affected portions of the video accordingly.
Beating Runway by a Wide Margin
In controlled human evaluation tests, participants preferred VOID's outputs 64.8% of the time compared to 18.4% for Runway — the current commercial benchmark for AI video editing. The remaining evaluators rated the results as equivalent.
The gap was especially pronounced in scenes involving complex physical interactions: objects in contact, items casting shadows, or elements that influence fluid dynamics. These are precisely the cases where naive pixel-filling produces uncanny results.
Open Source Under Apache 2.0
VOID is built on Alibaba's CogVideoX-Fun-V1.5-5b-InP foundation model and fine-tuned with Netflix's proprietary interaction-aware training pipeline. The model weights are now available on Hugging Face, with code, paper, and interactive demos on GitHub — all under the Apache 2.0 license.
This is Netflix's first public release on Hugging Face, marking the streaming giant's entrance into the open-source AI model ecosystem. The research team includes contributors from both Netflix and Sofia University.
Implications for Post-Production
For film and television production, VOID addresses a workflow that currently requires expensive manual rotoscoping and VFX compositing. Removing boom mics, safety wires, crew reflections, or unwanted background elements from footage is a routine but time-consuming part of post-production.
A tool that handles physics-aware removal automatically could compress days of VFX work into minutes — and Netflix, which produces more original content than any other studio, has an obvious incentive to make that workflow faster and cheaper.
The model is available now on Hugging Face at netflix/void-model.



