We're already used to the fact that AI is now able to extract individual instrument tracks from fully mastered stereo songs quite convincingly. So we shouldn't be surprised if it is now possible to split a video into individual layers. And Generative Omnimatte promises no less than this.
Generative Omnimatte: AI breaks down each video into separable layers
This novel AI model attempts to split a video clip into semantically meaningful layers that contain individual objects and the associated effects such as shadows and reflections. Existing methods previously required at least a static background or were based on a pose or depth estimation for the layers. In addition, previous methods failed due to a fundamental problem: dynamic hidden areas between objects or layers cannot be completed as they do not contain any knowledge about the hidden areas.
This is set to change with Generative Omnimatte, which can also complete missing or unknown content in moving images using generative AI. This is referred to as “supplementation through generative prior knowledge”.
Generative Omnimatte requires no information on camera position or depth and creates complete layers - including mostly convincing completions of hidden dynamic areas.
Mind you, this does not work by splitting up existing composites but with any video recording. The published paper and the special project website with demos show a wide range of arbitrarily recorded videos with soft shadows, shiny reflections, splash water and more.
Researchers from Google DeepMind, the University of Maryland College Park and the Weizmann Institute of Science are behind Generative Omnimatte. Although such publications “only” represent the current state of research, it is foreseeable that we will soon find generative de-compositing technologies as usable tools in our editing and compositing programs.