.footer { } Logo Logo
deutsch
/// News

MultiFoley - SFX generation via AI with multimodal control

[17:21 Fri,29.November 2024   by blip]    

To ensure that AI-generated videos do not remain silent, there are already several approaches to artificial (post-)dubbing - as reported, the Google Deepmind team is working on a video-to-audio system as a supplement for its video AI Veo. AI-generated sound effects can be found, for example, at at Elevenlabs.

multiFoley_textprompt


Now another model for video-controlled sound generation has been introduced, which promises some potent capabilities. MultiFoley supports a multimodal approach and is designed to accept text, audio and video as input. The desired Foley sound for a clip can therefore be generated “from nothing” using a text prompt, or an audio sample can be defined as a reference, for example from a sound effect library, whose sound characteristics (e.g. rhythm and timbre) are to be adopted. If a video with partially existing sound is specified, MultiFoley spins the soundtrack accordingly.





Natural sounds can be generated (e.g. skateboard wheels rolling on a surface) as well as more bizarre audio sequences (e.g. the roar of a lion that sounds like the meowing of a cat), in each case synchronized with the image event. Negative prompting also makes it possible to exclude unwanted audio elements.

multiFoley


MultiFoley is based on diffusion models and currently uses two different data sets for training, VGG sound with 168K samples for video-text-sound generation and sound-ideas with 400K samples for text-sound generation. The approach combines speech with video cues and decouples the semantic and temporal elements of videos. This enables creative Foley applications, such as modifying a birdsong video to sound like a human voice, or converting a typewriter sound into piano notes - all while keeping it synchronized with the video.

multiFoley_textcontrol


According to the developers, a key innovation of the model is that it can be trained on both internet video data with low-quality sound and professional SFX recordings to enable high-quality sound generation with full bandwidth (48 kHz). MultiFoley is designed to outperform other existing methods with successfully synchronized and high-quality sounds. However, the aim does not appear to be to generate music or dialog (as with Google's video-to-audio system) - the name says it all.

MultiFoley is a joint project between researchers at the University of Michigan and Adobe. We should therefore not be surprised if a similar functionality appears in the Firefly video generator sooner or later; the model is currently not publicly accessible.



deutsche Version dieser Seite: MultiFoley - Video-Vertonung per KI mit multimodaler Kontrolle

  



[nach oben]












Archiv Newsmeldungen

2024

December - November - October - September - August - July - June - May - April - March - February - January

2023
December - November - October - September - August - July - June - May - April - March - February - January

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000






































deutsche Version dieser Seite: MultiFoley - Video-Vertonung per KI mit multimodaler Kontrolle



last update : 26.Dezember 2024 - 18:02 - slashCAM is a project by channelunit GmbH- mail : slashcam@--antispam:7465--slashcam.de - deutsche Version