OmniHuman-1 KI animiert Menschen - nur per Photo und Sprachaufnahme

[17:53 Thu,6.February 2025 by Thomas Richter]

Researchers from ByteDance, the company behind TikTok, have unveiled OmniHuman-1, an AI model that can generate realistic videos from a single photo of a person, automatically animating them lifelike. The animation includes speaking, gesturing, singing, playing instruments, and more. OmniHuman supports not only humans but also anime characters, artificial objects, animals, and complex poses, taking into account the typical characteristics of the style used in the animation.

OmniHuman-1 animates Einstein

OmniHuman uses a speech or singing recording as input, animating a given image (portraits, half-body, or full-body shots) with any aspect ratio automatically using an audio recording (alternatively, a video or a combination of audio and video) – matching the input with appropriate gestures, mouth movements, and facial expressions.

OmniHuman-1 model

Lighting, including reflections and shadows, as well as texture, are automatically adjusted. Interestingly, OmniHuman animates mouth movements for various languages, including culturally appropriate accompanying gestures.

2. pic.twitter.com/m1xnyoDtnf
— Min Choi (@minchoi) February 4, 2025

The official demo clips range from animations of TED talks and singing performances with a guitar (although the guitar playing is not accurately animated) to cartoon characters – the animated humans look very realistic, including their gestures – the

Uncanny Valley seems to be overcome here.

5. pic.twitter.com/WNvOAxKh1G
— Min Choi (@minchoi) February 4, 2025

Chinese ByteDance just announced OmniHuman.

This AI can make a single image talk, sing, and rap expressively with gestures from audio or video input.

10 wild examples:

1. pic.twitter.com/TrT8rQa1eI
— Min Choi (@minchoi) February 4, 2025

more infos at bei omnihuman-lab.github.io

deutsche Version dieser Seite: OmniHuman-1 KI animiert Menschen aus nur einem Foto passend zu Sprachaufnahme