With Bytedance (TikTok), following Kuaishou’s Kling and Minimax Hailuo, yet another Chinese corporation has introduced a state-of-the-art video AI. The YouTuber and video AI specialist Theoretically Media had the opportunity to take a closer look at demos of the video AI, named Seaweed, and discovered that it offers some unique new cinematic possibilities. He therefore describes it as a gamechanger.
Seaweed can generate clips up to 2 minutes (!) long in 1080p resolution using its image-to-video and text-to-video model. These clips consist of up to four different shots within a scene, where objects remain consistent even when viewed from various angles. This is well demonstrated in the following example, where a scene is depicted using shot/reverse shot (reaction shot).
Another very cinematic technique that Seaweed also masters is the so-called rack focus, i.e., a shift in focus within a shot from one object to another, for example from the foreground to the background. This is done by moving the focus of the camera from one point to another while maintaining the same camera shot—a subtle method of conveying visual information without changing the camera angle or frame to shift the viewer&s attention from a foreground object to one in the background.
In fact, there are two new video models: PixelDance and Seaweed, which ByteDance introduced under its Doubao brand—initially only in China. Both are still in a testing phase with limited access. The exact difference between the two models is not yet entirely clear.
Both models are state-of-the-art in terms of consistency and the ability to interpret complex prompts and realistically depict human movements and interactions. No wonder, as ByteDance, being the owner of TikTok, has access to extensive training data.