Meta's AI can generate videos based on voice commands

A team of machine learning engineers from Facebook parent company Meta has unveiled a new system called Make-A-Video. As the name suggests, this AI model allows users to type in a rough description of a scene, and it will generate a short video that matches their text. The videos are clearly artificial, with blurred subjects and distorted animations, but still represent a major development in the field of AI content generation.

The output of this model is obviously artificial, but still impressive

“Generative AI research is advancing creative expression, giving people the tools to create new content quickly and easily,” Mehta said in a blog post announcing the work. “With just a few words or lines, Make-A-Video brings imagination to life, creating one-of-a-kind videos full of vivid colors and landscapes.”

In a Facebook post, Meta CEO Mark Zuckerberg described the work as “amazing progress,” adding. “Generating a video is much harder than generating a photo, because in addition to correctly generating each pixel, the system must also predict how they will change over time.

The clips are no longer than 5 seconds, contain no audio, but span a huge range of cues. The best way to judge the performance of this model is to watch its output. Each of the videos below was generated by Make-A-Video, and the tips for generating the videos are indicated. It’s worth noting, however, that each video was provided to The Verge by the company Meta, which currently doesn’t allow anyone access to the model. This means that the fragments may be singled out to show the system at its best.