Meta has unveiled two new AI-powered tools, Emu Video and Emu Edit, set to compete against the growing slate of generative AI advancements.
Expanding on their previous work in image generation, these tools mark a significant step into video creation and precise image editing.
Today we’re sharing two new advances in our generative AI research: Emu Video & Emu Edit.— AI at Meta (@AIatMeta) November 16, 2023
Details ➡️ https://t.co/qm8aejgNtd
These new models deliver exciting results in high quality, diffusion-based text-to-video generation & controlled image editing w/ text instructions.
Emu Video: Turning Text into Moving Pictures
Emu Video, an extension of Meta's Emu model, brings a fresh approach to turning text into videos using diffusion models.
Its process involves two steps: generating images based on text and then creating videos by combining text and images.
This streamlined approach, using only two diffusion models, allows users to create impressive 512x512 four-second videos at 16 frames per second (fps).
Compared to other AI-powered video tools, Emu Video stands out for its efficiency, surpassing previous models with a simpler design.
According to Meta, the tool initially received high praise in human evaluations, with a whopping 96% preference over a leading competitor.
Additionally, Emu Video excels in animating user-provided images, showcasing the tech giant's commitment to advancing user-friendly technology.
Emu Edit: Precision in Image Editing
Complementing Emu Video, Emu Edit revolutionizes image editing with sharp precision.
The tool allows free-form editing through instructions, covering various tasks like local and global editing, background changes and color transformations.
Unlike other models, Emu Edit excels at precisely following instructions, ensuring only relevant pixels are modified.
"The primary objective shouldn't just be about producing a 'believable' image. Instead, the model should focus on precisely altering only the pixels relevant to the edit request," the researchers explained.
The innovation lies in incorporating computer vision tasks as instructions, offering unmatched control over image editing.
The model's superior performance can be attributed to its rich training dataset. With 10 million synthesized samples, the tool sets new standards in following prompts and producing superior image quality.
Meta's Future of Self-Expression
While Meta emphasizes that these tools are still in fundamental research, the two hold a lot of potential against their competitors.
From creating personalized animated stickers to effortlessly editing photos without technical skills, Meta aims to empower users for novel self-expression with Emu Video and Emu Edit.
Though they won't replace professional artists, the Facebook-maker envisions these tools as the driver of new forms of self-expression.