Google unveils Whisk, its all-new image generation tool, which enables users to generate originally new images using image prompts or reference images of subject, scene, and style. It is an all-new Google Labs experiment that enables a fast and fun creative process embedded in a generative AI initiative, transforming the capabilities of AI tools through ushering in an all-new idea.
Objective
Google says in a recent blog post, “We built it for rapid visual exploration, not pixel-perfect edits. It’s about exploring ideas in new and creative ways, allowing you to work through dozens of options and download the ones you love.”
Input
In input, the tool accepts image prompts for 3 things namely, subject, scene, and style to generate new images. This means the tool tries to make sense of the subject, the environment, and the approach needed to generate a particular image through the 3 input images it takes.
Meet Whisk! ???? Our new experiment that lets you use images as prompts to visualize your ideas and tell your story. Try it now: https://t.co/BR1z7gmDs6 pic.twitter.com/2zrPLQZlga
— labs.google (@labsdotgoogle) December 16, 2024
How does it work?
The model takes in the three image prompts coupled with some texts from the user defining the eventual output, converts it into a text prompt, and furthers it to Google’s latest Image generation model “Imagen 3”. This enables the AI model an enhanced understanding of the subject and not any replica, enabling more control and autonomy over the subject to generate images based on the final amalgamation of all the prompts.
Challenges
It might generate subjects with different heights, weights, hairstyles, or tones, and other abnormalities too in the scene, environment, or style, not complying with the prompt, which is particularly because of some obvious limitations of gen-AI capabilities. Although Google has tried to solve this through text prompts available to the users which can be applied whenever deemed necessary by the users.
Read about OpenAI's video generation model here: Can Sora create realistic human faces? Let's understand OpenAI’s latest video generation model
Availability
The model is currently available only for people residing in the US. No plans regarding a further release for the rest of the people have been posted by Google till the time of publication of this piece.
Read bout XAI's image generation model here: XAI unveils image generation model Aurora, capable of generating realistic human portraits in seconds