In the field of AI image generation, traditional text prompts have become quite common. However, Google's Whisk introduces a brand new visually-driven approach. It allows users to use images as inputs to generate and remix creative ideas, providing creators with a more intuitive and imaginative image generation experience.

Hero_hmKlSP6.width-1600.format-webp.png

Introduction to Whisk

Whisk is an innovative generative AI tool launched by Google Labs. Utilizing the Gemini and Imagen 3 models, it generates new images based on user-uploaded images that represent subjects, scenes, and styles. This tool is primarily aimed at creative exploration, designed to help users quickly generate and iterate on various creative ideas rather than for precise image editing. Currently, Whisk is only available to users in the United States (with a US IP), and users can access the tool at labs.google/whisk to provide feedback.

Screenshot_2024-12-17_3.45.28_PM.jpg

Highlights of Whisk Features

  • Image-driven Generation: Users can upload images to define subjects, scenes, and styles instead of using text prompts, providing a more convenient way for those who are not skilled at writing prompts. For example, users can upload an image of a cat as the subject, a lotus leaf as the scene, and an image with shiny elements as the style to generate a unique image.
  • Automatically Generated Detailed Captions: The Gemini model automatically writes detailed captions for the images uploaded by users, which are then input into the Imagen 3 model to better capture key features of the images and generate new images that align with user intentions.
  • Creative Remixing: It allows users to remix different subject, scene, and style images to create unique designs, such as digital dolls, enamel pins, and various creative products.
  • Essence Capture Rather than Copying: It captures the essential characteristics of the input images rather than making precise copies, which allows for more creative variations in the generated images, but may also lead to results that are not completely in line with user expectations.
  • Editable Prompts: Users can view and edit the underlying prompt information to adjust and optimize the generated images according to their needs, such as modifying colors, patterns, and other features.

Screenshot_2024-12-17_3.50.10_PM.jpg

Applicable Scenarios

  1. Creative Design: Designers can quickly explore different design directions using Whisk by uploading various related images to generate creative inspiration, such as designing a unique appearance for a new product.
  2. Art Creation: Artists can utilize Whisk for preliminary concept development in art creation, merging and experimenting with different elements through image inputs, such as creating a fantasy-themed painting by uploading images of related fantasy creatures and scenes to spark creative ideas.
  3. Personalized Product Customization: In the personalized product customization industry, such as custom badges and stickers, Whisk can help users quickly generate various design options by simply uploading images that represent their preferences for subjects, scenes, and styles to obtain unique custom designs.
  4. Advertising and Marketing: Advertising planners can use Whisk to generate various creative advertising materials by uploading subject images related to products and scenes and styles that align with the brand image, quickly obtaining appealing advertising images for online and offline promotions.
  5. Education: In education, teachers can use Whisk to assist in teaching. For example, in art classes, students can upload images of things they are interested in to inspire creativity and cultivate their imagination.

Screenshot_2024-12-17_3.48.01_PM.jpg

Whisk Usage Tutorial

  1. Access the Tool: Users with a US IP can visit labs.google/whisk to enter the Whisk tool page.
  2. Upload Images: Upload images representing the subjects, scenes, and styles needed for the generated image. If suitable images are not available, users can click the dice icon to get some suggested images (which may also be AI-generated).
  3. Generate Images: After uploading images, Whisk will automatically generate new images and corresponding text prompts based on these images.
  4. Review and Adjust: Review the generated images, and if unsatisfied, users can adjust and optimize the images by editing the prompt information in the text box or clicking on the image to edit its related text prompts.
  5. Download or Save: If users are satisfied with the generated images, they can download and save them or add them to their favorites for future use.

Screenshot_2024-12-17_3.46.32_PM.jpg

Conclusion

Whisk, as an innovative AI image generation tool, brings users a new creative experience with its unique image input method and creative remixing capabilities. It has potential application value in various fields such as creative design, art creation, and personalized product customization. Although it is currently only available to US users and may sometimes produce results that are not perfect, the visually-driven direction of AI image generation that it represents is worth our attention.

If you are interested in creativity and AI image generation, feel free to like, comment, and keep following the development of Whisk, looking forward to more surprises and possibilities it may bring us in the future.