Recently, a study from the University of California, Berkeley, has shown that automatic prompt modifications by large language models (LLM) significantly degrade the quality of images generated by DALL-E3. The study conducted an online experiment involving 1,891 participants to explore the impact of such automatic rewriting on image quality.
In the experiment, participants were randomly assigned to three groups: DALL-E2, DALL-E3, and DALL-E3 with automatic prompt revisions. Participants were asked to write ten consecutive prompts to accurately reproduce a target image as closely as possible. The results showed that DALL-E3 indeed outperformed DALL-E2 in image generation, with a significant improvement in matching the target image. However, when using automatically revised prompts, the performance of DALL-E3 dropped by nearly 58%. Although users of DALL-E3 with prompt rewriting still outperformed those using DALL-E2, this advantage was significantly reduced.
Researchers found that the performance gap between DALL-E3 and DALL-E2 mainly stems from two factors: the enhanced technical capabilities of DALL-E3 and the adaptability of users in their prompting strategies. Notably, DALL-E3 users employed longer, more semantically similar, and more descriptive prompts. Participants were unaware of which model they were using, but their performance demonstrated this adaptability.
Researchers believe that as models continue to advance, users will also adjust their prompts to better leverage the capabilities of the latest models. This indicates that while new models do not render prompts obsolete, prompts remain a crucial means for users to tap into the potential of new models.
This study reminds us that automated tools do not always enhance user performance but may instead limit their ability to fully exploit the model's potential. Therefore, when using AI tools, users should consider how to most effectively adjust their prompts to achieve more理想的image generation results.
Key Points:
🖼️ Automatic prompt revisions lead to a 58% reduction in DALL-E3 image quality, limiting user performance.
🤖 The experiment found that although DALL-E3 outperforms DALL-E2, the effect is weakened with automatic prompt modifications.
🔍 Users need to adjust their prompting strategies according to the advancements in models to fully harness the potential of new models.