Midjourney, renowned for its AI image generation technology, is quietly showcasing its broader ambitions in the field of artificial intelligence. This technology company, boasting a large user base, has recently partnered with machine learning experts at New York University (NYU) to release groundbreaking research on training large language models (LLMs) for creative writing, following announcements about its self-developed computing and AI hardware.

QQ_1742869272075.png

The research focuses on enhancing the creative writing capabilities of LLMs, aiming to enable AI models to generate more creative text, similar to open-source models like Meta's Llama and Mistral.

Beyond Images: Midjourney's Push into Creative Text Generation

For a company known for its diffusion model AI image generation technology, Midjourney's foray into text generation is a clear signal: their ambitions extend far beyond visual content. As the researchers state, the traditional notion of "a picture is worth a thousand words" may be rewritten, as the creative potential of text deserves deeper exploration. Midjourney is demonstrating the diversity of its AI explorations.

Breaking the Mold: Innovative Techniques Enhance AI Writing Diversity

Published on the AI code community Hugging Face, the research paper introduces two novel techniques: "Diversified Direct Preference Optimization" (DDPO) and "Diversified Odds Ratio Preference Optimization" (DORPO). These techniques aim to broaden the range of text generated by AI models, allowing for richer and more diverse content while maintaining coherence and readability.

Researchers point out that while current LLMs excel in factual question answering or code assistance, generating "optimal solutions," creative writing's open-ended nature necessitates multiple valid responses to a single prompt. For example, a prompt like "Write a story about a dog on the moon" could inspire stories about an astronaut's lost pet, dogs in a future canine space colony, or a stray dog befriending extraterrestrials – vastly different scenarios.

However, instruction-tuned LLMs often converge on similar storylines and themes. This is largely because post-training techniques prioritize user preference over originality, reinforcing popular but repetitive answers. Instruction tuning can also smooth out variations, leading models to generate "safe" but uninspired responses. Existing diversity-promoting techniques (like temperature adjustment) typically only operate during model inference, not integrated into the model's learning process. This results in homogenous, predictable AI-generated creative writing lacking surprise and depth.

Enabling AI Models to Think Outside the Box

To overcome these limitations, Midjourney's research team improved existing preference optimization methods, introducing DDPO and DORPO. The core innovation lies in leveraging "deviation"—the difference between one response and others—to guide model training.

Specifically, during training, the model receives a writing prompt and multiple possible answers. Each answer is compared to others under the same prompt, calculating a deviation score. Rare but high-quality responses are given higher weight during training, encouraging the model to learn from more diverse examples. By incorporating deviation into Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO), the model learns to generate higher-quality and more diverse responses. This ensures AI-generated stories aren't confined to a single predictable structure but explore a wider range of characters, settings, and themes, much like human writers.

To validate these new methods, researchers trained LLMs using a dataset from the Reddit community r/writingPrompts. They used Meta's Llama-3.1-8B (an 8-billion parameter model) and Mistral AI's Mistral-7B-v0.3 (a 7-billion parameter model) as base models.

The training process involved supervised fine-tuning (SFT) and preference optimization. In the preference optimization stage, they first used standard DPO and ORPO as baselines, then applied DDPO and DORPO to introduce deviation-based weighting. Finally, model performance was evaluated through automatic evaluation (measuring semantic and stylistic diversity) and human evaluation (judging diversity and appeal, comparing against GPT-4 and Claude 3.5).

Results showed DDPO significantly outperformed standard DPO while maintaining output quality. Llama-3.1-8B with DDPO achieved the best balance between quality and diversity, generating more diverse responses than GPT-4 while maintaining good coherence. Even with reduced dataset size, the DDPO model retained a degree of diversity.

Empowering Industries: Unlimited Potential for AI Creative Content

This research has significant practical implications for businesses using AI for creative text generation. Improving the diversity and quality of AI-generated content is crucial in areas like marketing copywriting, corporate storytelling, and scriptwriting for film and games. For AI teams deploying LLMs, enhancing output diversity without sacrificing quality is a key challenge. Midjourney's research offers a novel solution.

The study presents a new post-training method for LLMs that enhances creativity without compromising quality. It also provides a practical alternative to inference-time diversity adjustments (like adjusting temperature), integrating diversity directly into the model's learning process. This promises more engaging AI applications, such as AI-assisted writing tools and virtual assistants with dynamically adaptable responses.

For professionals responsible for AI model orchestration and automation, this research highlights the importance of tuning models during training, reducing the need for post-deployment adjustments. It also offers a method for introducing adaptive narratives into AI-driven applications, ensuring content variability while maintaining high quality. Furthermore, the method helps make LLM outputs more human-like, crucial for applications requiring interactive narratives, customer interactions, or dynamic content creation.

Conclusion

The success of DDPO and DORPO demonstrates that training LLMs with diversity as a goal can significantly advance creative writing. Future research directions include integrating deviation-based learning methods into enterprise AI models to enhance response diversity in customer-facing applications, exploring these methods in other generative tasks like poetry, scriptwriting, or game storytelling, and developing hybrid training methods that balance diversity and instruction following.

Midjourney's research team plans to publicly release its code, providing invaluable resources for developers wishing to apply these techniques. By adopting these innovations, AI teams can break free from rigid, formulaic output patterns, building AI systems that are not only intelligent but truly imaginative.

Paper: https://huggingface.co/papers/2503.17126