In the field of artificial intelligence, transformative changes can occur on a daily basis. Just the day after Midjourney's major update, the open-source image generation sector welcomed a remarkable dark horse—FLUX.1. This unexpected newcomer not only claims to significantly outperform closed-source models like DALL·E3 and Midjourney V6 but also outclasses the entire open-source SD3 series, instantly igniting the AI community.

Let's first get to know the mastermind behind FLUX.1. Its founder, Robin Rombach, is no small figure but a leading expert in diffusion models. His notable works include VQGAN, Taming Transformers, and Latent Diffusion. He has served as the Chief Scientist at Stability AI, leading the globally renowned Stable Diffusion series projects. Robin Rombach can be considered an "old hand" among the "old hands" in the AI image generation field.

image.png

In March of this year, due to internal turmoil at Stability AI, Robin chose to leave. After four months of incubation, he returned with the new open-source large model platform FLUX.1. Even more astonishingly, FLUX.1 received $32 million in seed funding led by the prestigious venture capital firm Andreessen Horowitz upon its debut. This undoubtedly injected a strong dose of confidence into FLUX.1's future development.

So, what makes FLUX.1 stand out? Firstly, it is based on the Vision Transformer architecture, employs a flow matching training method, and uses rotational position embeddings and parallel attention layers to enhance model performance and hardware efficiency. This model with 12 billion parameters is released in three versions:

Pro Edition: Accessible via API, offering the most robust performance.

Dev Edition: A non-commercial guided distillation model, inheriting most of the Pro Edition's capabilities.

Schnell Edition: A commercially viable open-source model with impressive performance.

According to FLUX.1 team's test data, even the open-source Schnell version surpasses mainstream models like Midjourney v6.0, DALL·E3 (HD), and SD3-Ultra in text semantic restoration, image quality, action consistency, coherence, and diversity. Especially in embedding text into images, FLUX.1 demonstrates a clear advantage.

QQ截图20240802091854.jpg

FLUX.1's ambitions clearly do not stop there. The team indicates that text-to-image generation is just the beginning, and they plan to introduce text-to-video models in the future, challenging top products like Sora, Gen-3, and Luma.

For developers and AI enthusiasts, the emergence of FLUX.1 is undoubtedly a significant boon. The Schnell version is fully open-source and has received support from Comfyui. If you have more than 36GB of GPU memory, you can even run the t5 fp16 version. However, note that t5xxl_fp16.safetensors or clip_l.safetensors and VAE need to be downloaded separately.

FLUX.1's sudden arrival not only brings new hope to the open-source AI image generation field but also injects new vitality into the entire AI industry. Its powerful performance and open-source nature are likely to accelerate the popularization and innovation of AI image generation technology. For ordinary users, this means we may soon be able to run AI image generation models on home computers that rival or even surpass Midjourney.

Project Link: https://github.com/black-forest-labs/flux

Try It Out: https://replicate.com/black-forest-labs/flux-pro