The Stable Diffusion 3 model has been released, utilizing the same DiT architecture as Sora, with significant improvements in quality. The authors claim that Stable Diffusion 3 outperforms other text-to-image generation systems, with parameter sizes ranging from 800M to 8B. The SD3 architecture is based on a collaboration between core Sora developers and an assistant professor from New York University, and it employs the MMDiT architecture, which surpasses UViT and DiT. Stable Diffusion 3 incorporates the Rectified Flow (RF) formula, and the authors' proposed reweighted RF variant continues to enhance performance. The model has undergone extensive research, utilizing a flexible text encoder for improvements, and has been compared against other models in terms of performance.
Stable Diffusion 3 Model Release: Architecture Details Revealed, Is It Helpful for Reproducing Sora?

机器之心
This article is from AIbase Daily
Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.