The Stable Diffusion 3 model has been released, utilizing the same DiT architecture as Sora, with significant improvements in quality. The authors claim that Stable Diffusion 3 outperforms other text-to-image generation systems, with parameter sizes ranging from 800M to 8B. The SD3 architecture is based on a collaboration between core Sora developers and an assistant professor from New York University, and it employs the MMDiT architecture, which surpasses UViT and DiT. Stable Diffusion 3 incorporates the Rectified Flow (RF) formula, and the authors' proposed reweighted RF variant continues to enhance performance. The model has undergone extensive research, utilizing a flexible text encoder for improvements, and has been compared against other models in terms of performance.
Stable Diffusion 3 Model Release: Architecture Details Revealed, Is It Helpful for Reproducing Sora?
机器之心
41
© Copyright AIbase Base 2024, Click to View Source - https://www.aibase.com/news/6402