Apple researchers have recently introduced a Matryoshka-style diffusion model known as MDM, capable of generating high-quality images with a resolution of 1024x1024 in an end-to-end manner. The innovation of MDM lies in the incorporation of a multi-resolution diffusion process, achieved through a nested UNet architecture that implements multi-resolution loss, significantly enhancing the convergence speed for denoising high-resolution inputs. Additionally, MDM employs progressive training, starting from low resolutions and gradually incorporating higher resolution inputs and outputs, greatly improving training efficiency. Despite the relatively small training dataset, MDM has demonstrated formidable capabilities in generating high-quality, high-resolution images and videos. Compared to other cascade or latent methods, MDM offers simpler and more efficient training and inference processes.
Apple's MDM Large Model for Image Generation Unveiled, Supporting High-Resolution Image Generation

机器之心
This article is from AIbase Daily
Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.