Apple researchers have recently introduced a Matryoshka-style diffusion model known as MDM, capable of generating high-quality images with a resolution of 1024x1024 in an end-to-end manner. The innovation of MDM lies in the incorporation of a multi-resolution diffusion process, achieved through a nested UNet architecture that implements multi-resolution loss, significantly enhancing the convergence speed for denoising high-resolution inputs. Additionally, MDM employs progressive training, starting from low resolutions and gradually incorporating higher resolution inputs and outputs, greatly improving training efficiency. Despite the relatively small training dataset, MDM has demonstrated formidable capabilities in generating high-quality, high-resolution images and videos. Compared to other cascade or latent methods, MDM offers simpler and more efficient training and inference processes.