DiffRhythm is an innovative music generation model that utilizes latent diffusion technology to achieve fast and high-quality full-song generation. This technology breaks through the limitations of traditional music generation methods, eliminating the need for complex multi-stage architectures and cumbersome data preparation. Only lyrics and style prompts are needed to generate a complete song up to 4 minutes and 45 seconds in a short time. Its autoregressive structure ensures fast inference speed, greatly improving the efficiency and scalability of music creation. The model was jointly developed by the Audio, Speech, and Language Processing group (ASLP@NPU) at Northwestern Polytechnical University and the Big Data Institute of the Chinese University of Hong Kong (Shenzhen), aiming to provide a simple, efficient, and creative solution for music creation.