Llasa is a text-to-speech (TTS) base model based on the Llama framework, designed for large-scale speech synthesis tasks. The model is trained using 160,000 hours of tokenized speech data and has efficient language generation capabilities and multilingual support. Its main advantages include powerful speech synthesis capabilities, low inference costs, and flexible framework compatibility. This model is suitable for education, entertainment, and commercial scenarios, providing users with high-quality speech synthesis solutions. This model is currently freely available on Hugging Face, aiming to promote the development and application of speech synthesis technology.