Spark-TTS

Spark-TTS is a highly efficient single-stream decoupled speech synthesis model based on large language models.

CommonProductProductivitySpeech SynthesisLarge Language Model
Spark-TTS is a highly efficient text-to-speech synthesis model based on large language models, featuring single-stream decoupled speech tokens. Leveraging the power of large language models, it directly reconstructs audio predicted from code, omitting the additional acoustic feature generation model, thus improving efficiency and reducing complexity. This model supports zero-shot text-to-speech synthesis, enabling cross-lingual and code-switching scenarios, making it ideal for speech synthesis applications requiring high naturalness and accuracy. It also supports virtual voice creation; users can generate different voices by adjusting parameters such as gender, pitch, and speaking rate. The model aims to address the inefficiencies and complexities of traditional speech synthesis systems, providing a highly efficient, flexible, and powerful solution for research and production. Currently, the model is primarily intended for academic research and legitimate applications such as personalized speech synthesis, assistive technologies, and language research.
Visit

Spark-TTS Visit Over Time

Monthly Visits

474564576

Bounce Rate

36.20%

Page per Visit

6.1

Visit Duration

00:06:34

Spark-TTS Visit Trend

Spark-TTS Visit Geography

Spark-TTS Traffic Sources

Spark-TTS Alternatives