IndexTTS

An industrial-grade, controllable, and efficient zero-shot text-to-speech system

CommonProductProductivitySpeech SynthesisArtificial Intelligence
IndexTTS is a GPT-style text-to-speech (TTS) model primarily developed based on XTTS and Tortoise. It can correct Chinese pronunciation using pinyin and control pauses using punctuation marks. This system introduces a character-pinyin mixed modeling method in Chinese scenarios, significantly improving training stability, timbre similarity, and audio quality. Furthermore, it integrates BigVGAN2 to optimize audio quality. The model is trained on tens of thousands of hours of data and outperforms current popular TTS systems such as XTTS, CosyVoice2, and F5-TTS. IndexTTS is suitable for scenarios requiring high-quality speech synthesis, such as voice assistants and audiobooks, and its open-source nature makes it suitable for academic research and commercial applications.
Visit

IndexTTS Visit Over Time

Monthly Visits

502571820

Bounce Rate

37.10%

Page per Visit

5.9

Visit Duration

00:06:29

IndexTTS Visit Trend

IndexTTS Visit Geography

IndexTTS Traffic Sources

IndexTTS Alternatives