Seed-TTS
A series of high-quality, multi-functional voice synthesis models
PremiumNewProductProductivityVoice SynthesisText-to-Speech
Seed-TTS, launched by ByteDance, is a series of large-scale autoregressive text-to-speech (TTS) models capable of generating speech indistinguishable from human voice. It excels in voice context learning, speaker similarity, and naturalness. Through fine-tuning, the subjective score can be further improved. Seed-TTS also provides superior control over vocal attributes like emotion and can generate expressive and diverse voices. Furthermore, it proposes a self-distillation method for voice decomposition and a reinforcement learning method to enhance model robustness, speaker similarity, and controllability. The non-autoregressive (NAR) variant of Seed-TTS, Seed-TTSDiT, is also presented. It utilizes a fully diffusion-based architecture, independent of pre-estimated phoneme durations, and performs speech generation in an end-to-end manner.
Seed-TTS Visit Over Time
Monthly Visits
8909
Bounce Rate
56.79%
Page per Visit
1.8
Visit Duration
00:00:45