Seed-TTS

A series of high-quality, multi-functional voice synthesis models

PremiumNewProductProductivityVoice SynthesisText-to-Speech
Seed-TTS, launched by ByteDance, is a series of large-scale autoregressive text-to-speech (TTS) models capable of generating speech indistinguishable from human voice. It excels in voice context learning, speaker similarity, and naturalness. Through fine-tuning, the subjective score can be further improved. Seed-TTS also provides superior control over vocal attributes like emotion and can generate expressive and diverse voices. Furthermore, it proposes a self-distillation method for voice decomposition and a reinforcement learning method to enhance model robustness, speaker similarity, and controllability. The non-autoregressive (NAR) variant of Seed-TTS, Seed-TTSDiT, is also presented. It utilizes a fully diffusion-based architecture, independent of pre-estimated phoneme durations, and performs speech generation in an end-to-end manner.
Visit

Seed-TTS Visit Over Time

Monthly Visits

8909

Bounce Rate

56.79%

Page per Visit

1.8

Visit Duration

00:00:45

Seed-TTS Visit Trend

Seed-TTS Visit Geography

Seed-TTS Traffic Sources

Seed-TTS Alternatives