Stability AI Text-to-Speech Models
Stability AI's high-fidelity text-to-speech models
CommonProductOthersVoice synthesisHigh-fidelity
Stability AI's high-fidelity text-to-speech models aim to provide natural language guidance for training voice synthesis models on large datasets. This is achieved by annotating different speaker identities, styles, and recording conditions. This approach is then applied to a dataset of 45,000 hours of data to train the voice language model. Additionally, the model proposes simple methods for enhancing audio fidelity, which, despite relying entirely on discovered data, perform remarkably well.