2024-11-06 11:24:56.AIbase.13.0k
OuteTTS-0.1-350M: A Novel Text-to-Speech Synthesis Method with Zero-Shot Voice Cloning Capability
Recently, Oute AI released a novel text-to-speech synthesis method called OuteTTS-0.1-350M. This method utilizes pure language modeling without the need for external adapters or complex architectures, offering a simplified TTS approach. OuteTTS-0.1-350M is based on the LLaMa architecture, using WavTokenizer to directly generate audio tokens, making the process more efficient. The model features zero-shot voice cloning capability, requiring only a few seconds of reference audio.