StreamSpeech
Real-time speech translation, bridging cross-language communication.
PremiumNewProductProductivityReal-time translationMulti-task learning
StreamSpeech is a real-time speech-to-speech translation model based on multi-task learning. By learning translation and synchronization strategies in a unified framework, it effectively identifies the translation timing within streaming voice input, achieving a high-quality real-time communication experience. The model has demonstrated leading performance on the CVSS benchmark and can provide low-latency intermediate results, such as ASR or translation.