SpeechGPT2
An end-to-end human-like speech dialogue model.
PremiumNewProductchattingSpeech DialogueEmotion Expression
SpeechGPT2 is an end-to-end speech dialogue language model developed by the School of Computer Science at Fudan University. It can perceive and express emotions while providing appropriate voice responses in various styles based on context and human instructions. The model uses ultra-low bitrate speech codec (750bps) to simulate semantic and acoustic information and is initialized via a Multi-Input Multi-Output Language Model (MIMO-LM). Currently, SpeechGPT2 is a turn-based dialogue system, with development underway for a full-duplex real-time version that has shown promising progress. Despite limitations in computational and data resources, SpeechGPT2 has room for improvement regarding noise robustness in speech understanding and stability in speech generation quality, with plans for future open-source technical reports, code, and model weights.
SpeechGPT2 Visit Over Time
Monthly Visits
1591
Bounce Rate
47.78%
Page per Visit
1.0
Visit Duration
00:00:00