SpeechGPT2

An end-to-end human-like speech dialogue model.

PremiumNewProductchattingSpeech DialogueEmotion Expression
SpeechGPT2 is an end-to-end speech dialogue language model developed by the School of Computer Science at Fudan University. It can perceive and express emotions while providing appropriate voice responses in various styles based on context and human instructions. The model uses ultra-low bitrate speech codec (750bps) to simulate semantic and acoustic information and is initialized via a Multi-Input Multi-Output Language Model (MIMO-LM). Currently, SpeechGPT2 is a turn-based dialogue system, with development underway for a full-duplex real-time version that has shown promising progress. Despite limitations in computational and data resources, SpeechGPT2 has room for improvement regarding noise robustness in speech understanding and stability in speech generation quality, with plans for future open-source technical reports, code, and model weights.
Visit

SpeechGPT2 Visit Over Time

Monthly Visits

2965

Bounce Rate

58.47%

Page per Visit

1.0

Visit Duration

00:00:00

SpeechGPT2 Visit Trend

SpeechGPT2 Visit Geography

SpeechGPT2 Traffic Sources