SpeechGPT
Multimodal Language Model
CommonProductProgrammingSpeechMultimodal
SpeechGPT is a multimodal language model with inherent cross-modal dialogue capabilities. It can perceive and generate multimodal content and follow multimodal human instructions. SpeechGPT-Gen is an extended information chain speech generation model. SpeechAgents is a multimodal multi-agent system for human communication simulation. SpeechTokenizer is a unified speech tokenizer suitable for speech language models. The release dates and related information of these models and datasets can be found on the official website.
SpeechGPT Visit Over Time
Monthly Visits
515580771
Bounce Rate
37.20%
Page per Visit
5.8
Visit Duration
00:06:42