GLM-4-Voice is an end-to-end voice model developed by a team from Tsinghua University, capable of directly understanding and generating Chinese and English speech for real-time dialogue. Leveraging advanced speech recognition and synthesis technologies, it achieves seamless conversion from speech to text and back to speech, boasting low latency and high conversational intelligence. The model is optimized for intellectual engagement and expressive synthesis capabilities in the voice modality, making it suitable for scenarios requiring real-time voice interaction.