Data to be translated: The National University of Singapore has released the NExT-GPT multimodal language model, which supports processing of text, images, videos, and audio, thereby facilitating the development of multimedia artificial intelligence applications. The model employs a three-tier architecture and undergoes intermediate layer training through MosIT technology, offering open-source contributions that create opportunities for researchers and developers to integrate multimodal inputs. The unique feature of NExT-GPT lies in its ability to generate modal signaling tags, bringing potential applications in content generation and multimedia analysis.