Translated data: CoDi-2 is a multimodal large language model developed through collaboration among multiple institutions, successfully addressing the challenges of complex instruction generation and comprehension. It excels in tasks such as image generation and audio editing, achieving zero-shot control and multimodal dialogue through language models. In the future, CoDi-2 will optimize learning and support additional modalities, continuously enhancing its multimodal generation capabilities.