Recently, significant progress has been made in image generation technology. KOSMOS-G is a new AI model based on multi-model LLMs that can generate detailed images from textual descriptions and multiple images, even in zero-shot scenarios. KOSMOS-G's training strategy includes multiple stages, achieving outstanding zero-shot image generation capabilities through training at different stages. This technology is expected to replace CLIP in image generation systems and expands the innovative application fields of image generation under the combination of textual and visual information. KOSMOS-G marks a significant step forward in image generation.