Chinese researchers have introduced a powerful open-source vision-language foundation model named CogVLM, which has made significant progress in cross-modal tasks by deeply integrating language and visual information. CogVLM employs a novel training method, incorporating trainable visual experts to enhance the visual understanding capabilities of language models, demonstrating exceptional performance in tasks such as image captioning and visual question answering. The open-source CogVLM-28B-zh supports mixed Chinese-English commercial applications, bringing significant impact to both field research and practical applications.