Translated data: Tsinghua University, in collaboration with Zhipu AI, has developed the domestic multimodal model CogVLM-17B, which demonstrates exceptional performance. This model is capable of identifying objects within images and distinguishing between fully visible and partially visible objects. CogVLM-17B employs a unique deep fusion method, achieving deep alignment of image and text features through four key components. The model has outperformed Google's models in multiple fields, earning it the nickname "14-sided warrior," showcasing remarkable multimodal processing capabilities. This domestic multimodal model offers new insights and possibilities for technological research in the multimodal field.