Microsoft has recently unveiled the multimodal model LLaVA-1.5, which introduces a cross-modal connector and an academic visual question answering dataset, achieving successful tests in multiple fields. This model not only reaches the highest level among open-source models but also integrates various modules including vision, language, and generators. Tests indicate that LLaVA-1.5 performs on par with GPT-4V, marking an exciting technological breakthrough.