Microsoft has recently unveiled the multimodal model LLaVA-1.5, which introduces a cross-modal connector and an academic visual question answering dataset, achieving successful tests in multiple fields. This model not only reaches the highest level among open-source models but also integrates various modules including vision, language, and generators. Tests indicate that LLaVA-1.5 performs on par with GPT-4V, marking an exciting technological breakthrough.
Microsoft Open Sources Multimodal Model LLaVA-1.5 Comparable to GPT-4V Performance
站长之家
71
© Copyright AIbase Base 2024, Click to View Source - https://www.aibase.com/news/5113