2024-11-19 13:51:41.AIbase.13.3k
Peking University Team Releases Multimodal Model LLaVA-o1, Inference Capabilities Comparable to GPT-o1!
Recently, research teams from Peking University announced the release of an open-source multimodal model called LLaVA-o1, which is claimed to be the first visual language model capable of spontaneous and systematic reasoning, comparable to GPT-o1. The model excels in six challenging multimodal benchmark tests, with its 11B parameter version outperforming competitors such as Gemini-1.5-pro, GPT-4o-mini, and Llama-3.2-90B-Vision-Instruct.