Recently, the Allen Institute for Artificial Intelligence (Ai2) unveiled Molmo, a cutting-edge open-source multi-modal AI model family that has demonstrated exceptional performance, even surpassing OpenAI's GPT-4o, Anthropic's Claude3.5Sonnet, and Google's Gemini1.5 in several third-party benchmark tests.

image.png

Molmo is not only capable of analyzing images uploaded by users but also utilizes "1000 times less data" than its competitors for training, thanks to its unique training techniques.

image.png

This release showcases Ai2's commitment to open research by providing high-performance models along with open weights and data for a broader community and businesses to utilize. The Molmo family comprises four main models: Molmo-72B, Molmo-7B-D, Molmo-7B-O, and MolmoE-1B, with Molmo-72B being the flagship model, featuring 7.2 billion parameters, and standing out particularly in performance.

According to various evaluations, Molmo-72B scored highest in 11 significant benchmark tests and was second only to GPT-4o in user preference. Ai2 also introduced a model named OLMoE, which employs a "combination of small models" approach aimed at enhancing cost-effectiveness.

The architecture of Molmo is meticulously designed for efficiency and superior performance. All models utilize OpenAI's ViT-L/14336px CLIP model as the visual encoder, processing multi-scale images into visual tokens. The language model component is a decoder Transformer with varying capacities and openness.

In terms of training, Molmo underwent a two-stage process:首先是 multi-modal pre-training, followed by supervised fine-tuning. Unlike many modern models, Molmo does not rely on reinforcement learning with human feedback but updates model parameters through a carefully tuned training process.

Molmo has excelled in multiple benchmark tests, especially in complex tasks such as document reading and visual reasoning, showcasing its robust capabilities. Ai2 has already released these models and datasets on Hugging Face, with more models and extended technical reports planned for the coming months, aiming to provide researchers with additional resources.

If you wish to explore the capabilities of Molmo, you can now experience an open demo through the official Molmo website (https://molmo.allenai.org/).

Key Points:

🌟 Ai2's Molmo open-source multi-modal AI models surpass industry-leading products.

📊 Molmo-72B excels in multiple benchmark tests, ranking second only to GPT-4o.

🔍 High openness, with models and datasets freely available for researchers to use.