Magma-8B
Magma-8B is a multi-modal AI model developed by Microsoft that processes image and text inputs to generate text outputs.
CommonProductImageMulti-modalImage
Magma-8B is a foundational multi-modal AI model developed by Microsoft, specifically designed for researching multi-modal AI agents. It integrates text and image inputs to generate text outputs and possesses visual planning and agent capabilities. The model utilizes Meta LLaMA-3 as its language model backbone and incorporates a CLIP-ConvNeXt-XXLarge vision encoder. It can learn spatiotemporal relationships from unlabeled video data, exhibiting strong generalization capabilities and multi-task adaptability. Magma-8B excels in multi-modal tasks, particularly in spatial understanding and reasoning. It provides a powerful tool for multi-modal AI research, advancing the study of complex interactions in virtual and real-world environments.
Magma-8B Visit Over Time
Monthly Visits
26103677
Bounce Rate
43.69%
Page per Visit
5.5
Visit Duration
00:04:43