Magma-8B

Magma-8B is a multi-modal AI model developed by Microsoft that processes image and text inputs to generate text outputs.

CommonProductImageMulti-modalImage
Magma-8B is a foundational multi-modal AI model developed by Microsoft, specifically designed for researching multi-modal AI agents. It integrates text and image inputs to generate text outputs and possesses visual planning and agent capabilities. The model utilizes Meta LLaMA-3 as its language model backbone and incorporates a CLIP-ConvNeXt-XXLarge vision encoder. It can learn spatiotemporal relationships from unlabeled video data, exhibiting strong generalization capabilities and multi-task adaptability. Magma-8B excels in multi-modal tasks, particularly in spatial understanding and reasoning. It provides a powerful tool for multi-modal AI research, advancing the study of complex interactions in virtual and real-world environments.
Visit

Magma-8B Visit Over Time

Monthly Visits

26103677

Bounce Rate

43.69%

Page per Visit

5.5

Visit Duration

00:04:43

Magma-8B Visit Trend

Magma-8B Visit Geography

Magma-8B Traffic Sources

Magma-8B Alternatives