moondream
A powerful small visual language model, accessible everywhere.
CommonProductImageVisualLanguage Model
moondream is a 1.6 billion parameter model built using the SigLIP, Phi-1.5, and LLaVA training datasets. Due to the use of the LLaVA dataset, the weights are protected by the CC-BY-SA license. You can try it out on Huggingface Spaces. The model's performance on the VQAv2, GQA, VizWiz, and TextVQA benchmark tests is as follows:
LLaVA-1.5 (13.3B parameters): 80.0, 63.3, 53.6, 61.3
LLaVA-1.5 (7.3B parameters): 78.5, 62.0, 50.0, 58.2
MC-LLaVA-3B (3B parameters): 64.2, 49.6, 24.9, 38.6
LLaVA-Phi (3B parameters): 71.4, -, 35.9, 48.6
moondream1 (1.6B parameters): 74.3, 56.3, 30.3, 39.8.
moondream Visit Over Time
Monthly Visits
515580771
Bounce Rate
37.20%
Page per Visit
5.8
Visit Duration
00:06:42