moondream

A powerful small visual language model, accessible everywhere.

CommonProductImageVisualLanguage Model
moondream is a 1.6 billion parameter model built using the SigLIP, Phi-1.5, and LLaVA training datasets. Due to the use of the LLaVA dataset, the weights are protected by the CC-BY-SA license. You can try it out on Huggingface Spaces. The model's performance on the VQAv2, GQA, VizWiz, and TextVQA benchmark tests is as follows: LLaVA-1.5 (13.3B parameters): 80.0, 63.3, 53.6, 61.3 LLaVA-1.5 (7.3B parameters): 78.5, 62.0, 50.0, 58.2 MC-LLaVA-3B (3B parameters): 64.2, 49.6, 24.9, 38.6 LLaVA-Phi (3B parameters): 71.4, -, 35.9, 48.6 moondream1 (1.6B parameters): 74.3, 56.3, 30.3, 39.8.
Visit

moondream Visit Over Time

Monthly Visits

499904316

Bounce Rate

37.31%

Page per Visit

5.8

Visit Duration

00:06:52

moondream Visit Trend

moondream Visit Geography

moondream Traffic Sources

moondream Alternatives