llama3v
State-of-the-art (SOTA) visual model based on llama3 8B
CommonProductImageVisual ModelMulti-Modal Learning
llama3v is a state-of-the-art (SOTA) visual model based on Llama3 8B and siglip-so400m. It is an open-source VLLM (Visual Language Multi-Modal Learning Model) with model weights available on Huggingface, supporting fast local inference, and released inference code. This model combines image recognition and text generation by adding a projection layer to map image features to the LLaMA embedding space, enhancing its understanding of images.
llama3v Visit Over Time
Monthly Visits
494758773
Bounce Rate
37.69%
Page per Visit
5.7
Visit Duration
00:06:29