llama3v

State-of-the-art (SOTA) visual model based on llama3 8B

CommonProductImageVisual ModelMulti-Modal Learning
llama3v is a state-of-the-art (SOTA) visual model based on Llama3 8B and siglip-so400m. It is an open-source VLLM (Visual Language Multi-Modal Learning Model) with model weights available on Huggingface, supporting fast local inference, and released inference code. This model combines image recognition and text generation by adding a projection layer to map image features to the LLaMA embedding space, enhancing its understanding of images.
Visit

llama3v Visit Over Time

Monthly Visits

488643166

Bounce Rate

37.28%

Page per Visit

5.7

Visit Duration

00:06:37

llama3v Visit Trend

llama3v Visit Geography

llama3v Traffic Sources

llama3v Alternatives