Phi-3.5-vision

An advanced multimodal model that supports image and text understanding.

CommonProductProgrammingMultimodalImage Understanding
Phi-3.5-vision is a lightweight, next-generation multimodal model developed by Microsoft. It is built on a dataset that includes synthetic data and curated publicly available websites, focusing on high-quality, dense reasoning data for both text and visual inputs. This model belongs to the Phi-3 family and has undergone rigorous enhancement processes, combining supervised fine-tuning with direct preference optimization to ensure precise instruction following and robust safety measures.
Visit

Phi-3.5-vision Visit Over Time

Monthly Visits

17104189

Bounce Rate

44.67%

Page per Visit

5.5

Visit Duration

00:05:49

Phi-3.5-vision Visit Trend

Phi-3.5-vision Visit Geography

Phi-3.5-vision Traffic Sources

Phi-3.5-vision Alternatives