Phi-3.5-vision
An advanced multimodal model that supports image and text understanding.
CommonProductProgrammingMultimodalImage Understanding
Phi-3.5-vision is a lightweight, next-generation multimodal model developed by Microsoft. It is built on a dataset that includes synthetic data and curated publicly available websites, focusing on high-quality, dense reasoning data for both text and visual inputs. This model belongs to the Phi-3 family and has undergone rigorous enhancement processes, combining supervised fine-tuning with direct preference optimization to ensure precise instruction following and robust safety measures.
Phi-3.5-vision Visit Over Time
Monthly Visits
19075321
Bounce Rate
45.07%
Page per Visit
5.5
Visit Duration
00:05:32