Phi-3.5-vision
An advanced multimodal model that supports image and text understanding.
CommonProductProgrammingMultimodalImage Understanding
Phi-3.5-vision is a lightweight, next-generation multimodal model developed by Microsoft. It is built on a dataset that includes synthetic data and curated publicly available websites, focusing on high-quality, dense reasoning data for both text and visual inputs. This model belongs to the Phi-3 family and has undergone rigorous enhancement processes, combining supervised fine-tuning with direct preference optimization to ensure precise instruction following and robust safety measures.
Phi-3.5-vision Visit Over Time
Monthly Visits
20899836
Bounce Rate
46.04%
Page per Visit
5.2
Visit Duration
00:04:57