Born for Complex Visual Reasoning! Microsoft Releases Phi-3.5-vision Lightweight, Multimodal Open Source Model

AIbase基地

Published inAI News · 4 min read · Aug 21, 2024

416

Microsoft has recently released Phi-3.5-vision, a lightweight, multimodal open-source AI model, which is the newest member of the Phi-3 model family designed for applications that require simultaneous processing of text and visual inputs. The Phi-3.5-vision model performs exceptionally well in environments with limited memory or computational resources, supporting a context length of 128K, making it an ideal choice for both commercial and research sectors.

The Phi-3.5-vision model offers a wide range of functionalities including extensive image understanding, optical character recognition (OCR), chart and table parsing, and summarization of multiple images or video clips. It has demonstrated significant performance improvements in benchmark tests related to image and video processing.

Comprising a system with 4.2 billion parameters, the Phi-3.5-vision model includes an image encoder, connector, projector, and the Phi-3Mini language model. It is trained using high-quality educational data, synthetic data, and rigorously screened public documents to ensure data quality and privacy.

Phi-3.5-vision includes three models:

Phi-3.5Mini Instruct: A lightweight AI model suitable for environments with limited memory or computational resources.

Phi-3.5MoE (Mixture of Experts): Microsoft's first "mixture of experts" model, adept at handling complex tasks.

Phi-3.5Vision Instruct: A multimodal model that integrates text and image processing capabilities.

Key Features

The main features of the Phi-3.5-vision model include image understanding, OCR, chart and table comprehension, multi-image comparison, summarization of multiple images or video clips, efficient inference capabilities, and low latency with memory optimization.

Phi-3.5-vision has performed excellently in multiple benchmark tests such as MMMU, MMBench, TextVQA, video processing capability tests, and the BLINK benchmark, showcasing its robust performance in multimodal and visual tasks.

The release of Microsoft's Phi-3.5-vision model brings new options to the AI field, particularly in edge-side operations and complex visual reasoning. Its open-source nature and optimized design allow it to perform exceptionally well in resource-constrained environments, providing strong support for a variety of AI-driven applications.

Model download link: https://huggingface.co/microsoft/Phi-3.5-vision-instruct

Phi-3.5-vision Open Source AI Model Optical Character Recognition Image Understanding

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Lenovo Tianshi AI Pro Launch: A Trustworthy AI Partner for Government and Enterprises

Lenovo launches Tianshi AI Pro, positioned as an AI partner for government and enterprise office work, promoting the Xinchuang industry into the AI era. The product shifts the operational logic from "tool-centric" to "task-centric," and is deeply integrated with the Kylin operating system, offering a "dual interface," allowing users to switch conveniently by swiping with four fingers.

Apr 17, 2026

370

Miniso Establishes AI Innovation Department: Focused on Intelligent Agent R&D and Global Site Selection Algorithm Optimization

Miniso establishes an AI Innovation Department, which belongs to the Digital Technology Center, aiming to promote the intelligent upgrade of global business decisions and internal collaboration through intelligent agent technology, focusing on the intelligentization of business decision-making and the construction of core capabilities of intelligent agents.

Apr 17, 2026

190

ChatGPT Users Exceed 1 Billion, Female Users Account for Over 50% for the First Time

According to OpenAI data, ChatGPT's global weekly active users will exceed 1 billion, with a significant change in user structure. The proportion of female users increased from 20% at the beginning to over 50%, for the first time surpassing males, with about 500 million women using it regularly. This reflects that AI technology is accelerating its popularization.

Apr 17, 2026

250

Cerebras and OpenAI Sign 20 Billion Dollar Chip Agreement Plan for IPO

AI chip company Cerebras has reached a major three-year deal worth over $1 billion with OpenAI, doubling the scale of the agreement from the beginning of the year, showing OpenAI's high trust in its technology. OpenAI has committed to provide approximately $1 billion in support for Cerebras to develop data center systems and has obtained a maximum of 10% of minority equity warrants, deepening the strategic cooperation.

Apr 17, 2026

230

iFLYTEK Launches the Upgraded Version of AstronClaw: Introduces 9 New Products and a Hardware-Software Integrated AI Agent Architecture

iFLYTEK launches the upgraded version of AstronClaw, introducing 9 new products and showcasing the hardware-software integrated "AI Agent" architecture. This architecture drives AI from a "dialogue assistant" to a "physical execution hub," aiming to break through screen limitations and bring large model capabilities into the physical world and complex business processes. In the office field, AstronClaw integrates with iFLYTEK Office Book to structure and process fragmented work information.

Apr 17, 2026

370

AI Daily: Claude Opus 4.7 Released; Alibaba Open Sources Qwen3.6-35B-A3B; Perplexity Launches AI Assistant for Mac

Welcome to the [AI Daily] column! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications. Discover new AI products: https://app.aibase.com/zh1, ClaudeOpus4.7 officially released: What matters more than being smart is being reliable. The release of ClaudeOpus4.7 marks Anthropic's progress in AI model reliability.

Apr 17, 2026

830

OpenAI Launches GPT-Rosalind Model, Deeply Crossing into the Field of Pharmaceutical and Life Sciences

OpenAI launches GPT-Rosalind, an AI model for life sciences named after DNA pioneer, designed to accelerate drug discovery by analyzing biochemical data to aid in evidence synthesis, hypothesis generation, experimental planning, and protein engineering, enhancing lab efficiency and medical application.....

Apr 17, 2026

330

Starbucks Introduces ChatGPT to Recommend Drinks Based on Mood

Starbucks is testing a smart ordering application based on ChatGPT, allowing users to get personalized drink recommendations by entering their mood or needs, aiming to enhance the consumer experience.

Apr 17, 2026

240

Google Gemini Integrates with Personal Photo Albums, AI-Generated Images Move Toward True Personalization

Google's Gemini AI now includes Personal Intelligence, linking to Google Photos to auto-generate personalized images from private albums without manual uploads. With Nano Banana, users can easily create custom content like animated family portraits, enhancing AI response personalization and convenience.....

Apr 17, 2026

340

NVIDIA Releases Lyra 2.0: Generate 90-Meter 3D Environments from a Single Photo, Outperforming Competitors in Multiple Metrics

NVIDIA released the Lyra 2.0 system, which can generate large-scale, highly coherent 3D virtual environments extending up to 90 meters from a single photo, solving issues of image distortion in long-distance camera paths. This technological breakthrough marks significant progress in AI's understanding of 3D spaces and real-time environment simulation, especially meeting the urgent demand for high-quality virtual scenes in embodied intelligence training.

Apr 17, 2026

390

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Born for Complex Visual Reasoning! Microsoft Releases Phi-3.5-vision Lightweight, Multimodal Open Source Model

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Lenovo Tianshi AI Pro Launch: A Trustworthy AI Partner for Government and Enterprises

Miniso Establishes AI Innovation Department: Focused on Intelligent Agent R&D and Global Site Selection Algorithm Optimization

ChatGPT Users Exceed 1 Billion, Female Users Account for Over 50% for the First Time

Cerebras and OpenAI Sign 20 Billion Dollar Chip Agreement Plan for IPO

iFLYTEK Launches the Upgraded Version of AstronClaw: Introduces 9 New Products and a Hardware-Software Integrated AI Agent Architecture

AI Daily: Claude Opus 4.7 Released; Alibaba Open Sources Qwen3.6-35B-A3B; Perplexity Launches AI Assistant for Mac

OpenAI Launches GPT-Rosalind Model, Deeply Crossing into the Field of Pharmaceutical and Life Sciences

Starbucks Introduces ChatGPT to Recommend Drinks Based on Mood

Google Gemini Integrates with Personal Photo Albums, AI-Generated Images Move Toward True Personalization

NVIDIA Releases Lyra 2.0: Generate 90-Meter 3D Environments from a Single Photo, Outperforming Competitors in Multiple Metrics

AI News Recommendations

Lenovo Tianshi AI Pro Launch: A Trustworthy AI Partner for Government and Enterprises

Miniso Establishes AI Innovation Department: Focused on Intelligent Agent R&D and Global Site Selection Algorithm Optimization

ChatGPT Users Exceed 1 Billion, Female Users Account for Over 50% for the First Time

Cerebras and OpenAI Sign 20 Billion Dollar Chip Agreement Plan for IPO

iFLYTEK Launches the Upgraded Version of AstronClaw: Introduces 9 New Products and a Hardware-Software Integrated AI Agent Architecture

AI Daily: Claude Opus 4.7 Released; Alibaba Open Sources Qwen3.6-35B-A3B; Perplexity Launches AI Assistant for Mac

OpenAI Launches GPT-Rosalind Model, Deeply Crossing into the Field of Pharmaceutical and Life Sciences

Starbucks Introduces ChatGPT to Recommend Drinks Based on Mood

Google Gemini Integrates with Personal Photo Albums, AI-Generated Images Move Toward True Personalization

NVIDIA Releases Lyra 2.0: Generate 90-Meter 3D Environments from a Single Photo, Outperforming Competitors in Multiple Metrics

GEO Services