Microsoft Launches Latest Vision Foundation Model Florence-2 for Local Browser Operation

AIbase

Published inAI News · 4 min read · Jun 27, 2024

690

Recently, Microsoft's latest visual foundation model, Florence-2, has made significant breakthroughs. Utilizing Transformers.js technology, this model can now run 100% locally in browsers that support WebGPU. This breakthrough brings revolutionary changes to AI visual applications, enabling powerful visual recognition capabilities to be implemented directly in users' browsers without relying on remote servers.

Florence-2-base-ft is a visual foundation model with 230 million parameters, using a prompt-based approach to handle a wide range of visual and visual-language tasks. The model supports multiple functionalities, including but not limited to:

Image caption generation
Optical Character Recognition (OCR)
Object detection
Image segmentation

This powerful model only occupies 340MB of storage space. Once loaded, it is cached in the browser, and users can directly invoke it when they revisit the page without needing to download it again. Most impressively, the entire process is conducted locally in the user's browser without any API calls to the server. This means that once the model is loaded, users can still use all functionalities even if they disconnect from the internet.

The local operation of Florence-2 is made possible by the support of 🤗 Transformers.js and ONNX Runtime Web technologies. This breakthrough not only enhances user privacy protection but also significantly reduces usage costs, paving the way for the widespread application of AI visual technology.

For developers and tech enthusiasts, the ONNX model of Florence-2 is already available on the Hugging Face platform. Interested individuals can visit https://huggingface.co/models?library=transformers.js&other=florence2 for more details. Additionally, the project's source code has also been made public on GitHub, and developers can obtain it via https://github.com/xenova/transformers.js/tree/v3/examples/florence2-webgpu for further exploration and development.

This breakthrough by Florence-2 will undoubtedly drive the rapid development and widespread adoption of AI visual applications. We can look forward to seeing more browser-based intelligent visual applications changing our daily lives and work methods in the near future.

Florence-2 Transformers.js WebGPU WeChat

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Moonshot AI Releases and Opensources Kimi K2 Model, Strong in Code and Agentic Tasks

Moonshot AI officially released its latest creation - the Kimi K2 model, and simultaneously announced its open source. This foundation model based on the MoE architecture has gained widespread attention in the AI field since its release, thanks to its strong coding capabilities and excellent general Agent task processing abilities. The Kimi K2 model has a total of 1T parameters, with 32B activated parameters. It has achieved top performance among open-source models in a series of benchmark performance tests such as SWE Bench Verified, Tau2, and AceBench.

Jul 12, 2025

AI Daily: Tencent Huyaun Launches 3D Generation Large Model Hunyuan3D-PolyGen; DingTalk AI Spreadsheet Makes a Big Entry; Alibaba Launches Multimodal Large Language Model HumanOmniV2

1.Tencent's Hunyuan3D-PolyGen boosts 3D modeling efficiency by 70% with BPT tech. 2.Alibaba's HumanOmniV2 achieves 69.33% accuracy in multilingual input. 3.DingTalk AI processes 1k tasks/hour with 'spreadsheet-as-document'. 4.Baidu PaddleOCR3.1 improves 37-language recognition by 30%. 5.Microsoft Deep Research opens API. 6.HKPolyU & OPPO's DLoRAL speeds video enhancement 10x. 7.Google opens MCP Toolbox for SQL. 8.Microsoft Win11 to add AI dynamic....

Jul 8, 2025

1.1k

Ali HumanOmniV2 Launches with a Shock: The New King of Multimodal AI, Accuracy Surges to 69.33%

Jul 8, 2025

1.5k

AI Daily: Bilibili May Launch an AI Creation Tool Named H; Zhiyuan Unveils Naoche Robot Lingxi X2-N; Yushu Technology Pursues IPO on Sci-Tech Innovation Board

AI Daily: B站 launches 'H' tool for video creation; Zhiyuan unveils dual-mode robot X2-N; Yushu Tech aims for IPO at $12B valuation; EarthMind innovates earth data analysis; Gemini CLI updates AV features; macOS assistant Glass goes open-source; Claude to release math-focused Neptune v3; OpenAI's GPT-5 to integrate multi-models.....

Jul 7, 2025

1.0k

Zhixuan Launches Naocha Robot Lingxi X2-N: Can Switch Between Wheel and Foot Dual Modes

Ziyuan's robot Lingxi X2-N features dual-mode design: wheeled for mobility and legged for obstacle-crossing, carrying 6kg. It adapts to complex terrains with excellent balance and load capacity.....

Jul 7, 2025

930

AI Daily: Tencent Yuanbao Upgrades for One-Phrase Image and Video Search; WeChat Pay MCP Launches; Google Unveils Veo 3 Globally

Welcome to the [AI Daily] column! This is your guide to exploring the world of artificial intelligence every day. Each day, we present you with the latest content in the AI field, focusing on developers to help you understand technical trends and innovative AI product applications. Click to learn more about new AI products: https://top.aibase.com/1. Tencent Yuanbao upgrades again: one phrase search, images and videos appear instantly, making information retrieval more intuitive! The upgraded features of Tencent Yuanbao make information retrieval more intuitive and efficient. Users just need to ask a question in one phrase to get text and image results.

Jul 4, 2025

120

WeChat Pay MCP Launch: The Perfect Combination of AI and Payment, Opening a New Era for Business

Jul 4, 2025

700

Kunlun Xiwang Once Again Open-Sources the Reward Model Skywork-Reward-V2

On July 4, 2025, Kunlun Xiwang continued to open-source the second-generation reward model Skywork-Reward-V2 series. This series includes 8 reward models based on different foundation models, with parameter sizes ranging from 600 million to 8 billion. Upon its release, it won all seven major reward model evaluation rankings, becoming a focus in the open-source reward model field. Reward models play a key role in the reinforcement learning from human feedback (RLHF) process. To build the next generation of reward models, Kunlun Xiwang has constructed a dataset containing 40 million

Jul 4, 2025

310

Topview Avatar 2 Shakes the Market! AI Digital Humans Revolution E-commerce Live Streaming, Will the Era of Models Come to an End?

Jul 3, 2025

580

Honor Magic V5 Launch: Li Jian Emphasizes Open Ecosystem, Collaborating with Giants to Build the AI Future

In the media Q&A session after today's Honor Magic V5 and AI Terminal Ecosystem Launch, Honor CEO Li Jian, CFO Peng Qiuen, and Product Line President Fang Fei had in-depth discussions with the media. During the event, Honor officially announced support for the MCP and A2A protocols, and revealed that it will collaborate deeply with partners such as Alibaba, BYD, and Midea in the fields of intelligent service ecosystem, smart vehicle networking, and smart home. Honor CEO Li Jian emphasized in the conversation that 'openness' is the core philosophy of Honor. He pointed out...

Jul 3, 2025

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Microsoft Launches Latest Vision Foundation Model Florence-2 for Local Browser Operation

AIbase

This article is from AIbase Daily

AI News Recommendations

Moonshot AI Releases and Opensources Kimi K2 Model, Strong in Code and Agentic Tasks

AI Daily: Tencent Huyaun Launches 3D Generation Large Model Hunyuan3D-PolyGen; DingTalk AI Spreadsheet Makes a Big Entry; Alibaba Launches Multimodal Large Language Model HumanOmniV2

Ali HumanOmniV2 Launches with a Shock: The New King of Multimodal AI, Accuracy Surges to 69.33%

AI Daily: Bilibili May Launch an AI Creation Tool Named H; Zhiyuan Unveils Naoche Robot Lingxi X2-N; Yushu Technology Pursues IPO on Sci-Tech Innovation Board

Zhixuan Launches Naocha Robot Lingxi X2-N: Can Switch Between Wheel and Foot Dual Modes

AI Daily: Tencent Yuanbao Upgrades for One-Phrase Image and Video Search; WeChat Pay MCP Launches; Google Unveils Veo 3 Globally

WeChat Pay MCP Launch: The Perfect Combination of AI and Payment, Opening a New Era for Business

Kunlun Xiwang Once Again Open-Sources the Reward Model Skywork-Reward-V2

Topview Avatar 2 Shakes the Market! AI Digital Humans Revolution E-commerce Live Streaming, Will the Era of Models Come to an End?

Honor Magic V5 Launch: Li Jian Emphasizes Open Ecosystem, Collaborating with Giants to Build the AI Future