Florence-2-base
An advanced visual foundation model that supports various visual and vision-language tasks.
CommonProductImageVisual ModelMulti-Task Learning
Florence-2, a high-performance visual foundation model developed by Microsoft, utilizes a prompt-based approach to handle a wide range of visual and vision-language tasks. The model can interpret simple text prompts to perform tasks like description, object detection, and segmentation. It is trained on the FLD-5B dataset, which consists of 540 million images with 5.4 billion annotations, mastering multi-task learning. Its sequence-to-sequence architecture enables strong performance in both zero-shot and fine-tuning settings, establishing it as a competitive visual foundation model.
Florence-2-base Visit Over Time
Monthly Visits
19075321
Bounce Rate
45.07%
Page per Visit
5.5
Visit Duration
00:05:32