Florence-2
A unified foundation model for visual tasks.
PremiumNewProductProductivityVision ModelMulti-task Learning
Florence-2 is a novel visual foundation model that can handle various computer vision and vision-language tasks through a unified, prompt-based representation. Designed to accept text prompts as task instructions and generate expected results in textual format, whether it's image description, object detection, localization, or segmentation. This multi-task learning setup requires large-scale, high-quality annotated data. To this end, we jointly developed FLD-5B, which contains 5.4 billion comprehensive visual annotations across 126 million images, utilizing automated image annotation and model refinement iterative strategies. We employed a sequence-to-sequence structure to train Florence-2, enabling it to perform diverse and comprehensive visual tasks. Extensive evaluations demonstrate that Florence-2 is a powerful competitor within the visual foundation model landscape, exhibiting unprecedented zero-shot and fine-tuning capabilities.
Florence-2 Visit Over Time
Monthly Visits
20899836
Bounce Rate
46.04%
Page per Visit
5.2
Visit Duration
00:04:57