Florence-2

A unified foundation model for visual tasks.

PremiumNewProductProductivityVision ModelMulti-task Learning
Florence-2 is a novel visual foundation model that can handle various computer vision and vision-language tasks through a unified, prompt-based representation. Designed to accept text prompts as task instructions and generate expected results in textual format, whether it's image description, object detection, localization, or segmentation. This multi-task learning setup requires large-scale, high-quality annotated data. To this end, we jointly developed FLD-5B, which contains 5.4 billion comprehensive visual annotations across 126 million images, utilizing automated image annotation and model refinement iterative strategies. We employed a sequence-to-sequence structure to train Florence-2, enabling it to perform diverse and comprehensive visual tasks. Extensive evaluations demonstrate that Florence-2 is a powerful competitor within the visual foundation model landscape, exhibiting unprecedented zero-shot and fine-tuning capabilities.
Visit

Florence-2 Visit Over Time

Monthly Visits

17788201

Bounce Rate

44.87%

Page per Visit

5.4

Visit Duration

00:05:32

Florence-2 Visit Trend

Florence-2 Visit Geography

Florence-2 Traffic Sources