Florence-2-base-ft
An advanced visual foundation model supporting various visual and vision-language tasks
CommonProductImageImage ProcessingVision-Language Model
Florence-2 is a high-performance visual foundation model developed by Microsoft, utilizing a prompt-based approach to handle a wide range of visual and vision-language tasks. This model can interpret simple text prompts and perform tasks such as image description, object detection, and segmentation. It is trained on the FLD-5B dataset, containing 5.4 billion annotations across 126 million images, demonstrating expertise in multi-task learning. Its sequence-to-sequence architecture allows for strong performance in both zero-shot and fine-tuning settings, proving to be a competitive visual foundation model.
Florence-2-base-ft Visit Over Time
Monthly Visits
17788201
Bounce Rate
44.87%
Page per Visit
5.4
Visit Duration
00:05:32