Kosmos-2
A world-facing multi-modal large language model
CommonProductProductivityNatural Language ProcessingMulti-modal
Kosmos-2 is a multi-modal large language model that can associate natural language with various input forms like images and videos. It can be used for tasks such as phrase localization, referential understanding, referential expression generation, image description, and visual question answering. Kosmos-2 is trained and evaluated using the GRIT dataset, which contains a large amount of image-text pairs. Kosmos-2's strength lies in its ability to associate natural language with visual information, thereby enhancing model performance.
Kosmos-2 Visit Over Time
Monthly Visits
494758773
Bounce Rate
37.69%
Page per Visit
5.7
Visit Duration
00:06:29