GLEE
A General Object Foundation Model for Images and Videos
CommonProductImageImageVideo
GLEE is a general object foundation model for images and videos. It uses a unified framework to locate and recognize objects in images and videos, and can be applied to various object perception tasks. GLEE forms a general object representation through joint training from various data sources with different supervision levels. While maintaining state-of-the-art performance, it can effectively perform zero-shot transfer and generalization. It also has good scalability and robustness.