Data to be translated: Researchers from Huazhong University of Science and Technology, ByteDance, and Johns Hopkins University have introduced GLEE, a universal object-level foundational model, which breaks through the limitations of current visual foundational models and brings new possibilities for image and video analysis. GLEE excels in various tasks, demonstrating flexibility and generalization capabilities, particularly outstanding in zero-shot transfer scenarios. The model integrates various data sources, including a large amount of automatically labeled data, to provide accurate and universal object-level information. Future research directions include expanding capabilities in handling complex scenes and long-tail distribution datasets to enhance adaptability.