In today's rapidly advancing field of artificial intelligence, a multi-modal large language model named ORYX is quietly transforming our understanding of AI's ability to perceive the visual world. This AI system, developed collaboratively by researchers from Tsinghua University, Tencent, and Nanyang Technological University, is regarded as the 'Transformers' of visual processing. ORYX, short for Oryx Multi-Modal Large Language Models, is an AI model specifically designed for processing images, videos, and 3D scene time-space understanding.