Researchers from Apple and the Federal Institute of Technology in Lausanne, Switzerland, have jointly open-sourced a large-scale multimodal visual model named 4M-21. Unlike models that are optimized for specific tasks or data types, 4M-21 boasts broad versatility and flexibility. Despite having only 3 billion parameters, it offers a myriad of functionalities including image classification, object detection, semantic segmentation, instance segmentation, depth estimation, surface normal estimation