EAGLE
Exploration of the design space for multimodal large language models
CommonProductProgrammingMultimodal LearningLarge Language Models
EAGLE is a series of high-resolution, vision-centric multimodal large language models (LLMs) designed to enhance the perception capabilities of multimodal LLMs through a combination of visual encoders and varied input resolutions. The model features a 'CLIP+X' fusion based on channel connections, suitable for visual experts trained on different architectures (ViT/ConvNets) and domains (detection/segmentation/OCR/SSL). The EAGLE model family supports input resolutions over 1K and excels in multimodal LLM benchmarks, particularly in resolution-sensitive tasks such as optical character recognition and document understanding.
EAGLE Visit Over Time
Monthly Visits
494758773
Bounce Rate
37.69%
Page per Visit
5.7
Visit Duration
00:06:29