EAGLE

Exploration of the design space for multimodal large language models

CommonProductProgrammingMultimodal LearningLarge Language Models
EAGLE is a series of high-resolution, vision-centric multimodal large language models (LLMs) designed to enhance the perception capabilities of multimodal LLMs through a combination of visual encoders and varied input resolutions. The model features a 'CLIP+X' fusion based on channel connections, suitable for visual experts trained on different architectures (ViT/ConvNets) and domains (detection/segmentation/OCR/SSL). The EAGLE model family supports input resolutions over 1K and excels in multimodal LLM benchmarks, particularly in resolution-sensitive tasks such as optical character recognition and document understanding.
Visit

EAGLE Visit Over Time

Monthly Visits

515580771

Bounce Rate

37.20%

Page per Visit

5.8

Visit Duration

00:06:42

EAGLE Visit Trend

EAGLE Visit Geography

EAGLE Traffic Sources

EAGLE Alternatives