EAGLE

Exploration of the design space for multimodal large language models

CommonProductProgrammingMultimodal LearningLarge Language Models

Visit

EAGLE is a series of high-resolution, vision-centric multimodal large language models (LLMs) designed to enhance the perception capabilities of multimodal LLMs through a combination of visual encoders and varied input resolutions. The model features a 'CLIP+X' fusion based on channel connections, suitable for visual experts trained on different architectures (ViT/ConvNets) and domains (detection/segmentation/OCR/SSL). The EAGLE model family supports input resolutions over 1K and excels in multimodal LLM benchmarks, particularly in resolution-sensitive tasks such as optical character recognition and document understanding.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

EAGLE

EAGLE Visit Over Time

EAGLE Visit Trend

EAGLE Visit Geography

EAGLE Traffic Sources

EAGLE Alternatives

EAGLE — Exploration of the design space for multimodal large language models

Large World Models — Large World Models: Understanding Video and Language

Models Table — A comprehensive list and information about large language models

CuMo — An advanced architecture for extending multimodal large language models (LLMs).

LongLLaVA — Efficiently extending multimodal large language models to 1,000 images.

MM1.5 — Optimization and analysis of multimodal large language models

FP6-LLM — Efficiently serving large language models

lmms-finetune — A unified codebase for fine-tuning large multimodal models.

Phi Open Models — Phi Open Models are powerful, cost-effective, low-latency small language models.

BiTA — Bidirectional Adjustment for Large Language Models

Zhipu AI Large Model Open Platform — Integrate large models with just a few lines of code.

Apollo-LMMs — Exploration of Video Understanding in Large Multimodal Models

LLM Maybe LongLM — Extends the context window of large language models

VSP-LLM — A framework that combines Visual Speech Processing with Large Language Models

Prompt Engineering Guide — A comprehensive guide to prompt engineering for large language models

InternVL2_5-38B — Advanced Multimodal Large Language Model Series

Benchmarking API Performance of Large Language Models — In-depth analysis of key metrics like TTFT and TPS

Open LLM Leaderboard — A publicly accessible leaderboard of large language models.

Brainglue — Brainglue is an interesting experimental platform for large language models

Multimodal-Maestro — More effectively prompt large multimodal models to unlock their potential.

DataBonsai — A Python library for data cleaning and organization using Large Language Models (LLMs).

OpenAI Embedding Models — New generation embedding models with improved performance and lower prices.

parsera — A lightweight Python library for web scraping using large language models.

NVLM 1.0 — A cutting-edge multimodal large language model that achieves state-of-the-art performance on visual-language tasks.

Nous Research — Leader in human-centric language models and simulators

xLAM — Research on intelligent agents based on large language models

LLMs-from-scratch — Deep dive into the inner workings of large language models.

AutoDAN-Turbo — An automated framework for breaking the limitations of large language models

DCLM — Comprehensive framework for building and training large language models