ModernBERT-large

High-performance bidirectional encoder Transformer model

CommonProductProgrammingBERTTransformer

ModernBERT-large is a state-of-the-art bidirectional encoder Transformer model (BERT style) pre-trained on 2 trillion tokens of English and code data, with a native context length of up to 8192 tokens. This model incorporates the latest architectural improvements such as Rotary Positional Embeddings (RoPE) for long-context support, local-global alternating attention for enhanced efficiency with long inputs, and padding-free and Flash Attention for improved inference speed. ModernBERT-long is suitable for tasks involving the handling of long documents, such as retrieval, classification, and semantic search within large corpora. The training data primarily consists of English and code, which may result in lower performance with other languages.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

ModernBERT-large

ModernBERT-large Visit Over Time

ModernBERT-large Visit Trend

ModernBERT-large Visit Geography

ModernBERT-large Traffic Sources

ModernBERT-large Alternatives

ModernBERT-large — High-performance bidirectional encoder Transformer model

LongVA — Long Contextual Transformer Model from Language to Vision

Flash-Decoding — Flash-Decoding for long-context inference

ModernBERT-base — Efficient bidirectional encoder model for processing long texts.

Infini-attention — Extends the Transformer model to handle infinitely long inputs

LLM Context Extender — Extends LLM context window

Transformer Explainer — A visualization tool for in-depth understanding of Transformer models

Google Vision Transformer — An image recognition model based on the Transformer architecture

LTM — Long-context model, revolutionizing software development

MiniMax-M1-80k — A large language model that supports an ultra-long context of 80,000 tokens.

EgoLife — EgoLife is a long-term, multi-modal, multi-view daily life AI assistant project aimed at advancing research in long-term context understanding.

InternLM2.5-7B-Chat-1M — A 7 billion parameter long-context dialogue model

Masked Diffusion Transformer (MDT) — Masked Diffusion Transformer is the latest technology in image synthesis, achieving SOTA (State of the Art) at ICCV 2023.

AI21-Jamba-1.5-Large — An advanced hybrid SSM-Transformer model that adheres to instruction-following principles

Samba — Official implementation of an efficient infinite context language model

Split Long Text for Chat GPT — Split long texts for seamless Chat GPT conversations.

Model Context Protocol Servers — A collection of reference implementations and community-contributed servers for the Model Context Protocol.

Scite — View article citation context

Star-Attention — EfficientInference Technology for Long Sequence Large Language Models

Flux1 Context — Edit pictures using natural language instructions while maintaining context and identity consistency.

Baichuan2-192K — The world's largest context window large model

MoBA — MoBA is a Mixed Block Attention mechanism for long text contexts designed to improve the efficiency of large language models.

VideoRAG — VideoRAG is a retrieval-augmented generation framework designed for processing videos with extremely long context.

MiniMax-01 — A powerful language model with a total of 456 billion parameters, capable of processing context lengths of up to 4 million tokens.

MiniMax-Text-01 — MiniMax-Text-01 is a powerful language model with a total of 456 billion parameters, capable of handling a context of up to 4 million tokens.

ReLLM — Permission-Aware Context Provider

Flux Context — FLUX Context provides advanced AI-powered image editing tools, including style transfer, text-driven modifications, and context-aware transformations.

CogView — A Pre-trained Transformer Model for General-Lensity Text-to-Image Generation Based on Transformer

ModernBERT — ModernBERT is a next-generation encoder model with outstanding performance.

Megatron-LM — Continuous research on training Transformer models at scale.

GEO Services