ModernBERT-large

High-performance bidirectional encoder Transformer model

CommonProductProgrammingBERTTransformer
ModernBERT-large is a state-of-the-art bidirectional encoder Transformer model (BERT style) pre-trained on 2 trillion tokens of English and code data, with a native context length of up to 8192 tokens. This model incorporates the latest architectural improvements such as Rotary Positional Embeddings (RoPE) for long-context support, local-global alternating attention for enhanced efficiency with long inputs, and padding-free and Flash Attention for improved inference speed. ModernBERT-long is suitable for tasks involving the handling of long documents, such as retrieval, classification, and semantic search within large corpora. The training data primarily consists of English and code, which may result in lower performance with other languages.
Visit

ModernBERT-large Visit Over Time

Monthly Visits

20899836

Bounce Rate

46.04%

Page per Visit

5.2

Visit Duration

00:04:57

ModernBERT-large Visit Trend

ModernBERT-large Visit Geography

ModernBERT-large Traffic Sources

ModernBERT-large Alternatives