ModernBERT-large
High-performance bidirectional encoder Transformer model
CommonProductProgrammingBERTTransformer
ModernBERT-large is a state-of-the-art bidirectional encoder Transformer model (BERT style) pre-trained on 2 trillion tokens of English and code data, with a native context length of up to 8192 tokens. This model incorporates the latest architectural improvements such as Rotary Positional Embeddings (RoPE) for long-context support, local-global alternating attention for enhanced efficiency with long inputs, and padding-free and Flash Attention for improved inference speed. ModernBERT-long is suitable for tasks involving the handling of long documents, such as retrieval, classification, and semantic search within large corpora. The training data primarily consists of English and code, which may result in lower performance with other languages.
ModernBERT-large Visit Over Time
Monthly Visits
20899836
Bounce Rate
46.04%
Page per Visit
5.2
Visit Duration
00:04:57