On January 15, 2025, MiniMax announced the open-source release of its new model series, MiniMax-01, which includes the foundational language model MiniMax-Text-01 and the visual multimodal model MiniMax-VL-01. The MiniMax-01 series features bold architectural innovations, achieving large-scale implementation of linear attention mechanisms for the first time, breaking the limitations of traditional Transformer architectures. With a staggering 456 billion parameters and 45.9 billion activations per instance, its overall performance is comparable to leading overseas models, efficiently handling contexts of up to 4 million tokens—32 times the length of GPT-4o and 20 times that of Claude-3.5-Sonnet.

MiniMax believes that 2025 will be a pivotal year for the rapid development of agents. Whether for single-agent or multi-agent systems, longer contexts are needed to support persistent memory and extensive communication. The launch of the MiniMax-01 series models is a step towards meeting this demand and establishing the foundational capabilities for complex agents.

WeChat Screenshot_20250115091926.png

Thanks to architectural innovations, efficiency optimizations, and an integrated design for training and inference, MiniMax is able to offer API services for text and multimodal understanding at the industry's lowest price range, with standard pricing set at 1 RMB per million input tokens and 8 RMB per million output tokens. The MiniMax open platform and its international version are now live for developers to experience and use.

The MiniMax-01 series models have been open-sourced on GitHub and will continue to receive updates. In mainstream industry evaluations for text and multimodal understanding, the MiniMax-01 series has matched the performance of recognized advanced models such as GPT-4o-1120 and Claude-3.5-Sonnet-1022 on most tasks. Notably, in long document tasks, MiniMax-Text-01 shows the slowest performance degradation as input length increases, significantly outperforming Google’s Gemini model.

MiniMax's models exhibit high efficiency when processing long inputs, approaching linear complexity. In its structural design, 7 out of every 8 layers utilize linear attention based on Lightning Attention, while 1 layer employs traditional SoftMax attention. This is the first time linear attention mechanisms have been scaled to commercial model levels in the industry. MiniMax has conducted a comprehensive assessment in areas such as Scaling Law, integration with MoE, structural design, training optimization, and inference optimization, and has restructured its training and inference systems, including more efficient MoE all-to-all communication optimizations, longer sequence optimizations, and effective kernel implementations for linear attention in inference.

In most academic benchmarks, the MiniMax-01 series has achieved results on par with the top tier of overseas models. It has notably excelled in long context evaluation sets, such as the 4 million token Needle-In-A-Haystack retrieval task. In addition to academic datasets, MiniMax has also developed assistant scenario test sets based on real-world data, where MiniMax-Text-01 performed exceptionally well. In multimodal understanding test sets, MiniMax-VL-01 also demonstrated strong performance.

Open-source link: https://github.com/MiniMax-AI