MiniMax-01 is a robust language model with a total of 456 billion parameters, where each token activates 45.9 billion parameters. It employs a hybrid architecture that combines lightning attention, softmax attention, and mixture of experts (MoE). Through advanced parallel strategies and innovative computation-communication overlap methods, such as Linear Attention Sequence Parallelism (LASP+), variable-length ring attention, and expert tensor parallelism (ETP), it extends the training context length to 1 million tokens and can process contexts of up to 4 million tokens during inference. MiniMax-01 has demonstrated top-tier model performance across multiple academic benchmarks.