Megatron-LM is a powerful large-scale Transformer model developed by NVIDIA's Applied Deep Learning Research team. It is used in continuous research on training Transformer language models at scale. We utilize mixed precision, efficient model parallelism and data parallelism, along with the pre-training of multi-node Transformer models such as GPT, BERT, and T5.