The Beijing Academy of Artificial Intelligence (BAAI) and the Artificial Intelligence Research Institute of China Telecom (TeleAI) have recently upgraded their jointly developed Tele-FLM series of large models, releasing the 52B instruction model FLM-2-52B-Instruct and the world's first trillion-parameter monolithic dense model Tele-FLM-1T, while also open-sourcing relevant technical reports and model checkpoints.
FLM-2-52B-Instruct is an instruction dialogue model obtained through instruction fine-tuning based on the Tele-FLM-52B base model, focusing on enhancing Chinese dialogue capabilities. Through supervised fine-tuning, it was trained using 1 million open-source instruction data, and the optimal results were achieved based on a subset of 30,000 data items. These data included mathematical problems, code, and multi-turn dialogue data. The model training employed specific batch sizes, learning rates, and epoch settings, and was evaluated on the AlignBench evaluation platform. The results showed that FLM-2-52B-Instruct achieved 90% of GPT-4's capabilities in Chinese dialogue.
Tele-FLM-1T is the world's first open-source trillion-parameter dense model, using a growth-based pre-training approach to save costs. The model structure has been improved based on the decoder-only Transformers of the GPT series, including the addition of Input and Output multipliers, rotational positional encoding, RMSNorm, and SwiGLU, among others. The growth process includes horizontal and vertical growth, using a value-preserving operator based on MSG improvements. The pre-training process employed specific hyperparameter settings.
FLM-2-52B-Instruct Model Open-Source Link:
https://huggingface.co/CofeAI/FLM-2-52B-Instruct-2407
Tele-FLM-1T Model Open-Source Link:
https://huggingface.co/CofeAI/Tele-FLM-1T
52B +1T Technical Report Link:
https://arxiv.org/abs/2407.02783
52B Base Model Technical Report Link:
https://arxiv.org/abs/2404.16645