Multi-token prediction is a technology developed by Facebook based on large language model research. It aims to improve model efficiency and performance by predicting multiple future tokens. This technique enables the model to generate multiple tokens in a single forward pass, thereby accelerating generation speed and potentially enhancing model accuracy. The model is freely available for non-commercial research use, but usage is subject to Meta's privacy policies and applicable laws and regulations.