Meta has recently unveiled the next generation of its open-source model series, Llama3.1, which includes a version with 405 billion parameters. This model's performance is on par with, and in some benchmarks even surpasses, closed-source models like GPT-4. The Llama3.1-8B-Instruct, a variant with 8 billion parameters, supports English, German, French, Italian, Portuguese, Spanish, Hindi, and Thai, with a context length of up to 131,072 tokens, and its knowledge is updated until December 2023.

To enhance the capabilities of Llama3.1-8B-Instruct, Meta used over 25 million synthetic data points generated by the larger 405B model during training. This has resulted in Llama3.1-8B-Instruct demonstrating cognitive and reasoning abilities comparable to GPT3.5Turbo in tests involving code and mathematics.

WeChat Screenshot_20240725083410.png

OpenBuddy has leveraged the Llama3.1-8B-Instruct model and, through training on a small amount of Chinese data, released OpenBuddy-Llama3.1-8B-v22.1-131K, a next-generation open-source cross-lingual model capable of Chinese Q&A and translation across languages. Despite Llama3.1 not having inherent Chinese capabilities, the trained model can generate answers to questions that often lead to conceptual confusion, which are typically only produced by larger models, indicating a greater cognitive potential.

However, due to the limitations of the training dataset and time, OpenBuddy-Llama3.1-8B-v22.1 still has limitations in Chinese knowledge, particularly in traditional cultural knowledge. Nevertheless, the model demonstrates relatively stable performance in tasks such as long-text comprehension, thanks to its inherent long-text capabilities.

In the future, OpenBuddy plans to conduct larger-scale training for the 8B and 70B models to enhance the model's Chinese knowledge reserve, long-text capabilities, and cognitive abilities, and to explore the possibility of fine-tuning the 405B model.

Project Address: https://modelscope.cn/models/OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k