Teuken-7B, a language model with 7 billion parameters, has now been launched on Hugging Face, supporting all 24 official languages of the European Union. This model was developed by the EU OpenGPT-X research project and is available as an open-source project for users. Unlike most AI language models that are centered around English, Teuken-7B was built from the ground up, with about half of its training data sourced from non-English European languages.
Image Source Note: The image was generated by AI, with licensing provided by Midjourney.
The development team stated that Teuken-7B performs exceptionally well across all trained languages, particularly demonstrating impressive reliability when handling non-English languages. To assess the performance of language models in European languages, the project team has also created a brand new European LLM leaderboard, surpassing previous testing methods that were primarily based on English.
This release marks a significant advancement for Europe in promoting multilingual AI models, while also providing developers with a powerful and diverse tool to support cross-language applications and research.