In the rapidly evolving field of artificial intelligence, small language models (LLMs) are becoming increasingly significant. They are not only capable of running efficiently on consumer-grade hardware but also support offline application scenarios. The H2O.ai team is proud to introduce H2O-Danube3, a series of small language models that have demonstrated high competitiveness in various academic, chat, and fine-tuning benchmark tests.
H2O-Danube3 includes two models: H2O-Danube3-4B (400 million parameters) and H2O-Danube3-500M (50 million parameters). Both models were pre-trained on 6T and 4T tokens, respectively, using high-quality web data, primarily English tokens, and underwent three stages of different data blending before being fine-tuned for chat requirements.
Technical Highlights:
Efficient Architecture: The architecture of H2O-Danube3 focuses on parameter and computational efficiency, allowing it to run efficiently even on modern smartphones, enabling local inference and rapid processing capabilities.
Open Source License: All models are publicly available under the Apache2.0 license, further promoting the popularization of large language models (LLMs).
Variety of Application Scenarios: H2O-Danube3 can be used for chatbots, research, fine-tuning for specific use cases, and even offline applications on mobile devices.
H2O-Danube3 has performed exceptionally well in multiple academic benchmark tests, achieving top scores in CommonsenseQA and PhysicsQA and reaching an accuracy of 50.14% on the GSM8K mathematical benchmark test. Additionally, it has shown strong performance in chat and fine-tuning benchmark tests.
Another common application of small language models is fine-tuning. H2O-Danube3 has demonstrated excellent adaptability and performance after fine-tuning on text classification tasks. Even the 500M model, with fewer parameters, shows high competitiveness after fine-tuning.
To further promote the application of the model on edge devices, H2O-Danube3 provides quantization versions that significantly reduce the model size while maintaining performance.
The launch of H2O-Danube3 not only enriches the ecosystem of open-source small language models but also provides strong support for various application scenarios. From chatbots to fine-tuning for specific tasks and offline applications on mobile devices, H2O-Danube3 showcases its wide applicability and efficiency.
Model download link: https://top.aibase.com/tool/h2o-danube3
Paper link: https://arxiv.org/pdf/2407.09276