Recently, open-source platform Hugging Face and NVIDIA announced an exciting new service — Inference-as-a-Service, powered by NVIDIA's NIM technology. This new service allows developers to prototype faster, utilize open-source AI models available on the Hugging Face Hub, and deploy them efficiently.

image.png

This announcement was made at the ongoing SIGGRAPH2024 conference, which brings together experts in computer graphics and interactive technology. The collaboration between NVIDIA and Hugging Face, revealed at this time, presents new opportunities for developers. With this service, developers can easily deploy powerful large language models (LLMs) such as Llama2 and Mistral AI models, with NVIDIA's NIM microservices providing optimization.

Specifically, when accessed in the form of NIM, models like the 7-billion-parameter Llama3 model process data five times faster than when deployed on a standard NVIDIA H100 Tensor Core GPU system, representing a significant improvement. Additionally, this new service supports "Train on DGX Cloud," which is currently available on Hugging Face.

NVIDIA's NIM is a suite of AI microservices optimized for inference, covering NVIDIA's AI foundation models and open-source community models. It significantly enhances token processing efficiency through standard APIs and strengthens the infrastructure of NVIDIA DGX Cloud, accelerating the response speed and stability of AI applications.

The NVIDIA DGX Cloud platform is tailored for generative AI, providing a reliable and accelerated computing infrastructure that assists developers through the entire process from prototyping to production without long-term commitments. The collaboration between Hugging Face and NVIDIA will further strengthen the developer community, and Hugging Face recently announced that its team has become profitable, with a team size of 220 and the launch of the SmolLM series of small language models.

Key Points:

🌟 Hugging Face and NVIDIA launch Inference-as-a-Service, boosting AI model token processing efficiency by five times.

🚀 The new service supports rapid deployment of powerful LLM models, optimizing the development process.

💡 NVIDIA DGX Cloud platform provides accelerated infrastructure for generative AI, simplifying the production workflow for developers.