Hugging Face Launches 2B Parameter Visual Language Model SmolVLM: Runs Quickly on Ordinary Devices

AIbase基地

Published inAI News · 5 min read · Nov 27, 2024

194

In recent years, the demand for machine learning models in visual and language tasks has been growing rapidly. However, most models require substantial computational resources, making it difficult to run them efficiently on personal devices. This is especially challenging for smaller devices such as laptops, consumer-grade GPUs, and mobile devices when handling visual language tasks.

For example, while Qwen2-VL performs exceptionally well, its high hardware requirements limit its usability in real-time applications. Therefore, developing lightweight models that can run efficiently with lower resources has become an important necessity.

Recently, Hugging Face released SmolVLM, a 2B parameter visual language model specifically designed for on-device inference. SmolVLM outperforms other similar models in terms of GPU memory usage and token generation speed. Its main feature is the ability to run effectively on smaller devices like laptops or consumer-grade GPUs without sacrificing performance. SmolVLM strikes an ideal balance between performance and efficiency, addressing issues that previous models have struggled to overcome.

Compared to Qwen2-VL2B, SmolVLM generates tokens 7.5 to 16 times faster, thanks to its optimized architecture that enables lightweight inference. This efficiency not only provides practical benefits for end users but also greatly enhances the overall user experience.

From a technical perspective, SmolVLM features an optimized architecture that supports efficient on-device inference. Users can even fine-tune it easily on Google Colab, significantly lowering the barriers for experimentation and development.

Due to its low memory footprint, SmolVLM can run smoothly on devices that previously could not support similar models. In tests conducted on 50-frame YouTube videos, SmolVLM performed excellently, achieving a score of 27.14%, and outperformed two more resource-intensive models in terms of resource consumption, demonstrating its strong adaptability and flexibility.

SmolVLM represents a significant milestone in the field of visual language models. Its launch enables complex visual language tasks to be performed on everyday devices, filling an important gap in current AI tools.

Not only does SmolVLM excel in speed and efficiency, but it also provides developers and researchers with a powerful tool for visual language processing without the need for expensive hardware. As AI technology continues to proliferate, models like SmolVLM will make powerful machine learning capabilities more accessible.

Demo: https://huggingface.co/spaces/HuggingFaceTB/SmolVLM

https://huggingface.co/spaces/HuggingFaceTB/SmolVLM

Key Points:

🌟 SmolVLM is a 2B parameter visual language model launched by Hugging Face, designed for on-device inference, running efficiently without high-end hardware.

⚡ Its token generation speed is 7.5 to 16 times faster than similar models, greatly enhancing user experience and application efficiency.

📊 In testing, SmolVLM demonstrated strong adaptability, achieving good scores even without training on video data.

Hugging Face's Top Model Leaderboard Unveiled: AI Innovation Continues to Heat Up

Hugging Face recently released its top model leaderboard for the second week of April 2025, encompassing various modalities such as text generation, image generation, and video generation. This highlights the rapid iteration and diverse applications of AI technology. According to AIbase, the models featured in this week's leaderboard not only showcase the innovative vitality of the open-source community but also reflect technical trends such as low-precision training and multi-modal generation. Below is an analysis of the leaderboard highlights, with professional insights provided by the AIbase editorial team. Text Generation Models: Balancing Efficiency and Specialization

Reachy2 Robot Released: Natural Interaction, $70,000 Price Tag

Hugging Face announced the launch of the open-source humanoid robot, Reachy2, through the acquisition of French startup Pollen Robotics. This news sparked significant discussion on social media and within the AI community, considered a major milestone in the convergence of humanoid robotics and generative AI. Designed as a lab partner for AI research and education, Reachy2's open-source nature, advanced capabilities, and human-centric design have quickly made it a focus for top labs globally.

Hugging Face Acquires Pollen Robotics, Ushering in a New Era for Robotics

On April 15th, Hugging Face, the renowned open-source large language model platform, announced its acquisition of Pollen Robotics, marking its official entry into the physical robotics field. While specific transaction terms remain undisclosed, the acquisition will bring approximately 20 Pollen Robotics employees to Hugging Face. This represents the company's largest personnel acquisition to date, signifying its ambition in expanding its business areas. Hugging Face's co-founder...

Hugging Face, Prominent Open-Source AI Platform, Acquires Pollen Robotics to Enter Robotics Market

Hugging Face, a leading AI development platform, recently announced the acquisition of Pollen Robotics, a French humanoid robotics startup, marking its strategic foray into the robotics sector. While the financial details of the deal remain undisclosed, the acquisition has generated significant attention. Founded in 2016 by engineers Matthieu Lapeyre and Pierre Rouanet, Pollen Robotics' flagship product, Reachy2, is an advanced humanoid robot already utilized at Cornell...

Hugging Face Acquires Pollen Robotics to Accelerate Open-Source Robotics

Hugging Face, the AI development platform, has announced the acquisition of French robotics startup Pollen Robotics for an undisclosed sum. This marks Hugging Face's first foray into hardware and aims to promote the global adoption and development of open-source robotics. Pollen Robotics, founded in 2016 and based in Bordeaux, France, is known for its open-source humanoid robot, Reachy2. Priced at approximately $70,000, Reachy2 has been adopted by institutions such as Cornell University.

VisualCloze: A Highly Flexible Image Generation Framework Leveraging Visual Context Learning

Innovation in AI-powered image generation continues at a rapid pace. Hugging Face recently launched VisualCloze, a new tool utilizing Visual In-Context Learning, marking a significant advancement in general image generation frameworks. AIbase, through analysis of recent social media activity, provides an in-depth look at this tool's highlights and potential, offering readers a firsthand report.

Hugging Face Adds Handy Feature: One-Click Check for Compatible Models

Hugging Face, a leading open-source AI community platform, has launched a highly anticipated new feature: users can quickly see which machine learning models their computer hardware can run via platform settings. Users simply add their hardware information, such as GPU model, to their Hugging Face profile settings (located at the top right corner: Profile Icon > Settings > Local Apps and Hardware).