Generative LLM PowerInfer: Runs on a Single GPU, 11x Improvement in Machine Learning Model Inference Speed

站长之家

Published inAI News · 2 min read · Dec 25, 2023

118

The data to be translated: Generative large language models are renowned for their exceptional performance across a variety of tasks, including complex natural language processing, creative writing, question-answering, and code generation. LLMs have been running on user-friendly local systems, including home PCs equipped with consumer-grade GPUs. It is understood that PowerInfer is a GPU-CPU hybrid inference engine that leverages this understanding, preloading cold-activated neurons onto the CPU for computation and hot-activated neurons onto the GPU for immediate access. Evaluations have shown that PowerInfer is also 11.69 times faster than the current llama.cpp system while maintaining model fidelity. In summary, PowerInfer significantly enhances LLM inference speed, indicating its potential for execution on desktop computers with limited GPU capabilities.

Machine Learning Model GPU Inference Speed

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Aethir Launches AI Unbundled Alliance to Accelerate Web3 AI Innovation

Aethir, a decentralized GPU cloud computing provider, today announced the launch of an industry alliance called "AI Unbundled" to advance the Web3 artificial intelligence ecosystem. This initiative brings together leading organizations including 0G Labs, Biconomy, Polyhedra, Oasis Protocol Foundation, ChainGPT, IoTeX, iExec, Geodnet, Flock.io, and Alpha Ne.

Apr 22, 2025

160

Intel Open-Sources AI Playground: Arc GPU-Powered Local AI Model Execution

Intel recently announced the open-sourcing of its AI Playground software, designed for local generative AI. AI Playground provides a powerful platform for running AI models on Intel Arc GPUs. It supports various image and video generation models, as well as Large Language Models (LLMs), significantly lowering the hardware barrier for AI applications by optimizing local computing resources. The project is available on GitHub and has attracted developers and AI enthusiasts worldwide.

Apr 21, 2025

250

Intel Open Sources AI Playground for Intel Arc GPUs and Various AI Models

Intel has announced the open-sourcing of its generative AI software, AI Playground, generating significant interest within the AI community. Optimized for Intel Arc GPUs and integrated graphics, AI Playground is described as an 'AI hub' that supports local running of chat-based Large Language Models (LLMs), as well as image and video generation capabilities. This open-sourcing signifies Intel's commitment to advancing the accessibility of generative AI technology.

Apr 21, 2025

200

AMD GPU Performance Leap! Significant Stable Diffusion Model Optimization

AMD's advancements in AI are noteworthy, particularly its latest optimizations for the Stable Diffusion model. Recently, Stability AI released an ONNX-optimized version of Stable Diffusion, resulting in significantly improved performance for AMD Radeon GPUs and Ryzen integrated graphics in AI tasks, with speed increases up to 3.8 times faster. This progress narrows the gap with NVIDIA's ecosystem...

Apr 18, 2025

550

Hugging Face Adds Handy Feature: One-Click Check for Compatible Models

Hugging Face, a leading open-source AI community platform, has launched a highly anticipated new feature: users can quickly see which machine learning models their computer hardware can run via platform settings. Users simply add their hardware information, such as GPU model, to their Hugging Face profile settings (located at the top right corner: Profile Icon > Settings > Local Apps and Hardware).

Apr 3, 2025

590

Former Intel CEO Criticizes Nvidia's AI Chip Pricing, Sees Inference as Future Opportunity

Former Intel CEO Pat Gelsinger recently criticized Nvidia's pricing strategy for its AI GPUs on the Acquired podcast during Nvidia's 2025 GPU Technology Conference, arguing that the high cost makes them unsuitable for large-scale AI inference tasks. Gelsinger highlighted that inference is crucial for deploying AI models and that the industry should focus more on inference, a domain where Nvidia's technology falls short in terms of cost-effectiveness. Image source omitted.

Mar 25, 2025

230

Wang Xing: Meituan's Developed Internal Large Language Model LongCat, Investing Billions in GPU Resources

Meituan CEO Wang Xing detailed the company's strategic plan for artificial intelligence (AI). Wang revealed that over the past year, Meituan prioritized securing GPU resources, investing heavily in AI infrastructure. He further stated that Meituan plans to further increase investment in key AI infrastructure in 2025 to strengthen its position in this field.

Mar 24, 2025

870

Open-Sora 2.0: Commercial-Grade Video AI at a Tenth the Cost!

HPC-AI Tech recently announced Open-Sora 2.0, a groundbreaking video AI system achieving commercial-grade quality at approximately one-tenth the typical training cost. This represents a potential paradigm shift in the resource-intensive field of video AI, mirroring efficiency gains seen in language models. While existing high-quality video generation systems like Movie Gen and Step-Video-T2V can cost millions of dollars to train, Open-Sora 2.0 offers a significantly more affordable alternative.

Mar 20, 2025

250

Nvidia Showcases Next-Generation AI Chips, Focusing on Blackwell Series GPUs

Mar 18, 2025

290

Suzhou Issues Measures to Accelerate the Development of AI Chip Industry

Mar 18, 2025

120

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview