Mobile Large Model Runs 5 Times Faster! Microsoft Research Asia Opens New Technology for Ultra-Fast Experience on CPUs

AIbase基地

Published inAI News · 5 min read · Aug 9, 2024

190

In an era where smart devices are ubiquitous, we yearn to enhance the intelligent processing capabilities of smartphones, tablets, and even smart home devices. However, the hardware resources of these edge devices are limited, particularly in terms of memory and computational power, which restricts the deployment and operation of large language models (LLMs) on them. Imagine if these devices could harness powerful models capable of understanding natural language, answering questions, and even engaging in creative tasks—how might that transform our world?

This is the backdrop against which T-MAC technology was born. T-MAC, short for "Table-Lookup-based MAC," is a lookup table-based method that enables efficient operation of low-bit large language models on CPUs, thereby facilitating intelligent upgrades on edge devices.

Large language models typically contain billions or even hundreds of billions of parameters, which require substantial memory for storage. To deploy these models on edge devices, we need to quantize the model weights—using fewer bits to represent the weights—to reduce memory footprint. However, quantized models require mixed-precision matrix multiplication (mpGEMM) during operation, which is not common in existing hardware and software systems and lacks efficient support.

The core idea of T-MAC is to transform traditional data-type-based multiplication operations into bit-based lookup table (LUT) lookups. This approach not only eliminates multiplication operations but also reduces addition operations, significantly enhancing computational efficiency.

Specifically, T-MAC achieves this through the following steps:

Decompose the weight matrix into multiple one-bit matrices.

Precompute the product of the activation vector with all possible one-bit patterns and store the results in a lookup table.

During inference, quickly obtain the final matrix multiplication result through lookup table indexing and accumulation operations.

Tests on various edge devices have shown that T-MAC offers significant performance advantages. Compared to the existing llama.cpp implementation, T-MAC quadruples throughput and reduces energy consumption by 70%. This enables even low-end devices, like the Raspberry Pi5, to generate tokens at speeds exceeding an average adult's reading pace.

T-MAC not only holds theoretical advantages but also has practical application potential. Whether it's real-time speech recognition and natural language processing on smartphones or providing smarter interaction experiences on smart home devices, T-MAC plays a crucial role.

T-MAC technology offers an efficient and energy-saving solution for deploying low-bit large language models on edge devices. It not only enhances the intelligence level of devices but also enriches and simplifies the smart experiences for users. With ongoing technical development and optimization, we have reason to believe that T-MAC will play an increasingly significant role in the field of edge intelligence.

Open Source Address: https://github.com/microsoft/T-MAC

Paper Address: https://www.arxiv.org/pdf/2407.00088

Smart Devices Large Language Models T-MAC Technology Edge Computing

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AI Daily Report - June 30th: Baidu Open Sources the WENXIN Large Model 4.5 Series; Tongyi Qianwen Multimodal Generation Model Qwen VLo

Welcome to the AIbase [AI Daily Report] section! Spend three minutes a day to learn about the latest AI events, helping you understand AI industry trends and innovative AI product applications. For more AI news, visit: https://www.aibase.com/zh1. Baidu officially releases the WENXIN Large Model 4.5 series and fully opens it to the public, featuring ten new models with various parameter configurations. These models are trained and inferred using the PaddlePaddle framework, achieving a FLOPs utilization rate of 47%, and perform well in multi-modal text tasks.

Jun 30, 2025

200

Baidu Launches the WENXIN Large Model 4.5 Series Open Source, Sparking a New Wave in the Domestic Large Model Market!

Recently, Baidu officially announced the open-source release of its WENXIN Large Model 4.5 series, launching a total of ten models, including mixed expert (MoE) models with 47B and 3B activated parameters, as well as dense models with 0.3B parameters. This open-source initiative not only fully publicizes the pre-trained weights but also provides inference code, marking a significant advancement for Baidu in the field of large models. These newly released models can be downloaded and deployed on platforms such as PaddlePaddle Starry Sky Community and Hugging Face. Additionally, Baidu Intelligent Cloud's Qianfan Large Model Platform also provides

Jun 30, 2025

250

Baidu Makes a Major Open-Source Release of the ERNIE Bot 4.5 Series with Ten New Models Unveiled!

Baidu officially released the ERNIE Bot 4.5 series models and made them fully open source. Users can experience this latest open-source technology immediately through ERNIE Bot (https://yiyan.baidu.com). This series includes multiple parameter configurations, such as Mixture of Experts (MoE) models with activated parameters of 47B and 3B, as well as dense models designed with 0.3B parameters, totaling ten different models. In terms of training and inference, the ERNIE 4.5 series models use PaddlePaddle deep learning.

Jun 30, 2025

600

Breaking News! GPT-5 is About to Arrive, Take You into a New Multimodal AI Era!

Recently, news about OpenAI's upcoming release of GPT-5 has attracted widespread attention in the technology industry. According to insiders, GPT-5 has already started a gradual test and is expected to be officially launched in July this year. This new model will adopt a multimodal design, meaning it can not only process text input but also understand speech, images, code, and even videos, completely changing the way we interact with AI. Sam Altman, CEO of OpenAI, stated that the launch of GPT-5 will mark a new era in AI.

Jun 30, 2025

430

Baidu's WENXIN Series Large Models Are Open-Sourced on the PaddlePaddle Platform, Covering Multiple Latest Models

Baidu's WENXIN series large models have recently been open-sourced on its PaddlePaddle platform, including dozens of latest models such as ERNIE-4.5-VL-424B-A47B-Paddle and ERNIE-4.5-300B-A47B-Paddle. Although Baidu has not actively disclosed this open-source initiative, updates on the PaddlePaddle platform show that these actions were concentrated between June 29th and June 30th, marking its latest move. A source within the company confirmed: official

Jun 30, 2025

190

Alibaba Ovis-U1 Launches with a Bang: A Multi-Modal AI All-in-One, Open Source Empowers Global Developers

On June 29, 2025, the Alibaba International AI Team officially released the new multi-modal large model **Ovis-U1**, marking another major breakthrough in the field of multi-modal artificial intelligence. As the latest masterpiece of the Ovis series, Ovis-U1 integrates multi-modal understanding, image generation, and image editing functions, demonstrating powerful cross-modal processing capabilities, providing new possibilities for developers, researchers, and industry applications. This is a detailed report on Ovis-U1 by AIbase. Ovis-U1

Jun 30, 2025

680

Runway AI Launches Its New Game World: A Large Interactive Text Adventure

Recently, AI technology leader Runway announced the upcoming launch of its new generative AI platform, "Game Worlds." This innovative product marks Runway's successful expansion from the film industry into the gaming sector, offering creators and players a brand-new interactive experience. "Game Worlds": An AI-Driven Interactive Text Adventure. The Runway Game Worlds platform is built on generative AI, allowing users to create and experience text-based adventure games with simple text input. Compared to traditional...

Jun 30, 2025

620

Surprising Similarities Between Large Language Model Search Optimization and Traditional SEO Strategies

Recently, ERGO Innovation Lab and ECODYNAMICS conducted a study focusing on how insurance-related content is displayed in AI-driven search. The research analyzed over 33,000 AI search results and 600 websites, exploring the preferences of large language models (LLMs) such as ChatGPT when processing this content. The study found that LLMs tend to prioritize content that is easy to read, well-structured, and trustworthy, which closely aligns with traditional SEO strategies.

Jun 30, 2025

100

Major Release! China's First Open-Source Oceanic Large Model OceanGPT (Cangyuan) Makes Its Debut!

Jun 30, 2025

Breakthrough Release! The World's First Trillion-Parameter Large Model for the Power Generation Industry, Qinyuan, Makes Its Debut!

On June 30th, the world's first trillion-parameter large model for the power generation industry, Qinyuan, developed independently by the State Energy Group, was officially released. This innovative large model, with its unique full-stack independent and controllable characteristics, marks an important step forward for the power generation industry into a new era of intelligent decision-making. The Qinyuan large model aims to integrate the diverse scenarios, high complexity, and strong professionalism of the power generation industry, fully utilizing the State Energy Group's globally largest installed capacity and massive data resources. The launch of this model not only creates a new engine for multi-energy coordination and dynamic optimization, but also establishes an electricity

Jun 30, 2025

150

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Mobile Large Model Runs 5 Times Faster! Microsoft Research Asia Opens New Technology for Ultra-Fast Experience on CPUs

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily Report - June 30th: Baidu Open Sources the WENXIN Large Model 4.5 Series; Tongyi Qianwen Multimodal Generation Model Qwen VLo

Baidu Launches the WENXIN Large Model 4.5 Series Open Source, Sparking a New Wave in the Domestic Large Model Market!

Baidu Makes a Major Open-Source Release of the ERNIE Bot 4.5 Series with Ten New Models Unveiled!

Breaking News! GPT-5 is About to Arrive, Take You into a New Multimodal AI Era!

Baidu's WENXIN Series Large Models Are Open-Sourced on the PaddlePaddle Platform, Covering Multiple Latest Models

Alibaba Ovis-U1 Launches with a Bang: A Multi-Modal AI All-in-One, Open Source Empowers Global Developers

Runway AI Launches Its New Game World: A Large Interactive Text Adventure

Surprising Similarities Between Large Language Model Search Optimization and Traditional SEO Strategies

Major Release! China's First Open-Source Oceanic Large Model OceanGPT (Cangyuan) Makes Its Debut!

Breakthrough Release! The World's First Trillion-Parameter Large Model for the Power Generation Industry, Qinyuan, Makes Its Debut!