BitNet

Inference framework for 1-bit large language models

CommonProductProgrammingLarge Language ModelsInference Framework

BitNet is an official inference framework developed by Microsoft, designed specifically for 1-bit large language models (LLMs). It provides a set of optimized core features that support fast and lossless 1.58-bit model inference on CPUs (with NPU and GPU support coming soon). BitNet achieves speedups ranging from 1.37x to 5.07x on ARM CPUs, with energy efficiency gains of 55.4% to 70.0%. On x86 CPUs, speed improvements range from 2.37x to 6.17x, and the energy efficiency ratio increases from 71.9% to 82.2%. Additionally, BitNet can run the 100B parameter BitNet b1.58 model on a single CPU, achieving inference speeds close to human reading rates, thus expanding the possibilities of running large language models on local devices.

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

BitNet

BitNet Visit Over Time

BitNet Visit Trend

BitNet Visit Geography

BitNet Traffic Sources

BitNet Alternatives

BitNet — Inference framework for 1-bit large language models

Models Table — A comprehensive list and information about large language models

FP6-LLM — Efficiently serving large language models

Large World Models — Large World Models: Understanding Video and Language

BiTA — Bidirectional Adjustment for Large Language Models

MInference — Accelerate the inference process of long context large language models

Sonus-1 — Sonus-1: A New Era of Large Language Models (LLMs)

deepeval — A evaluation and unit testing framework for Large Language Models (LLM)

redcache-ai — A dynamic memory framework that supports large language models and agents.

AutoDAN-Turbo — An automated framework for breaking the limitations of large language models

DCLM — Comprehensive framework for building and training large language models

RoleLLM — Role-playing framework for large language models

VSP-LLM — A framework that combines Visual Speech Processing with Large Language Models

KnowEdit — A knowledge editing benchmark for evaluating the knowledge editing capabilities of large language models.

Phi Open Models — Phi Open Models are powerful, cost-effective, low-latency small language models.

LLM Maybe LongLM — Extends the context window of large language models

Star-Attention — EfficientInference Technology for Long Sequence Large Language Models

LongLLaVA — Efficiently extending multimodal large language models to 1,000 images.

Prompt Engineering Guide — A comprehensive guide to prompt engineering for large language models

PowerInfer-2 — An efficient large language model inference framework designed specifically for smartphones

Benchmarking API Performance of Large Language Models — In-depth analysis of key metrics like TTFT and TPS

Open LLM Leaderboard — A publicly accessible leaderboard of large language models.

Brainglue — Brainglue is an interesting experimental platform for large language models

T-MAC — Acceleration of low-bit large language model inference on CPU.

OpenAI Embedding Models — New generation embedding models with improved performance and lower prices.

FastVideo — Open-source framework that accelerates large video diffusion models

xLAM — Research on intelligent agents based on large language models

parsera — A lightweight Python library for web scraping using large language models.

LlamaIndex — A data framework for connecting custom data sources to large language models.

ASPIRE — A framework to enhance the selective prediction capability of large language models