T-MAC

Acceleration of low-bit large language model inference on CPU.

PremiumNewProductProgrammingLow-bit inferenceCPU optimization
T-MAC is a kernel library that directly supports mixed-precision matrix multiplication using lookup tables, eliminating the need for quantization operations, aimed at accelerating low-bit large language model inference on CPUs. It supports various low-bit models including W4A16 for GPTQ/gguf, W2A16 for BitDistiller/EfficientQAT, and BitNet W1(.58)A8 on ARM/Intel CPUs across OSX/Linux/Windows. T-MAC achieved a token generation throughput of 20 tokens per second on a single core and 48 tokens per second on four cores for 3B BitNet on the Surface Laptop 7, making it 4-5 times faster than existing state-of-the-art low-bit CPU frameworks such as llama.cpp.
Visit

T-MAC Visit Over Time

Monthly Visits

503747431

Bounce Rate

37.31%

Page per Visit

5.7

Visit Duration

00:06:44

T-MAC Visit Trend

T-MAC Visit Geography

T-MAC Traffic Sources