AIbase
Product LibraryTool Navigation

Compressed-Transformers

Public

In this repository, we explore model compression for transformer architectures via quantization. We specifically explore quantization aware training of the linear layers and demonstrate the performance for 8 bits, 4 bits, 2 bits and 1 bit (binary) quantization.

Creat2020-11-07T20:28:13
Update2025-03-19T03:34:52
24
Stars
0
Stars Increase