Model-Quantization
PublicQuantization is a technique to reduce the computational and memory costs of running inference by representing the weights and activations with low-precision data types like 8-bit integer (int8) instead of the usual 32-bit floating point (float32).
Creat:2023-08-07T12:38:21
Update:2025-02-05T11:01:43
4
Stars
0
Stars Increase