This is the AWQ quantized version of the Saiga Mistral 7B model developed by Ilya Gusev. AWQ is an efficient, accurate, and fast low-bit weight quantization method that supports 4-bit quantization. This version aims to provide faster inference speed and lower resource consumption while maintaining comparable quality to the original model. It supports various inference tools and environments, including text-generation-webui, vLLM, Hugging Face TGI, and Transformers.
Natural Language Processing
Transformers