bitsandbytes

As large language models (LLM) got bigger with more and more parameters, new techniques to reduce their memory usage have also been proposed. One of the most effective methods to reduce the model size in memory is quantization. You can see quantization as a compression technique for LLMs. In...

Tag: bitsandbytes

GPTQ or bitsandbytes: Which Quantization Method to Use for LLMs — Examples with Llama 2