As large language models (LLM) got bigger with more and more parameters, new techniques to reduce their memory usage have also been proposed.
One of the most effective methods to reduce the model size in memory is quantization. You can see quantization as a compression technique for LLMs. In...