Tag: GTPQ

Run Llama 2 on Your CPU with Rust

A new one-file Rust implementation of Llama 2 is now available thanks to Sasha Rush. It’s a Rust port of Karpathy’s llama2.c. It already supports the following features: Support for 4-bit GPT-Q Quantization SIMD support for fast CPU inference Support for Grouped ...