A new one-file Rust implementation of Llama 2 is now available thanks to Sasha Rush. It’s a Rust port of Karpathy’s llama2.c. It already supports the following features:
Support for 4-bit GPT-Q Quantization
SIMD support for fast CPU inference
Support for Grouped ...