This Is Why You Can???t Use Llama-2

Open-Source Foundation Models

We have seen an explosion of open-source foundation models with the likes of Llama-2, Falcon, and Bloom, to name a few. However, the largest of these models are pretty much impossible to use for a person of modest means.

Large language models have a large number of parameters. Take Llama-2 for instance, the largest version of it has 70 billion parameters.

The scale of these models ensures that for most researchers, hobbyists or engineers, the hardware requirements are a significant barrier.

If you’re reading this I gather you have probably tried but you have been unable to use these models. Let’s look at the hardware requirements for Meta’s Llama-2 to understand why that is.

Why you can’t use Llama-2

Photo by Ilias Gainutdinov on Unsplash

To load a model in full precision, i.e. 32-bit (or float-32) on a GPU for downstream training or inference, it costs about 4GB in memory per 1 billion parameters¹. So, just to load Llama-2 at 70 billion parameters, it costs around 280GB in memory at full precision.

Now, there is the option to load models at different precision levels (at the sacrifice of performance). If you load in 8-bit, you will incur 1GB of memory per billion parameters, which would still require 70GB of GPU memory for loading in 8-bit.

Website

This Is Why You Can???t Use Llama-2

Open-Source Foundation Models

Why you can’t use Llama-2

Related posts

Recent posts