This Is Why You Can’t Use Llama-2
<h1>Open-Source Foundation Models</h1>
<p>We have seen an explosion of open-source foundation models with the likes of <a href="https://huggingface.co/meta-llama/Llama-2-70b-chat-hf" rel="noopener ugc nofollow" target="_blank">Llama-2</a>, <a href="https://huggingface.co/tiiuae/falcon-40b" rel="noopener ugc nofollow" target="_blank">Falcon</a>, and <a href="https://huggingface.co/bigscience/bloom" rel="noopener ugc nofollow" target="_blank">Bloom</a>, to name a few. However, the largest of these models are pretty much impossible to use for a person of modest means.</p>
<p>Large language models have a large number of parameters. Take Llama-2 for instance, the largest version of it has 70 billion parameters.</p>
<p>The scale of these models ensures that for most researchers, hobbyists or engineers, the hardware requirements are a significant barrier.</p>
<p>If you’re reading this I gather you have probably tried but you have been unable to use these models. Let’s look at the hardware requirements for Meta’s Llama-2 to understand why that is.</p>
<h1>Why you can’t use Llama-2</h1>
<p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/0*YHJwR3Ce9OLKmzQ7" style="height:467px; width:700px" /></p>
<p>Photo by <a href="https://unsplash.com/@crinitus?utm_source=medium&utm_medium=referral" rel="noopener ugc nofollow" target="_blank">Ilias Gainutdinov</a> on <a href="https://unsplash.com/?utm_source=medium&utm_medium=referral" rel="noopener ugc nofollow" target="_blank">Unsplash</a></p>
<p>To load a model in full precision, i.e. 32-bit (or float-32) on a GPU for downstream training or inference, it costs about 4GB in memory per 1 billion parameters¹. So, just to load Llama-2 at 70 billion parameters, it costs around 280GB in memory at full precision.</p>
<p>Now, there is the option to load models at different precision levels (at the sacrifice of performance). If you load in 8-bit, you will incur 1GB of memory per billion parameters, which would still require 70GB of GPU memory for loading in 8-bit.</p>
<p><a href="https://pub.aimind.so/this-is-why-you-cant-use-llama-2-d33701ce0766"><strong>Website</strong></a></p>