This Is Why You Can’t Use Llama-2

<h1>Open-Source Foundation Models</h1> <p>We have seen an explosion of open-source foundation models with the likes of&nbsp;<a href="https://huggingface.co/meta-llama/Llama-2-70b-chat-hf" rel="noopener ugc nofollow" target="_blank">Llama-2</a>,&nbsp;<a href="https://huggingface.co/tiiuae/falcon-40b" rel="noopener ugc nofollow" target="_blank">Falcon</a>, and&nbsp;<a href="https://huggingface.co/bigscience/bloom" rel="noopener ugc nofollow" target="_blank">Bloom</a>, to name a few. However, the largest of these models are pretty much impossible to use for a person of modest means.</p> <p>Large language models have a large number of parameters. Take Llama-2 for instance, the largest version of it has 70 billion parameters.</p> <p>The scale of these models ensures that for most researchers, hobbyists or engineers, the hardware requirements are a significant barrier.</p> <p>If you&rsquo;re reading this I gather you have probably tried but you have been unable to use these models. Let&rsquo;s look at the hardware requirements for Meta&rsquo;s Llama-2 to understand why that is.</p> <h1>Why you can&rsquo;t use Llama-2</h1> <p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/0*YHJwR3Ce9OLKmzQ7" style="height:467px; width:700px" /></p> <p>Photo by&nbsp;<a href="https://unsplash.com/@crinitus?utm_source=medium&amp;utm_medium=referral" rel="noopener ugc nofollow" target="_blank">Ilias Gainutdinov</a>&nbsp;on&nbsp;<a href="https://unsplash.com/?utm_source=medium&amp;utm_medium=referral" rel="noopener ugc nofollow" target="_blank">Unsplash</a></p> <p>To load a model in full precision, i.e. 32-bit (or float-32) on a GPU for downstream training or inference, it costs about 4GB in memory per 1 billion parameters&sup1;. So, just to load Llama-2 at 70 billion parameters, it costs around 280GB in memory at full precision.</p> <p>Now, there is the option to load models at different precision levels (at the sacrifice of performance). If you load in 8-bit, you will incur 1GB of memory per billion parameters, which would still require 70GB of GPU memory for loading in 8-bit.</p> <p><a href="https://pub.aimind.so/this-is-why-you-cant-use-llama-2-d33701ce0766"><strong>Website</strong></a></p>
Tags: Llama Falcon