ReLoRa: Pre-train a Large Language Model on Your GPU

<p>In 2021,&nbsp;<a href="https://arxiv.org/abs/2106.09685" rel="noopener ugc nofollow" target="_blank">Hu et al.</a>&nbsp;proposed low-rank adapters (LoRa) for LLMs. This method significantly reduces the cost of fine-tuning large language models (LLMs) by only training a few added parameters (low-rank networks) while keeping the LLM&rsquo;s original parameters (high-rank networks) frozen.</p> <p>With LoRa, we still need an existing pre-trained model to fine-tune, i.e., it can&rsquo;t pre-train a good LLM from scratch due to the low-rank restrictions. It leaves pre-training unaffordable for most individuals and organizations.</p> <p>To reduce this cost,&nbsp;<a href="https://arxiv.org/pdf/2307.05695.pdf" rel="noopener ugc nofollow" target="_blank">Lialin et al. (2023)</a>&nbsp;propose ReLoRa. This is a modification of LoRa that allows pre-training LLMs from scratch.</p> <p><a href="https://medium.com/towards-data-science/relora-pre-train-a-large-language-model-on-your-gpu-d104756f9ddf"><strong>Read More</strong></a></p>