Dive Into LoRA Adapters
<p>Large Language Models (LLMs) have taken the world by storm. Over the last year we have witnessed a massive leap in what they can do, going from quite narrow and restricted applications to now engaging in fluent, multi-turn conversations.</p>
<p>Isn’t it amazing how these models have shifted from extractive summarization—copying the source verbatim—to now providing abstractive summarizations? They are now completely re-writing the summary to match the reader’s style preference and the reader’s existing knowledge. What’s even more astonishing is that these new models can not only generate new code, but explain your existing code. Fascinating.</p>
<p>Frequently these large models are so powerful that they even yield impressive results when queried in a <strong>zero-shot</strong> or <strong>few-shot</strong> manner. Although this allows for rapid experimentation and seeing results immediately, for many tasks this is often followed by <strong>finetuning</strong> a model to achieve the best performance and efficiency. However, <strong>finetuning every single one of their billions of parameters</strong> <strong>becomes impractical</strong> inefficient. Moreover, given the size of the models, do we even have enough labeled data to train such a massive model without overfitting?</p>
<p><strong>Parameter Efficient Finetuning (PEFT)</strong> to the rescue: You can now achieve great performance while <strong>only tuning a small fraction of the weights</strong>. Not having to tune billions of parameters across multiple machines, makes the whole process of finetuning more practical and economically viable again. Using PEFT and quantization allows large models with billions of parameters to be finetuned on a single GPU.</p>
<p><a href="https://towardsdatascience.com/dive-into-lora-adapters-38f4da488ede">Visit Now</a></p>