LoRA: Low-Rank Adaptation of Large Language Models

Prompt engineering, fine-tuning, and model training are all viable options to get domain or task-specific results from a Large Language Model (LLM). One model training technique to consider is Low-Rank Adaptation of Large Language Models (LoRA). <h1>Background</h1> First introduced by Microsoft via the whitepaper <a href="https://arxiv.org/abs/2106.09685" rel="noopener ugc nofollow" target="_blank">here</a>, LoRA is a technique used in language models to make them more efficient and easier for different tasks. Imagine you have a big language model that knows much about language and can understand and generate sentences. This model is like a big brain that has been trained on a lot of data. Let’s say you want to use this language model for different tasks, like summarizing articles or answering questions. The problem is that the model is so big and has so many parameters that it becomes difficult and expensive to use for each task separately. That’s where LoRA comes in. LoRA is a way to make the language model more adaptable and efficient. Instead of training the whole model again for each task, LoRA freezes the pre-trained model and adds smaller trainable matrices to each model layer. These matrices help the model adapt to different tasks without changing all the parameters. <a href="https://medium.com/gta-generative-tech-advances/lora-low-rank-adaptation-of-large-language-models-ea7d28250916">Visit Now</a>