Fine Tuning LLM: Parameter Efficient Fine Tuning (PEFT) — LoRA & QLoRA

In this blog, we will implement LoRA the idea behind Parameter Efficient Fine Tuning (PEFT), and explore LoRA and QLoRA, Two of the most important PEFT methods. We will also be exploring “Weights and Biases” for capturing the training metrics. We will be fine-tuning a small Salesforce codegen 350m parameter model to improve efficacy to generate Python code. In Part 1, we discussed how LoRA introduces modularity and reduces training time by allowing us to enhance the base model using an adapter module with significantly lower dimensions. QLoRA takes this approach a step further by further reducing the dimensions of the base model. This is achieved through quantization, which involves converting the floating-point 32 format to smaller data types like 8-bit or 4-bit. In this blog post, we will take the Salesforce codegen 350m model and fine-tune it for generating complex Python code. To improve its performance, we will utilize the Alpaca instruction set for fine-tuning and generating Python code more efficiently. Let’s begin by setting up the “Hugging Face” account and the “Weights and Biases” account. We will use Weights and Biases to capture training metrics. First, create a Hugging Face account and generate an access token with Write permissions. This token will allow us to save our trained model on the Hugging Face Hub. We will use this token for logging in to Hugging Face to pull the base model and push the trained model. <a href="https://abvijaykumar.medium.com/fine-tuning-llm-parameter-efficient-fine-tuning-peft-lora-qlora-part-2-d8e23877ac6f">Website</a>

Fine Tuning LLM: Parameter Efficient Fine Tuning (PEFT) — LoRA & QLoRA — Part 2