Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #1: Supervised Fine-tuning

Instruct large language models (LLMs) have become extremely popular since the release of ChatGPT by OpenAI. We can now find online many chat models mimicking the behavior of ChatGPT (since many of them are actually trained on ChatGPT’s outputs) and fine-tuned for different domains. OpenAI describes the procedure to train instruct LLMs in this paper: <a href="https://arxiv.org/abs/2203.02155" rel="noopener ugc nofollow" target="_blank">Training language models to follow instructions with human feedback (Ouyang et al., 2022)</a> Which can be summarized by this figure: This is a 3-step process: <ol> <li>Supervised fine-tuning (SFT): Typical fine-tuning performed on prompts (e.g., questions) paired with expected outputs (e.g., answers)</li> <li>Reward model training (RM): A model trained to compute a scalar reward given a prompt paired with a ranking of outputs. Typically, the datasets for this task are limited to prompts paired with a correct output and an incorrect output.</li> </ol> <a href="https://medium.com/@bnjmn_marie/train-instruct-llms-on-your-gpu-with-deepspeed-chat-step-1-supervised-fine-tuning-f962e8516753">Visit Now</a>