End to End ML with GPT-3.5

A lot of repetitive boilerplate code exists in the model development phase of any machine learning application. Popular libraries such as PyTorch Lightning have been created to standardize the operations performed when training/evaluating neural networks, leading to much cleaner code. However, boilerplate extends far beyond training loops. Even the data acquisition phase of machine learning projects is full of steps that are necessary but time consuming. One way to deal with this challenge would be to create a library similar to PyTorch Lightning for the entire model development process. It would have to be general enough to work with a variety of model types beyond neural networks, and capable of integrating a variety of data sources. Code examples for extracting data, preprocessing, model training, and deployment is readily available on the internet, though gathering it, and integrating it into a project takes time. Since such code is on the internet, chances are it has been trained on by a large language model (LLM) and can be rearranged in a variety of useful ways through natural language commands. The goal of this post is to show how easy it is to automate many of the steps common to ML projects by using the GPT-3.5 API from OpenAI. I’ll show some failure cases along the way, and how to tune prompts to fix bugs when possible. Starting from scratch, without even so much as a dataset, we’ll end up with a model that is ready to be deployed on AWS SageMaker. If you’re following along, make sure to setup the OpenAI API as follows <a href="https://towardsdatascience.com/end-to-end-ml-with-gpt-3-5-8334db3d78e2">Website</a>