How to Automate Your Data Science Workflow with CI/CD in GitLab and Dataiku

Continuous Integration and Delivery enable teams to automate all necessary steps to build, test, and deploy code to the production environment. The result is a faster development cycle and a lower error rate. For a data science team, having a CI/CD pipeline is crucial to delivering machine learning models to production in a timely and high-quality manner. Recently, we began investing in and building our data science products using Dataiku and GitLab. This project presented its own set of complexities, and CI/CD was one of them. With many stakeholders with diverse experiences and roles, we had to be very flexible while keeping things simple and automating a large part of our workflow. In this blog post, I will share the step-by-step approach we took and the challenges we faced in building the CI/CD pipeline using GitLab and deploying the project/models in Dataiku. <h1>Introduction to Dataiku</h1> Large companies use various new tools and platforms to build products efficiently. Typically, they are not concerned about the pricing of these platforms. One of the emerging platforms is Dataiku, which is similar to AWS SageMaker in terms of machine learning development. With Dataiku, you can explore, visualize, and wrangle data, build machine learning models, and deploy them as real-time APIs or batch predictions. <a href="https://medium.com/@aniketmish/how-to-automate-your-data-science-workflow-with-ci-cd-in-gitlab-and-dataiku-99411852c4e4">Click Here</a>