Building an End-to-End Data Pipeline with Delta Lake and Databricks

Introduction

In this article, we will walk through the process of building a data pipeline using Delta Lake and Databricks. We will use COVID-19 data for the USA, available on Kaggle, as our dataset. This pipeline will demonstrate how to ingest raw data, clean and transform it, and finally visualize it.

If you are new to the concept of Delta Lake, I suggest you start with my previous article:

Delta Lake for Beginners: Data Lake + Data Warehouse And More

Welcome to this beginner’s guide to Delta Lake! If you are interested in big data, this guide is for you. We’ll explain…

towardsdev.com

Prerequisites

Before we begin, ensure you have the following:

  • An account on Databricks (Azure Databricks).
  • The COVID-19 dataset for the USA from Kaggle.

Read More

Tags: Delta Lake