Introduction
In this article, we will walk through the process of building a data pipeline using Delta Lake and Databricks. We will use COVID-19 data for the USA, available on Kaggle, as our dataset. This pipeline will demonstrate how to ingest raw data, clean and transform it, and finally visualize it.
If you are new to the concept of Delta Lake, I suggest you start with my previous article:
Delta Lake for Beginners: Data Lake + Data Warehouse And More
Welcome to this beginner’s guide to Delta Lake! If you are interested in big data, this guide is for you. We’ll explain…
Prerequisites
Before we begin, ensure you have the following:
- An account on Databricks (Azure Databricks).
- The COVID-19 dataset for the USA from Kaggle.