Building an End-to-End Data Pipeline with Delta Lake and Databricks
<h1>Introduction</h1>
<p>In this article, we will walk through the process of building a data pipeline using Delta Lake and Databricks. We will use COVID-19 data for the USA, available on <a href="https://www.kaggle.com/datasets/sudalairajkumar/covid19-in-usa?resource=download" rel="noopener ugc nofollow" target="_blank">Kaggle</a>, as our dataset. This pipeline will demonstrate how to ingest raw data, clean and transform it, and finally visualize it.</p>
<p>If you are new to the concept of Delta Lake, I suggest you start with my previous article:</p>
<h2><a href="https://towardsdev.com/delta-lake-for-beginners-data-lake-data-warehouse-and-more-4017099b87a5?source=post_page-----337202a110a8--------------------------------" rel="noopener ugc nofollow" target="_blank">Delta Lake for Beginners: Data Lake + Data Warehouse And More</a></h2>
<h3><a href="https://towardsdev.com/delta-lake-for-beginners-data-lake-data-warehouse-and-more-4017099b87a5?source=post_page-----337202a110a8--------------------------------" rel="noopener ugc nofollow" target="_blank">Welcome to this beginner’s guide to Delta Lake! If you are interested in big data, this guide is for you. We’ll explain…</a></h3>
<p><a href="https://towardsdev.com/delta-lake-for-beginners-data-lake-data-warehouse-and-more-4017099b87a5?source=post_page-----337202a110a8--------------------------------" rel="noopener ugc nofollow" target="_blank">towardsdev.com</a></p>
<h1>Prerequisites</h1>
<p>Before we begin, ensure you have the following:</p>
<ul>
<li>An account on Databricks (Azure Databricks).</li>
<li>The COVID-19 dataset for the USA from Kaggle.</li>
</ul>
<p><a href="https://towardsdev.com/building-an-end-to-end-data-pipeline-with-delta-lake-and-databricks-337202a110a8"><strong>Read More</strong></a></p>