End-to-End Data Engineering Project — Airflow, Snowflake, DBT, Docker and DockerOperator
<p>In this article, we are going to create an end-to-end data engineering pipeline using airflow, dbt and snowflake and everything will be running in docker. Given the dependency problems, we will have different containers for airflow and its components and the DBT. In order to run Airflow dags on another container, we will be using the DockerOperator.</p>
<p>Our data engineering pipeline will be based on airbnb Istanbul data that is available in the following link:</p>
<h2><a href="http://insideairbnb.com/get-the-data/?source=post_page-----469a1f969301--------------------------------" rel="noopener ugc nofollow" target="_blank">Get the Data</a></h2>
<h3><a href="http://insideairbnb.com/get-the-data/?source=post_page-----469a1f969301--------------------------------" rel="noopener ugc nofollow" target="_blank">Adding data to the debate</a></h3>
<p><a href="http://insideairbnb.com/get-the-data/?source=post_page-----469a1f969301--------------------------------" rel="noopener ugc nofollow" target="_blank">insideairbnb.com</a></p>
<p>The data comes with many different fields. However, given the scope of this tutorial, the data is manipulated to be simpler and it is available in the github repo shared above.</p>
<p>What we are going to do is to load the data manually in snowflake in a schema called <strong>raw </strong>and we will create nine more models on top of them. The models will be run by Dockeroperator that we create in Airflow dag.</p>
<p><a href="https://medium.com/@murat_aydin/end-to-end-data-engineering-project-airflow-snowflake-dbt-docker-and-dockeroperator-469a1f969301"><strong>Click Here</strong></a></p>