In this article, we are going to create an end-to-end data engineering pipeline using airflow, dbt and snowflake and everything will be running in docker. Given the dependency problems, we will have different containers for airflow and its components and the DBT. In order to run Airflow dags on another container, we will be using the DockerOperator.
Our data engineering pipeline will be based on airbnb Istanbul data that is available in the following link:
Get the Data
Adding data to the debate
The data comes with many different fields. However, given the scope of this tutorial, the data is manipulated to be simpler and it is available in the github repo shared above.
What we are going to do is to load the data manually in snowflake in a schema called raw and we will create nine more models on top of them. The models will be run by Dockeroperator that we create in Airflow dag.