Data Engineering End-to-End Project — PostgreSQL, Airflow, Docker, Pandas

<p>In this article, we are going to get a CSV file from a remote repo, download it to the local working directory, create a local PostgreSQL table, and write this CSV data to the PostgreSQL table with&nbsp;<a href="https://github.com/dogukannulu/csv_extract_airflow_docker/blob/main/python_scripts/write_csv_to_postgres.py" rel="noopener ugc nofollow" target="_blank"><strong>write_csv_to_postgres.py</strong></a>&nbsp;script.</p> <p>Then, we will get the data from the table. After some modifications and pandas practices, we will create 3 separate data frames with the&nbsp;<a href="https://github.com/dogukannulu/csv_extract_airflow_docker/blob/main/python_scripts/create_df_and_modify.py" rel="noopener ugc nofollow" target="_blank"><strong>create_df_and_modify.py</strong></a><strong>&nbsp;</strong>script.</p> <p>In the end, we will get these 3 data frames, create related tables in the PostgreSQL database, and insert the data frames into these tables with&nbsp;<a href="https://github.com/dogukannulu/csv_extract_airflow_docker/blob/main/python_scripts/write_df_to_postgres.py" rel="noopener ugc nofollow" target="_blank"><strong>write_df_to_postgres.py</strong></a></p> <p>All these scripts will run as Airflow DAG tasks with the&nbsp;<a href="https://github.com/dogukannulu/csv_extract_airflow_docker/blob/main/dags/airflow_dag.py" rel="noopener ugc nofollow" target="_blank"><strong>DAG script</strong></a>.</p> <p>Think of this project as a practice of pandas and an alternative way of storing the data in the local machine.</p> <p><a href="https://medium.com/@dogukannulu/data-engineering-end-to-end-project-postgresql-airflow-docker-pandas-91c6aa529030"><strong>Website</strong></a></p>
Tags: Docker Pandas