Data Engineering End-to-End Project — PostgreSQL, Airflow, Docker, Pandas
<p>In this article, we are going to get a CSV file from a remote repo, download it to the local working directory, create a local PostgreSQL table, and write this CSV data to the PostgreSQL table with <a href="https://github.com/dogukannulu/csv_extract_airflow_docker/blob/main/python_scripts/write_csv_to_postgres.py" rel="noopener ugc nofollow" target="_blank"><strong>write_csv_to_postgres.py</strong></a> script.</p>
<p>Then, we will get the data from the table. After some modifications and pandas practices, we will create 3 separate data frames with the <a href="https://github.com/dogukannulu/csv_extract_airflow_docker/blob/main/python_scripts/create_df_and_modify.py" rel="noopener ugc nofollow" target="_blank"><strong>create_df_and_modify.py</strong></a><strong> </strong>script.</p>
<p>In the end, we will get these 3 data frames, create related tables in the PostgreSQL database, and insert the data frames into these tables with <a href="https://github.com/dogukannulu/csv_extract_airflow_docker/blob/main/python_scripts/write_df_to_postgres.py" rel="noopener ugc nofollow" target="_blank"><strong>write_df_to_postgres.py</strong></a></p>
<p>All these scripts will run as Airflow DAG tasks with the <a href="https://github.com/dogukannulu/csv_extract_airflow_docker/blob/main/dags/airflow_dag.py" rel="noopener ugc nofollow" target="_blank"><strong>DAG script</strong></a>.</p>
<p>Think of this project as a practice of pandas and an alternative way of storing the data in the local machine.</p>
<p><a href="https://medium.com/@dogukannulu/data-engineering-end-to-end-project-postgresql-airflow-docker-pandas-91c6aa529030"><strong>Website</strong></a></p>