Data Engineering Project — IMDB Movie Analysis

<p>In this article, I will create a data pipeline for transferring and analyzing movie data from IMDb.</p> <p>The data pipeline will be created using the following tools:</p> <ol> <li><strong>Data ingestion</strong>: Web scraping from IMDB using Python</li> <li><strong>Data storage</strong>: Google BigQuery</li> <li><strong>Data analysis</strong>: DBT</li> <li><strong>Data visualization</strong>: Power BI</li> <li><strong>Data orchestration</strong>: Apache Airflow</li> <li><strong>Container deployment</strong>: Docker</li> </ol> <p>The project will have the following structure:</p> <p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/1*v9qDFqXJrArNutZGIGc-5g.png" style="height:239px; width:700px" /></p> <p>Project Workflow</p> <p>Check out my git repository for the full code and explanation.</p> <h2>GitHub - bardadon/imdb_data_engineering</h2> <h3>You can&#39;t perform that action at this time. You signed in with another tab or window. You signed out in another tab or&hellip;</h3> <p>github.com</p> <h1>1. Deploying Airflow</h1> <p>In this project I used docker compose to deploy airflow using various containers. I prepared a bash file to assist with the deployment. Simply execute the following file in your project folder to get started:</p> <p><a href="https://towardsdev.com/data-engineering-project-imdb-movie-analysis-3f79de2f4ce7">Website</a></p> <p>&nbsp;</p>