Data Engineering Project — IMDB Movie Analysis
<p>In this article, I will create a data pipeline for transferring and analyzing movie data from IMDb.</p>
<p>The data pipeline will be created using the following tools:</p>
<ol>
<li><strong>Data ingestion</strong>: Web scraping from IMDB using Python</li>
<li><strong>Data storage</strong>: Google BigQuery</li>
<li><strong>Data analysis</strong>: DBT</li>
<li><strong>Data visualization</strong>: Power BI</li>
<li><strong>Data orchestration</strong>: Apache Airflow</li>
<li><strong>Container deployment</strong>: Docker</li>
</ol>
<p>The project will have the following structure:</p>
<p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/1*v9qDFqXJrArNutZGIGc-5g.png" style="height:239px; width:700px" /></p>
<p>Project Workflow</p>
<p>Check out my git repository for the full code and explanation.</p>
<h2>GitHub - bardadon/imdb_data_engineering</h2>
<h3>You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…</h3>
<p>github.com</p>
<h1>1. Deploying Airflow</h1>
<p>In this project I used docker compose to deploy airflow using various containers. I prepared a bash file to assist with the deployment. Simply execute the following file in your project folder to get started:</p>
<p><a href="https://towardsdev.com/data-engineering-project-imdb-movie-analysis-3f79de2f4ce7">Website</a></p>
<p> </p>