Building ETL Job: Transferring Data from MySQL to Redshift using Python

<p>Extract, Transform, Load (ETL) is a data pipeline process that involves extracting data from a source system, transforming it in some way, and then loading it into a target system. In this article, we&rsquo;ll demonstrate how to build an ETL job that extracts data from a MySQL database and loads it into a Redshift data warehouse. We&rsquo;ll also implement the Change Data Capture (CDC) concept to capture delta changes and trigger this ETL job every hour.</p> <h1>Using Normal Python Script</h1> <h1>Prerequisites</h1> <ul> <li>Python 3 installed on your local machine</li> <li>MySQL and AWS Redshift instances up and running</li> <li><code>mysql-connector-python</code>&nbsp;and&nbsp;<code>psycopg2</code>&nbsp;Python libraries installed</li> <li>An&nbsp;<code>orders</code>&nbsp;table in your MySQL database with&nbsp;<code>create_date</code>&nbsp;and&nbsp;<code>update_date</code>&nbsp;columns</li> </ul> <h1>Step 1: Extract Data from MySQL</h1> <p>First, we will extract the data from the MySQL database using the&nbsp;<code>mysql-connector-python</code>&nbsp;library. Here&#39;s a simple Python function that connects to a MySQL database and fetches new records from the&nbsp;<code>orders</code>&nbsp;table:</p> <pre> import mysql.connector from datetime import datetime # set last_update to the start of epoch last_update = datetime(1970, 1, 1) def extract_new_records(): global last_update connection = mysql.connector.connect(user=&#39;mysql_user&#39;, password=&#39;mysql_password&#39;, host=&#39;mysql_host&#39;, database=&#39;mysql_database&#39;) cursor = connection.cursor() query = f&quot;&quot;&quot;SELECT * FROM orders WHERE update_date &gt; &#39;{last_update.strftime(&quot;%Y-%m-%d %H:%M:%S&quot;)}&#39;&quot;&quot;&quot; cursor.execute(query) records = cursor.fetchall() last_update = datetime.now() cursor.close() connection.close() return records</pre> <h1>Step 2: Load Data to Redshift</h1> <p>The next step is to load the extracted data into Redshift. We&rsquo;ll use the&nbsp;<code>psycopg2</code>&nbsp;library to insert the records into the Redshift table</p> <p><a href="https://blog.devgenius.io/building-etl-job-transferring-data-from-mysql-to-redshift-using-python-e28fad1a0b8e">Read More</a></p>