Building ETL Job: Transferring Data from MySQL to Redshift using Python
<p>Extract, Transform, Load (ETL) is a data pipeline process that involves extracting data from a source system, transforming it in some way, and then loading it into a target system. In this article, we’ll demonstrate how to build an ETL job that extracts data from a MySQL database and loads it into a Redshift data warehouse. We’ll also implement the Change Data Capture (CDC) concept to capture delta changes and trigger this ETL job every hour.</p>
<h1>Using Normal Python Script</h1>
<h1>Prerequisites</h1>
<ul>
<li>Python 3 installed on your local machine</li>
<li>MySQL and AWS Redshift instances up and running</li>
<li><code>mysql-connector-python</code> and <code>psycopg2</code> Python libraries installed</li>
<li>An <code>orders</code> table in your MySQL database with <code>create_date</code> and <code>update_date</code> columns</li>
</ul>
<h1>Step 1: Extract Data from MySQL</h1>
<p>First, we will extract the data from the MySQL database using the <code>mysql-connector-python</code> library. Here's a simple Python function that connects to a MySQL database and fetches new records from the <code>orders</code> table:</p>
<pre>
import mysql.connector
from datetime import datetime
# set last_update to the start of epoch
last_update = datetime(1970, 1, 1)
def extract_new_records():
global last_update
connection = mysql.connector.connect(user='mysql_user', password='mysql_password',
host='mysql_host', database='mysql_database')
cursor = connection.cursor()
query = f"""SELECT * FROM orders
WHERE update_date > '{last_update.strftime("%Y-%m-%d %H:%M:%S")}'"""
cursor.execute(query)
records = cursor.fetchall()
last_update = datetime.now()
cursor.close()
connection.close()
return records</pre>
<h1>Step 2: Load Data to Redshift</h1>
<p>The next step is to load the extracted data into Redshift. We’ll use the <code>psycopg2</code> library to insert the records into the Redshift table</p>
<p><a href="https://blog.devgenius.io/building-etl-job-transferring-data-from-mysql-to-redshift-using-python-e28fad1a0b8e">Read More</a></p>