Sticking with Success using AWS Glue:
<p>In one of my previous articles on using AWS Glue, I showed how you could use an external Python database library (pg8000) in your AWS Glue job to perform database operations. Click on the link below to read that story.</p>
<h2><a href="https://awstip.com/mastering-aws-glue-advanced-data-management-using-3rd-party-database-libraries-8043bf84f489?source=post_page-----a8aa38171e8d--------------------------------" rel="noopener ugc nofollow" target="_blank">Connecting external database libraries with AWS Glue</a></h2>
<h3><a href="https://awstip.com/mastering-aws-glue-advanced-data-management-using-3rd-party-database-libraries-8043bf84f489?source=post_page-----a8aa38171e8d--------------------------------" rel="noopener ugc nofollow" target="_blank">Using pg8000</a></h3>
<p><a href="https://awstip.com/mastering-aws-glue-advanced-data-management-using-3rd-party-database-libraries-8043bf84f489?source=post_page-----a8aa38171e8d--------------------------------" rel="noopener ugc nofollow" target="_blank">awstip.com</a></p>
<p>At the end of that article, I warned that such database operations are not parallelisable by the Glue job and that you might run into issues when processing large data sets.</p>
<p>In fact, that issue is exactly what happened to me and this is what I did to fix it.</p>
<p>At the end of one particular job that ran daily, I wanted to clear out stale data from a Postgres table after inserting new data, keeping about 6 days’ worth of data in the table at any one time. So, effectively, I needed to do something like this at the end of my job.</p>
<pre>
delete from myschema.mytable
where last_update_date < current_date - 6</pre>
<p>My first try was using the pg8000 python library but I hit an issue and the Glue job started to fail with an error similar to this.</p>
<p><a href="https://awstip.com/sticking-with-success-using-aws-glue-a8aa38171e8d"><strong>Website</strong></a></p>