Sticking with Success using AWS Glue:

<p>In one of my previous articles on using AWS Glue, I showed how you could use an external Python database library (pg8000) in your AWS Glue job to perform database operations. Click on the link below to read that story.</p> <h2><a href="https://awstip.com/mastering-aws-glue-advanced-data-management-using-3rd-party-database-libraries-8043bf84f489?source=post_page-----a8aa38171e8d--------------------------------" rel="noopener ugc nofollow" target="_blank">Connecting external database libraries with AWS Glue</a></h2> <h3><a href="https://awstip.com/mastering-aws-glue-advanced-data-management-using-3rd-party-database-libraries-8043bf84f489?source=post_page-----a8aa38171e8d--------------------------------" rel="noopener ugc nofollow" target="_blank">Using pg8000</a></h3> <p><a href="https://awstip.com/mastering-aws-glue-advanced-data-management-using-3rd-party-database-libraries-8043bf84f489?source=post_page-----a8aa38171e8d--------------------------------" rel="noopener ugc nofollow" target="_blank">awstip.com</a></p> <p>At the end of that article, I warned that such database operations are not parallelisable by the Glue job and that you might run into issues when processing large data sets.</p> <p>In fact, that issue is exactly what happened to me and this is what I did to fix it.</p> <p>At the end of one particular job that ran daily, I wanted to clear out stale data from a Postgres table after inserting new data, keeping about 6 days&rsquo; worth of data in the table at any one time. So, effectively, I needed to do something like this at the end of my job.</p> <pre> delete from myschema.mytable where last_update_date &lt; current_date - 6</pre> <p>My first try was using the pg8000 python library but I hit an issue and the Glue job started to fail with an error similar to this.</p> <p><a href="https://awstip.com/sticking-with-success-using-aws-glue-a8aa38171e8d"><strong>Website</strong></a></p>
Tags: AWS Glue