Simplify Airflow DAG Creation and Maintenance with Hamilton in 8 minutes

<p>This post walks you through the benefits of having two open source projects,&nbsp;Hamilton&nbsp;and&nbsp;Airflow, and their&nbsp;directed acyclic graphs&nbsp;(DAGs) work in tandem. At a high level Airflow is responsible for orchestration (think macro) and Hamilton helps author clean and maintainable data transformations (think micro).</p> <p>For those that are unfamiliar with Hamilton, we point you to an interactive overview on&nbsp;tryhamilton.dev, or our other posts, e.g. like this&nbsp;<a href="https://towardsdatascience.com/functions-dags-introducing-hamilton-a-microframework-for-dataframe-generation-more-8e34b84efc1d" rel="noopener" target="_blank">one</a>. Otherwise we will talk about Hamilton at a high level and point to reference documentation for more details. For reference I&rsquo;m one of the co-creators of Hamilton.</p> <p>For those still mentally trying to grasp how the two can run together, the reason you can run Hamilton with Airflow, is that Hamilton is just a library with a small dependency footprint, and so one can add Hamilton to their Airflow setup in no time!</p> <p>Just to recap, Airflow is the industry standard to orchestrate data pipelines. It powers all sorts of data initiatives including ETL, ML pipelines and BI. Since its inception in 2014, Airflow users have faced certain rough edges with regards to authoring and maintaining data pipelines:</p> <ol> <li>Maintainably managing the evolution of workflows; what starts simple can invariably get complex.</li> <li>Writing modular, reusable, and testable code that runs within an Airflow task.</li> <li>Tracking lineage of code and data artifacts that an Airflow DAG produces.</li> </ol> <p>This is where we believe Hamilton can help!&nbsp;Hamilton&nbsp;is a Python micro-framework for writing data transformations. In short, one writes python functions in a &ldquo;declarative&rdquo; style, which Hamilton parses and connects into a graph based on their names, arguments and type annotations. Specific outputs can be requested and Hamilton will execute the required function path to produce them. Because it doesn&rsquo;t provide macro orchestrating capabilities, it pairs nicely with Airflow by helping data professionals write cleaner code and more reusable code for Airflow DAGs.</p> <p><a href="https://towardsdatascience.com/simplify-airflow-dag-creation-and-maintenance-with-hamilton-in-8-minutes-e6e48c9c2cb0">Read More</a></p>
Tags: Airflow DAG Code