Integrating Airflow with Databricks: Creating Custom Operators

<h1>Introduction</h1> <p>Apache Airflow provides robust capabilities for designing and managing workflows. However, there are times when external integrations require a more tailored approach than what&rsquo;s available out-of-the-box. This article focuses on the practical implementation of custom Airflow operators, using Databricks integration as a case study. We&rsquo;ll create a custom operator, and make it deferrable for better resource utilization.</p> <p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/1*hL6mgLFqgTFiYRmjXs8dXQ.png" style="height:394px; width:700px" /></p> <h1>Setup</h1> <p>To follow this example, you will need:</p> <ol> <li>Airflow:&nbsp;<code>pip install apache-airflow</code></li> <li>Databricks Python SDK:&nbsp;<code>pip install databricks-sdk</code></li> <li>A&nbsp;<a href="https://accounts.cloud.databricks.com/" rel="noopener ugc nofollow" target="_blank">Databricks</a>&nbsp;account</li> </ol> <h1>Writing the Hook</h1> <p>The best practice for interacting with an external service using Airflow is the&nbsp;<a href="https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/connections.html" rel="noopener ugc nofollow" target="_blank">Hook abstraction</a>. Hooks provide a unified interface for acquiring connections, and integrate with the built-in&nbsp;<a href="https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html" rel="noopener ugc nofollow" target="_blank">connection management</a>. We&rsquo;ll create a hook for connecting to Databricks via the Databricks Python SDK:</p> <p><a href="https://itnext.io/integrating-airflow-with-databricks-creating-custom-operators-90c2fea8399d"><strong>Read More</strong></a></p>