Integrating Airflow with Databricks: Creating Custom Operators
<h1>Introduction</h1>
<p>Apache Airflow provides robust capabilities for designing and managing workflows. However, there are times when external integrations require a more tailored approach than what’s available out-of-the-box. This article focuses on the practical implementation of custom Airflow operators, using Databricks integration as a case study. We’ll create a custom operator, and make it deferrable for better resource utilization.</p>
<p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/1*hL6mgLFqgTFiYRmjXs8dXQ.png" style="height:394px; width:700px" /></p>
<h1>Setup</h1>
<p>To follow this example, you will need:</p>
<ol>
<li>Airflow: <code>pip install apache-airflow</code></li>
<li>Databricks Python SDK: <code>pip install databricks-sdk</code></li>
<li>A <a href="https://accounts.cloud.databricks.com/" rel="noopener ugc nofollow" target="_blank">Databricks</a> account</li>
</ol>
<h1>Writing the Hook</h1>
<p>The best practice for interacting with an external service using Airflow is the <a href="https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/connections.html" rel="noopener ugc nofollow" target="_blank">Hook abstraction</a>. Hooks provide a unified interface for acquiring connections, and integrate with the built-in <a href="https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html" rel="noopener ugc nofollow" target="_blank">connection management</a>. We’ll create a hook for connecting to Databricks via the Databricks Python SDK:</p>
<p><a href="https://itnext.io/integrating-airflow-with-databricks-creating-custom-operators-90c2fea8399d"><strong>Read More</strong></a></p>