Databricks Workflows: Orchestration Made Easy
<p>When it comes to orchestration frameworks for data engineering, there are many different options. Airflow is either loved or hated based on who you ask, as I’ve discussed in a <a href="https://dagster.io/" rel="noopener ugc nofollow" target="_blank">previous post</a>. As someone who has used Airflow for close to 6 years now, I haven’t seen enough bad to put myself into the “we need another engine” camp (yet).</p>
<p>However, our pipelines are starting to become a lot more basic compared to what they were some time back. As we’ve moved as much processing as we can into Databricks, the majority of our Airflow jobs are starting to follow a model where all they do is call Databricks itself to submit a job run.</p>
<p>Why use Airflow when this can all be done in Databricks? Databricks workflows were not very advanced for quite some time, and therefore lacked some of the features I consider mandatory for proper job orchestration. However, Databricks has recently released more features for their workflows which in my opinion, give engineers the green light to just orchestrate everything within Databricks itself, when feasible. Let’s explore those features.</p>
<p><a href="https://medium.com/@matt_weingarten/databricks-workflows-orchestration-made-easy-1ae529023f5e"><strong>Visit Now</strong></a></p>