Successful Engineering With Databricks

<h1>Introduction</h1> <p>Databricks is a platform that is used by a variety of different personas. Analysts use it for queries, data engineers use it for job and pipeline creation, and data scientists/machine learning engineers use it to build models.</p> <p>With support for a variety of different personas comes the challenge of how to manage it all. Quite frankly, there&rsquo;s no right answer, and in most cases, you&rsquo;re best off leaving people to their own devices (with proper guardrails in place, of course). But, here are some tips on how to really take advantage of Databricks from an engineering perspective.</p> <h1>Version Control</h1> <p>Version control is every developer&rsquo;s dream (and occasional nightmare). All critical notebooks, whether they&rsquo;re used just for analysis or for jobs, should ideally be stored in version control so that the rest of the team can access them as necessary. This is even more critical in the case of jobs, as you don&rsquo;t want access to be completely lost when an engineer moves to another team or leaves the company altogether.</p> <p>Databricks makes this easy with their&nbsp;<a href="https://docs.databricks.com/en/repos/index.html" rel="noopener ugc nofollow" target="_blank">Repos</a>&nbsp;feature. You can connect to any repo by passing in your email/password (or PAT), and then start branching/pushing/pulling as necessary. Databricks also just made&nbsp;<a href="https://www.databricks.com/blog/new-support-conflict-resolution-repos-merge-rebase-and-pull" rel="noopener ugc nofollow" target="_blank">merging/rebasing</a>&nbsp;a possibility, which was a big pain point before. All in all, there&rsquo;s no reason not to be using version control when it comes to your critical Databricks resources.</p> <p><a href="https://medium.com/@matt_weingarten/successful-engineering-with-databricks-9fd2d7444096"><strong>Website</strong></a></p>