Git And Databricks

<h1>Introduction</h1> <p>Databricks is one of the most popular platforms out there because of how easy it is for people of all backgrounds to get up and running. Its easy-to-use UI is very intuitive for analysts and even product managers who just need to go in and run the occasional query.</p> <p>For those who want to use Databricks as a production-level tool, the UI shouldn&rsquo;t come into play that often. Everything that you need to run Databricks on a regular basis can be configured in Git, so how can we take advantage of that?</p> <h1>Notebooks</h1> <p>Notebooks that are being used for adhoc analysis or testing don&rsquo;t necessarily need to be stored in version control. However, I&rsquo;m one that believes that all notebooks that are used in jobs should be stored in Git.</p> <p>First, you&rsquo;ll need to set up the Git integration within the Databricks console. This can be done by going to the User Settings tab under your name in the right-hand corner. Under Git Integration, you&rsquo;ll pass in your email, the Git provider you&rsquo;re using, and the Personal Access Token (PAT) associated with your Git use, which can be done within your Git platform. Once this is established, you can then go to the Repos tab and clone a repo to start working with it in the Databricks UI. This is convenient if you don&rsquo;t want to have to manage everything within an IDE (although Databricks has created various plugins that make this a lot cleaner than it used to be).</p> <p><a href="https://medium.com/@matt_weingarten/git-and-databricks-4d076173b2c3"><strong>Learn More</strong></a></p>
Tags: Git Databricks