5 reasons to choose Delta format (on Databricks)
<p>In this blog post, I will explain 5 reasons to prefer the Delta format to parquet or ORC when you are using Databricks for your analytic workloads.</p>
<p>But before we start, let’s have a look at what is delta format.</p>
<h1>Delta … an introduction</h1>
<p>Delta is a data format based on Apache Parquet. It’s an open source project (<a href="https://github.com/delta-io/delta" rel="noopener ugc nofollow" target="_blank">https://github.com/delta-io/delta</a>), delivered with Databricks runtimes and it’s the default table format from runtimes 8.0 onwards.</p>
<p>You can use Delta format through notebooks and applications executed in Databricks with various APIs (<a href="http://spark.apache.org/docs/latest/api/python/index.html" rel="noopener ugc nofollow" target="_blank">Python</a>, <a href="http://spark.apache.org/docs/latest/api/scala/index.html" rel="noopener ugc nofollow" target="_blank">Scala</a>, <a href="https://spark.apache.org/docs/latest/api/sql/index.html" rel="noopener ugc nofollow" target="_blank">SQL </a>etc.) and also with Databricks SQL.</p>
<p>As said above, Delta is made of many components:</p>
<p><a href="https://medium.com/datalex/5-reasons-to-use-delta-lake-format-on-databricks-d9e76cf3e77d"><strong>Read More</strong></a></p>