Accelerating Spark: Databricks Photon Runtime

<p>Databricks is a ~$40B company built around the open-source distributed computation engine Apache Spark. Their core offering is a high-level interface that allows organizations to utilize Spark without incurring the operational costs of managing Spark clusters. Databricks&rsquo; clusters use the&nbsp;<a href="https://www.databricks.com/glossary/what-is-databricks-runtime" rel="noopener ugc nofollow" target="_blank">Databricks Runtime (DBR)</a>, a fork of Spark that is API-compliant while offering improved security, optimized IO, GPU extensions, and raw engine performance improvements. Part of the broader&nbsp;<a href="https://www.databricks.com/glossary/data-lakehouse" rel="noopener ugc nofollow" target="_blank">data lakehouse</a>&nbsp;initiative at Databricks, the&nbsp;<a href="https://people.eecs.berkeley.edu/~matei/papers/2022/sigmod_photon.pdf" rel="noopener ugc nofollow" target="_blank">Photon</a>&nbsp;project provides a high-performance operator framework that integrates with the DBR to enable warehouse-like performance on simple&nbsp;<a href="https://www.databricks.com/discover/data-lakes" rel="noopener ugc nofollow" target="_blank">data lakes</a>. In 2021, Photon enabled Databricks to set the&nbsp;<a href="https://www.databricks.com/blog/2021/11/02/databricks-sets-official-data-warehousing-performance-record.html" rel="noopener ugc nofollow" target="_blank">world record for the 100 TB TPC-DS benchmark</a>, an industry-standard OLAP evaluation.</p> <p><a href="https://blog.devgenius.io/accelerating-spark-databricks-photon-runtime-9a7a53824d1b"><strong>Read More</strong></a></p>