Accelerating Spark: Databricks Photon Runtime

Databricks is a ~$40B company built around the open-source distributed computation engine Apache Spark. Their core offering is a high-level interface that allows organizations to utilize Spark without incurring the operational costs of managing Spark clusters. Databricks’ clusters use the Databricks Runtime (DBR), a fork of Spark that is API-compliant while offering improved security, optimized IO, GPU extensions, and raw engine performance improvements. Part of the broader data lakehouse initiative at Databricks, the Photon project provides a high-performance operator framework that integrates with the DBR to enable warehouse-like performance on simple data lakes. In 2021, Photon enabled Databricks to set the world record for the 100 TB TPC-DS benchmark, an industry-standard OLAP evaluation.

Read More