Using Spark on client side applications via Databricks Connect

<p>The Data + AI Summit 2023 presented some new interesting capabilities of the Databricks platform and the Spark ecosystem as a whole. One of the most interesting releases for application development is Spark Connect, which is available as a service in Databricks via Databricks Connect. Databricks connect increases the use cases of Spark, potentially providing capabilities for running large-scale Spark jobs from any application. Anywhere you can get the Databricks Connect libraries, you can run Spark jobs in the connected cluster and obtain the results on your application. In this article we will be introducing Databricks Connect and, in order showcase its capabilities, we will show how it can facilitate two workflows, ETL development and executing Spark workloads on celery workers.</p> <h1>What is Databricks Connect?</h1> <p>Databricks connect is built upon the open-sourced Spark Connect (introduced in Spark 3.4). Spark Connect introduces a decoupled client-server architecture for Apache Spark that allows remote connectivity to Spark Clusters using the DataFrame API and unresolved logical plans as protocol. This is, the client can send logical plans to the server, which will execute them on the cluster and the results will be returned to the client.</p> <p><a href="https://engineering.telefonica.com/using-spark-on-client-side-applications-via-databricks-connect-f15f26aba097"><strong>Click Here</strong></a></p>