Delta-RS and DuckDB — Read and Write Delta Without Spark
<p>I have used Apache Spark (often as Azure Databricks) for some years and see it as a sledgehammer in data processing. It is a reliable tool built on JVM, which does in-memory processing and can spin up multiple workers to distribute workload to handle various use cases. It does not matter: whether small or considerable datasets to process; Spark does a job and has a reputation as a de-facto standard processing engine for running Data Lakehouses.</p>
<p>There is an alternative to Java, Scala, and JVM, though. Open-source libraries like <code>delta-rs</code>, <code>duckdb</code>, <code>pyarrow</code>, and <code>polars</code> written in more performant languages. These newcomers can act as the performant option in specific scenarios like low-latency ETLs on small to medium-size datasets, data exploration, etc.</p>
<p>This article is a form of POC exploration with a bit of benchmark to see what else is currently achievable outside of spark.</p>
<p><a href="https://betterprogramming.pub/delta-rs-duckdb-read-and-write-delta-without-spark-c4d3db580b25"><strong>Website</strong></a></p>