Delta-RS and DuckDB — Read and Write Delta Without Spark

<p>I have used Apache Spark (often as Azure Databricks) for some years and see it as a sledgehammer in data processing. It is a reliable tool built on JVM, which does in-memory processing and can spin up multiple workers to distribute workload to handle various use cases. It does not matter: whether small or considerable datasets to process; Spark does a job and has a reputation as a de-facto standard processing engine for running Data Lakehouses.</p> <p>There is an alternative to Java, Scala, and JVM, though. Open-source libraries like&nbsp;<code>delta-rs</code>,&nbsp;<code>duckdb</code>,&nbsp;<code>pyarrow</code>, and&nbsp;<code>polars</code>&nbsp;written in more performant languages. These newcomers can act as the performant option in specific scenarios like low-latency ETLs on small to medium-size datasets, data exploration, etc.</p> <p>This article is a form of POC exploration with a bit of benchmark to see what else is currently achievable outside of spark.</p> <p><a href="https://betterprogramming.pub/delta-rs-duckdb-read-and-write-delta-without-spark-c4d3db580b25"><strong>Website</strong></a></p>
Tags: Without Spark