Delta-RS and DuckDB — Read and Write Delta Without Spark

I have used Apache Spark (often as Azure Databricks) for some years and see it as a sledgehammer in data processing. It is a reliable tool built on JVM, which does in-memory processing and can spin up multiple workers to distribute workload to handle various use cases. It does not matter: whether small or considerable datasets to process; Spark does a job and has a reputation as a de-facto standard processing engine for running Data Lakehouses. There is an alternative to Java, Scala, and JVM, though. Open-source libraries like <code>delta-rs</code>, <code>duckdb</code>, <code>pyarrow</code>, and <code>polars</code> written in more performant languages. These newcomers can act as the performant option in specific scenarios like low-latency ETLs on small to medium-size datasets, data exploration, etc. This article is a form of POC exploration with a bit of benchmark to see what else is currently achievable outside of spark. <a href="https://betterprogramming.pub/delta-rs-duckdb-read-and-write-delta-without-spark-c4d3db580b25">Website</a>