Tag: Spark

Accelerating Spark: Databricks Photon Runtime

Databricks is a ~$40B company built around the open-source distributed computation engine Apache Spark. Their core offering is a high-level interface that allows organizations to utilize Spark without incurring the operational costs of managing Spark clusters. Databricks’ clusters use the&nbsp...

Delta-RS and DuckDB — Read and Write Delta Without Spark

I have used Apache Spark (often as Azure Databricks) for some years and see it as a sledgehammer in data processing. It is a reliable tool built on JVM, which does in-memory processing and can spin up multiple workers to distribute workload to handle various use cases. It does not matter: whether sm...

Spark related quickies

The answer is No. It last only for the duration of spark’s application run (till the time cluster is up in Databricks). As soon as the cluster ends the scope is lost and you can’t access on restart of the cluster. How to solve above constraint and set a configuration property permanen...

Different Types of “Join Strategies” in “Apache Spark”

What is “Join Selection Strategy”? When “Any Type” of “Join”, like the “Left Join”, or, the “Inner Join” is “Performed” between “Two DataFrames”, “Apache Spark” “Internally” decides whic...

Writing PySpark logs in Apache Spark and Databricks

The closer your data product is getting to the production, the bigger is the importance of properly collecting and analysing logs. Logs help both during debugging in-depth issues and analysing the behaviour of your application. For general Python applications the classical choice would be to use ...

Spark Performance Tuning: Spill

Spill problem happens when the moving of an RDD (resilient distributed dataset, aka fundamental data structure in Spark) moves from RAM to disk and then back to RAM again. Simply put, this behavior occurs when a given data partition is too large to fit within the RAM of the executor. Spark w...

Spark Performance Tuning: Spill

Spill problem happens when the moving of an RDD (resilient distributed dataset, aka fundamental data structure in Spark) moves from RAM to disk and then back to RAM again. Simply put, this behavior occurs when a given data partition is too large to fit within the RAM of the executor. Spark w...

Introduction to “Partition” in “Apache Spark”

What is the “Importance” of “Partition”? “Apache Spark” is known for its “Speed”. The “Fast Speed” of “Computing” comes from the “Parallel Processing”. “Partition” is the “Key” for &ld...

Spark Tuning

Spark tuning is the process of precisely and specifically fine-tuning and configuring Apache Spark to maximize its effectiveness and efficiency for a given application or workflow. The main objective is to optimize the Spark configuration in order to...

The Small Island Man That Helped Spark a Revolution

You might be familiar with Secretary of Transportation Pete Buttigieg. This 41-year-old former mayor of South Bend was born into a relatively comfortable life with well-educated parents in Indiana. However, Pete’s father’s early life was quite contrasting. He was born in 1947 and grew up...

This Brilliantly Simple Wind Turbine Could Spark A Revolution

Renewable energy only accounts for around 11% of global energy production. This is a staggeringly low figure, considering that if we are to reach net-zero by 2050 and save the world, renewable energy needs to account for at least 60% of global energy production in less than seven years’ time. ...

Igniting the Philanthropic Spark: Engaging Young Professionals

When I was in my early 30s, I became the youngest Cleveland Sight Center Board member. As a new addition to the Development Committee, I noticed a concerning trend — our donor demographics skewed much older population, with very few younger professionals donating or volunteering. This problem ...

What If Everything You Know About Reality Is Wrong?

My journey into the quantum world began with a simple question: how can something be in two places simultaneously? This seemingly nonsensical notion, central to the concept of superposition in quantum mechanics, piqued my curiosity and led me down a rabbit hole of exploration. Delving into books,...