There are several ways a spark job can be optimized. Using the right optimization is crucial to reduce the overall runtime and compute cost.
In my project, I was given a task to optimize the spark jobs and reduce the overall run time of long-running Airflow DAGs. In this blog, I am going to discuss a few spark optimizations i have used and changes I have done on cluster configurations to reduce runtime and compute cost.

1. JDBC connection:
One of the Long Running DAG was taking about 1 hr. In spark code, jdbc connection was used to fetch data from a table. Using the right parameters and configurations to fetch data using JDBC connection is important to fetch the data faster.