Different Types of “Join Strategies” in “Apache Spark”

<h1>What is &ldquo;Join Selection Strategy&rdquo;?</h1> <ul> <li>When &ldquo;<strong>Any Type</strong>&rdquo; of &ldquo;<strong>Join</strong>&rdquo;, like the &ldquo;<strong>Left Join</strong>&rdquo;, or, the &ldquo;<strong>Inner Join</strong>&rdquo; is &ldquo;<strong>Performed</strong>&rdquo; between &ldquo;<strong>Two DataFrames</strong>&rdquo;, &ldquo;<strong>Apache Spark</strong>&rdquo; &ldquo;<strong>Internally</strong>&rdquo; decides which &ldquo;<strong>Algorithm</strong>&rdquo; will be used to &ldquo;<strong>Perform</strong>&rdquo; the &ldquo;<strong>Join</strong>&rdquo;&nbsp;<strong>Operations</strong>&nbsp;between the &ldquo;<strong>Two DataFrames</strong>&rdquo;.</li> <li>That particular &ldquo;<strong>Algorithm</strong>&rdquo; that is &ldquo;<strong>Responsible</strong>&rdquo; for &ldquo;<strong>Planning</strong>&rdquo; the &ldquo;<strong>Join</strong>&rdquo;&nbsp;<strong>Operation</strong>&nbsp;between the &ldquo;<strong>Two DataFrames</strong>&rdquo;, is called as the &ldquo;<strong>Join Selection Strategy</strong>&rdquo;.</li> </ul> <h1>Why Learning About &ldquo;Join Selection Strategies&rdquo; is Important?</h1> <ul> <li>To &ldquo;<strong>Optimize</strong>&rdquo; a &ldquo;<strong>Spark Job</strong>&rdquo; that &ldquo;<strong>Involves</strong>&rdquo; a &ldquo;<strong>Lot of Joins</strong>&rdquo;, the &ldquo;<strong>Developers</strong>&rdquo; need to be very much aware about the &ldquo;<strong>Internal Algorithm</strong>&rdquo; that &ldquo;<strong>Apache Spark</strong>&rdquo; will &ldquo;<strong>Choose</strong>&rdquo; to &ldquo;<strong>Perform</strong>&rdquo; &ldquo;<strong>Any</strong>&rdquo; of the &ldquo;<strong>Join</strong>&rdquo;&nbsp;<strong>Operations</strong>&nbsp;between &ldquo;<strong>Two DataFrames</strong>&rdquo;.</li> <li>The &ldquo;<strong>Developers</strong>&rdquo; need to know about the &ldquo;<strong>Join Selection Strategies</strong>&rdquo; so that the &ldquo;<strong>Wrong Join Selection Strategy</strong>&rdquo; is &ldquo;<strong>Not Used</strong>&rdquo; in the &ldquo;<strong>Join</strong>&rdquo;&nbsp;<strong>Operation</strong>&nbsp;between &ldquo;<strong>Two DataFrames</strong>&rdquo;.</li> <li>An &ldquo;<strong>Incorrect Join Selection Strategy</strong>&rdquo; will &ldquo;<strong>Increase</strong>&rdquo; the &ldquo;<strong>Execition Time</strong>&rdquo; of the &ldquo;<strong>Join</strong>&rdquo;&nbsp;<strong>Operation</strong>, and, the &ldquo;<strong>Join</strong>&rdquo;&nbsp;<strong>Operation</strong>&nbsp;becomes a &ldquo;<strong>Heavy Operation</strong>&rdquo; on the &ldquo;<strong>Executors</strong>&rdquo; as well.</li> </ul> <p><a href="https://oindrila-chakraborty88.medium.com/different-types-of-join-strategies-in-apache-spark-5c0066999d0d"><strong>Learn More</strong></a></p>
Tags: Apache Spark