Intuitive Guide to Understanding KL Divergence
<h1>Concept Grounding</h1>
<p>First of all let us build some ground rules. We will define few things we need to know like the back of our hands to understand KL divergence.</p>
<h2>What is a Distribution</h2>
<p>By distribution we refer to different things such as data distributions or probability distributions. Here we are interested in probability distributions. Imagine you draw two axis (that is, <strong><em>X</em> </strong>and <strong><em>Y</em></strong>) on a paper, I like to imagine a distribution as a thread dropped between the two axis; <strong><em>X</em></strong> and <strong><em>Y</em></strong>. <strong><em>X</em></strong> represents different values you are interested in obtaining probabilities for. <em>Y </em>represents the probability of observing some value on the <strong><em>X</em></strong> axis (that is, <strong><em>y=p(x)</em></strong>). I visualize this below.</p>
<p><a href="https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-understanding-kl-divergence-2b382ca2b2a8"><strong>Website</strong></a></p>