Intuitive Guide to Understanding KL Divergence

<h1>Concept Grounding</h1> <p>First of all let us build some ground rules. We will define few things we need to know like the back of our hands to understand KL divergence.</p> <h2>What is a Distribution</h2> <p>By distribution we refer to different things such as data distributions or probability distributions. Here we are interested in probability distributions. Imagine you draw two axis (that is,&nbsp;<strong><em>X</em>&nbsp;</strong>and&nbsp;<strong><em>Y</em></strong>) on a paper, I like to imagine a distribution as a thread dropped between the two axis;&nbsp;<strong><em>X</em></strong>&nbsp;and&nbsp;<strong><em>Y</em></strong>.&nbsp;<strong><em>X</em></strong>&nbsp;represents different values you are interested in obtaining probabilities for.&nbsp;<em>Y&nbsp;</em>represents the probability of observing some value on the&nbsp;<strong><em>X</em></strong>&nbsp;axis (that is,&nbsp;<strong><em>y=p(x)</em></strong>). I visualize this below.</p> <p><a href="https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-understanding-kl-divergence-2b382ca2b2a8"><strong>Website</strong></a></p>
Tags: KL Divergence