Intuitive Guide to Understanding KL Divergence

<h1>Concept Grounding</h1> First of all let us build some ground rules. We will define few things we need to know like the back of our hands to understand KL divergence. <h2>What is a Distribution</h2> By distribution we refer to different things such as data distributions or probability distributions. Here we are interested in probability distributions. Imagine you draw two axis (that is, X and Y) on a paper, I like to imagine a distribution as a thread dropped between the two axis; X and Y. X represents different values you are interested in obtaining probabilities for. Y represents the probability of observing some value on the X axis (that is, y=p(x)). I visualize this below. <a href="https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-understanding-kl-divergence-2b382ca2b2a8">Website</a>