Everything You Should Know About Evaluating Large Language Models
<p>As open source language models become more readily available, getting lost in all the options is easy.</p>
<p>How do we determine their performance and compare them? And how can we confidently say that one model is better than another?</p>
<p>This article provides some answers by presenting training and evaluation metrics, and general and specific benchmarks to have a clear picture of your model’s performance.</p>
<p>If you missed it, take a look at the first article in the Open Language Models series:</p>
<h2><a href="https://towardsdatascience.com/a-gentle-introduction-to-open-source-large-language-models-3643f5ca774?source=post_page-----dce69ef8b2d2--------------------------------" rel="noopener follow" target="_blank">A Gentle Introduction to Open Source Large Language Models</a></h2>
<h3><a href="https://towardsdatascience.com/a-gentle-introduction-to-open-source-large-language-models-3643f5ca774?source=post_page-----dce69ef8b2d2--------------------------------" rel="noopener follow" target="_blank">Why everyone is talking about Llamas, Alpacas, Falcons and other animals</a></h3>
<p><a href="https://towardsdatascience.com/a-gentle-introduction-to-open-source-large-language-models-3643f5ca774?source=post_page-----dce69ef8b2d2--------------------------------" rel="noopener follow" target="_blank">towardsdatascience.com</a></p>
<h1>Perplexity</h1>
<p>Language models define a probability distribution over a vocabulary of words to select the most likely next word in a sequence. Given a text, a language model assigns a probability to each word in the language, and the most likely is selected.</p>
<p><strong>Perplexity </strong>measures how well a language model can predict the next word in a given sequence. As a training metric, it shows how well the models learned its training set.</p>
<p>We won’t go into the mathematical details but intuitively, <strong>minimizing perplexity means maximizing the predicted probability.</strong></p>
<p>In other words, the best model is the one that is not<em> surprised</em> when it sees the new text because it’s expecting it — meaning it already predicted well what words are coming next in the sequence.</p>
<p><a href="https://towardsdatascience.com/everything-you-should-know-about-evaluating-large-language-models-dce69ef8b2d2"><strong>Read More</strong></a></p>