Everything You Should Know About Evaluating Large Language Models

<p>As open source language models become more readily available, getting lost in all the options is easy.</p> <p>How do we determine their performance and compare them? And how can we confidently say that one model is better than another?</p> <p>This article provides some answers by presenting training and evaluation metrics, and general and specific benchmarks to have a clear picture of your model&rsquo;s performance.</p> <p>If you missed it, take a look at the first article in the Open Language Models series:</p> <h2><a href="https://towardsdatascience.com/a-gentle-introduction-to-open-source-large-language-models-3643f5ca774?source=post_page-----dce69ef8b2d2--------------------------------" rel="noopener follow" target="_blank">A Gentle Introduction to Open Source Large Language Models</a></h2> <h3><a href="https://towardsdatascience.com/a-gentle-introduction-to-open-source-large-language-models-3643f5ca774?source=post_page-----dce69ef8b2d2--------------------------------" rel="noopener follow" target="_blank">Why everyone is talking about Llamas, Alpacas, Falcons and other animals</a></h3> <p><a href="https://towardsdatascience.com/a-gentle-introduction-to-open-source-large-language-models-3643f5ca774?source=post_page-----dce69ef8b2d2--------------------------------" rel="noopener follow" target="_blank">towardsdatascience.com</a></p> <h1>Perplexity</h1> <p>Language models define a probability distribution over a vocabulary of words to select the most likely next word in a sequence. Given a text, a language model assigns a probability to each word in the language, and the most likely is selected.</p> <p><strong>Perplexity&nbsp;</strong>measures how well a language model can predict the next word in a given sequence. As a training metric, it shows how well the models learned its training set.</p> <p>We won&rsquo;t go into the mathematical details but intuitively,&nbsp;<strong>minimizing perplexity means maximizing the predicted probability.</strong></p> <p>In other words, the best model is the one that is not<em>&nbsp;surprised</em>&nbsp;when it sees the new text because it&rsquo;s expecting it &mdash; meaning it already predicted well what words are coming next in the sequence.</p> <p><a href="https://towardsdatascience.com/everything-you-should-know-about-evaluating-large-language-models-dce69ef8b2d2"><strong>Read More</strong></a></p>