Topic Modeling with Llama 2
<p>With the advent of <strong>Llama 2</strong>, running strong LLMs locally has become more and more a reality. Its accuracy approaches OpenAI’s GPT-3.5, which serves well for many use cases.</p>
<p>In this article, we will explore how we can use Llama2 for Topic Modeling without the need to pass every single document to the model. Instead, we are going to leverage <a href="https://github.com/MaartenGr/BERTopic" rel="noopener ugc nofollow" target="_blank"><strong>BERTopic</strong></a>, a modular topic modeling technique that can use any LLM for fine-tuning topic representations.</p>
<p>BERTopic works rather straightforward. It consists of 5 sequential steps:</p>
<ol>
<li>Embedding documents</li>
<li>Reducing the dimensionality of embeddings</li>
<li>Cluster reduced embeddings</li>
<li>Tokenize documents per cluster</li>
<li>Extract best-representing words per cluster</li>
</ol>
<p><a href="https://towardsdatascience.com/topic-modeling-with-llama-2-85177d01e174"><strong>Learn More</strong></a></p>