Topic Modeling with Llama 2
<p>With the advent of <strong>Llama 2</strong>, running strong LLMs locally has become more and more a reality. Its accuracy approaches OpenAI’s GPT-3.5, which serves well for many use cases.</p>
<p>In this article, we will explore how we can use Llama2 for Topic Modeling without the need to pass every single document to the model. Instead, we are going to leverage <a href="https://github.com/MaartenGr/BERTopic" rel="noopener ugc nofollow" target="_blank"><strong>BERTopic</strong></a>, a modular topic modeling technique that can use any LLM for fine-tuning topic representations.</p>
<p>BERTopic works rather straightforward. It consists of 5 sequential steps:</p>
<ol>
<li>Embedding documents</li>
<li>Reducing the dimensionality of embeddings</li>
<li>Cluster reduced embeddings</li>
<li>Tokenize documents per cluster</li>
<li>Extract best-representing words per cluster</li>
</ol>
<p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/1*BY9n2IWgoFJ3uNnE4wN7cw.png" style="height:405px; width:700px" /></p>
<p>The 5 main steps of BERTopic.</p>
<p>However, with the rise of LLMs like <strong>Llama 2</strong>, we can do much better than a bunch of independent words per topic. It is computationally not feasible to pass all documents to Llama 2 directly and have it analyze them. We can employ vector databases for search but we are not entirely sure which topics to search for.</p>
<p><a href="https://towardsdatascience.com/topic-modeling-with-llama-2-85177d01e174"><strong>Visit Now</strong></a></p>