Topic Modeling with Llama 2

<p>With the advent of&nbsp;<strong>Llama 2</strong>, running strong LLMs locally has become more and more a reality. Its accuracy approaches OpenAI&rsquo;s GPT-3.5, which serves well for many use cases.</p> <p>In this article, we will explore how we can use Llama2 for Topic Modeling without the need to pass every single document to the model. Instead, we are going to leverage&nbsp;<a href="https://github.com/MaartenGr/BERTopic" rel="noopener ugc nofollow" target="_blank"><strong>BERTopic</strong></a>, a modular topic modeling technique that can use any LLM for fine-tuning topic representations.</p> <p>BERTopic works rather straightforward. It consists of 5 sequential steps:</p> <ol> <li>Embedding documents</li> <li>Reducing the dimensionality of embeddings</li> <li>Cluster reduced embeddings</li> <li>Tokenize documents per cluster</li> <li>Extract best-representing words per cluster</li> </ol> <p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/1*BY9n2IWgoFJ3uNnE4wN7cw.png" style="height:405px; width:700px" /></p> <p>The 5 main steps of BERTopic.</p> <p>However, with the rise of LLMs like&nbsp;<strong>Llama 2</strong>, we can do much better than a bunch of independent words per topic. It is computationally not feasible to pass all documents to Llama 2 directly and have it analyze them. We can employ vector databases for search but we are not entirely sure which topics to search for.</p> <p><a href="https://towardsdatascience.com/topic-modeling-with-llama-2-85177d01e174"><strong>Visit Now</strong></a></p>
Tags: Llama Modeling