Natural Language Processing For Absolute Beginners

<p>It is mostly true that NLP (Natural Language Processing) is a complex area of computer science. Frameworks like SpaCy or NLTK are large and often require some learning. But with the help of open-source large language models (LLMs) and modern Python libraries, many tasks can be solved much more easily. And even more, results, which only several years ago were available only in science papers, can now be achieved with only 10 lines of Python code.</p> <p>Without further ado, let&rsquo;s get into it.</p> <h2>1. Language Translation</h2> <p>Have you ever wondered how Google Translate works? Google&nbsp;<a href="https://blog.research.google/2020/06/recent-advances-in-google-translate.html" rel="noopener ugc nofollow" target="_blank">is using</a>&nbsp;a deep learning model trained on a vast amount of text. Now, with the help of the&nbsp;<a href="https://huggingface.co/docs/transformers/index" rel="noopener ugc nofollow" target="_blank">Transformers library</a>, it can be done not only in Google Labs but on an ordinary PC. In this example, I will be using a pre-trained&nbsp;<a href="https://huggingface.co/t5-base" rel="noopener ugc nofollow" target="_blank">T5-base</a>&nbsp;(Text-to-Text Transfer Transformer) model. This model was first trained on raw text data, then fine-tuned on source-target pairs like (&ldquo;translate English to German: the house is wonderful&rdquo;, &ldquo;Das Haus ist Wunderbar&rdquo;). Here &ldquo;translate English to German&rdquo; is a prefix that &ldquo;tells&rdquo; the model what to do, and the phrases are the actual context that the model should learn.</p> <blockquote> <p>I<strong>mportant warning</strong>. Large language models are literally pretty large. The&nbsp;<em>T5ForConditionalGeneration</em>&nbsp;class, used in this example, will automatically download the &ldquo;t5-base&rdquo; model, which is about 900 MB in size. Before running the code, be sure that there is enough disk space and that your traffic is not limited.</p> </blockquote> <p>A pre-trained T5 model can be used in Python</p> <p><a href="https://towardsdatascience.com/natural-language-processing-for-absolute-beginners-a195549a3164">Website</a></p>
Tags: LLMs NLP NLTK