Inside GPT — I : Understanding the text generation

<p>Regularly engaging with colleagues across diverse domains, I enjoy the challenge of conveying machine learning concepts to people who have little to no background in data science. Here, I attempt to explain how GPT is wired in simple terms, only this time in written form.</p> <p>Behind ChatGPT&rsquo;s popular magic, there is an unpopular logic. You write a prompt to ChatGPT and it generates text and whether it is accurate, it resembles human answers. How is it able to understand your prompt and generate coherent and comprehensible answers?</p> <p><strong>Transformer Neural Networks.</strong>&nbsp;The architecture designed to process unstructured data in vast amounts, in our case, text. When we say architecture, what we mean is essentially a series of mathematical operations that were made in several layers in parallel. Through this system of equations, several innovations were introduced that helped us overcome the long-existing challenges of text generation. The challenges that we were struggling to solve up until 5 years ago.</p> <p>If GPT has already been here for 5 years (indeed GPT paper was published in 2018), isn&rsquo;t GPT old news? Why has it become immensely popular recently? What is the difference between GPT 1, 2, 3, 3.5 (ChatGPT ) and 4?</p> <p>All GPT versions were built on the same architecture. However each following model contained more parameters and trained using larger text datasets. There were obviously other novelties introduced by the later GPT releases especially in the training processes like&nbsp;reinforcement learning&nbsp;through human feedback which we will explain in the 3rd part of this blog series.</p> <p><a href="https://towardsdatascience.com/inside-gpt-i-1e8840ca8093">Click Here</a></p>