Forget 32K of GPT4: LongNet Has a Billion Token Context

<p>On 19th July, Microsoft published a paper that is being considered as a major step forward in the development of architectures to develop large language models that could have a practically unlimited context length. Microsoft proposed and developed a transformer model that can scale to theoretically a billion tokens. This removes the major obstacle in the practical use case for the large language modes also known as &ldquo;Context length restriction&rdquo;.</p> <p>In this article, we will walk through &mdash;</p> <ol> <li>Large Language Models (LLMs)</li> <li>Remember me! context matters</li> <li>How to Achieve a Larger Context</li> <li>Current Networks For LLMs</li> <li>Difficulty of Scaling</li> <li>Microsoft&rsquo;s solution LongNet</li> <li>Distributed Trainer</li> <li>Results and Verification of Scaling to 1B Tokens</li> <li>Closing Thoughts</li> </ol> <p>So, let&#39;s get started.</p> <h1>Large Language Models (LLMs)</h1> <p>Large Language Models are the deep learning models that are deep, and have millions if not billions of parameters. These models are generally trained on the &ldquo;General text&rdquo; corpus from the internet. Such Corpus may have up to a trillion tokens (i.e., if it exists on the internet, the text was used to train the large language model).</p> <p><a href="https://pub.towardsai.net/longnet-a-billion-token-context-a6470f33e844">Website</a></p>