Forget 32K of GPT4: LongNet Has a Billion Token Context

On 19th July, Microsoft published a paper that is being considered as a major step forward in the development of architectures to develop large language models that could have a practically unlimited context length. Microsoft proposed and developed a transformer model that can scale to theoretically a billion tokens. This removes the major obstacle in the practical use case for the large language modes also known as “Context length restriction”. In this article, we will walk through — <ol> <li>Large Language Models (LLMs)</li> <li>Remember me! context matters</li> <li>How to Achieve a Larger Context</li> <li>Current Networks For LLMs</li> <li>Difficulty of Scaling</li> <li>Microsoft’s solution LongNet</li> <li>Distributed Trainer</li> <li>Results and Verification of Scaling to 1B Tokens</li> <li>Closing Thoughts</li> </ol> So, let's get started. <h1>Large Language Models (LLMs)</h1> Large Language Models are the deep learning models that are deep, and have millions if not billions of parameters. These models are generally trained on the “General text” corpus from the internet. Such Corpus may have up to a trillion tokens (i.e., if it exists on the internet, the text was used to train the large language model). <a href="https://pub.towardsai.net/longnet-a-billion-token-context-a6470f33e844">Website</a>