The Secret Sauce behind 100K context window in LLMs: all tricks in one place

<blockquote> <p><strong>tldr;</strong>&nbsp;techniques to speed up training and inference of LLMs to use large context window up to 100K input tokens during training and inference: ALiBi positional embedding, Sparse Attention, FlashAttention, Multi-Query attention, Conditional computation, and 80GB A100 GPUs.</p> </blockquote> <p>Recently there were several announcements about new Large Language Models (LLMs) that can consume an extremely large context window, such as&nbsp;<strong>65K tokens&nbsp;</strong>(<a href="https://www.mosaicml.com/blog/mpt-7b" rel="noopener ugc nofollow" target="_blank">MPT-7B-StoryWriter-65k+</a>&nbsp;by MosaicML) or&nbsp;<strong>even 100K tokens</strong>&nbsp;(<a href="https://www.anthropic.com/index/100k-context-windows" rel="noopener ugc nofollow" target="_blank">Introducing 100K Context Windows</a>&nbsp;by Antropic). In the Palm-2&nbsp;<a href="https://ai.google/static/documents/palm2techreport.pdf" rel="noopener ugc nofollow" target="_blank">technical report</a>, Google doesn&rsquo;t reveal the context size but mentions that they &ldquo;<em>increase the context length of the model significantly</em>.&rdquo;</p> <p>For comparison, the current GPT-4 model&nbsp;<a href="https://help.openai.com/en/articles/7127966-what-is-the-difference-between-the-gpt-4-models" rel="noopener ugc nofollow" target="_blank">can work</a>&nbsp;with the context length of&nbsp;<strong>32K input tokens</strong>. And most of the open-source LLMs have a context length of&nbsp;<strong>2K tokens</strong>.</p> <p>That&rsquo;s impressive since having such a large context length means&nbsp;<strong>the prompt can be literally a size of a book</strong>. The Great Gatsby is 72K tokens, 210 pages, and 6 hours of reading at a 1.7 min/page speed. So the model can scan and keep this amount of &ldquo;custom&rdquo; information to process queries!</p> <p>I was trying to wrap my head around how that is technically possible, so in this blog post, I collect scattered pieces of information (this&nbsp;<a href="https://twitter.com/finbarrtimbers/status/1656758500868112384?s=52&amp;t=pAgoD9yNOpl6mP7UZRSX6Q" rel="noopener ugc nofollow" target="_blank">thread</a>&nbsp;was the first clue) and cover the following:</p> <p><a href="https://blog.gopenai.com/how-to-speed-up-llms-and-use-100k-context-window-all-tricks-in-one-place-ffd40577b4c"><strong>Read More</strong></a></p>
Tags: LLMs window