The Secret Sauce behind 100K context window in LLMs: all tricks in one place
<blockquote>
<p><strong>tldr;</strong> techniques to speed up training and inference of LLMs to use large context window up to 100K input tokens during training and inference: ALiBi positional embedding, Sparse Attention, FlashAttention, Multi-Query attention, Conditional computation, and 80GB A100 GPUs.</p>
</blockquote>
<p>Recently there were several announcements about new Large Language Models (LLMs) that can consume an extremely large context window, such as <strong>65K tokens </strong>(<a href="https://www.mosaicml.com/blog/mpt-7b" rel="noopener ugc nofollow" target="_blank">MPT-7B-StoryWriter-65k+</a> by MosaicML) or <strong>even 100K tokens</strong> (<a href="https://www.anthropic.com/index/100k-context-windows" rel="noopener ugc nofollow" target="_blank">Introducing 100K Context Windows</a> by Antropic). In the Palm-2 <a href="https://ai.google/static/documents/palm2techreport.pdf" rel="noopener ugc nofollow" target="_blank">technical report</a>, Google doesn’t reveal the context size but mentions that they “<em>increase the context length of the model significantly</em>.”</p>
<p>For comparison, the current GPT-4 model <a href="https://help.openai.com/en/articles/7127966-what-is-the-difference-between-the-gpt-4-models" rel="noopener ugc nofollow" target="_blank">can work</a> with the context length of <strong>32K input tokens</strong>. And most of the open-source LLMs have a context length of <strong>2K tokens</strong>.</p>
<p>That’s impressive since having such a large context length means <strong>the prompt can be literally a size of a book</strong>. The Great Gatsby is 72K tokens, 210 pages, and 6 hours of reading at a 1.7 min/page speed. So the model can scan and keep this amount of “custom” information to process queries!</p>
<p>I was trying to wrap my head around how that is technically possible, so in this blog post, I collect scattered pieces of information (this <a href="https://twitter.com/finbarrtimbers/status/1656758500868112384?s=52&t=pAgoD9yNOpl6mP7UZRSX6Q" rel="noopener ugc nofollow" target="_blank">thread</a> was the first clue) and cover the following:</p>
<p><a href="https://blog.gopenai.com/how-to-speed-up-llms-and-use-100k-context-window-all-tricks-in-one-place-ffd40577b4c"><strong>Read More</strong></a></p>