The Secret Sauce behind 100K context window in LLMs: all tricks in one place

<blockquote> tldr; techniques to speed up training and inference of LLMs to use large context window up to 100K input tokens during training and inference: ALiBi positional embedding, Sparse Attention, FlashAttention, Multi-Query attention, Conditional computation, and 80GB A100 GPUs. </blockquote> Recently there were several announcements about new Large Language Models (LLMs) that can consume an extremely large context window, such as 65K tokens (<a href="https://www.mosaicml.com/blog/mpt-7b" rel="noopener ugc nofollow" target="_blank">MPT-7B-StoryWriter-65k+</a> by MosaicML) or even 100K tokens (<a href="https://www.anthropic.com/index/100k-context-windows" rel="noopener ugc nofollow" target="_blank">Introducing 100K Context Windows</a> by Antropic). In the Palm-2 <a href="https://ai.google/static/documents/palm2techreport.pdf" rel="noopener ugc nofollow" target="_blank">technical report</a>, Google doesn’t reveal the context size but mentions that they “increase the context length of the model significantly.” For comparison, the current GPT-4 model <a href="https://help.openai.com/en/articles/7127966-what-is-the-difference-between-the-gpt-4-models" rel="noopener ugc nofollow" target="_blank">can work</a> with the context length of 32K input tokens. And most of the open-source LLMs have a context length of 2K tokens. That’s impressive since having such a large context length means the prompt can be literally a size of a book. The Great Gatsby is 72K tokens, 210 pages, and 6 hours of reading at a 1.7 min/page speed. So the model can scan and keep this amount of “custom” information to process queries! I was trying to wrap my head around how that is technically possible, so in this blog post, I collect scattered pieces of information (this <a href="https://twitter.com/finbarrtimbers/status/1656758500868112384?s=52&t=pAgoD9yNOpl6mP7UZRSX6Q" rel="noopener ugc nofollow" target="_blank">thread</a> was the first clue) and cover the following: <a href="https://blog.gopenai.com/how-to-speed-up-llms-and-use-100k-context-window-all-tricks-in-one-place-ffd40577b4c">Read More</a>