How GPT works: A Metaphoric Explanation of Key, Value, Query in Attention, using a Tale of Potion

The backbone of ChatGPT is the GPT model, which is built using the Transformer architecture. The backbone of Transformer is the Attention mechanism. The hardest concept to grok in Attention for many is Key, Value, and Query. In this post, I will use an analogy of potion to internalize these concepts. Even if you already understand the maths of transformer mechanically, I hope by the end of this post, you can develop a more intuitive understanding of the inner workings of GPT from end to end. <blockquote> This explanation requires no maths background. For the technically inclined, I add more technical explanations in […]. You can also safely skip notes in [brackets] and side notes in quote blocks like this one. Throughout my writing, I make up some human-readable interpretation of the intermediary states of the transformer model to aid the explanation, but GPT doesn’t think exactly like that. [When I talk about “attention”, I exclusively mean “self-attention”, as that is what’s behind GPT. But the same analogy explains the general concept of “attention” just as well.] </blockquote> <h2>The Set Up</h2> GPT can spew out paragraphs of coherent content, because it does one task superbly well: “Given a text, what word comes next?” Let’s role-play GPT: “Sarah lies still on the bed, feeling ____”. Can you fill in the blank? One reasonable answer, among many, is “tired”. In the rest of the post, I will unpack how GPT arrives at this answer. (For fun, I put this prompt in ChatGPT and it wrote a short <a href="https://chat.openai.com/share/169f2702-3811-4388-b3d4-67064903f4b2" rel="noopener ugc nofollow" target="_blank">story</a> out of it.) <a href="https://towardsdatascience.com/how-gpt-works-a-metaphoric-explanation-of-key-value-query-in-attention-using-a-tale-of-potion-8c66ace1f470">Click Here</a>