How GPT works: A Metaphoric Explanation of Key, Value, Query in Attention, using a Tale of Potion
<p>The backbone of ChatGPT is the GPT model, which is built using the <strong>Transformer</strong> architecture. The backbone of Transformer is the <strong>Attention </strong>mechanism. The hardest concept to grok in Attention for many is <strong>Key, Value, and Query</strong>. In this post, I will use an analogy of potion to internalize these concepts. Even if you already understand the maths of transformer mechanically, I hope by the end of this post, you can develop a more intuitive understanding of the inner workings of GPT from end to end.</p>
<blockquote>
<p>This explanation requires no maths background. For the technically inclined, I add more technical explanations in […]. You can also safely skip notes in [brackets] and side notes in quote blocks like this one. Throughout my writing, I make up some human-readable interpretation of the intermediary states of the transformer model to aid the explanation, but GPT doesn’t think exactly like that.</p>
<p>[When I talk about “attention”, I exclusively mean “self-attention”, as that is what’s behind GPT. But the same analogy explains the general concept of “attention” just as well.]</p>
</blockquote>
<h2>The Set Up</h2>
<p>GPT can spew out paragraphs of coherent content, because it does one task superbly well: “Given a text, what word comes next?” Let’s role-play GPT: <em>“Sarah lies still on the bed, feeling ____”. </em>Can you fill in the blank?</p>
<p>One reasonable answer, among many, is <em>“tired”</em>. In the rest of the post, I will unpack how GPT arrives at this answer. (For fun, I put this prompt in ChatGPT and it wrote a short <a href="https://chat.openai.com/share/169f2702-3811-4388-b3d4-67064903f4b2" rel="noopener ugc nofollow" target="_blank">story</a> out of it.)</p>
<p><a href="https://towardsdatascience.com/how-gpt-works-a-metaphoric-explanation-of-key-value-query-in-attention-using-a-tale-of-potion-8c66ace1f470">Click Here</a></p>