How GPT works: A Metaphoric Explanation of Key, Value, Query in Attention, using a Tale of Potion

<p>The backbone of ChatGPT is the GPT model, which is built using the&nbsp;<strong>Transformer</strong>&nbsp;architecture. The backbone of Transformer is the&nbsp;<strong>Attention&nbsp;</strong>mechanism. The hardest concept to grok in Attention for many is&nbsp;<strong>Key, Value, and Query</strong>. In this post, I will use an analogy of potion to internalize these concepts. Even if you already understand the maths of transformer mechanically, I hope by the end of this post, you can develop a more intuitive understanding of the inner workings of GPT from end to end.</p> <blockquote> <p>This explanation requires no maths background. For the technically inclined, I add more technical explanations in [&hellip;]. You can also safely skip notes in [brackets] and side notes in quote blocks like this one. Throughout my writing, I make up some human-readable interpretation of the intermediary states of the transformer model to aid the explanation, but GPT doesn&rsquo;t think exactly like that.</p> <p>[When I talk about &ldquo;attention&rdquo;, I exclusively mean &ldquo;self-attention&rdquo;, as that is what&rsquo;s behind GPT. But the same analogy explains the general concept of &ldquo;attention&rdquo; just as well.]</p> </blockquote> <h2>The Set Up</h2> <p>GPT can spew out paragraphs of coherent content, because it does one task superbly well: &ldquo;Given a text, what word comes next?&rdquo; Let&rsquo;s role-play GPT:&nbsp;<em>&ldquo;Sarah lies still on the bed, feeling ____&rdquo;.&nbsp;</em>Can you fill in the blank?</p> <p>One reasonable answer, among many, is&nbsp;<em>&ldquo;tired&rdquo;</em>. In the rest of the post, I will unpack how GPT arrives at this answer. (For fun, I put this prompt in ChatGPT and it wrote a short&nbsp;<a href="" rel="noopener ugc nofollow" target="_blank">story</a>&nbsp;out of it.)</p> <p><a href="">Click Here</a></p>