Understanding Transformers: A Step-by-Step Math Example — Part 1

<p>I understand that the transformer architecture may seem scary, and you might have encountered various explanations on YouTube or in blogs. However, in my blog, I will make an effort to clarify it by providing a comprehensive numerical example. By doing so, I hope to simplify the understanding of the transformer architecture.</p> <p>Shoutout to&nbsp;<a href="https://www.youtube.com/@HeduMathematicsofIntelligence" rel="noopener ugc nofollow" target="_blank">HeduAI</a>&nbsp;for providing clear explanations that have helped clarify my own concepts!</p> <p><strong>Let&rsquo;s get Started!</strong></p> <h1>Inputs and Positional Encoding</h1> <p>Let&rsquo;s solve the initial part where we will determine our inputs and calculate positional encoding for them.</p> <p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/1*eBg0WY6510NaFwP94G9Zog.png" style="height:445px; width:700px" /></p> <h2>Step 1 (Defining the data)</h2> <p>The initial step is to define our&nbsp;<strong>dataset (corpus)</strong>.</p> <p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/1*SlziEl8zomWrZbraMkWA3A.png" style="height:138px; width:700px" /></p> <p>In our dataset, there are&nbsp;<strong>3 sentences (dialogues)&nbsp;</strong>taken from the&nbsp;<strong>Game of Thrones&nbsp;</strong>TV show. Although this dataset may seem small, its size actually helps us in finding the results using the upcoming mathematical equations.</p> <h2>Step 2 (Finding the Vocab Size)</h2> <p>To determine the vocabulary size, we need to identify the total number of unique words in our dataset. This is crucial for encoding (i.e., converting the data into numbers).</p> <p><a href="https://blog.gopenai.com/understanding-transformers-a-step-by-step-math-example-part-1-a7809015150a"><strong>Click Here</strong></a></p>