Load Balancing: The Intuition Behind the Power of Two Random Choices

<p>In dynamic resource allocations and load balancing, one of the most well-known and fascinating algorithms is the so-called &ldquo;power of two random choices,&rdquo; which proposes a very simple change to the random allocations that yields exponential improvement. I was lucky enough to have implemented this very technique on a gigantic scale to improve the resource utilization of AWS Lambda, and yet I struggled to &ldquo;feel&rdquo; it intuitively for a long time.</p> <p>In this article, I&rsquo;d like to share my mental model for understanding this algorithm which also helps build a good intuition for many other advanced techniques in this area.</p> <p><strong>Update:</strong>&nbsp;I have a&nbsp;<a href="https://medium.com/@mihsathe/load-balancing-a-very-counterintuitive-improvement-to-the-best-of-k-algorithm-608ffbdb7c8a" rel="noopener">follow-up article</a>&nbsp;on a very cool optimization to this technique. Also, in addition to these articles, I&rsquo;ve started writing short form content&nbsp;<a href="https://twitter.com/_mihir_sathe" rel="noopener ugc nofollow" target="_blank">on twitter</a>&nbsp;for a faster feedback loop. Follow if you&rsquo;d like to keep up.</p> <h1>What is Load Balancing?</h1> <p>Let&rsquo;s say you run a large-scale AI bot service like ChatGPT. You have thousands of servers with expensive GPUs generating text based on user prompts. Now if your algorithm was routing about 80% of incoming requests to only 20% of servers, that 20% of servers would be quite busy, and the requests hitting those servers will likely have to wait for quite some time to see a GPU. On the other hand, 80% of cold servers are poorly utilized and waste many precious GPU hours. This is what we call bad load balancing.</p> <p><a href="https://betterprogramming.pub/load-balancing-the-intuition-behind-the-power-of-two-random-choices-6de2e139ac2f"><strong>Read More</strong></a></p>