Sequential A/B Testing Keeps the World Streaming Netflix Part 1: Continuous Data

<p>Can you spot any difference between the two data streams below? Each observation is the time interval between a Netflix member hitting the play button and playback commencing, i.e.,&nbsp;<em>play-delay</em>. These observations are from a particular type of A/B test that Netflix runs called a software canary or regression-driven experiment. More on that below &mdash; for now, what&rsquo;s important is that we want to&nbsp;<strong>quickly</strong>&nbsp;and&nbsp;<strong>confidently</strong>&nbsp;identify any difference in the distribution of play-delay &mdash; or conclude that, within some tolerance, there is no difference.</p> <p>In this blog post, we will develop a statistical procedure to do just that, and describe the impact of these developments at Netflix.&nbsp;The key idea is to switch from a &ldquo;fixed time horizon&rdquo; to an &ldquo;any-time valid&rdquo; framing of the problem.</p> <p><a href=""><strong>Website</strong></a></p>