Building Real-time Machine Learning Foundations at Lyft
<p>In early 2022, Lyft already had a comprehensive Machine Learning Platform called LyftLearn composed of <a href="https://eng.lyft.com/powering-millions-of-real-time-decisions-with-lyftlearn-serving-9bb1f73318dc" rel="noopener ugc nofollow" target="_blank">model serving</a>, <a href="https://eng.lyft.com/lyftlearn-ml-model-training-infrastructure-built-on-kubernetes-aef8218842bb" rel="noopener ugc nofollow" target="_blank">training</a>, CI/CD, <a href="https://eng.lyft.com/ml-feature-serving-infrastructure-at-lyft-d30bf2d3c32a" rel="noopener ugc nofollow" target="_blank">feature serving</a>, and <a href="https://eng.lyft.com/full-spectrum-ml-model-monitoring-at-lyft-a4cdaf828e8f" rel="noopener ugc nofollow" target="_blank">model monitoring</a> systems.</p>
<p>On the real-time front, LyftLearn supported real-time inference and input feature validation. However, streaming data was not supported as a first-class citizen across many of the platform’s systems — such as training, complex monitoring, and others.</p>
<p>While several teams were using streaming data in their Machine Learning (ML) workflows, doing so was a laborious process, sometimes requiring weeks or months of engineering effort. On the flip side, there was a substantial appetite to build real-time ML systems from developers at Lyft.</p>
<p><strong>Lyft is a real-time marketplace and many teams benefit from enhancing their machine learning models with real-time signals.</strong></p>
<p>To meet the needs of our customers, we kicked off the <code>Real-time Machine Learning with Streaming</code> initiative. Our goal was to develop foundations that would enable the hundreds of ML developers at Lyft to efficiently develop new models and enhance existing models with streaming data.</p>
<p>In this blog post, we will discuss some what we built in support of that goal and the lessons we learned along the way.</p>
<h1><strong>Capabilities of Real-time Machine Learning</strong></h1>
<p>One of the first questions we asked ourselves is — what are the general use cases within the ML ecosystem that can leverage streaming data?</p>
<p>We identified three main capabilities of real-time ML applications which could leverage streaming:</p>
<ol>
<li><strong>Real-time Features</strong> → Computing features with real-time streaming data</li>
<li><strong>Real-time Learning </strong>→ Training models with real-time streaming data</li>
<li><strong>Event Driven Decisions </strong>→ Making decisions, e.g. retraining a model, triggering an alert, or running an inference call, with real-time streaming data</li>
</ol>
<p><a href="https://eng.lyft.com/building-real-time-machine-learning-foundations-at-lyft-6dd99b385a4e">Click Here</a> </p>