Building Real-time Machine Learning Foundations at Lyft

In early 2022, Lyft already had a comprehensive Machine Learning Platform called LyftLearn composed of <a href="https://eng.lyft.com/powering-millions-of-real-time-decisions-with-lyftlearn-serving-9bb1f73318dc" rel="noopener ugc nofollow" target="_blank">model serving</a>, <a href="https://eng.lyft.com/lyftlearn-ml-model-training-infrastructure-built-on-kubernetes-aef8218842bb" rel="noopener ugc nofollow" target="_blank">training</a>, CI/CD, <a href="https://eng.lyft.com/ml-feature-serving-infrastructure-at-lyft-d30bf2d3c32a" rel="noopener ugc nofollow" target="_blank">feature serving</a>, and <a href="https://eng.lyft.com/full-spectrum-ml-model-monitoring-at-lyft-a4cdaf828e8f" rel="noopener ugc nofollow" target="_blank">model monitoring</a> systems. On the real-time front, LyftLearn supported real-time inference and input feature validation. However, streaming data was not supported as a first-class citizen across many of the platform’s systems — such as training, complex monitoring, and others. While several teams were using streaming data in their Machine Learning (ML) workflows, doing so was a laborious process, sometimes requiring weeks or months of engineering effort. On the flip side, there was a substantial appetite to build real-time ML systems from developers at Lyft. <a href="https://medium.com/lyft-engineering/building-real-time-machine-learning-foundations-at-lyft-6dd99b385a4e">Read More</a>