Monitoring Machine Learning Models in Production: Why and How?

<p>Machine Learning (ML) model development often takes time and requires technical expertise. As data science enthusiasts, when we acquire a dataset to explore and analyze, we eagerly train and validate it using diverse&nbsp;<a href="https://medium.com/geekculture/whats-the-public-sentiment-under-inflationary-economic-environment-5edc899efa29" rel="noopener">state-of-the-art models</a>&nbsp;or employing&nbsp;<a href="https://medium.com/mlearning-ai/how-to-apply-data-centric-ai-mindset-to-text-classification-problems-b41656c70c16" rel="noopener">data-centric strategies</a>. It feels incredibly fulfilling when we optimize the model&rsquo;s performance as if all the tasks have been accomplished.</p> <p>However, after deploying the model to production, there are plenty of reasons that contribute to lower model performance or degradation.</p> <p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/0*0bV_DzPbXJsjS1sE" style="height:394px; width:700px" /></p> <p>Photo by&nbsp;<a href="https://unsplash.com/@adriendlf?utm_source=medium&amp;utm_medium=referral" rel="noopener ugc nofollow" target="_blank">Adrien Delforge</a>&nbsp;on&nbsp;<a href="https://unsplash.com/?utm_source=medium&amp;utm_medium=referral" rel="noopener ugc nofollow" target="_blank">Unsplash</a></p> <p><strong>#1 The training data is generated through simulation</strong></p> <p>Data scientists often&nbsp;<a href="https://www.cio.com/article/465251/how-can-cios-protect-personal-identifiable-information-pii-for-a-new-class-of-data-consumers.html" rel="noopener ugc nofollow" target="_blank">face limitations</a>&nbsp;in accessing the production data, which results in training the model using simulated or sample data instead. While data engineers bear the responsibility of ensuring the representativeness of the training data in terms of scale and complexity, the training data still deviates to some extent from the production data. There is also a risk of systematic flaws in upstream data processing, such as data collection and labeling. These factors can impact the extraction of additional useful input features or hinder the model&rsquo;s ability to generalize well.</p> <p><a href="https://towardsdatascience.com/monitoring-machine-learning-models-in-production-why-and-how-13d07a5ff0c6"><strong>Website</strong></a></p>