Tidying Up the Framework of Dataset Shifts: The Example

<p>I recently talked about the causes of model performance degradation, meaning when their prediction quality drops with respect to the moment we trained and deployed our models.&nbsp;<a href="https://medium.com/towards-data-science/tidying-up-the-framework-of-dataset-shifts-cd9f922637b7" rel="noopener">In this other post</a>, I proposed a new way of thinking about the causes of model degradation. In that framework, the so-called conditional probability comes out as the global cause.</p> <p>The conditional probability is, by definition, composed of three probabilities which I call the specific causes. The most important learning of this restructure of concepts is that&nbsp;<em>covariate shift&nbsp;</em>and&nbsp;<em>conditional shift</em>&nbsp;are not two separate or parallel concepts.&nbsp;<em>Conditional shift</em>&nbsp;can happen as a function of&nbsp;<em>covariate shift</em>.</p> <p>With this restructuring, I believe it becomes easier to think about the causes and it becomes more logical to interpret the shifts that we observe in our applications.</p> <p>This is the scheme of causes and model performance for machine learning models:</p> <p><a href="https://towardsdatascience.com/tidying-up-the-framework-of-dataset-shifts-the-example-77807ee952f5"><strong>Visit Now</strong></a></p>