Tidying Up the Framework of Dataset Shifts: The Example
<p>I recently talked about the causes of model performance degradation, meaning when their prediction quality drops with respect to the moment we trained and deployed our models. <a href="https://medium.com/towards-data-science/tidying-up-the-framework-of-dataset-shifts-cd9f922637b7" rel="noopener">In this other post</a>, I proposed a new way of thinking about the causes of model degradation. In that framework, the so-called conditional probability comes out as the global cause.</p>
<p>The conditional probability is, by definition, composed of three probabilities which I call the specific causes. The most important learning of this restructure of concepts is that <em>covariate shift </em>and <em>conditional shift</em> are not two separate or parallel concepts. <em>Conditional shift</em> can happen as a function of <em>covariate shift</em>.</p>
<p>With this restructuring, I believe it becomes easier to think about the causes and it becomes more logical to interpret the shifts that we observe in our applications.</p>
<p>This is the scheme of causes and model performance for machine learning models:</p>
<p><a href="https://towardsdatascience.com/tidying-up-the-framework-of-dataset-shifts-the-example-77807ee952f5"><strong>Website</strong></a></p>