Tidying Up the Framework of Dataset Shifts: The Example

I recently talked about the causes of model performance degradation, meaning when their prediction quality drops with respect to the moment we trained and deployed our models. In this other post, I proposed a new way of thinking about the causes of model degradation. In that framework, the so-called conditional probability comes out as the global cause.

The conditional probability is, by definition, composed of three probabilities which I call the specific causes. The most important learning of this restructure of concepts is that covariate shift and conditional shift are not two separate or parallel concepts. Conditional shift can happen as a function of covariate shift.

With this restructuring, I believe it becomes easier to think about the causes and it becomes more logical to interpret the shifts that we observe in our applications.

This is the scheme of causes and model performance for machine learning models:

Visit Now