Probabilistic ML with Quantile Matching: an Example with Python

When we train regressive models, we obtain point predictions. However, in practice we are often interested in estimating the uncertainty associated to each prediction. To achieve that, we assume that the value we are trying to predict is a random variable, and the goal is to estimate its distribution.

There are many methods available to estimate uncertainty from predictions, such as variance estimationBayesian methodsconformal predictions, etc. Quantile regression is one of these well-known methods.

Quantile regression

Quantile regression consists in estimating one model for each quantile you are interested in. This can be achieved by the use of an asymmetric loss function, known as pinball loss. Quantile regression is simple, easy to understand, and readily available in high performing libraries such as LightGBM. However, quantile regression presents some issues:

  • There is no guarantee that the order of the quantiles will be correct. For example, your prediction for the 50% quantile could be greater than the one you get for the 60% quantile, which is absurd.
  • To obtain an estimate of the entire distribution, you need to train many models. For instance, if you need an estimate for each point percent quantile, you have to train 99 models.

Read More 

Tags: ML Python