Probability Calibration And Isotonic Regression


What is this Isotonic Regression

Isotonic Regression (IR) is a special case of a degree 0 spline (aka piece-wise function), such that it is monotonically increasing.

Mathematicaly it tries to fit a weighted least-squares via Quadratic Programming.

Basically, the degree 0 spline needs to be increasing, as illustrated via the green line below.

Application 1: Probability Calibration

In the classification setting, we’d often like to obtain an accurate probability of a class. A well tuned probability score is the foundation for correctly assessing a situation. For instance, in advertising, we want to know the exact probability that an user clicks on an ad, in order to figure out how much an impression is worth.

However, many classifiers, such as SVM, output a score instead of a probability. Unlike logistic regression, the output of the classifier can’t be directly interpreted as probability. For instance, an svm score indicates how far the sample is from the hyperplane; a random forest classifier rarely gives extremely scores because bagging averages the predictions.

Isotonic Regression comes in as function that maps the scores to probabilities. In the below diagram, we can see that IR transforms the green curve into the red curve, which is generalizes the scores to probabilties.

Note: We chose Isotonic Regression instead of a normal spline because we want the probability function of the score to be strictly increasing.

Application 2: Non-metric Multi Dimensional Scaling

First, let’s go over what non-metric and Multi-Dimensional Scaling (MDS) mean respectively.

MDS is a powerful visualization technique that maps high dimensional data into the 2D space. In modern MDS, we typically construct an item-item similarity matrix from high dimensions.

  site 1 site 2 site 3
site 1 1 0.8 0.4
site 2 0.8 1 0.6
site 3 0.4 0.6 1

Then we learn via SGD low dimensional points such that the similarity between items in low dimensions approximates the similarity in high dimensions.

Non-metric on the other hand means that the quatitative variable at hand (e.g. spiciness: 0, 1, 2, 3, 4) is not scaled properly. The difference between spiciness lv1 and lv2 might be 20 times the difference between lv2 and lv3. Since the data is ordinal, we care more about the ranks instead of distance metrics. So the similarity of items in high dimensions loses its meaning.

Isotonic Regression comes handy here by mapping these ordinal data into a monotonically increasing function, which preserves order. Essentially, we construct the item-item similarity matrix from the isotonic Regression function instead of metrics like Euclidean distance.

Conclusion

Isotonic Regression is a special case of splines. It is typically solved via quadratic programming and has some niche applications because of the monotonically increasing constraint.