Quantile Regression: Principles, Examples, and Python Implementation

Régression Quantile : Guide Complet — Principes, Exemples et Implémentation Python

Quantile Regression: Complete Guide — Principles, Examples, and Python Implementation

SummaryQuantile regression is a supervised learning technique that estimates the conditional quantiles of a target variable, going beyond the simple mean. Unlike ordinary linear regression, it provides a more complete view of the conditional distribution of the target. It is particularly useful for building prediction intervals, analyzing heteroscedasticity, and modeling risk.


Mathematical principle

Quantile regression was introduced by Roger Koenker and Gilbert Bassett in 1978. Its fundamental idea is to replace the minimization of mean squared error with the minimization of an asymmetric loss function, called pinball loss or check loss.

The conditional quantile

In classical linear regression, we model:

$$
\mathbb{E}[Y \mid X=x]
$$

that is, the conditional mean of $Y$ given $X=x$.

In quantile regression, we instead seek to model:

$$
Q_\tau(Y \mid X=x)
$$

where $Q_\tau$ denotes the quantile of order $\tau$, with $\tau \in (0,1)$.

For example:

  • $\tau = 0{,}5$ corresponds to the conditional median;
  • $\tau = 0{,}25$ corresponds to the first quartile;
  • $\tau = 0{,}75$ corresponds to the third quartile.

The pinball loss function

For a quantile of order $\tau$, the loss function is defined by:

$$
\rho_\tau(u) = u \, (\tau – \mathbf{1}_{u < 0})
$$

where:

  • $u = y_i – \hat{y}_i$ is the residual;
  • $y_i$ is the observed value;
  • $\hat{y}_i$ is the prediction;
  • $\mathbf{1}_{u < 0}$ is the indicator function.

This function can also be written in piecewise form:

$$
\rho_\tau(u) =
\begin{cases}
\tau u & \text{if } u \ge 0 \
(\tau – 1)u & \text{if } u < 0
\end{cases}
$$

The asymmetry of this loss is the central point of the model:

  • for $\tau = 0{,}5$, the loss becomes symmetric and corresponds to minimizing absolute deviation;
  • for $\tau = 0{,}75$, underestimates are penalized more than overestimates;
  • for $\tau = 0{,}25$, it is the opposite.

Problem formulation

In its linear form, quantile regression seeks a parameter vector $\beta_\tau$ such that:

$$
\hat{\beta}\tau = \arg\min\beta \sum_{i=1}^{n} \rho_\tau(y_i – x_i^\top \beta)
$$

In other words, we fit a line, or more generally a hyperplane, not to minimize mean squared error, but to target a particular level of the distribution of the target variable.

Relationship with the median and the mean

This point is fundamental:

  • Ordinary linear regression estimates the conditional mean;
  • Quantile regression at $\tau = 0{,}5$ estimates the conditional median.

Since the median is more robust to extreme values, quantile regression is often more stable when the data contains outliers or non-constant variance.


Intuition

Imagine you are trying to predict the price of an apartment based on its area.

Classical linear regression will give you the expected average price for 50 m², 80 m², or 120 m². But this information is incomplete: two apartments of the same size can have very different prices depending on the neighborhood, floor, condition of the property, or the view.

Quantile regression allows you to go further:

  • the 0.25 quantile gives an estimate of the rather affordable properties;
  • the 0.50 quantile gives the median price;
  • the 0.75 quantile gives an estimate of the more expensive properties.

You thus obtain a prediction band rather than a single point. This allows you to answer richer questions:

  • “What is the typical price?”
  • “In what range do 50% of observations fall?”
  • “What is the plausible upper bound?”

This is particularly useful when the dispersion increases with the explanatory variable. For example, larger apartments often have much more dispersed prices than studios. Quantile regression naturally captures this phenomenon of heteroscedasticity.


Python Implementation

There are two common approaches with scikit-learn:

  1. QuantileRegressor: for linear quantile regression;
  2. HistGradientBoostingRegressor(loss="quantile"): for non-linear quantile regression.

Installing dependencies

pip install scikit-learn numpy matplotlib

Example 1 — Linear quantile regression with QuantileRegressor

Generating heteroscedastic data

import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import QuantileRegressor, LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_pinball_loss, mean_absolute_error

# Reproducibility
rng = np.random.RandomState(42)

# Synthetic data
n_samples = 400
X = rng.uniform(0, 10, size=n_samples).reshape(-1, 1)

# Heteroscedastic noise: the larger X, the greater the variance
noise = rng.normal(loc=0, scale=0.8 + 0.35 * X.ravel(), size=n_samples)
y = 3 + 2.0 * X.ravel() + noise

# Train / test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

Training multiple quantiles

quantiles = [0.1, 0.25, 0.5, 0.75, 0.9]
models = {}

for q in quantiles:
    model = QuantileRegressor(
        quantile=q,
        alpha=0.0,       # no regularization initially
        solver="highs"
    )
    model.fit(X_train, y_train)
    models[q] = model

# Classical linear regression for comparison
ols = LinearRegression()
ols.fit(X_train, y_train)

Visualizing prediction bands

x_grid = np.linspace(X.min(), X.max(), 300).reshape(-1, 1)

pred_quantiles = {q: models[q].predict(x_grid) for q in quantiles}
pred_ols = ols.predict(x_grid)

plt.figure(figsize=(10, 6))
plt.scatter(X_train, y_train, alpha=0.35, s=18, label="Training data")

# Wide 10%-90% band
plt.fill_between(
    x_grid.ravel(),
    pred_quantiles[0.1],
    pred_quantiles[0.9],
    alpha=0.15,
    label="10%-90% interval"
)

# Central 25%-75% band
plt.fill_between(
    x_grid.ravel(),
    pred_quantiles[0.25],
    pred_quantiles[0.75],
    alpha=0.25,
    label="25%-75% interval"
)

# Quantile curves
for q in quantiles:
    plt.plot(x_grid, pred_quantiles[q], linestyle="--", linewidth=2, label=f"Quantile {q:.2f}")

# OLS mean
plt.plot(x_grid, pred_ols, linewidth=2.5, label="Linear regression (mean)")

plt.xlabel("X")
plt.ylabel("y")
plt.title("Linear quantile regression")
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

Evaluation on the test set

# Evaluate the median quantile
y_pred_median = models[0.5].predict(X_test)

print("MAE on quantile 0.5 (median):",
      round(mean_absolute_error(y_test, y_pred_median), 4))

# Evaluate each quantile with pinball loss
for q in [0.1, 0.5, 0.9]:
    y_pred = models[q].predict(X_test)
    loss = mean_pinball_loss(y_test, y_pred, alpha=q)
    print(f"Pinball loss for quantile {q:.1f}: {loss:.4f}")

Interpretation

In this plot:

  • the OLS line models the mean;
  • the quantile lines model different parts of the distribution;
  • the gap between the low and high quantiles increases with $X$, revealing increasing variance.

Example 2 — Non-linear quantile regression with HistGradientBoostingRegressor

When the relationship between $X$ and $y$ is non-linear, linear quantile regression becomes insufficient. A more flexible model can then be used.

import numpy as np
import matplotlib.pyplot as plt

from sklearn.ensemble import HistGradientBoostingRegressor

rng = np.random.RandomState(42)

# Non-linear data
X = np.linspace(0, 10, 500).reshape(-1, 1)
y_true = np.sin(X).ravel() * 3 + X.ravel()

# Asymmetric and variable noise
noise = rng.normal(0, 0.8 + 0.2 * X.ravel(), size=X.shape[0])
y = y_true + noise

quantiles = [0.1, 0.5, 0.9]
preds = {}

for q in quantiles:
    model = HistGradientBoostingRegressor(
        loss="quantile",
        quantile=q,
        max_iter=300,
        max_depth=4,
        learning_rate=0.05,
        random_state=42
    )
    model.fit(X, y)
    preds[q] = model.predict(X)

plt.figure(figsize=(10, 6))
plt.scatter(X, y, s=10, alpha=0.25, label="Observations")
plt.plot(X, preds[0.5], linewidth=2.5, label="Quantile 0.5")
plt.fill_between(
    X.ravel(),
    preds[0.1],
    preds[0.9],
    alpha=0.2,
    label="10%-90% interval"
)

plt.xlabel("X")
plt.ylabel("y")
plt.title("Non-linear quantile regression with gradient boosting")
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

When to choose this approach?

Quantile gradient boosting is often preferable when:

  • the relationship between variables is non-linear;
  • interactions between variables are important;
  • the goal is primarily good predictive performance rather than purely linear interpretation.

Hyperparameters

For QuantileRegressor

Hyperparameter Typical values Description
quantile between 0 and 1 Target quantile to estimate
alpha 0.0 to 1.0 L1 regularization on coefficients
fit_intercept True / False Adds an intercept
solver highs Linear optimization solver

For HistGradientBoostingRegressor

Hyperparameter Typical values Description
loss "quantile" Enables quantile regression
quantile between 0 and 1 Target quantile
max_iter 100 to 500 Number of boosting iterations
max_depth 3 to 8 Maximum tree depth
learning_rate 0.01 to 0.1 Learning rate
min_samples_leaf 10 to 50 Minimum leaf size

Practical recommendations

  • Start with quantiles 0.1 / 0.5 / 0.9 to get a simple view of the distribution.
  • Use alpha=0 initially with QuantileRegressor, then add regularization if the model becomes unstable.
  • Visually check if the quantile bands cross. This phenomenon, called quantile crossing, can occur when models are trained separately.
  • For complex data, test the boosting version before manually adding polynomial terms to a linear regression.

Advantages and limitations

Advantages

  • Robustness to extreme values: especially for the median quantile.
  • No normality assumption on residuals.
  • Captures heteroscedasticity: dispersion can vary depending on inputs.
  • Natural prediction intervals: by combining several quantiles.
  • Richer view than the mean: useful in risk, forecasting, and uncertainty.

Limitations

  • One model per quantile: multiple models need to be trained to get a complete band.
  • Higher computational cost than OLS.
  • Possible quantile crossing: predicted quantiles can become inconsistent.
  • More subtle interpretation: coefficients change depending on the quantile.
  • Extreme quantiles are more unstable: estimates near 0 or 1 require more data.

Concrete use cases

1. Sales forecasting with prediction intervals

In retail, a mean forecast is not enough. Supply chain teams need a low, central, and high scenario to size inventory. Quantile regression directly provides these bounds.

2. Finance and Value at Risk (VaR)

Value at Risk is a quantile of a loss distribution. Quantile regression allows estimating this quantile as a function of market explanatory variables, making it useful for risk management.

3. Real estate

Instead of predicting a single real estate price, one can predict a plausible range based on area, location, or condition of the property. This is often more relevant for buyers, sellers, and agencies.

4. Probabilistic weather forecasting

Weather services do not only provide an expected value, but also uncertainty. Quantile regression is a natural tool for producing probabilistic forecasts on temperature, precipitation, or wind.


Comparison with classical linear regression

Aspect Linear regression Quantile regression
Target estimated Conditional mean Conditional quantile
Cost function Squared error Pinball loss
Sensitivity to outliers High Lower, especially at $\tau=0{,}5$
Heteroscedasticity handling Limited Natural
Prediction intervals Indirect Directly estimable

Conclusion

Quantile regression is an extremely useful extension of classical regression. Where linear regression provides a mean, quantile regression provides a partial conditional distribution, much more informative in real-world contexts.

It is particularly relevant when:

  • uncertainty is important;
  • the variance of the target is not constant;
  • decisions depend on pessimistic, median, or optimistic scenarios.

In other words, whenever you want to predict a plausible range rather than a simple mean value, quantile regression becomes a first-choice tool.


See also