Quantile Regression: Complete Guide — Principles, Examples, and Python Implementation
Summary — Quantile regression is a supervised learning technique that estimates the conditional quantiles of a target variable, going beyond the simple mean. Unlike ordinary linear regression, it provides a more complete view of the conditional distribution of the target. It is particularly useful for building prediction intervals, analyzing heteroscedasticity, and modeling risk.
Mathematical principle
Quantile regression was introduced by Roger Koenker and Gilbert Bassett in 1978. Its fundamental idea is to replace the minimization of mean squared error with the minimization of an asymmetric loss function, called pinball loss or check loss.
The conditional quantile
In classical linear regression, we model:
$$
\mathbb{E}[Y \mid X=x]
$$
that is, the conditional mean of $Y$ given $X=x$.
In quantile regression, we instead seek to model:
$$
Q_\tau(Y \mid X=x)
$$
where $Q_\tau$ denotes the quantile of order $\tau$, with $\tau \in (0,1)$.
For example:
- $\tau = 0{,}5$ corresponds to the conditional median;
- $\tau = 0{,}25$ corresponds to the first quartile;
- $\tau = 0{,}75$ corresponds to the third quartile.
The pinball loss function
For a quantile of order $\tau$, the loss function is defined by:
$$
\rho_\tau(u) = u \, (\tau – \mathbf{1}_{u < 0})
$$
where:
- $u = y_i – \hat{y}_i$ is the residual;
- $y_i$ is the observed value;
- $\hat{y}_i$ is the prediction;
- $\mathbf{1}_{u < 0}$ is the indicator function.
This function can also be written in piecewise form:
$$
\rho_\tau(u) =
\begin{cases}
\tau u & \text{if } u \ge 0 \
(\tau – 1)u & \text{if } u < 0
\end{cases}
$$
The asymmetry of this loss is the central point of the model:
- for $\tau = 0{,}5$, the loss becomes symmetric and corresponds to minimizing absolute deviation;
- for $\tau = 0{,}75$, underestimates are penalized more than overestimates;
- for $\tau = 0{,}25$, it is the opposite.
Problem formulation
In its linear form, quantile regression seeks a parameter vector $\beta_\tau$ such that:
$$
\hat{\beta}\tau = \arg\min\beta \sum_{i=1}^{n} \rho_\tau(y_i – x_i^\top \beta)
$$
In other words, we fit a line, or more generally a hyperplane, not to minimize mean squared error, but to target a particular level of the distribution of the target variable.
Relationship with the median and the mean
This point is fundamental:
- Ordinary linear regression estimates the conditional mean;
- Quantile regression at $\tau = 0{,}5$ estimates the conditional median.
Since the median is more robust to extreme values, quantile regression is often more stable when the data contains outliers or non-constant variance.
Intuition
Imagine you are trying to predict the price of an apartment based on its area.
Classical linear regression will give you the expected average price for 50 m², 80 m², or 120 m². But this information is incomplete: two apartments of the same size can have very different prices depending on the neighborhood, floor, condition of the property, or the view.
Quantile regression allows you to go further:
- the 0.25 quantile gives an estimate of the rather affordable properties;
- the 0.50 quantile gives the median price;
- the 0.75 quantile gives an estimate of the more expensive properties.
You thus obtain a prediction band rather than a single point. This allows you to answer richer questions:
- “What is the typical price?”
- “In what range do 50% of observations fall?”
- “What is the plausible upper bound?”
This is particularly useful when the dispersion increases with the explanatory variable. For example, larger apartments often have much more dispersed prices than studios. Quantile regression naturally captures this phenomenon of heteroscedasticity.
Python Implementation
There are two common approaches with scikit-learn:
QuantileRegressor: for linear quantile regression;HistGradientBoostingRegressor(loss="quantile"): for non-linear quantile regression.
Installing dependencies
pip install scikit-learn numpy matplotlib
Example 1 — Linear quantile regression with QuantileRegressor
Generating heteroscedastic data
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import QuantileRegressor, LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_pinball_loss, mean_absolute_error
# Reproducibility
rng = np.random.RandomState(42)
# Synthetic data
n_samples = 400
X = rng.uniform(0, 10, size=n_samples).reshape(-1, 1)
# Heteroscedastic noise: the larger X, the greater the variance
noise = rng.normal(loc=0, scale=0.8 + 0.35 * X.ravel(), size=n_samples)
y = 3 + 2.0 * X.ravel() + noise
# Train / test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=42
)
Training multiple quantiles
quantiles = [0.1, 0.25, 0.5, 0.75, 0.9]
models = {}
for q in quantiles:
model = QuantileRegressor(
quantile=q,
alpha=0.0, # no regularization initially
solver="highs"
)
model.fit(X_train, y_train)
models[q] = model
# Classical linear regression for comparison
ols = LinearRegression()
ols.fit(X_train, y_train)
Visualizing prediction bands
x_grid = np.linspace(X.min(), X.max(), 300).reshape(-1, 1)
pred_quantiles = {q: models[q].predict(x_grid) for q in quantiles}
pred_ols = ols.predict(x_grid)
plt.figure(figsize=(10, 6))
plt.scatter(X_train, y_train, alpha=0.35, s=18, label="Training data")
# Wide 10%-90% band
plt.fill_between(
x_grid.ravel(),
pred_quantiles[0.1],
pred_quantiles[0.9],
alpha=0.15,
label="10%-90% interval"
)
# Central 25%-75% band
plt.fill_between(
x_grid.ravel(),
pred_quantiles[0.25],
pred_quantiles[0.75],
alpha=0.25,
label="25%-75% interval"
)
# Quantile curves
for q in quantiles:
plt.plot(x_grid, pred_quantiles[q], linestyle="--", linewidth=2, label=f"Quantile {q:.2f}")
# OLS mean
plt.plot(x_grid, pred_ols, linewidth=2.5, label="Linear regression (mean)")
plt.xlabel("X")
plt.ylabel("y")
plt.title("Linear quantile regression")
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()
Evaluation on the test set
# Evaluate the median quantile
y_pred_median = models[0.5].predict(X_test)
print("MAE on quantile 0.5 (median):",
round(mean_absolute_error(y_test, y_pred_median), 4))
# Evaluate each quantile with pinball loss
for q in [0.1, 0.5, 0.9]:
y_pred = models[q].predict(X_test)
loss = mean_pinball_loss(y_test, y_pred, alpha=q)
print(f"Pinball loss for quantile {q:.1f}: {loss:.4f}")
Interpretation
In this plot:
- the OLS line models the mean;
- the quantile lines model different parts of the distribution;
- the gap between the low and high quantiles increases with $X$, revealing increasing variance.
Example 2 — Non-linear quantile regression with HistGradientBoostingRegressor
When the relationship between $X$ and $y$ is non-linear, linear quantile regression becomes insufficient. A more flexible model can then be used.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import HistGradientBoostingRegressor
rng = np.random.RandomState(42)
# Non-linear data
X = np.linspace(0, 10, 500).reshape(-1, 1)
y_true = np.sin(X).ravel() * 3 + X.ravel()
# Asymmetric and variable noise
noise = rng.normal(0, 0.8 + 0.2 * X.ravel(), size=X.shape[0])
y = y_true + noise
quantiles = [0.1, 0.5, 0.9]
preds = {}
for q in quantiles:
model = HistGradientBoostingRegressor(
loss="quantile",
quantile=q,
max_iter=300,
max_depth=4,
learning_rate=0.05,
random_state=42
)
model.fit(X, y)
preds[q] = model.predict(X)
plt.figure(figsize=(10, 6))
plt.scatter(X, y, s=10, alpha=0.25, label="Observations")
plt.plot(X, preds[0.5], linewidth=2.5, label="Quantile 0.5")
plt.fill_between(
X.ravel(),
preds[0.1],
preds[0.9],
alpha=0.2,
label="10%-90% interval"
)
plt.xlabel("X")
plt.ylabel("y")
plt.title("Non-linear quantile regression with gradient boosting")
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()
When to choose this approach?
Quantile gradient boosting is often preferable when:
- the relationship between variables is non-linear;
- interactions between variables are important;
- the goal is primarily good predictive performance rather than purely linear interpretation.
Hyperparameters
For QuantileRegressor
| Hyperparameter | Typical values | Description |
|---|---|---|
quantile |
between 0 and 1 | Target quantile to estimate |
alpha |
0.0 to 1.0 | L1 regularization on coefficients |
fit_intercept |
True / False |
Adds an intercept |
solver |
highs |
Linear optimization solver |
For HistGradientBoostingRegressor
| Hyperparameter | Typical values | Description |
|---|---|---|
loss |
"quantile" |
Enables quantile regression |
quantile |
between 0 and 1 | Target quantile |
max_iter |
100 to 500 | Number of boosting iterations |
max_depth |
3 to 8 | Maximum tree depth |
learning_rate |
0.01 to 0.1 | Learning rate |
min_samples_leaf |
10 to 50 | Minimum leaf size |
Practical recommendations
- Start with quantiles 0.1 / 0.5 / 0.9 to get a simple view of the distribution.
- Use
alpha=0initially withQuantileRegressor, then add regularization if the model becomes unstable. - Visually check if the quantile bands cross. This phenomenon, called quantile crossing, can occur when models are trained separately.
- For complex data, test the boosting version before manually adding polynomial terms to a linear regression.
Advantages and limitations
Advantages
- Robustness to extreme values: especially for the median quantile.
- No normality assumption on residuals.
- Captures heteroscedasticity: dispersion can vary depending on inputs.
- Natural prediction intervals: by combining several quantiles.
- Richer view than the mean: useful in risk, forecasting, and uncertainty.
Limitations
- One model per quantile: multiple models need to be trained to get a complete band.
- Higher computational cost than OLS.
- Possible quantile crossing: predicted quantiles can become inconsistent.
- More subtle interpretation: coefficients change depending on the quantile.
- Extreme quantiles are more unstable: estimates near 0 or 1 require more data.
Concrete use cases
1. Sales forecasting with prediction intervals
In retail, a mean forecast is not enough. Supply chain teams need a low, central, and high scenario to size inventory. Quantile regression directly provides these bounds.
2. Finance and Value at Risk (VaR)
Value at Risk is a quantile of a loss distribution. Quantile regression allows estimating this quantile as a function of market explanatory variables, making it useful for risk management.
3. Real estate
Instead of predicting a single real estate price, one can predict a plausible range based on area, location, or condition of the property. This is often more relevant for buyers, sellers, and agencies.
4. Probabilistic weather forecasting
Weather services do not only provide an expected value, but also uncertainty. Quantile regression is a natural tool for producing probabilistic forecasts on temperature, precipitation, or wind.
Comparison with classical linear regression
| Aspect | Linear regression | Quantile regression |
|---|---|---|
| Target estimated | Conditional mean | Conditional quantile |
| Cost function | Squared error | Pinball loss |
| Sensitivity to outliers | High | Lower, especially at $\tau=0{,}5$ |
| Heteroscedasticity handling | Limited | Natural |
| Prediction intervals | Indirect | Directly estimable |
Conclusion
Quantile regression is an extremely useful extension of classical regression. Where linear regression provides a mean, quantile regression provides a partial conditional distribution, much more informative in real-world contexts.
It is particularly relevant when:
- uncertainty is important;
- the variance of the target is not constant;
- decisions depend on pessimistic, median, or optimistic scenarios.
In other words, whenever you want to predict a plausible range rather than a simple mean value, quantile regression becomes a first-choice tool.
See also
- Calculate the Length of the Union of Segments in Python: Complete Guide and Optimized Code
- Discover the Secrets of the Pandigital Prime with Python: Complete Guide and Programming Tips

