Polynomial Regression: Complete Guide — Principles, Examples, and Python Implementation

Introductory Summary

Polynomial regression extends linear regression by adding powers of the input variable, allowing it to capture non-linear relationships between features and the target. It is one of the first techniques to learn when your linear model fails to follow the curvature of your data.

Polynomial regression is a supervised learning algorithm that transforms a linear regression problem into a model capable of fitting complex curves, while remaining fundamentally a linear model solved by least squares. It is used whenever a scatter plot reveals a curved trend: parabola, exponential growth, asymptotic decay, etc.

Mathematical Principle

The Polynomial Model

In simple linear regression, we predict the target variable y from a single feature x according to the formula:

ŷ = w₀ + w₁ · x

Polynomial regression generalizes this formula by adding successive powers of x:

ŷ = w₀ + w₁·x + w₂·x² + w₃·x³ + … + w_d·x^d

where:

d is the degree of the polynomial, a positive integer hyperparameter chosen by the practitioner.
w₀ is the bias (intercept).
w₁, w₂, …, w_d are the coefficients learned by the model.

Feature Expansion

The central idea is as follows: we transform the input variable x into an extended feature vector:

x  ⟶  [x, x², x³, …, x^d]

Then we apply ordinary linear regression to this new dataset. In other words, polynomial regression is linear with respect to the coefficients w, even though the relationship with x is non-linear. This trick makes the problem convex and easily solvable.

In the case of multivariate polynomial regression (multiple features x₁, x₂, …), the expansion also generates interaction terms:

[x₁, x₂, x₁², x₁·x₂, x₂², x₁³, …]

The number of features then grows combinatorially with degree d and the initial number of variables p, following the binomial coefficient formula:

Number of features = C(p + d, d) = (p + d)! / (p! · d!)

For example, with 2 variables and degree 4, we go from 2 to C(6,4) = 15 features.

Cost Function: Mean Squared Error (MSE)

As in linear regression, we minimize the mean squared error:

MSE = (1/n) · Σ (y_i − ŷ_i)²

The analytical solution by ordinary least squares (OLS) is written:

w = (Xᵀ · X)⁻¹ · Xᵀ · y

where X is the design matrix containing the polynomial features of each example, and y is the target vector.

The Bias-Variance Tradeoff Related to Degree

The choice of degree d is the main challenge of polynomial regression:

Degree too low (underfitting): the model is too rigid to follow the curvature of the data. The bias is high, and the error on training and test data is significant.
Well-chosen degree: the model captures the true form of the relationship without memorizing noise. The bias-variance tradeoff is optimal.
Degree too high (overfitting): the model oscillates wildly to pass through every training point, including the noise. The variance explodes, and performance on unseen data collapses.

This bias-variance tradeoff structures all reasoning around polynomial regression and justifies the rigorous use of cross-validation to select the optimal degree.

Intuition — How to Understand It?

Imagine plotting a scatter diagram representing the relationship between the speed of a vehicle and its braking distance. The points don’t align on a straight line: they form a curve that bends upward. A linear regression line would cut alongside this curvature.

Now, add the squared power of speed as a new feature. The model can now weight x and x² separately. With an x² term, it can draw a parabola. With x³, it can add an inflection point. Each additional degree offers an additional degree of freedom to “bend” the curve and better follow the data.

Geometric Analogy

Think of degree d as the number of hinges on a flexible ruler:

d = 1: a rigid ruler — a straight line.
d = 2: a hinge in the middle — a parabola.
d = 3: two hinges — a cubic with an inflection point.
d = 10: nine hinges — a very flexible curve that can oscillate erratically.

The Overfitting Trap

With a high degree, the polynomial has so much flexibility that it ends up fitting not only the underlying signal but also the random noise. The result is a curve with extreme oscillations between training points, with no generalization ability. This is why polynomial regression is almost always accompanied by regularization (Ridge, Lasso) when the degree is significant, and cross-validation to choose the optimal degree.

Python Implementation — Complete Example

Installing Dependencies

pip install scikit-learn numpy matplotlib

Complete Code: Noisy Quadratic Data

The script below generates data following a quadratic model, applies polynomial regression with different degrees, visualizes the fitted curves, and compares MSE errors.

# ============================================================
# Polynomial Regression — Complete Example with scikit-learn
# ============================================================

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import make_pipeline

# ----------------------------------------------------------
# 1. Generating noisy synthetic quadratic data
# ----------------------------------------------------------
# We create 100 points following y = 0.5 * x² + x + 2 + Gaussian noise
np.random.seed(42)
n_samples = 100
X = np.linspace(-3, 3, n_samples).reshape(-1, 1)
y_true = 0.5 * X.ravel()**2 + X.ravel() + 2  # true quadratic relationship
noise = np.random.normal(0, 1.5, n_samples)   # Gaussian noise
y = y_true + noise

# ----------------------------------------------------------
# 2. Training polynomial models for several degrees
# ----------------------------------------------------------
degrees = [1, 2, 3, 8, 15]  # degrees to test
colors = ['gray', 'blue', 'green', 'orange', 'red']

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# --- Subplot: fitted curves ---
ax1 = axes[0]
ax1.scatter(X, y, alpha=0.4, s=20, color='black', label='Noisy data')
ax1.plot(X, y_true, linestyle='--', linewidth=2, color='purple',
         label='True function (y = 0.5x² + x + 2)')

mse_train = []

for degree, color in zip(degrees, colors):
    # Pipeline: polynomial expansion + linear regression
    model = make_pipeline(
        PolynomialFeatures(degree=degree, include_bias=False),
        LinearRegression()
    )
    model.fit(X, y)

    # Predictions on a fine grid for smooth plotting
    X_grid = np.linspace(-3.5, 3.5, 300).reshape(-1, 1)
    y_pred = model.predict(X_grid)

    # Calculate MSE on training data
    y_pred_train = model.predict(X)
    mse = mean_squared_error(y, y_pred_train)
    mse_train.append(mse)

    # Plot the fitted curve
    label = f'degree {degree} — MSE = {mse:.2f}'
    ax1.plot(X_grid, y_pred, linewidth=1.5, color=color, label=label)

ax1.set_title('Polynomial fit for different degrees')
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.legend(fontsize=8, loc='upper left')
ax1.grid(True, alpha=0.3)

# --- Subplot: MSE by degree ---
ax2 = axes[1]
ax2.bar([str(d) for d in degrees], mse_train, color=colors, edgecolor='black')
ax2.set_title('MSE Error as a function of degree')
ax2.set_xlabel('Polynomial degree')
ax2.set_ylabel("MSE (training data)")
ax2.grid(True, axis='y', alpha=0.3)

# Annotation: highlight overfitting behavior
ax2.annotate('Overfitting\nstarting at degree 8',
             xy=('8', mse_train[3]), xytext=('10', mse_train[3] + 2),
             arrowprops=dict(arrowstyle='->', color='red'),
             fontsize=9, color='red')

plt.tight_layout()
plt.savefig('regression_polynomiale.png', dpi=150)
plt.show()

# ----------------------------------------------------------
# 3. Detailed results display
# ----------------------------------------------------------
print('=' * 50)
print('Polynomial Regression — Detailed Results')
print('=' * 50)
for degree, mse in zip(degrees, mse_train):
    print(f"Degree {degree:2d}  |  Training MSE : {mse:.4f}")

Step-by-Step Explanation

Data generation: We create 100 points following a true quadratic relationship (y = 0.5x² + x + 2) with Gaussian noise of standard deviation 1.5. This simulates a realistic scenario where the underlying signal is masked by noise.
scikit-learn Pipeline: The make_pipeline tool automatically chains PolynomialFeatures and LinearRegression. For each degree, the data X is first transformed into polynomial features, then the linear regression model is fitted on these new features.
Evaluation: We calculate the MSE for each degree on the training set. Degree 2 should achieve an MSE closest to the noise variance (≈ 2.25), confirming that it identified the correct functional form.
Visualization: The left chart overlays the fitted curves. The right chart shows the evolution of MSE. We typically observe that MSE decreases up to degree 2, then continues to decrease (the model memorizes noise) while the curve becomes erratic.

Key Takeaways

Degree 2 recognizes the true quadratic form. Degree 1 is too simple (high bias). Degrees 8 and 15 overfit: their curves oscillate between points, a classic sign of overfitting in polynomial regression.

Automatic Degree Selection by Cross-Validation

In practice, we don’t know the true form of the function. Here’s how to select the optimal degree through cross-validation:

from sklearn.model_selection import cross_val_score

optimal_degree = 0
best_score = -np.inf

for d in range(1, 11):
    modele = make_pipeline(
        PolynomialFeatures(degree=d, include_bias=False),
        LinearRegression()
    )
    # Mean R² over 5-fold cross-validation
    scores = cross_val_score(modele, X, y, cv=5, scoring='r2')
    mean_score = scores.mean()
    if mean_score > best_score:
        best_score = mean_score
        optimal_degree = d

print(f'Optimal degree : {optimal_degree}  |  Mean R² : {best_score:.4f}')

This approach evaluates the model’s generalization ability for each degree and selects the one that maximizes the mean R² score across validation folds.

Hyperparameters

The table below summarizes the key hyperparameters of polynomial regression in scikit-learn:

Name	Role	Typical Values	Impact
degree	Degree of the polynomial: number of powers of x to generate	1 to 10 (rarely beyond)	Directly controls model flexibility. A low degree underfits, a high degree overfits. This is the most critical hyperparameter.
include_bias	Adds or not a column of 1s (constant term) in the feature matrix	True (default), False	If True, a column of 1s is added to features. In practice, False is preferred to let LinearRegression(fit_intercept=True) handle the bias separately, avoiding redundancy.
interaction_only	Generates only interaction terms between different variables, without pure powers	False (default), True	Useful in multivariate regression when you want cross-interactions (x₁·x₂) but not pure quadratic terms (x₁²). Drastically reduces the number of generated features.
fit_intercept (LinearRegression)	Indicates whether the model should learn a bias w₀	True (default), False	If PolynomialFeatures(include_bias=False) is used, keep fit_intercept=True. Conversely, if include_bias=True, you can set fit_intercept=False.

Advantages and Limitations

Advantages

Conceptual simplicity: natural extension of linear regression, easy to explain and implement.
No iterative algorithm: solved analytically by least squares, so convergence is guaranteed and fast.
Interpretability: the coefficients w₁, w₂, … directly indicate the contribution of each power of x to the prediction.
Native scikit-learn pipeline: PolynomialFeatures combined with any linear estimator is coded in just a few lines.
Excellent educational foundation: ideal entry point for understanding overfitting, the bias-variance tradeoff, and regularization mechanisms.
Extensible: naturally pairs with Ridge, Lasso, or ElasticNet to control overfitting when the degree is high.

Limitations

Combinatorial explosion: with p variables and degree d, the number of features follows binomial growth. Already for p = 10, d = 4, we reach 210 features.
Sensitivity to extreme values: high powers amplify extreme x values, making the model unstable in the presence of outliers.
Dangerous extrapolation: a high-degree polynomial diverges rapidly outside the training interval, producing totally inconsistent predictions.
Multicollinearity: the columns x, x², x³, … are strongly correlated with each other, degrading the numerical stability of matrix inversion.
Not automatic: degree selection and regularization require manual search or systematic cross-validation.
Unsuitable for complex structures: does not capture discontinuities, sharp thresholds, or non-polynomial interactions. For these cases, random forests or neural networks are more appropriate.

Concrete Use Cases

1. Ballistic Trajectory and Experimental Physics

In experimental physics, the position of a free-falling object follows a quadratic law of time (y = ½gt² + v₀t + y₀). Polynomial regression of degree 2 allows estimating gravity g and initial velocity v₀ from noisy measurements. This is a case where the degree is determined by physical theory, not by grid search.

2. Crop Yield Curve

In agronomy, crop yield as a function of fertilizer amount often follows a bell curve (law of diminishing returns). A polynomial of degree 2 or 3 effectively models this relationship and helps identify the optimal fertilizer application point — the one that maximizes yield without waste or pollution.

3. Industrial Sensor Calibration

Industrial sensors (temperature, pressure, humidity) often exhibit a non-linear response relative to the measured quantity. Polynomial regression is used to establish a calibration curve that transforms the raw sensor signal into an accurate physical measurement. Degrees 2 to 4 are most common in this application.

4. Energy Demand Modeling

A building’s electricity consumption as a function of outdoor temperature forms a U-shaped curve: high in winter (heating), low in spring and autumn, high in summer (air conditioning). A polynomial of degree 2 or 3 captures this U-curve and feeds load forecasting models for power grid managers.

Polynomial Regression: Principles, Examples, and Python Implementation

Polynomial Regression: Complete Guide — Principles, Examples, and Python Implementation

Introductory Summary

Mathematical Principle

The Polynomial Model

Feature Expansion

Cost Function: Mean Squared Error (MSE)

The Bias-Variance Tradeoff Related to Degree

Intuition — How to Understand It?

Geometric Analogy

The Overfitting Trap

Python Implementation — Complete Example

Installing Dependencies

Complete Code: Noisy Quadratic Data

Step-by-Step Explanation

Key Takeaways

Automatic Degree Selection by Cross-Validation

Hyperparameters

Advantages and Limitations

Advantages

Limitations

Concrete Use Cases

1. Ballistic Trajectory and Experimental Physics

2. Crop Yield Curve

3. Industrial Sensor Calibration

4. Energy Demand Modeling

See Also

Articles similaires

About Salah YAHIAOUI

Polynomial Regression: Complete Guide — Principles, Examples, and Python Implementation

Introductory Summary

Mathematical Principle

The Polynomial Model

Feature Expansion

Cost Function: Mean Squared Error (MSE)

The Bias-Variance Tradeoff Related to Degree

Intuition — How to Understand It?

Geometric Analogy

The Overfitting Trap

Python Implementation — Complete Example

Installing Dependencies

Complete Code: Noisy Quadratic Data

Step-by-Step Explanation

Key Takeaways

Automatic Degree Selection by Cross-Validation

Hyperparameters

Advantages and Limitations

Advantages

Limitations

Concrete Use Cases

1. Ballistic Trajectory and Experimental Physics

2. Crop Yield Curve

3. Industrial Sensor Calibration

4. Energy Demand Modeling

See Also

Partager :

Articles similaires

Related Posts

Linear Regression: Principles, Examples, and Python Implementation

Régression Logistique : Guide Complet — Principes, Exemples et Implémentation Python

Flow Matching: Generation by Flow Matching

About Salah YAHIAOUI