Linear Discriminant Analysis
Summary — Linear Discriminant Analysis (LDA) is a fundamental statistical method that combines supervised classification and dimensionality reduction. This guide explores in depth the mathematical principles, geometric intuition, and practical implementation of linear discriminant analysis with Python and scikit-learn. You will learn to maximize separation between classes while minimizing within-class variation, an elegant and effective approach for many classification problems.
Mathematical Principle
Linear Discriminant Analysis is based on a simple yet elegant idea formulated by Ronald Fisher in 1936: find the linear projection that best separates the classes. This optimal separation is expressed through the Fisher criterion, a ratio of between-class variance to within-class variance.
The Fisher Criterion
Given a projection space defined by a vector w, the Fisher criterion is written as:
J(w) = wᵀ · S_B · w / (wᵀ · S_W · w)
where:
- S_B is the between-class scatter matrix, which measures the dispersion of class centers around the global center. It captures how far the classes are from each other.
- S_W is the within-class scatter matrix, which measures the dispersion of points around their respective class center. It reflects the compactness of each class.
The goal is to find the vector w that maximizes J(w). Intuitively, we are looking for a projection direction where the classes are well separated (large S_B) and where the data within each class are tightly grouped (small S_W).
Analytical Solution
Maximizing the Fisher criterion admits a closed-form analytical solution. For a two-class problem, the optimal vector is directly expressed as:
w = S_W⁻¹ · (μ₁ - μ₂)
where μ₁ and μ₂ are the mean vectors of the two classes. For K > 2 classes, this generalizes to projection onto a subspace of dimension at most K – 1, obtained by generalized eigenvalue decomposition.
Fundamental Assumptions
Linear Discriminant Analysis relies on precise probabilistic assumptions:
- Gaussian distribution: each class follows a multivariate normal distribution N(μ_k, Σ).
- Common covariance: all classes share the same covariance matrix Σ. This assumption is what makes the decision boundaries linear (hence the name of the method).
- Conditional independence: observations are independent given their class.
If the common covariance assumption is violated, the boundaries become quadratic and one must then turn to Quadratic Discriminant Analysis (QDA).
Geometric Intuition
Imagine a two-dimensional point cloud belonging to two distinct classes, for example flowers of two different species. Each class forms a cluster of points.
The core idea of Linear Discriminant Analysis is as follows: we project all the data onto a single axis (a line) chosen optimally. This axis must satisfy two simultaneous criteria:
- Maximize the distance between the projected centers of the classes (between-class criterion). The farther apart the centers are on this axis, the easier it will be to distinguish the classes.
- Minimize the dispersion of points around each projected center (within-class criterion). The more tightly grouped the points of the same class, the less confusion with the other class.
Let’s visualize this in 2D: if you draw two slightly overlapping point clouds, LDA automatically finds the projection direction that minimizes the overlap. The decision threshold is then naturally placed at the midpoint of the two projected centers (taking into account the priors).
This linear projection makes LDA particularly useful not only for classification, but also for dimensionality reduction: one moves from a high-dimensional space to a space of dimension K – 1 while preserving as much discriminative information as possible.
Python Implementation
Let’s see how to implement Linear Discriminant Analysis with scikit-learn, first for classification and then for dimensionality reduction.
Classification with LinearDiscriminantAnalysis
Let’s start with a complete example using the Iris dataset:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import seaborn as sns
# Load data
iris = load_iris()
X, y = iris.data, iris.target
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
# Create and train the LDA model
lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train)
# Predictions
y_pred = lda.predict(X_test)
# Evaluation
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
print(f"\nConfusion matrix:\n{confusion_matrix(y_test, y_pred)}")
print(f"\nClassification report:\n{classification_report(y_test, y_pred)}")
With the Iris dataset, Linear Discriminant Analysis typically achieves an accuracy greater than 97%, thanks to the good separability of species and the satisfaction of Gaussian assumptions.
Decision and Boundaries
The model exposes several useful attributes for understanding the decisions made:
# Discriminant function coefficients
print("Coefficients (scalings):\n", lda.scalings_)
# Class means in the projected space
print("Projected class means:\n", lda.means_)
# Posterior probabilities for the first sample
probabilities = lda.predict_proba(X_test[:5])
print(f"\nProbabilities for the first 5 predictions:\n{probabilities}")
Dimensionality Reduction with LDA
Linear Discriminant Analysis also shines as a dimensionality reduction technique. For the Iris dataset (4 dimensions, 3 classes), the LDA projection reduces to at most 2 dimensions:
# LDA projection in 2D
lda_proj = LinearDiscriminantAnalysis(n_components=2)
X_lda = lda_proj.fit_transform(X, y)
# Projection visualization
plt.figure(figsize=(10, 7))
colors = ["#FF6B6B", "#4ECDC4", "#45B7D1"]
labels = iris.target_names
for class_idx, color, label in zip(range(3), colors, labels):
mask = y == class_idx
plt.scatter(
X_lda[mask, 0], X_lda[mask, 1],
c=color, label=label, alpha=0.7, edgecolors="black", s=80
)
# Add projected centroids
centroids = lda_proj.transform(lda_proj.means_)
plt.scatter(
centroids[:, 0], centroids[:, 1],
c="black", marker="X", s=200, label="Centroids"
)
plt.title("LDA Projection: Dimensionality Reduction (Iris)", fontsize=14)
plt.xlabel("Discriminant Component 1")
plt.ylabel("Discriminant Component 2")
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
This visualization clearly shows how Linear Discriminant Analysis separates the three iris species into an optimal two-dimensional space. The centroids of each class appear well distinct, reflecting the quality of the discriminant projection.
Comparison with Synthetic Data
To illustrate the behavior of LDA in a well-separated two-class case:
from sklearn.datasets import make_classification
# Generate synthetic two-class data
X_syn, y_syn = make_classification(
n_samples=500, n_features=10, n_informative=5,
n_redundant=2, n_classes=2, random_state=42
)
X_train_s, X_test_s, y_train_s, y_test_s = train_test_split(
X_syn, y_syn, test_size=0.25, random_state=42
)
lda_syn = LinearDiscriminantAnalysis()
lda_syn.fit(X_train_s, y_train_s)
print(f"Synthetic accuracy: {lda_syn.score(X_test_s, y_test_s):.4f}")
Hyperparameters
The LinearDiscriminantAnalysis constructor offers several essential hyperparameters to master:
| Hyperparameter | Possible Values | Description |
|---|---|---|
| solver | "svd", "lsqr", "eigen" |
Resolution algorithm. "svd" (default) does not compute the covariance explicitly and supports dimensionality reduction. "lsqr" and "eigen" allow shrinkage. |
| shrinkage | None, "auto", float ∈ [0, 1] |
Covariance matrix regularization. "auto" uses the Ledoit-Wolf lemma to automatically estimate the optimal rate. A float sets the rate manually (e.g., 0.5). Required for solver="lsqr" or "eigen". |
| n_components | int (1 ≤ n ≤ K-1) | Number of components for dimensionality reduction. Default is K – 1, where K is the number of classes. Ignored when used only for classification with solver="svd". |
| priors | array-like or None |
Prior probabilities of classes. If None, estimated from training frequencies. Useful in case of known class imbalance. |
| store_covariance | True or False |
Stores the computed covariance matrix (covariance_). Automatically enabled with solver="svd", useful for interpretation. |
| tol | float (default 1e-4) | Tolerance used for thresholding in solver="svd". Controls the precision of eigenvalue computation. |
When to Use Shrinkage?
Shrinkage is particularly useful in the following situations:
- p >> n: the number of variables greatly exceeds the number of observations, making covariance estimation unstable.
- Strong collinearity: variables highly correlated with each other.
- Noisy data: presence of significant noise in measurements.
Shrinkage combines the empirical covariance matrix with a target matrix (usually diagonal), weighted by the shrinkage coefficient. This stabilizes the inversion of S_W and improves generalization.
# Example with automatic shrinkage
lda_regularized = LinearDiscriminantAnalysis(
solver="lsqr",
shrinkage="auto"
)
lda_regularized.fit(X_train, y_train)
print(f"Estimated shrinkage: {lda_regularized.shrinkage_:.4f}")
print(f"Accuracy with shrinkage: {lda_regularized.score(X_test, y_test):.4f}")
Advantages and Limitations
Like any method, Linear Discriminant Analysis has strengths and constraints that are important to understand.
Advantages
- Computational efficiency: the closed-form analytical solution makes training extremely fast, even on large datasets. No iterative optimization required.
- Low risk of overfitting: with few parameters to estimate (class means + common covariance), LDA generalizes well even with a modest training sample.
- Integrated dimensionality reduction: LDA naturally projects data into a K – 1 dimensional space, providing immediate visualization of separability.
- Calibrated probabilities: the posterior probabilities estimated by LDA are well-calibrated (unlike some methods such as SVM), which facilitates decision threshold analysis.
- Interpretability: the projection coefficients (
scalings_) directly indicate the weight of each variable in the discrimination, making the model easy to explain. - Robustness with shrinkage: integrated regularization allows handling cases where p > n, a common situation in genomics and signal processing.
Limitations
- Gaussian assumption: if the real distributions strongly deviate from normality, the performance of Linear Discriminant Analysis may degrade significantly.
- Common covariance: the assumption of the same covariance for all classes is restrictive. When classes have very different shapes (ellipsoids of different sizes or orientations), QDA or other methods are preferable.
- Linear boundaries only: LDA cannot learn complex nonlinear decision boundaries. For nonlinearly separated problems, kernel methods or neural networks must be used.
- Sensitivity to outliers: since mean and covariance estimators are sensitive to outliers, rigorous data preprocessing is essential.
- Limited maximum projection dimension: the LDA projection can only produce K – 1 components. For a binary problem, one obtains a single dimension, which can be restrictive for visualization.
Use Cases
Linear Discriminant Analysis finds remarkable applications in many scientific and industrial fields.
1. Facial Recognition and Biometrics
In facial recognition systems, Linear Discriminant Analysis is used via the Fisherfaces algorithm, a direct variant of LDA applied to face images. Each pixel (or extracted feature) constitutes a dimension, and LDA finds the directions that maximize separation between individuals while minimizing intra-person variation (expressions, lighting). This approach is historically competitive with Eigenface-based methods (based on PCA), because it explicitly targets discrimination between identities rather than reconstruction.
2. Assisted Biomedical Diagnosis
LDA is widely used in medical diagnostic assistance. For example, one can train an LDA model to distinguish benign tumors from malignant ones based on clinical measurements (size, texture, regularity of contours). The posterior probabilities provide a confidence score that the doctor can use to prioritize ambiguous cases. The transparency of the LDA coefficients also helps identify the most discriminating biomedical variables, adding explanatory value to the diagnosis.
3. Document Classification and Text Filtering
In natural language processing, Linear Discriminant Analysis can classify text documents after TF-IDF vectorization or word embeddings. Although modern neural methods dominate today, LDA remains a relevant benchmark for small-volume tasks: spam detection, thematic categorization, content filtering. Its speed and robustness make it an excellent baseline to beat.
4. Genetic Microarray Analysis and Genomics
Modern genomics produces very high-dimensional data: thousands of genes measured on a few dozen samples (p >> n). Linear Discriminant Analysis with shrinkage is particularly suited to this context, because covariance matrix regularization stabilizes estimation despite the small number of observations. Researchers use it to classify cancer types from gene expression profiles, identify discriminating biomarkers, and visualize the structure of molecular subtypes in the reduced K – 1 dimensional space.
See Also
- Create a Circle Necklace with Python: Complete Guide to Mastering Loops and Lists
- Python Interview: Solving the Trapped Water Problem – Tips and Solutions
- Become a ‘Trillionaire’ by Coding: How Python Can Revolutionize Your Financial Projects
- Master Colorful Charts in Python: Complete Guide for Beginners and Experts

