Elliptic Envelope: Principles, Examples and Python Implementation

Elliptic Envelope : Guide Complet — Principes, Exemples et Implémentation Python

Elliptic Envelope: Complete Guide — Principles, Examples and Python Implementation

Summary

The Elliptic Envelope is a parametric anomaly detection method that relies on a strong but powerful assumption: normal data follows a multivariate Gaussian distribution. Rather than isolating points one by one, this approach globally models the distribution of typical observations and identifies as anomalies points that deviate significantly from this model.

The algorithm uses a robust estimate of the mean and covariance matrix via the Minimum Covariance Determinant (MCD), making it resistant to anomalies present in the training data. Points whose Mahalanobis distance from the center of the distribution exceeds a threshold derived from the χ² distribution are classified as anomalies.

In this guide, we will explore the mathematical foundations of the Elliptic Envelope, its geometric intuition, its practical implementation in Python with scikit-learn, its key hyperparameters, as well as its advantages, limitations, and concrete use cases.


Mathematical principle

Fundamental assumption

The Elliptic Envelope assumes that “normal” (non-anomalous) data follows a multivariate Gaussian distribution in p dimensions:

x ~ N(μ, Σ)

where μ is the mean vector and Σ is the covariance matrix. Under this assumption, points that have a low probability of belonging to this distribution are considered anomalies.

Robust covariance estimation — Minimum Covariance Determinant (MCD)

If we used the standard empirical mean and covariance to estimate μ and Σ, anomalies present in the data would massively bias these estimators: an extreme point can shift the mean and inflate the covariance, making detection ineffective. This is the classic breakpoint problem in robust statistics.

The Minimum Covariance Determinant (MCD), proposed by Rousseeuw in 1985, solves this problem elegantly. The principle is as follows:

  1. We search for the subset of h observations (with h ≤ n, where n is the total number of observations) whose covariance matrix determinant is minimal.
  2. The mean and covariance calculated on this subset constitute the robust estimators μ̂ and Σ̂.

The intuition behind this algorithm is subtle: by searching for the subset whose covariance has the smallest determinant, we identify the “tightest” group of points, the most concentrated. Anomalies, being by definition dispersed and far from the core of the data, tend to be excluded from this optimal subset. The parameter h controls the proportion of data considered “normal” — typically, one chooses h ≈ n × (1 − contamination).

The exact computation of the MCD is combinatorially expensive (all possible subsets would need to be examined). In practice, iterative approximate algorithms such as FAST-MCD are used, which quickly converge to a good approximation.

Mahalanobis distance

Once the robust estimators μ̂ and Σ̂ are obtained, we compute for each observation xₖ its Mahalanobis distance from the center of the distribution:

D²(xₖ) = (xₖ − μ̂)ᵀ Σ̂⁻¹ (xₖ − μ̂)

This distance has several essential properties:

  • It takes into account the correlation between variables, unlike Euclidean distance.
  • It is invariant under linear transformation of the data.
  • It weights each direction proportionally to its variance: a deviation of one unit in a high-variance direction is less suspicious than the same deviation in a low-variance direction.

If x truly follows a multivariate Gaussian, then D²(x) follows a χ² distribution with p degrees of freedom (where p is the dimensionality of the data).

Decision threshold

The decision threshold is determined by the quantile of the χ² distribution with p degrees of freedom corresponding to the expected contamination rate:

Threshold = χ²_p, 1−α

The point xₖ is classified as an anomaly if D²(xₖ) > Threshold, and as normal data otherwise.

The parameter α corresponds to the expected rate of anomalies. For example, if 5% of anomalies are expected, we set α ≈ 0.05 and the threshold is the 95th percentile of the χ² distribution.


Geometric intuition

Let’s visualize what happens in two dimensions to properly understand the Elliptic Envelope.

Imagine a point cloud representing normal observations. Most cluster together in a compact mass at the center of the chart, with a few isolated points far from this cluster.

The Elliptic Envelope fits an ellipsoid around the core of normal data. This ellipsoid is centered on the robust mean μ̂ and its shape is dictated by the robust covariance matrix Σ̂:

  • The principal axes of the ellipsoid correspond to the eigenvectors of Σ̂.
  • The length of each axis is proportional to the square root of the corresponding eigenvalue.
  • The ellipsoid represents the level surface of the Gaussian distribution containing a certain percentage of probability mass (e.g. 95%).

Points located inside this ellipsoid are considered normal — they belong to the estimated Gaussian distribution. Points located outside are classified as anomalies — they are statistically too far from the center to be consistent with the normal distribution.

The crucial advantage of robust estimation (MCD) is that it does not let itself be fooled by anomalies when fitting the ellipsoid. If standard covariance were used, anomalous points could “stretch” the ellipsoid in their direction, paradoxically including them in the region considered normal — this is the famous masking phenomenon.

In two dimensions, the level lines of the Mahalanobis distance form concentric ellipses (hence the name “elliptic envelope”). In higher dimensions, these are nested ellipsoids.


Python implementation with scikit-learn

Basic example

Scikit-learn provides the EllipticEnvelope class in the sklearn.covariance module:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.covariance import EllipticEnvelope
from sklearn.datasets import make_blobs

# Generate data: normal cluster + anomalies
X_normal, _ = make_blobs(n_samples=500, centers=1,
                         cluster_std=1.0, random_state=42)
X_anomalies = np.random.uniform(-8, 8, size=(30, 2))
X = np.vstack([X_normal, X_anomalies])

# Create and train the Elliptic Envelope
ee = EllipticEnvelope(contamination=0.057, random_state=42)
ee.fit(X)

# Predict labels: 1 = normal, -1 = anomaly
labels = ee.predict(X)

# Count anomalies
n_anomalies = np.sum(labels == -1)
print(f"Anomalies detected: {n_anomalies} / {len(X)}")

2D envelope visualization

The true power of the Elliptic Envelope is revealed through visualization:

# Grid for the decision boundary
xx, yy = np.meshgrid(
    np.linspace(X[:, 0].min() - 1, X[:, 0].max() + 1, 300),
    np.linspace(X[:, 1].min() - 1, X[:, 1].max() + 1, 300)
)
Z = ee.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

fig, ax = plt.subplots(figsize=(10, 7))

# Decision boundary
ax.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 10),
            cmap=plt.cm.Blues, alpha=0.4)
ax.contour(xx, yy, Z, levels=[0], linewidths=2, colors='red',
           linestyles='--', label='Decision boundary')

# Normal points
ax.scatter(X[labels == 1, 0], X[labels == 1, 1],
           c='steelblue', s=20, label='Normal')
# Anomalies
ax.scatter(X[labels == -1, 0], X[labels == -1, 1],
           c='tomato', s=40, marker='x', label='Anomaly')

ax.set_title("Elliptic Envelope — Anomaly detection")
ax.set_xlabel('Feature 1')
ax.set_ylabel('Feature 2')
ax.legend()
plt.tight_layout()
plt.savefig('elliptic_envelope_2d.png', dpi=150)
plt.show()

The red dashed line traces exactly the decision ellipsoid — the boundary between the normal region and the anomalous region.

Comparison with non-parametric methods

Unlike Isolation Forest (article 055), which isolates points through random splits without distribution assumptions, or the One-Class SVM (article 056), which learns a complex boundary in a feature space, the Elliptic Envelope imposes a specific geometric structure (ellipsoid) on normal data.

This constraint is both its strength and its weakness:

  • If the normal data is approximately Gaussian, the Elliptic Envelope is extremely efficient, fast, and interpretable.
  • If the normal data has a complex shape (banana-shaped, ring-shaped, multi-modal), the Gaussian assumption is violated and performance drops. In this case, non-parametric methods like Isolation Forest or LOF (article 057) are preferable.

Key hyperparameters

contamination

The contamination parameter controls the expected proportion of anomalies in the data. It directly determines the χ² threshold used for classification.

ee = EllipticEnvelope(contamination=0.05, random_state=42)
  • 5% is a reasonable default value for many problems.
  • If you approximately know the anomaly rate, use this knowledge.
  • EllipticEnvelope expects a float value for contamination (strictly between 0 and 0.5]; there is no 'auto' option for this model.

support_fraction

This parameter controls the fraction of points included in the MCD subset. It corresponds to the ratio h/n in the MCD estimator.

ee = EllipticEnvelope(support_fraction=0.95, random_state=42)
  • 0.95 means that 95% of points are considered potentially normal.
  • By default (None), scikit-learn automatically chooses a support fraction compatible with the MCD estimator; this is not simply 1 - contamination.
  • A lower value makes the estimator more robust but reduces statistical efficiency (fewer points to estimate covariance).
  • A higher value improves estimation accuracy but risks including anomalies in the MCD subset.

random_state

Since the FAST-MCD algorithm uses random initialization, random_state ensures reproducibility of results:

ee = EllipticEnvelope(contamination=0.05, random_state=42)

Setting random_state is essential for production environments where result consistency is critical.

assume_centered

If your data is already centered (zero mean by construction), this parameter saves computation time:

ee = EllipticEnvelope(contamination=0.05, assume_centered=True)

In this case, the algorithm does not re-estimate the mean and focuses only on the covariance.


Advantages of the Elliptic Envelope

  1. Execution speed — The algorithm is significantly faster than the One-Class SVM, with complexity that remains reasonable even for thousands of observations.
  2. Interpretability — The decision boundary is an ellipsoid defined by clear statistical parameters (mean, covariance). The principal axes can be analyzed to understand which directions are most discriminative.
  3. No need for exclusively normal data — Unlike the One-Class SVM or the Autoencoder, the Elliptic Envelope can be trained on mixed data (normal + anomalies) thanks to MCD robust estimation.
  4. Continuous decision score — The Mahalanobis distance provides a continuous score for ranking anomalies by severity, not just detecting them binarily.
  5. Well suited to moderate dimensionality — Works well up to a few dozen dimensions, as long as the Gaussian assumption is respected.

Limitations of the Elliptic Envelope

  1. Strong Gaussian assumption — This is the main limitation. If the normal data does not follow a multivariate Gaussian distribution, results will be poor. Always verify this assumption (multivariate normality tests, Q-Q plots) before using the method.
  2. Curse of dimensionality — Beyond 30-50 dimensions, covariance matrix estimation becomes unstable and computing the inverse is numerically difficult. Dimensionality reduction (PCA) is often necessary.
  3. Sensitivity to multicollinear data — If features are strongly correlated, the covariance matrix can be singular (non-invertible), causing numerical errors.
  4. Rigid decision shape — The boundary is always an ellipsoid. It cannot capture disconnected clusters or complex shapes.
  5. Performance on multimodal data — If the normal data forms several separate clusters, the single ellipsoid will necessarily encompass the empty regions between clusters, creating false positives.

4 concrete use cases

1. Fraud detection in financial transactions

Legitimate transactions tend to follow Gaussian distributions in feature space (amount, time, location, frequency). A fraudulent transaction will typically have an atypical profile with a large Mahalanobis distance. The Elliptic Envelope is particularly well suited because transaction data naturally contains anomalies (actual fraud) in the training data, and the MCD is resistant to them.

2. Industrial quality control in manufacturing

In a manufacturing process, quality measurements typically follow a Gaussian distribution centered around target specifications. A defective product manifests as a simultaneous deviation across multiple characteristics. The Elliptic Envelope detects these multivariate deviations that individual univariate tests would miss — a product may be within tolerances on each characteristic individually but abnormal in their combination.

3. Equipment health monitoring (predictive monitoring)

Machine sensors (temperature, vibration, pressure, current) produce correlated signals in normal operation following a Gaussian distribution. The appearance of a mechanical defect simultaneously modifies several sensors. The Elliptic Envelope monitors the multivariate Mahalanobis distance and alerts as soon as the operating point exits the normal elliptical envelope, enabling predictive maintenance.

4. Data cleaning before model training

Before training a machine learning model, it is crucial to identify and remove outliers that could bias learning. The Elliptic Envelope effectively filters aberrant observations in datasets of moderate size. The Mahalanobis distance score also makes it possible to decide which observations are suspicious enough to be excluded.


Best practices

  1. Verify the multivariate normality assumption before applying the Elliptic Envelope. Use Mardia’s test or multivariate Q-Q plots.
  2. Standardize your data — although the Elliptic Envelope is invariant under linear transformation, standardization facilitates interpretation and comparison of results.
  3. Reduce dimensionality if you have more than 30 variables. A principal component analysis (PCA) retaining 80-95% of the variance is generally sufficient.
  4. Compare with non-parametric methods on a subset of your data to validate that the Gaussian assumption is appropriate.
  5. Carefully adjust contamination — a poorly estimated rate can lead to either too many false positives or too many false negatives.

See also