MDS (Multi-Dimensional Scaling): Complete Guide — Principles, Examples and Python Implementation
Summary
MDS (Multi-Dimensional Scaling) is a family of unsupervised dimensionality reduction techniques whose central objective is to project data into a low-dimensional space (usually 2D or 3D) while best preserving the pairwise distances between observations. Unlike PCA, which maximizes projected variance, MDS starts from a dissimilarity matrix and attempts to reconstruct a spatial configuration consistent with those distances. This guide covers the mathematical principle in detail, the geometric intuition, Python implementation with scikit-learn, key hyperparameters, as well as four concrete use cases.
Mathematical principle
The dissimilarity matrix
The starting point of MDS is a dissimilarity matrix D of size n × n, where each element d_ij represents the distance (or dissimilarity) between observations i and j. This matrix is symmetric (d_ij = d_ji), with zeros on the diagonal (d_ii = 0), and all elements are non-negative.
These distances can be computed in many ways: Euclidean distance, Manhattan distance, cosine dissimilarity, geodesic distances (as in Isomap), or any dissimilarity measure suited to the application domain.
The fundamental objective
The goal of MDS is to find a configuration Z in a low-dimensional space p (typically p = 2 or p = 3), where each observation i is represented by a vector z_i ∈ ℝ^p, such that the Euclidean distances in that projection space best approximate the original distances:
||z_i − z_j|| ≈ d_ij for all pairs (i, j)
In other words, we seek the coordinates z_1, z_2, …, z_n that faithfully reproduce the known distance structure.
The stress function
To measure the quality of a configuration Z, a stress function (also called criterion function or loss function) is used. The most common form, proposed by Kruskal, is stress-1:
Stress = √[ Σ_{i<j} (d_ij − ||z_i − z_j||)² / Σ_{i<j} d_ij² ]
A simplified variant, the raw stress, is written:
Raw stress = Σ_{i<j} (d_ij − ||z_i − z_j||)²
Stress quantifies the gap between the original distances and the distances in the projection space. A stress close to 0 indicates excellent distance preservation, while a high stress signals significant information loss.
Minimization by gradient descent
Minimizing the stress function is a nonlinear optimization problem. The classic algorithm used is SMACOF (Scaling by MAjorizing a COmplicated Function), which relies on an iterative majorization strategy:
- Initialization: we start with a random configuration Z^(0) (or a PCA-based initialization).
- Majorization: at each iteration t, we construct a function G_t(Z) that upper-bounds the stress and coincides with it at Z^(t).
- Minimization: we minimize G_t(Z), which gives a new configuration Z^(t+1).
- Convergence: we repeat until the decrease in stress is below a threshold ε or the maximum number of iterations is reached.
This approach guarantees a monotonic decrease in stress at each iteration, leading to a local minimum (and often the global minimum thanks to multiple initializations via n_init).
Metric MDS vs non-metric MDS
There are two fundamental variants of MDS:
Metric MDS: we assume the distances d_ij are measured on an interval or ratio scale. The objective is to directly minimize the stress between the original distances and the projected distances, possibly with a linear transformation of the original distances: β · d_ij ≈ ||z_i − z_j||.
Non-metric MDS: we assume the distances d_ij are only known on an ordinal scale. Only the ranks of distances matter: if d_ij < d_kl, then we expect ||z_i − z_j|| < ||z_k − z_l||. To enforce this constraint, a monotonic regression (monotonic regression or isotonic regression) is used to find the best monotonic transformation f such that f(d_ij) ≈ ||z_i − z_j||. The stress then becomes:
Non-metric stress = √[ Σ_{i<j} (f(d_ij) − ||z_i − z_j||)² / Σ_{i<j} d_ij² ]
where f is a monotonically increasing function optimized by isotonic regression (PAVA algorithm — Pool Adjacent Violators Algorithm).
Geometric intuition
Imagine you want to draw a map of France, but you have no GPS coordinates for the cities. However, you have a complete table of road distances between every known pair of cities.
MDS is exactly that: from the distance table alone, the algorithm will “reconstruct” the map. It places each city on a plane so that the distances between the points on paper correspond as closely as possible to the actual road distances.
If the Paris–Lyon distance is 465 km and the Paris–Marseille distance is 775 km, MDS will position Lyon closer to Paris than Marseille on the reconstructed map. And if the Lyon–Marseille distance is 315 km, MDS will naturally form a consistent triangle.
What is remarkable is that MDS never needs the original coordinates. It only works with distances. If the distances come from a 100-dimensional space, MDS projects them into 2D while preserving the relative structure of proximities. This property is what makes MDS so powerful for visualization and exploration of complex data.
Python implementation with scikit-learn
Basic example with sklearn MDS
The simplest implementation uses the MDS class from scikit-learn:
from sklearn.manifold import MDS
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
import numpy as np
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Metric MDS in 2D
mds = MDS(n_components=2, metric=True, random_state=42,
n_init=4, max_iter=300)
Z = mds.fit_transform(X)
# Visualization
plt.figure(figsize=(10, 6))
for label, marker, color in zip(range(3), ['o', 's', 'D'],
['blue', 'green', 'red']):
plt.scatter(Z[y == label, 0], Z[y == label, 1],
marker=marker, c=color,
label=iris.target_names[label],
alpha=0.7, edgecolors='k', s=80)
plt.title('Metric MDS on Iris', fontsize=14)
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')
plt.legend()
plt.tight_layout()
plt.show()
# Display the final stress
print(f"Final stress: {mds.stress_:.4f}")
MDS from a custom dissimilarity matrix
MDS directly accepts a pre-computed dissimilarity matrix, which is extremely useful when your distances are not Euclidean:
from sklearn.metrics import pairwise_distances
# Compute a dissimilarity matrix with Manhattan distance
D = pairwise_distances(X, metric='cityblock')
# MDS with pre-computed dissimilarity
mds_custom = MDS(n_components=2, metric=True,
dissimilarity='precomputed',
random_state=42, n_init=4, max_iter=300)
Z_custom = mds_custom.fit_transform(D)
print(f"Stress with Manhattan distance: {mds_custom.stress_:.4f}")
Comparison of metric vs non-metric MDS
Let’s compare the two variants to see their impact on projection quality:
# Metric MDS
mds_metric = MDS(n_components=2, metric=True, random_state=42,
n_init=4, max_iter=300,
normalized_stress=True)
Z_metric = mds_metric.fit_transform(X)
stress_metric = mds_metric.stress_
# Non-metric MDS
mds_nonmetric = MDS(n_components=2, metric=False, random_state=42,
n_init=4, max_iter=300,
normalized_stress=True)
Z_nonmetric = mds_nonmetric.fit_transform(X)
stress_nonmetric = mds_nonmetric.stress_
print(f"Metric MDS stress : {stress_metric:.6f}")
print(f"Non-metric MDS stress: {stress_nonmetric:.6f}")
# Comparative visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
for ax, Z, stress, title in [
(axes[0], Z_metric, stress_metric, 'Metric MDS'),
(axes[1], Z_nonmetric, stress_nonmetric, 'Non-metric MDS')
]:
for label, marker, color in zip(range(3), ['o', 's', 'D'],
['blue', 'green', 'red']):
ax.scatter(Z[y == label, 0], Z[y == label, 1],
marker=marker, c=color,
label=iris.target_names[label],
alpha=0.7, edgecolors='k', s=80)
ax.set_title(f'{title}\nStress = {stress:.4f}', fontsize=12)
ax.legend()
plt.tight_layout()
plt.show()
Plotting stress as a function of the number of dimensions
An essential diagnostic tool for choosing the optimal number of dimensions:
stresses = []
dimensions = range(1, 8)
for dim in dimensions:
mds = MDS(n_components=dim, metric=True, random_state=42,
n_init=4, max_iter=300, normalized_stress=True)
mds.fit(X)
stresses.append(mds.stress_)
plt.figure(figsize=(8, 5))
plt.plot(dimensions, stresses, 'bo-', linewidth=2, markersize=8)
plt.xlabel('Number of dimensions', fontsize=12)
plt.ylabel('Normalized stress', fontsize=12)
plt.title('Stress scree plot — MDS on Iris', fontsize=14)
plt.xticks(dimensions)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
This plot helps identify the “elbow” — the point beyond which adding dimensions no longer significantly reduces stress.
Key hyperparameters
Understanding the hyperparameters of MDS in scikit-learn is essential to obtain optimal results:
n_components (int, default: 2)
Number of dimensions in the projection space. Typically 2 or 3 for visualization. A higher number better preserves distances but reduces visual interpretability.
metric (bool, default: True)
True for metric MDS, False for non-metric MDS (with monotonic regression). Non-metric MDS is more robust when distances are only ordinal judgments.
n_init (int, default: 4)
Number of random initializations performed. The initialization with the best stress is kept. Increasing this value improves the chances of finding the global minimum, at the cost of increased computation time.
max_iter (int, default: 300)
Maximum number of SMACOF algorithm iterations for each initialization. If the algorithm reaches this limit without convergence, either increase max_iter or check that the eps threshold is not too strict.
verbose (int, default: 0)
Verbosity level. 0 = silent, 1 = basic convergence information, 2+ = full iteration details. Useful for debugging.
eps (float, default: 1e-3)
Convergence threshold on the relative change in stress. If the stress decreases by less than eps between two consecutive iterations, the algorithm stops.
n_jobs (int, default: None)
Number of parallelized threads for distance computations and multiple initializations. -1 uses all available cores.
dissimilarity (str, default: ‘euclidean’)
Type of dissimilarity: 'euclidean' (automatically computes Euclidean distances from the data) or 'precomputed' (directly uses the dissimilarity matrix provided as input).
normalized_stress (bool/str, default: ‘auto’)
Uses normalized stress (stress-1) instead of raw stress. Recommended for comparing datasets of different sizes. The 'auto' value automatically enables normalized stress.
Advantages and limitations
Advantages
- Distance flexibility: MDS accepts any dissimilarity measure, not just Euclidean distance. You can use custom distances, transformed similarities, or even human similarity judgments.
- Geometric interpretability: the projection axes have a direct interpretation — spatial proximity reflects the similarity of observations.
- No linearity assumption: unlike PCA, MDS does not assume that relationships are linear. Non-metric MDS in particular is entirely non-parametric.
- Applicable to non-numerical data: as long as you can define a dissimilarity measure, MDS can be applied (texts, graphs, categorical data).
- Stress-based diagnosis: the stress function provides a direct quantitative measure of projection quality.
Limitations
- High computational complexity: computing the dissimilarity matrix is O(n²) and each SMACOF iteration is O(n²). For n = 10,000, the matrix already contains 100 million elements. MDS does not scale to very large datasets.
- Sensitivity to local minima: stress optimization is non-convex. Different initializations can give different results, hence the importance of
n_init. - Information loss under heavy compression: projecting from 100 dimensions to 2D inevitably involves loss. A high stress signals that the data structure cannot be faithfully represented in 2D.
- No invertible projection function: unlike PCA, MDS does not produce a reusable projection matrix to transform new data. The full MDS must be recomputed or approximation techniques used (out-of-sample extension).
- Arbitrary rotation: the MDS solution is not unique — any rotation or reflection of the configuration preserves distances and therefore stress. The orientation of the axes has no intrinsic meaning.
Concrete use cases
Case 1: Product similarity mapping
An e-commerce company wants to visualize the perceived similarity between its products to optimize its catalog. By computing a dissimilarity matrix based on co-purchases (how often two products are bought together), MDS makes it possible to project products onto a 2D plane. Similar products naturally cluster together, revealing clusters invisible in the raw data. This visualization guides merchandising and recommendation decisions.
# Concept: product similarity from co-purchases
from sklearn.metrics import pairwise_distances
from sklearn.manifold import MDS
import numpy as np
# Binary client × product matrix (1 = purchased, 0 = not purchased)
# Dissimilarity = 1 − Jaccard coefficient
# After computing matrix D:
mds = MDS(n_components=2, metric=True,
dissimilarity='precomputed', random_state=42)
Z = mds.fit_transform(D)
Case 2: Perception analysis in sensory marketing
In a study on olfactory perceptions, participants rate the perceived similarity between 50 perfumes on a scale of 1 to 10. Non-metric MDS is ideal here because human judgments are ordinal (we know that perfume A is closer to B than to C, but not in what exact proportion). The projection reveals the underlying dimensions of olfactory perception (floral vs. woody, fresh vs. warm, etc.) without having predefined them.
# judgment_matrix: 50×50 matrix of perceived similarities
# Convert to dissimilarity
dissimilarity = np.max(judgment_matrix) - judgment_matrix
mds = MDS(n_components=2, metric=False,
random_state=42, n_init=10)
Z = mds.fit_transform(dissimilarity)
Case 3: Visualization of genetic distances between species
In evolutionary biology, we have genetic distances (number of mutations per base pair) between many species. These distances are not Euclidean — they reflect complex evolutionary processes. MDS allows them to be visualized in 2D or 3D, revealing consistent taxonomic groups and unexpected evolutionary relationships.
# D_genetic: distance matrix between species
mds = MDS(n_components=2, metric=True,
dissimilarity='precomputed',
random_state=42, n_init=8)
Z = mds.fit_transform(D_genetic)
# Color by taxonomic clade to verify consistency
Case 4: Exploration of multidimensional health data
In a medical record, each patient is described by hundreds of variables: blood results, history, vital signs, genomic data. Computing a distance matrix (e.g., Gower distance, suited to mixed continuous and categorical variables) and then applying non-metric MDS makes it possible to visualize the heterogeneity of the cohort and identify subgroups of similar patients for personalized medicine.
from sklearn.manifold import MDS
# D_gower: Gower distance matrix between patients
mds = MDS(n_components=2, metric=False,
dissimilarity='precomputed',
random_state=42, n_init=10,
normalized_stress=True)
Z = mds.fit_transform(D_gower)
print(f"Stress: {mds.stress_:.4f}")
See also
- Become a ‘Trillionaire’ by Coding: How Python Can Revolutionize Your Financial Projects
- Mastering Colored Charts in Python: Complete Guide for Beginners and Experts

