Isomap: Complete Guide — Principles, Examples, and Python Implementation

Summary

Isomap (Isometric Mapping) is a nonlinear dimensionality reduction algorithm introduced by Joshua Tenenbaum, Vin de Silva, and John Langford in 2000. Unlike Principal Component Analysis (PCA), which projects data onto linear subspaces, Isomap preserves the intrinsic geometry of nonlinear manifolds by estimating geodesic distances along the data structure. This approach allows discovering the true dimensionality of the data while respecting its natural curvature. Isomap belongs to the family of manifold learning methods and represents a fundamental step between classical linear techniques and modern approaches like t-SNE and UMAP.

Mathematical principle of the Isomap algorithm

The Isomap algorithm rests on three fundamental steps that transform high-dimensional data into a low-dimensional embedding while preserving the global geometry of the manifold.

Step 1: Construction of the k-nearest neighbor graph

The first step consists of constructing a neighborhood graph where each data point is a vertex. Two vertices are connected by an edge if one of the two points is among the k nearest neighbors of the other. The weight of each edge corresponds to the Euclidean distance between the two connected points.

Formally, for a dataset X = {x₁, x₂, …, xₙ} ⊂ ℝᴰ, we construct a graph G = (V, E) where:
– V = {1, 2, …, n} represents the set of vertices
– (i, j) ∈ E if xⱼ ∈ Nₖ(xᵢ) or xᵢ ∈ Nₖ(xⱼ), where Nₖ(xᵢ) denotes the set of k nearest neighbors of xᵢ
– The edge weight wᵢⱼ = ||xᵢ – xⱼ||₂ (Euclidean distance)

There are two main variants for determining neighborhoods: the k-nearest neighbor (k-NN) method where k is a fixed parameter, and the ε-neighborhood method where all points separated by a distance less than a threshold ε are connected. The first approach is the most commonly used in practice due to its direct control over graph connectivity.

Step 2: Computation of geodesic distances via shortest path

This is where the central innovation of Isomap lies. Instead of using direct Euclidean distances between points (which cut through the ambient space), Isomap estimates geodesic distances — that is, distances along the curved surface of the manifold.

To compute these geodesic distances, Isomap uses Dijkstra’s algorithm (or alternatively the Floyd-Warshall algorithm) on the graph constructed in the previous step. Dijkstra’s algorithm computes the shortest path between a source node and all other nodes in the graph, considering the sum of the weights of the traversed edges.

The geodesic distance matrix Dᴳ is of size n × n, where Dᴳᵢⱼ represents the length of the shortest path between vertices i and j in the graph. If two vertices are not connected (disconnected graph), the distance is defined as infinite.

The Floyd-Warshall algorithm offers an alternative: it simultaneously computes the shortest paths between all pairs of vertices, with a complexity of O(n³). Dijkstra, applied n times (once per source), achieves a complexity of O(n · (m + n log n)) with a priority queue, where m is the number of edges. In practice, Isomap generally uses Floyd-Warshall for its ease of implementation, although Dijkstra is more efficient for large datasets.

Step 3: Classical MDS on the geodesic distance matrix

The last step applies the Multidimensional Scaling (MDS) algorithm to the geodesic distance matrix Dᴜ. Classical MDS proceeds as follows:

The geodesic distance matrix is transformed into a similarity matrix via double centering: B = -½ · J · Dᴳ² · J, where J = I – (1/n)·1·1ᵀ is the centering matrix.
An eigenvalue decomposition of matrix B is performed: B = VΛVᵀ.
The d largest eigenvalues λ₁ ≥ λ₂ ≥ … ≥ λ_d > 0 and their corresponding eigenvectors v₁, v₂, …, v_d are selected.
The final embedding in dimension d is obtained by: Y = V_d · Λ_d^(½), where V_d contains the d eigenvectors and Λ_d is the diagonal matrix of the d eigenvalues.

The result is a low-dimensional representation where the Euclidean distances between points best approximate the geodesic distances of the original manifold.

Intuition: why Isomap changes everything

Imagine you want to measure the distance between two cities. Euclidean distance is the straight-line distance: a straight line on a map, cutting through mountains, rivers, and buildings as if they didn’t exist. Geodesic distance is the distance following the roads: it respects the terrain, the contours, the actual structure of the land.

Isomap does exactly this for your data. Take the classic Swiss roll example: imagine a sheet of paper rolled into a spiral in three-dimensional space. Two points can be very close in ambient space (as the crow flies, they almost touch) while being very far from each other if you travel along the surface of the paper. PCA, which only sees straight lines, would flatten the roll and mix points that are actually very far apart on the manifold. Isomap, on the other hand, “unrolls” the roll by following its surface, preserving the true structure of the data.

This intuition is fundamental: Isomap doesn’t project the data — it unrolls it. It respects the topology of the manifold instead of flattening it blindly into a linear subspace.

Python implementation with scikit-learn

Basic example with the Swiss roll

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.manifold import Isomap
from sklearn.decomposition import PCA

# Generate the Swiss roll dataset (300 points)
X, color = datasets.make_swiss_roll(n_samples=300, noise=0.1, random_state=42)

# Apply Isomap with 12 neighbors and 2 components
isomap = Isomap(n_neighbors=12, n_components=2)
X_isomap = isomap.fit_transform(X)

# Visualization
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# 3D view of the original Swiss roll
ax = axes[0]
ax.scatter(X[:, 0], X[:, 2], c=color, cmap='viridis')
ax.set_title('Original Swiss Roll (3D)')
ax.set_xlabel('X')
ax.set_ylabel('Z')

# Isomap result
ax = axes[1]
ax.scatter(X_isomap[:, 0], X_isomap[:, 1], c=color, cmap='viridis')
ax.set_title(f'Isomap (n_neighbors=12)')
ax.set_xlabel('Component 1')
ax.set_ylabel('Component 2')

# Comparison with PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
ax = axes[2]
ax.scatter(X_pca[:, 0], X_pca[:, 1], c=color, cmap='viridis')
ax.set_title('PCA (2 components)')
ax.set_xlabel('PC1')
ax.set_ylabel('PC2')

plt.tight_layout()
plt.savefig('isomap_vs_pca.png', dpi=150)
plt.show()

Detailed comparison: Isomap vs PCA

The difference between Isomap and PCA is radical on nonlinear data. PCA projects data onto the directions of greatest variance — it flattens the Swiss roll like a pancake, losing all the spiral structure. Isomap, on the other hand, progressively unrolls the spiral and recovers the underlying rectangular structure.

We can quantify this difference by measuring the correlation between the original distances (geodesic for Isomap, Euclidean for PCA) and the distances in the embedding:

from scipy.spatial.distance import pdist
from scipy.stats import spearmanr

# Distances in the original space (geodesic for Isomap)
dist_orig = isomap.dist_matrix_
dist_isomap_2d = pdist(X_isomap)

# Spearman correlation for PCA
dist_pca_2d = pdist(X_pca)
rho_pca, _ = spearmanr(dist_orig.ravel(), pdist(X).ravel())
rho_isomap, _ = spearmanr(dist_orig.ravel(), dist_isomap_2d)

print(f"PCA correlation: {rho_pca:.3f}")
print(f"Isomap correlation: {rho_isomap:.3f}")

In practice, we typically observe a significantly higher correlation for Isomap on the Swiss roll, confirming that it better preserves the intrinsic geometric structure.

Impact of the n_neighbors parameter

The choice of n_neighbors is crucial and deserves special attention:

Too small (k < 5): the graph becomes fragmented, with many disconnected subgraphs. Geodesic distances become infinite between disconnected components, and the algorithm fails.
Just right (k ≈ 10-20): the graph is well connected, geodesic paths reasonably follow the surface of the manifold. This is the ideal zone.
Too large (k > 50): the graph becomes too dense, edges create “shortcuts” through holes in the manifold, and geodesic distances become confused with Euclidean distances — the nonlinear advantage is lost.

# Study the impact of n_neighbors
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
for i, k in enumerate([5, 8, 12, 20, 50, 100]):
    ax = axes[i // 3, i % 3]
    try:
        iso = Isomap(n_neighbors=k, n_components=2)
        X_emb = iso.fit_transform(X)
        ax.scatter(X_emb[:, 0], X_emb[:, 1], c=color, cmap='viridis')
        ax.set_title(f'n_neighbors={k}')
    except Exception as e:
        ax.text(0.5, 0.5, f'Error: {str(e)[:50]}',
                ha='center', va='center', transform=ax.transAxes)
plt.tight_layout()
plt.savefig('isomap_neighbors_impact.png', dpi=150)
plt.show()

Impact of the n_components parameter

The n_components parameter determines the dimensionality of the output embedding. Unlike PCA where one examines the percentage of explained variance, with Isomap we can rely on the MDS eigenvalues to choose the intrinsic dimensionality:

iso_full = Isomap(n_neighbors=12, n_components=min(X.shape[0]-1, 10))
iso_full.fit(X)

# Examine the eigenvalues
eigenvalues = iso_full.eigenvalues_
plt.figure(figsize=(8, 4))
plt.bar(range(len(eigenvalues)), np.abs(eigenvalues), color='steelblue')
plt.xlabel('Component')
plt.ylabel('Eigenvalue')
plt.title('Eigenvalue Spectrum — Isomap')
plt.tight_layout()
plt.savefig('isomap_eigenvalues.png', dpi=150)
plt.show()

Eigenvalues that drop sharply indicate the intrinsic dimensionality of the manifold. For the Swiss roll, we expect a drop after the second component, confirming that the intrinsic dimensionality is 2 (a two-dimensional surface rolled up in ℝ³).

Isomap hyperparameters in detail

n_neighbors

The number of neighbors used to construct the graph. This is the most important and most sensitive hyperparameter. Typical values: 5 to 30. There is no universal rule — systematic scanning with visual or quantitative validation is recommended. Some practitioners recommend choosing k at the edge of connectivity (the smallest k guaranteeing a connected graph).

n_components

The dimensionality of the projection. Corresponds to the number of dimensions of the output embedding. For visualization, we generally set n_components=2 or 3. For preprocessing before classification, we can choose a higher value based on analysis of the eigenvalue spectrum.

eigen_solver

The algorithm used for computing the eigenvalues of the similarity matrix:
– ‘auto’ (default): automatic selection between ‘arpack’ and ‘dense’ based on data size
– ‘arpack’: uses ARPACK decomposition, efficient for large sparse matrices
– ‘dense’: full eigenvalue decomposition, more precise but memory-intensive

path_method

The shortest path computation algorithm:
– ‘auto’ (default): automatic selection
– ‘FW’: Floyd-Warshall algorithm, O(n³) complexity, suitable for moderately sized data
– ‘D’: Dijkstra’s algorithm, O(n · (m + n log n)) complexity, more efficient for large datasets with sparse graphs

neighbors_algorithm

The algorithm used for nearest neighbor search:
– ‘auto’: automatic selection
– ‘ball_tree’: Ball tree, efficient for moderate-dimensional data
– ‘kd_tree’: k-d tree, performant in low dimensions
– ‘brute’: exhaustive search, reliable regardless of dimensionality but expensive

Advantages and limitations of Isomap

Advantages

Preservation of global geometry: Unlike t-SNE which only preserves local structure, Isomap captures the global geometric structure of the manifold through geodesic distances.
No linearity assumption: Works on curved manifolds where PCA fails completely.
Deterministic algorithm: No randomness, results are perfectly reproducible.
Clear interpretation: The three-step principle is mathematically transparent and easy to explain.
Intrinsic dimensionality estimation: The eigenvalue spectrum provides a natural indication of the true dimensionality of the data.

Limitations

Sensitivity to the n_neighbors parameter: A poor choice can fragment the graph or create artifactual shortcuts.
High computational cost: Shortest path computation (Floyd-Warshall at O(n³)) and eigenvalue decomposition make Isomap poorly suited to very large datasets (n > 10,000).
Sensitivity to noise and outliers: Aberrant points can create parasitic edges in the neighborhood graph, distorting geodesic distances.
Failure on manifolds with complex topology: Isomap assumes the manifold is homeomorphic to a convex subspace. Structures with holes (like a torus) can cause problems.
No embedding for new data: Isomap is a transductive method — there is no direct projection function for points not seen during training.

4 concrete use cases of Isomap

1. Analysis of face images

The classic dataset of faces under different poses and lighting conditions forms a nonlinear manifold. Each intrinsic dimension corresponds to a factor of variation: rotation angle, light intensity, facial expression. Isomap allows recovering these factors by unrolling the image manifold, revealing that faces vary along only 2 or 3 intrinsic dimensions despite the apparent dimensionality of thousands of pixels.

2. Dimensionality reduction in bioinformatics

Analysis of genomic sequencing data (RNA-seq) produces matrices of tens of thousands of genes. Cells follow differentiation trajectories that form curved manifolds in this high-dimensional space. Isomap can reveal the structure of these cell development trajectories, identifying intermediate differentiation states that PCA would miss.

3. Natural language processing

Word embeddings (such as Word2Vec or GloVe) live in 300-dimensional spaces where the semantic structure is nonlinear. Isomap can project these embeddings into 2D for visualization, preserving global semantic relationships. Similar words remain close, and the axes of the embedding can reveal interpretable semantic directions (gender, intensity, temporality).

4. Robotics and navigation

In mobile robotics, a robot’s perceptions (camera images, sensor readings) form a manifold whose intrinsic dimensionality corresponds to the robot’s degrees of freedom in its environment. Isomap can extract this latent structure, allowing the robot to localize itself and plan trajectories in perception space rather than in direct physical space.

Isomap: Principles, Examples, and Python Implementation

Isomap: Complete Guide — Principles, Examples, and Python Implementation

Summary

Mathematical principle of the Isomap algorithm

Step 1: Construction of the k-nearest neighbor graph

Step 2: Computation of geodesic distances via shortest path

Step 3: Classical MDS on the geodesic distance matrix

Intuition: why Isomap changes everything

Python implementation with scikit-learn

Basic example with the Swiss roll

Detailed comparison: Isomap vs PCA

Impact of the n_neighbors parameter

Impact of the n_components parameter

Isomap hyperparameters in detail

n_neighbors

n_components

eigen_solver

path_method

neighbors_algorithm

Advantages and limitations of Isomap

Advantages

Limitations

4 concrete use cases of Isomap

1. Analysis of face images

2. Dimensionality reduction in bioinformatics

3. Natural language processing

4. Robotics and navigation

See also

Articles similaires

About Salah YAHIAOUI

Isomap: Complete Guide — Principles, Examples, and Python Implementation

Summary

Mathematical principle of the Isomap algorithm

Step 1: Construction of the k-nearest neighbor graph

Step 2: Computation of geodesic distances via shortest path

Step 3: Classical MDS on the geodesic distance matrix

Intuition: why Isomap changes everything

Python implementation with scikit-learn

Basic example with the Swiss roll

Detailed comparison: Isomap vs PCA

Impact of the n_neighbors parameter

Impact of the n_components parameter

Isomap hyperparameters in detail

n_neighbors

n_components

eigen_solver

path_method

neighbors_algorithm

Advantages and limitations of Isomap

Advantages

Limitations

4 concrete use cases of Isomap

1. Analysis of face images

2. Dimensionality reduction in bioinformatics

3. Natural language processing

4. Robotics and navigation

See also

Partager :

Articles similaires

Related Posts

Linear Regression: Principles, Examples, and Python Implementation

Régression Logistique : Guide Complet — Principes, Exemples et Implémentation Python

Flow Matching: Generation by Flow Matching

About Salah YAHIAOUI