SOM (Self-Organizing Maps): Complete Guide — Principles, Examples and Python Implementation
Summary
Self-Organizing Maps (SOM), also called Kohonen maps after their inventor Teuvo Kohonen (1982), are one of the most elegant unsupervised learning algorithms ever designed. Unlike classical dimensionality reduction methods like PCA, SOMs preserve the topology of the data: two samples that are close in the original space end up as neighbors on the two-dimensional map. This unique property makes SOMs a tool of choice for visualization, exploration, and segmentation of complex datasets.
In this guide, we will explore the mathematical principle of SOMs, their deep intuition, a complete Python implementation with MiniSom, the hyperparameters to master, as well as four concrete use cases.
Mathematical principle of self-organizing maps
A Kohonen map is an artificial neural network organized on a regular grid (usually 1D or 2D). Each neuron i has a weight vector w_i of the same dimension as the input data. The algorithm relies on a competitive learning mechanism where neurons compete to represent each sample.
Step 1: Random weight initialization
At startup, the weight vectors of all neurons are initialized randomly:
∀i : w_i(0) ∼ uniform(0, 1) or sampled from the data
Initialization by sampling from the real data often gives better results than purely random initialization, as it places the weights directly in the domain of the observed data.
Step 2: Finding the BMU (Best Matching Unit)
For each input vector x presented to the network, we identify the winning neuron — the BMU — that is, the neuron whose weight vector is closest to x according to a distance metric (usually Euclidean distance):
BMU(x) = argmin_i ||x - w_i(t)||
The BMU is the neuron that “most resembles” the presented data x. The entire map responds, but the BMU is the most strongly activated.
Step 3: Weight update — the Kohonen rule
Once the BMU is identified, the weights of all neurons in the grid are adjusted according to the following formula:
w_i(t+1) = w_i(t) + α(t) · h(i, BMU, t) · (x - w_i(t))
This equation is at the heart of how SOMs work. Let’s break down each term:
- α(t): the learning rate, which decreases over time. It controls the magnitude of the modifications.
- h(i, BMU, t): the neighborhood function, typically a Gaussian centered on the BMU. It determines the spatial influence of the BMU on its neighbors on the grid.
- (x – w_i(t)): the gap vector between the data and the current weight, which indicates the direction of the correction.
Neighborhood function
The most common neighborhood function is a Gaussian:
h(i, BMU, t) = exp(-||r_i - r_BMU||² / (2σ(t)²))
where r_i and r_BMU are the positions of the neurons on the grid, and σ(t) is the neighborhood radius that decreases over iterations.
Two variants exist:
- Gaussian neighborhood (soft): progressive influence, all neurons receive an update.
- Bubble neighborhood (hard): only neurons within a radius σ around the BMU are updated.
Decay of the learning rate and neighborhood radius
Unlike many learning algorithms, SOMs use a progressive scheduling phase:
α(t) = α_0 · exp(-t / n_iterations)
σ(t) = σ_0 · exp(-t / n_iterations)
At the beginning, the learning rate and neighborhood radius are high: the map organizes globally, forming the large structures. Progressively, these two parameters decrease, allowing the map to locally refine the positions of the neurons. This dual decay is essential: without it, the map would never converge to a stable organization.
Intuition: a lecture hall of students that self-organizes
Imagine an amphitheatre in which hundreds of students enter in random order. They are asked to place themselves spontaneously according to their field of study — but without talking, without coordinating, simply by observing their neighbors and moving closer to those who resemble them.
What happens is remarkable. Mathematicians naturally gather in one corner. Literary scholars settle in another corner. Computer scientists position themselves between the two — because their field borrows from mathematics as much as from the humanities. Physicists end up near the mathematicians, with a few biologists at the border. Philosophers settle near the literary scholars.
The transitions between neighboring fields are smooth and natural. There is no sharp cut: between mathematics and computer science, there will be an “applied mathematics” zone. Between literature and philosophy, a “literary theory” zone.
This is exactly what a SOM does with data. Each neuron is like a student who finds its place on the 2D grid by moving closer to its peers. The topology of the relationships between neurons faithfully reflects the topology of the similarities between the data.
This intuition explains why SOMs are so powerful for visualization: they reveal the hidden structure of data in a format that the human eye can naturally interpret.
Python implementation with MiniSom
Installation
For this article, we will use MiniSom, a lightweight and well-designed Python library for Kohonen maps:
pip install minisom matplotlib seaborn scikit-learn
Complete example on the Iris dataset
Here is a complete implementation that trains a SOM on the famous Iris dataset and visualizes the results:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from minisom import MiniSom
import seaborn as sns
# 1. Loading and preparing the data
iris = load_iris()
X = iris.data
y = iris.target
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# 2. Creating and training the SOM
som = MiniSom(x=10, y=10, input_len=4,
sigma=3.0, learning_rate=0.5,
distance_metric='euclidean',
activation_distance='euclidean',
topology='hexagonal',
neighborhood_function='gaussian')
som.random_weights_init(X_scaled)
som.train_random(X_scaled, num_iteration=2000)
# 3. Visualization — cluster map
plt.figure(figsize=(12, 4))
# 3a. Mapping samples onto the grid
plt.subplot(1, 2, 1)
colors_map = {0: 'red', 1: 'green', 2: 'blue'}
for i, x in enumerate(X_scaled):
winner = som.winner(x)
plt.plot(winner[0] + 0.5, winner[1] + 0.5,
marker='o', color=colors_map[y[i]],
markersize=6, alpha=0.6, zorder=2)
plt.title('SOM Map — Iris Data', fontsize=14, fontweight='bold')
plt.xlabel('Grid x coordinate')
plt.ylabel('Grid y coordinate')
plt.grid(True, alpha=0.3)
# 3b. U-Matrix
def compute_umatrix(som):
um = np.zeros((som.weights.shape[0], som.weights.shape[1]))
for i in range(um.shape[0]):
for j in range(um.shape[1]):
neighbors = []
for di in [-1, 0, 1]:
for dj in [-1, 0, 1]:
ni, nj = i + di, j + dj
if 0 <= ni < som.weights.shape[0] and 0 <= nj < som.weights.shape[1]:
if (di != 0 or dj != 0):
dist = np.linalg.norm(
som.weights[i, j] - som.weights[ni, nj])
neighbors.append(dist)
um[i, j] = np.mean(neighbors) if neighbors else 0
return um
plt.subplot(1, 2, 2)
umatrix = compute_umatrix(som)
plt.imshow(umatrix, cmap='viridis', origin='upper', interpolation='nearest')
plt.colorbar(label='Average distance to neighbors')
plt.title('U-Matrix', fontsize=14, fontweight='bold')
plt.xlabel('x coordinate')
plt.ylabel('y coordinate')
plt.tight_layout()
plt.savefig('som_iris_visualization.png', dpi=150)
plt.show()
# 4. Component Planes
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
feature_names = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width']
for k in range(4):
ax = axes[k // 2, k % 2]
plane = np.zeros((som.weights.shape[0], som.weights.shape[1]))
for i in range(som.weights.shape[0]):
for j in range(som.weights.shape[1]):
plane[i, j] = som.weights[i, j, k]
im = ax.imshow(plane, cmap='RdYlBu_r', interpolation='bilinear')
ax.set_title(f'Component Plane — {feature_names[k]}',
fontweight='bold')
fig.colorbar(im, ax=ax)
plt.tight_layout()
plt.savefig('som_component_planes.png', dpi=150)
plt.show()
The U-Matrix: understanding cluster boundaries
The U-Matrix (Unified Distance Matrix) is the most powerful visualization tool for SOMs. For each cell in the grid, we compute the average distance between the weight of that cell and those of its neighbors. Areas where these distances are high correspond to boundaries between clusters — these are the regions where the map abruptly transitions from one group of data to another. Conversely, areas of low distance indicate homogeneous clusters.
Component Planes: reading one dimension at a time
Component planes allow us to visualize the contribution of each original variable to the positioning of neurons. By mentally overlaying the component planes, we understand how each characteristic influences the spatial organization of the map. For the Iris data, the “petal width” component plane shows a very clear transition that separates the three species — it is the most discriminating variable.
Alternative: sklearn-som
Another option is the sklearn-som package, which offers a scikit-learn-style interface:
from sklearn_som.som import SOM
som_sk = SOM(m=10, n=10, dim=4, n_iter=2000,
lr=0.5, random_state=42)
som_sk.fit(X_scaled)
# Weights are accessible via som_sk.weight_vectors_
This approach is simpler but offers less fine-grained control over the neighborhood function and topology.
Key SOM hyperparameters
The choice of hyperparameters determines the quality of the final map. Here are the five parameters to master:
1. Grid dimensions (x, y)
The shape of the grid determines the resolution of the map.
- Small grid (5×5): overview, high generalization.
- Large grid (20×20): fine details, risk of overfitting.
- Empirical rule: 5·√n samples for the total number of neurons.
2. Initial neighborhood radius (σ_0)
This parameter controls how many neurons are influenced by each sample at the beginning of training.
- σ too small: the map fragments, loses its topology.
- σ too large: the map organizes too globally, without local details.
- Recommended value: about half the longest side of the grid.
3. Initial learning rate (α_0)
- Typical value: between 0.3 and 0.7.
- A high α gives fast but coarse organization.
- A low α gives slow but precise organization.
4. Number of iterations (n_iterations)
- Minimum recommended: 500 × number of neurons.
- Too few iterations: incomplete convergence.
- Too many iterations: unnecessary computation time.
5. Distance metric
- Euclidean (default): the most common, suited to normalized continuous data.
- Manhattan: more robust to outliers.
- Cosine: ideal for text data or normalized vectors.
Advantages and limitations of Kohonen maps
Advantages
- Topological preservation: neighborhood relationships in the data space are preserved on the 2D map. This is the distinctive property of SOMs.
- Intuitive visualization: a colored 2D grid is immediately interpretable, even by non-specialists.
- Unsupervised learning: no labels are needed. SOMs discover the structure of the data on their own.
- Robustness to noise: thanks to the neighborhood function, outliers have a limited impact on the overall map.
- Interpretability: component planes reveal the role of each variable in cluster organization.
- Flexibility: adaptable to many types of data through the choice of distance metric.
Limitations
- Hyperparameter selection: grid size, σ_0, and α_0 are difficult to optimize automatically. The practitioner’s experience matters.
- Sensitivity to initialization: different initializations can produce distinct maps. It is often necessary to run the algorithm multiple times.
- No objective evaluation criterion: unlike K-Means which provides an inertia score, the quality of a SOM is subjectively assessed.
- Computational cost: each iteration requires computing distances between the sample and all neurons. For very large maps, this becomes expensive.
- Border interpretation: neurons at the edge of the grid have fewer neighbors, creating an edge effect that can distort the analysis.
4 concrete use cases
Use case 1: Customer segmentation
An e-commerce company has millions of customers described by hundreds of variables: purchase frequency, average basket, product category, seniority, return rate, engagement scores… A SOM makes it possible to project this multidimensional population onto a 2D map where each homogeneous zone corresponds to a customer segment. A marketer can literally “read” the map: “the customers in the top left are high-value occasional buyers, those in the bottom right are frequent buyers with small baskets.” This visual segmentation is much richer than simple K-Means clustering.
Use case 2: Image analysis and remote sensing
In satellite remote sensing, each pixel of a hyperspectral image contains hundreds of spectral bands. SOMs make it possible to group pixels of the same material composition to produce thematic classification maps: forests, urban areas, water bodies, agricultural crops. The preserved topology ensures that geographically neighboring areas (in spectral space) remain neighbors on the map.
Use case 3: Exploratory analysis in biology
In genomics, expression data from thousands of genes are projected onto a SOM. Genes expressed in a similar way cluster together, revealing functional modules — sets of genes that participate in the same metabolic pathways. This approach has been used successfully in cancer studies to identify molecular subtypes.
Use case 4: Industrial anomaly detection
In a predictive maintenance context, industrial sensor data (temperature, vibration, pressure, current) are projected onto a SOM. In normal operation, the data fall into well-identified areas of the map. When an atypical sample is projected onto a rarely visited neuron — with a high U-Matrix value — it is a sign of a potential anomaly. This unsupervised method detects drifts without needing labeled failure examples.
Comparison with other methods
| Criterion | SOM | K-Means | PCA | t-SNE |
|---|---|---|---|---|
| Type | Unsupervised | Unsupervised | Unsupervised | Unsupervised |
| Visualization | Structured 2D grid | Abstract clusters | Linear axes | 2D/3D projection |
| Topology | ✅ Preserved | ❌ | ❌ | ❌ Local only |
| Interpretability | High | Medium | High | Low |
| Hyperparameters | Moderate | Low | None | Numerous |
| Massive data | Moderate | Fast | Fast | Slow |
Unlike K-Means which produces disjoint clusters, SOMs offer spatial continuity between regions. Unlike PCA which is linear, SOMs capture nonlinear relationships. And unlike t-SNE which sacrifices global structure, SOMs offer a compromise between local and global fidelity.
Conclusion
Kohonen’s Self-Organizing Maps remain, forty years after their invention, one of the most fascinating algorithms in unsupervised learning. Their beauty lies in their conceptual simplicity — a neural network that self-organizes on a grid — and in the richness of the information they reveal.
In practice, SOMs excel in situations where visual interpretability is as important as algorithmic performance. When a decision-maker needs to understand the structure of their data, when an explorer wants to navigate a complex space, when a scientist seeks to discover hidden patterns — the Kohonen map is a companion of choice.
The key to success with SOMs? Experiment. Test several grid sizes, vary the hyperparameters, compare visualizations. The first map will give you an intuition, the tenth will give you answers.
See also
- Compute the Totient of a Square in Cube with Python: Practical Guide and Tips
- Create Crossed Ellipses in Python: Complete Guide for Visualization and Manipulation

