One-Class SVM anomaly detection: Complete guide — Principles, Examples and Python Implementation
Summary
The One-Class SVM is an unsupervised learning algorithm designed specifically for anomaly detection. Unlike traditional classifiers that distinguish between multiple classes, the One-Class SVM only learns the boundary enclosing the data of the normal class. Any sample located outside that boundary is flagged as an anomaly. This approach makes it particularly powerful in scenarios where abnormal examples are rare, unknown, or unlabeled — an extremely common situation in real-world industrial applications.
In this guide, we will explore the mathematical principles of the One-Class SVM, its fundamental intuition, its practical implementation with scikit-learn, the influence of its hyperparameters, as well as four concrete production use cases.
Mathematical principle
The One-Class SVM, introduced by Schölkopf and his collaborators in 2001, is based on classical Support Vector Machines (SVM). The fundamental problem is the following: find the optimal hyperplane that separates the data from the origin in a high-dimensional feature space.
Optimization formulation
The primal problem of the One-Class SVM is written as follows:
min (1/2) ||w||² - ρ + 1/(ν·n) Σ ξ_i
subject to the constraints:
w · φ(x_i) ≥ ρ - ξ_i, ∀i
ξ_i ≥ 0, ∀i
Where:
- w is the normal vector to the hyperplane in the feature space.
- φ(xᵢ) represents the nonlinear transformation of the input data via a kernel function.
- ρ is the bias (or margin) that determines the distance of the hyperplane from the origin.
- ξᵢ are slack variables that allow certain constraint violations, enabling a more flexible boundary.
- ν (nu) is the key parameter that controls the maximum fraction of tolerated anomalies and the minimum fraction of support vectors.
The role of the kernel
Like classical SVMs, the One-Class SVM uses the kernel trick to implicitly project the data into a higher-dimensional space. The RBF (Radial Basis Function) kernel is the most commonly used:
K(xᵢ, xⱼ) = exp(-γ ||xᵢ - xⱼ||²)
The γ (gamma) parameter determines the width of the Gaussian and directly influences the flexibility of the decision boundary. A high gamma produces a very flexible boundary that can capture complex structures, while a low gamma generates a more regular and more generalizing boundary.
Decision function
Once the model is trained, the decision function for a sample x is written:
f(x) = sign(Σ αᵢ K(xᵢ, x) - ρ)
If f(x) = +1, the sample is classified as normal. If f(x) = -1, it is classified as an anomaly. The continuous value decision_function(x) provides a normality score: the higher the score, the more the point is at the heart of the normal distribution.
Geometric intuition
Rather than trying to distinguish multiple classes from each other, the One-Class SVM adopts a radically different strategy: it learns what is normal by surrounding the training data with the most compact boundary possible.
Imagine a point cloud on a plane. The One-Class SVM will draw a boundary around that cloud. This boundary hugs the shape of the cloud thanks to the RBF kernel — it is not necessarily circular or elliptical. Any point located outside this boundary is flagged as suspicious, because it does not resemble the normal samples observed during training.
This approach is particularly relevant when:
- Anomalies are rare and therefore difficult to collect for training.
- Anomalies are unpredictable — their nature may evolve over time.
- We only have positive examples (normal data) without negative labels.
The ν parameter acts like a thermostat: a low value (ν = 0.01) imposes a strict boundary that tolerates few errors (increased risk of false positives), while a high value (ν = 0.5) allows a more permissive boundary.
Python implementation with scikit-learn
Foundamental example: anomaly detection on blobs
Here is a complete implementation using OneClassSVM from scikit-learn:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import OneClassSVM
from sklearn.datasets import make_blobs
from sklearn.ensemble import IsolationForest
from sklearn.metrics import classification_report
# --- 1. Data generation ---
# Normal data: two Gaussian blobs
X_normal, _ = make_blobs(
n_samples=500, centers=2,
cluster_std=0.8, random_state=42
)
# Artificial anomalies: uniformly distributed points
np.random.seed(42)
X_outliers = np.random.uniform(
low=-8, high=8, size=(50, 2)
)
# Training set (only normal data)
X_train = X_normal
# Test set (mix of normal + anomalies)
X_test = np.vstack([X_normal, X_outliers])
y_true = np.concatenate([
np.ones(len(X_normal)),
-np.ones(len(X_outliers))
])
# --- 2. Training the One-Class SVM ---
ocsvm = OneClassSVM(
kernel='rbf', gamma=0.1, nu=0.05
)
ocsvm.fit(X_train)
# Predictions
y_pred = ocsvm.predict(X_test)
print("Classification Report (One-Class SVM):")
print(classification_report(
y_true, y_pred,
target_names=['Anomaly', 'Normal']
))
# --- 3. Visualizing the boundary ---
xx, yy = np.meshgrid(
np.linspace(-8, 8, 200),
np.linspace(-8, 8, 200)
)
Z = ocsvm.decision_function(
np.c_[xx.ravel(), yy.ravel()]
)
Z = Z.reshape(xx.shape)
plt.figure(figsize=(10, 7))
plt.contourf(
xx, yy, Z, levels=np.linspace(Z.min(), 0, 50),
cmap='Blues', alpha=0.7
)
plt.contour(xx, yy, Z, levels=[0], colors='red', linewidths=2)
normal_mask = (y_pred == 1)
outlier_mask = (y_pred == -1)
plt.scatter(
X_test[normal_mask, 0], X_test[normal_mask, 1],
c='steelblue', s=30, label='Normal', alpha=0.7
)
plt.scatter(
X_test[outlier_mask, 0], X_test[outlier_mask, 1],
c='red', s=50, marker='x', label='Anomaly',
linewidths=2, alpha=0.9
)
plt.title(
"One-Class SVM: decision boundary and "
"anomaly detection"
)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.tight_layout()
plt.show()
Comparison with Isolation Forest
It is instructive to compare the One-Class SVM with Isolation Forest, another flagship anomaly detection algorithm:
# Isolation Forest for comparison
iforest = IsolationForest(
contamination=0.05, random_state=42
)
iforest.fit(X_train)
y_pred_if = iforest.predict(X_test)
print("\nClassification Report (Isolation Forest):")
print(classification_report(
y_true, y_pred_if,
target_names=['Anomaly', 'Normal']
))
# Comparative visualization
Z_if = iforest.decision_function(
np.c_[xx.ravel(), yy.ravel()]
)
Z_if = Z_if.reshape(xx.shape)
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
for ax, Z_val, title in [
(axes[0], Z, "One-Class SVM"),
(axes[1], Z_if, "Isolation Forest")
]:
ax.contourf(xx, yy, Z_val, levels=40, cmap='coolwarm', alpha=0.7)
ax.contour(xx, yy, Z_val, levels=[0], colors='black', linewidths=2)
ax.scatter(
X_test[y_true == 1, 0], X_test[y_true == 1, 1],
c='steelblue', s=20, label='Normal', alpha=0.5
)
ax.scatter(
X_test[y_true == -1, 0], X_test[y_true == -1, 1],
c='red', s=60, marker='x', linewidths=2,
label='Real anomaly'
)
ax.set_title(title)
ax.legend()
plt.suptitle(
"Comparison of decision boundaries"
)
plt.tight_layout()
plt.show()
Interpreting the decision function
The decision_function() method returns an extremely useful continuous score:
scores = ocsvm.decision_function(X_test)
# Score distribution
plt.figure(figsize=(10, 5))
plt.hist(
scores[y_true == 1], bins=50, alpha=0.6,
color='steelblue', label='Normal distribution'
)
plt.hist(
scores[y_true == -1], bins=30, alpha=0.6,
color='red', label='Anomalies'
)
plt.axvline(x=0, color='black', linestyle='--',
linewidth=1.5, label='Threshold = 0')
plt.xlabel("Decision score")
plt.ylabel("Frequency")
plt.title(
"Distribution of decision scores: "
"One-Class SVM"
)
plt.legend()
plt.tight_layout()
plt.show()
This chart typically reveals a bimodal distribution: a broad peak for normal points (positive scores) and a narrow peak for anomalies (negative scores). The threshold of 0 represents the boundary between the two regions.
Hyperparameters of the One-Class SVM
The performance of the One-Class SVM depends heavily on tuning its hyperparameters. Here is a detailed analysis of each:
kernel
The kernel defines the transformation of the input space:
- “rbf”: the default and most widely used choice. Adapts the boundary to complex shapes.
- “linear”: linear boundary in the original space. Fast but limited in expressiveness.
- “poly”: polynomial kernel. Useful when the data exhibits polynomial interactions.
- “sigmoid”: sigmoid kernel, inspired by neural networks. Rarely used in practice.
nu
The most critical parameter. It controls a trade-off between two theoretical bounds:
- Upper bound: maximum fraction of samples classified as anomalies.
- Lower bound: minimum fraction of support vectors.
In practice, ν ≈ 0.01 to 0.1 works well for data where anomalies constitute 1% to 10% of the volume. A ν of 0.05 means that approximately 5% of the training data will be classified as anomalies.
gamma
The gamma parameter of the RBF kernel:
gamma = 1 / (2 × σ²)
- Low gamma: smooth boundary, good generalization but risk of underfitting.
- High gamma: very flexible boundary, can capture fine details but risk of overfitting.
- Practical rule:
gamma='auto'uses 1/n_features, but it is often necessary to validate via cross-validation.
degree
Degree of the polynomial kernel. Only relevant when kernel='poly'. The default value is 3. Higher degrees increase the model’s capacity to model complex interactions, but at the cost of increased computational complexity.
shrinking
Enables the shrinking heuristic optimization. Significantly speeds up training, especially on large datasets. The default value is True. It is rarely necessary to disable it.
cache_size
Kernel cache size in memory (in MB). The default value is 200 MB. Increasing it (e.g. to 500 MB) can speed up training on large datasets, at the cost of higher memory consumption.
tol
Tolerance for the stopping criterion. Default: 1e-3. A stricter tolerance (1e-4) ensures more precise convergence but extends training time.
Advantages and Limitations
Advantages
- No need for labeled anomalies: Works with normal examples only.
- Nonlinear boundaries: The RBF kernel captures complex shapes.
- Solid theoretical foundation: Derived from SVM theory, with convergence guarantees.
- Continuous score: The
decision_functionallows adjusting the threshold to business needs. - Support vectors: Only points near the boundary are stored in memory, making inference efficient.
Limitations
- Computational complexity: Training is O(n²) to O(n³), prohibitive beyond ~50,000 samples.
- Sensitivity to hyperparameters: Tuning nu and gamma is critical and requires careful validation.
- Difficult to interpret: The boundary in feature space is not directly visualizable in high dimensions.
- Memory: Support vectors must be retained for inference, which can be costly.
- No probabilities: Does not natively return membership probabilities (requires additional calibration).
4 concrete use cases
Case 1: Financial fraud detection
In banking transactions, fraud typically represents less than 0.1% of volume. The One-Class SVM is trained on months of legitimate transactions. Each new transaction is evaluated by the decision function: a negative score triggers an alert for manual investigation. The ν parameter is set to limit false positives that would overwhelm analysis teams.
Case 2: Industrial predictive maintenance
Sensors on an industrial machine generate time series of vibrations, temperature, and pressure. The One-Class SVM models the normal behavior of the equipment in healthy operation. A deviation from the normal boundary signals early wear, imminent failure, or a maintenance need, allowing intervention before breakdown.
Case 3: Cybersecurity — Intrusion detection
The network traffic of an information system exhibits recognizable patterns: data volumes, schedules, protocols used. The One-Class SVM learns this legitimate traffic. A zero-day attack, by definition unknown to classic signatures, manifests as a deviation from normal behavior and is thus detected — even without a prior signature.
Case 4: Manufacturing quality control
On a production line, defective parts follow measurement distributions (dimensions, weight, strength) different from those of compliant parts. The One-Class SVM, trained on measurements of accepted parts, automatically identifies parts whose characteristics deviate from the normal distribution, enabling real-time sorting on the line.
See also
- Implement the SHA-1 Hashing Algorithm in Python: Step-by-Step Guide
- Draw a Triangle with Circle Arcs in Python: Complete Guide for Developers

