Anomaly Detection

Introduction to Anomaly Detection

2 minute read

🦄 Anomaly is a rare item, event or observation which deviates significantly from the majority of the data and does not conform to a well-defined notion of normal behavior.

Note: Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

Anomaly detection (Outlier detection or Novelty detection) is the identification of unusual patterns or anomalies or outliers in a given dataset.

Remove Outliers:

Rejection or omission of outliers from the data to aid statistical analysis, for example to compute the mean or standard deviation of the dataset.
Remove outliers for better predictions from models, such as linear regression.

Focus on Outliers:

Fraud detection in banking and financial services.
Cyber-security: intrusion detection, malware, or unusual user access patterns.

Supervised
Semi-Supervised
✅ Unsupervised (most common)

Note: Labeled anomaly data is often unavailable in real-world scenarios.

Statistical Methods: Z-Score, large value means outlier, IQR, point beyond fences (Q1 - 1.5*IQR or Q3 + 1.5*IQR) is flagged as an outlier.
Distance Based: KNN, points far from their neighbors as potential anomalies.
Density Based: DBSCAN, points in low density regions are considered outliers.
Clustering Based: K-Means, points far from cluster centroids that do not fit any cluster are anomalies.

Elliptic Envelope (MCD - Minimum Covariance Determinant)
One-Class SVM (OC-SVM)
Local Outlier Factor (LOF)
Isolation Forest (iForest)
RANSAC (Random Sample Consensus)

Video Anomaly/Outlier/Novelty Detection Methods | Explained with Example

Previous: Expectation Maximization Next: Elliptic Envelope