Anomaly Detection

Anomaly Detection Introduction

2 minute read

What is Anomaly?

🦄 Anomaly is a rare item, event or observation which deviates significantly from the majority of the data and does not conform to a well-defined notion of normal behavior.

Note: Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

Anomaly Detection

🐙 Anomaly detection (Outlier detection or Novelty detection) is the identification of unusual patterns or anomalies or outliers in a given dataset.

What to do with Outliers ?

❌ Remove Outliers:

Rejection or omission of outliers from the data to aid statistical analysis, for example to compute the mean or standard deviation of the dataset.
Remove outliers for better predictions from models, such as linear regression.

🔦Focus on Outliers:

Fraud detection in banking and financial services.
Cyber-security: intrusion detection, malware, or unusual user access patterns.

Anomaly Detection Methods 🐉

Supervised
Semi-Supervised
Unsupervised (most common) ✅

Note: Labeled anomaly data is often unavailable in real-world scenarios.

Known Methods 🐈

Statistical Methods: Z-Score, large value means outlier, IQR, point beyond fences (Q1 - 1.5*IQR or Q3 + 1.5*IQR) is flagged as an outlier.
Distance Based: KNN, points far from their neighbors as potential anomalies.
Density Based: DBSCAN, points in low density regions are considered outliers.
Clustering Based: K-Means, points far from cluster centroids that do not fit any cluster are anomalies.

Unsupervised Methods 🦅

Elliptic Envelope (MCD - Minimum Covariance Determinant)
One-Class SVM (OC-SVM)
Local Outlier Factor (LOF)
Isolation Forest (iForest)
RANSAC (Random Sample Consensus)

Anomaly/Outlier/Novelty Detection Methods | Explained with Example

Previous: Expectation Maximization Next: Elliptic Envelope

End of Section