Anomaly Detection

Anomaly Detection Introduction

What is Anomaly?

🦄 Anomaly is a rare item, event or observation which deviates significantly from the majority of the data and does not conform to a well-defined notion of normal behavior.

Note: Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

Anomaly Detection
🐙 Anomaly detection (Outlier detection or Novelty detection) is the identification of unusual patterns or anomalies or outliers in a given dataset.
What to do with Outliers ?

Remove Outliers:

  • Rejection or omission of outliers from the data to aid statistical analysis, for example to compute the mean or standard deviation of the dataset.
  • Remove outliers for better predictions from models, such as linear regression.

🔦Focus on Outliers:

  • Fraud detection in banking and financial services.
  • Cyber-security: intrusion detection, malware, or unusual user access patterns.
Anomaly Detection Methods 🐉
  • Supervised
  • Semi-Supervised
  • Unsupervised (most common) ✅

Note: Labeled anomaly data is often unavailable in real-world scenarios.

Known Methods 🐈
  • Statistical Methods: Z-Score, large value means outlier, IQR, point beyond fences (Q1 - 1.5*IQR or Q3 + 1.5*IQR) is flagged as an outlier.
  • Distance Based: KNN, points far from their neighbors as potential anomalies.
  • Density Based: DBSCAN, points in low density regions are considered outliers.
  • Clustering Based: K-Means, points far from cluster centroids that do not fit any cluster are anomalies.
Unsupervised Methods 🦅
  • Elliptic Envelope (MCD - Minimum Covariance Determinant)
  • One-Class SVM (OC-SVM)
  • Local Outlier Factor (LOF)
  • Isolation Forest (iForest)
  • RANSAC (Random Sample Consensus)



End of Section