KNN Introduction

K Nearest Neighbors Introduction

2 minute read

Issues with Linear/Logistic Regression

Parametric models:
- Rely on assumption that relationships between data points are linear.
- For polynomial regression we need to find the degree of polynomial.
Training:
- We need to train 🏃‍♂️the model for prediction.

K Nearest Neighbors

Simple: Intuitive way to classify data or predict values by finding similar existing data points (neighbors).
Non-Parametric: Makes no assumptions about the underlying data distribution.
No Training Required: KNN is a ‘lazy learner’, it does not require a formal training 🏃‍♂️ phase.

KNN Algorithm

Given a query point \(x_q\) and a dataset, D = {\((x_i,y_i)_{i=1}^n, \quad x_i,y_i \in \mathbb{R}^d\)}, the algorithm finds a set of ‘k’ nearest neighbors \(\mathcal{N}_k(x_q) \subseteq D\).

Inference:

Choose a value of ‘k’ (hyper-parameter); odd number.
Calculate distance (Euclidean, Cosine etc.) between and every point in dataset and store in a distance list.
Sort the distance list in ascending order; choose top ‘k’ data points.
Make prediction:
- Classification: Take majority vote of ‘k’ nearest neighbors and assign label.
- Regression: Take the mean/median of ‘k’ nearest neighbors.

Note: Store entire dataset.

Time & Space Complexity

Storing Data: Space Complexity: O(nd)
Inference: Time Complexity ⏰: O(nd + nlogn)

Explanation:

Distance to all ’n’ points in ‘d’ dimensions: O(nd)
Sorting all ’n’ data points : O(nlogn)

Note: Brute force 🔨 KNN is unacceptable when ’n’ is very large, say billions.

Introduction to KNN | K-Nearest Neighbors Algorithm | Explained with Examples

Previous: Probabilistic Interpretation Next: KNN Optimizations

End of Section