KNN Introduction
K Nearest Neighbors Introduction
2 minute read
Issues with Linear/Logistic Regression
- Parametric models:
- Rely on assumption that relationships between data points are linear.
- For polynomial regression we need to find the degree of polynomial.
- Training:
- We need to train 🏃♂️the model for prediction.
K Nearest Neighbors
Simple: Intuitive way to classify data or predict values by finding similar existing data points (neighbors).
Non-Parametric: Makes no assumptions about the underlying data distribution.
No Training Required: KNN is a ‘lazy learner’, it does not require a formal training 🏃♂️ phase.


KNN Algorithm
Given a query point \(x_q\) and a dataset, D = {\((x_i,y_i)_{i=1}^n, \quad x_i,y_i \in \mathbb{R}^d\)}, the algorithm finds a set of ‘k’ nearest neighbors \(\mathcal{N}_k(x_q) \subseteq D\).
Inference:
- Choose a value of ‘k’ (hyper-parameter); odd number.
- Calculate distance (Euclidean, Cosine etc.) between and every point in dataset and store in a distance list.
- Sort the distance list in ascending order; choose top ‘k’ data points.
- Make prediction:
- Classification: Take majority vote of ‘k’ nearest neighbors and assign label.
- Regression: Take the mean/median of ‘k’ nearest neighbors.
Note: Store entire dataset.
Time & Space Complexity
- Storing Data: Space Complexity: O(nd)
- Inference: Time Complexity ⏰: O(nd + nlogn)
Explanation:
- Distance to all ’n’ points in ‘d’ dimensions: O(nd)
- Sorting all ’n’ data points : O(nlogn)
Note: Brute force 🔨 KNN is unacceptable when ’n’ is very large, say billions.
End of Section