Bias Variance Tradeoff

2 minute read

Linear Regression | All Videos

Total Error

Mean Squared Error (MSE) = \(\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y_i})^2\)

Total Error = Bias^2 + Variance + Irreducible Error

Bias = Systematic Error
Variance = Sensitivity to Data
Irreducible Error = Sensor noise, Human randomness

Bias

Systematic error from overly simplistic assumptions or strong opinion in the model.

e.g. House 🏠 prices 💰 = Rs. 10,000 * Area (sq. ft).

Note: This is over simplified view, because it ignores, amenities, location, age, etc.

Variance

Error from sensitivity to small fluctuations 📈 in the data.

e.g. Deep neural 🧠 network trained on a small dataset.

Note: Memorizes everything, including noise.

Say a house 🏠 in XYZ street was sold for very low price 💰.

Reason: Distress selling (outlier), or incorrect entry (noise).

Note: Model will make wrong(lower) price 💰predictions for all houses in XYZ street.

Linear (High Bias), Polynomial(High Variance)

images/machine_learning/supervised/linear_regression/bias_variance_tradeoff/slide_06_01.png

Bias Variance Table

Bias-Variance Trade-Off

Goal 🎯 is to minimize total error.

Find a sweet-spot balance ⚖️ between Bias and Variance.

A good model ‘generalizes’ well, i.e.,

Is not too simple or has a strong opinion.
Does not memorize 🧠 everything in the data, including noise.

Fix 🩹 High Bias (Under-Fitting)

Make model more complex.
- Add more features, add polynomial features.
Decrease Regularization.
Train 🏃‍♂️longer, the model has not yet converged.

Fix 🩹 High Variance (Over-Fitting)

Add more data (most effective).
- Harder to memorize 🧠 1 million examples than 100.
- Use data augmentation, if getting more data is difficult.
Increase Regularization.
Early stopping 🛑, prevents memorization 🧠.
Dropout (DL), randomly kill neurons, prevents co-adaptation.
Use Ensembles.
Averaging reduces variance.

Note: Co-adaptation refers to a phenomenon where neurons in a neural network become highly dependent on each other to detect features, rather than learning independently.

Bias Variance TradeOff in Machine Learning | Underfitting vs Overfitting | Explained with Examples

Previous: Cross Validation Next: Regularization

End of Section