Bias Variance Tradeoff

Bias Variance Tradeoff


Total Error

Mean Squared Error (MSE) = \(\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y_i})^2\)

Total Error = Bias^2 + Variance + Irreducible Error

  • Bias = Systematic Error
  • Variance = Sensitivity to Data
  • Irreducible Error = Sensor noise, Human randomness
Bias

Systematic error from overly simplistic assumptions or strong opinion in the model.

e.g. House 🏠 prices 💰 = Rs. 10,000 * Area (sq. ft).

Note: This is over simplified view, because it ignores, amenities, location, age, etc.

Variance

Error from sensitivity to small fluctuations 📈 in the data.

e.g. Deep neural 🧠 network trained on a small dataset.

Note: Memorizes everything, including noise.

Say a house 🏠 in XYZ street was sold for very low price 💰.

Reason: Distress selling (outlier), or incorrect entry (noise).

Note: Model will make wrong(lower) price 💰predictions for all houses in XYZ street.

Linear (High Bias), Polynomial(High Variance)
images/machine_learning/supervised/linear_regression/bias_variance_tradeoff/slide_06_01.png
Bias Variance Table
images/machine_learning/supervised/linear_regression/bias_variance_tradeoff/bias_variance_table.png
Bias-Variance Trade-Off

Goal 🎯 is to minimize total error.

Find a sweet-spot balance ⚖️ between Bias and Variance.

A good model ‘generalizes’ well, i.e.,

  • Is not too simple or has a strong opinion.
  • Does not memorize 🧠 everything in the data, including noise.
Fix 🩹 High Bias (Under-Fitting)
  • Make model more complex.
    • Add more features, add polynomial features.
  • Decrease Regularization.
  • Train 🏃‍♂️longer, the model has not yet converged.
Fix 🩹 High Variance (Over-Fitting)
  • Add more data (most effective).
    • Harder to memorize 🧠 1 million examples than 100.
    • Use data augmentation, if getting more data is difficult.
  • Increase Regularization.
  • Early stopping 🛑, prevents memorization 🧠.
  • Dropout (DL), randomly kill neurons, prevents co-adaptation.
  • Use Ensembles.
  • Averaging reduces variance.

Note: Co-adaptation refers to a phenomenon where neurons in a neural network become highly dependent on each other to detect features, rather than learning independently.



End of Section