Random Forest

2 minute read

Problem with Bagging

💡If one feature is extremely predictive (e.g., ‘Area’ for house prices), almost every bootstrap tree will split on that feature at the root.

👉This makes the trees(models) very similar, leading to a high correlation ‘\(\rho\)’.

\[Var(f_{bagging})=ρ\sigma^{2}+\frac{1-ρ}{B}\sigma^{2}\]

Feature Sub Sampling

💡Choose a random subset of ‘m’ features from the total ‘d’ features, reducing the correlation ‘\(\rho\)’ between trees.

👉By forcing trees to split on ‘sub-optimal’ features, we intentionally increase the variance of individual trees; also the bias is slightly increased (simpler trees).

Standard heuristics:

Classification: \(m = \sqrt{d}\)
Regression: \(m = \frac{d}{3}\)

Math of De-Correlation

💡Because ‘\(\rho\)’ is the dominant factor in the variance of the ensemble when B is large, the overall ensemble variance Var(\(f_{rf}\)) drops significantly lower than standard Bagging.

\[Var(f_{rf})=ρ\sigma^{2}+\frac{1-ρ}{B}\sigma^{2}\]

Over-Fitting

💡A Random Forest will never overfit by adding more trees (B).

It only converges to the limit: ‘\(\rho\sigma^2\)’.

Overfitting is controlled by:

depth of the individual trees.
size of the feature subset ‘m'.

When to use Random Forest ?

High Dimensionality: 100s or 1000s of features; RF’s feature sampling prevents a few features from masking others.
Tabular Data (with Complex Interactions): Captures non-linear relationships without needing manual feature engineering.
Noisy Datasets: The averaging process makes RF robust to outliers (especially if using min_samples_leaf).
Automatic Validation: Need a quick estimate of generalization error without doing 10-fold CV (via OOB Error).

Random Forest | Decision Trees | Feature Sub-Sampling | Correlation Impact | Explained with Examples

Previous: Bagging Next: Extra Trees

End of Section