XgBoost

2 minute read

XGBoost (Extreme Gradient Boosting)

⭐️An optimized and highly efficient implementation of gradient boosting.

👉 Widely used in competitive data science (like Kaggle) due to its speed and performance.

Note: Research project developed by Tianqi Chen during his doctoral studies at the University of Washington.

Algorithmic Optimizations

🔵 Second order Derivative
🔵 Regularization
🔵 Sparsity-Aware Split Finding

Second order Derivative

⭐️Uses second derivative (Hessian), i.e, curvature, in addition to first derivative (gradient) to optimize the objective function more quickly and accurately than GBDT.

Let’s understand this with the problem to minimize \(f(x) = x^4\), using:

Gradient descent (uses only 1st order derivative, \(f'(x) = 4x^3\))
Newton’s method (uses both 1st and 2nd order derivatives \(f''(x) = 12x^2\))

Regularization

Adds explicit regularization terms (L1/L2 on leaf weights and tree complexity) into the boosting objective, helping reduce over-fitting. \[ \text{Objective} = \underbrace{\sum_{i=1}^{n} L(y_i, \hat{y}_i)}_{\text{Training Loss}} + \underbrace{\gamma T + \frac{1}{2}\lambda \sum_{j=1}^{T} w_j^2 + \alpha \sum_{j=1}^{T} |w_j|}_{\text{Regularization (The Tax)}} \]

Sparsity-Aware Split Finding

💡Real-world data often contains many missing values or zero-entries (sparse data).

👉 XGBoost introduces a ‘default direction’ for each node.

➡️During training, it learns the best direction (left or right) for missing values to go, making it significantly faster and more robust when dealing with sparse or missing data.

XGBoost Optimizations | Second Order Derivative| Sparsity Aware Splitting| Regularization| Explained

Previous: Advanced GBDT Algorithms Next: LightGBM

End of Section