CatBoost
CatBoost
2 minute read
CatBoost (Categorical Boosting)
Developed by Yandex, this algorithm is specifically optimized for handling ‘categorical’ features without requiring extensive preprocessing (such as, one-hot encoding).
Algorithmic Optimizations
🔵 Ordered Target Encoding
🔵 Symmetric(Oblivious) Trees
🔵 Handling Missing Values
🔵 Symmetric(Oblivious) Trees
🔵 Handling Missing Values
Ordered Target Encoding
- ❌ Standard target encoding can lead to target leakage, where the model uses information from the target variable during training that would not be available during inference.
(model ‘cheats’ by using a row’s own label to predict itself). - ✅ CatBoost calculates the target statistics (average target value) for each category based only on the history of previous training examples in a random permutation of the data.
Symmetric (Oblivious) Trees
Uses symmetric decision trees by default.
In symmetric trees, the same split condition is applied at each level across the entire tree structure.Does not walk down the tree using ‘if-else’ logic, instead it evaluates decision conditions to create a binary index (e.g. 101) and jumps directly to that leaf in memory .

Handling Missing Values
CatBoost offers built-in, intelligent handling of missing values and sparse features, which often require manual preprocessing in other GBDT libraries.
Treats ‘NaN’ as a distinct category, reducing the need for imputation.

End of Section