

In a previous article:
you have learned about rewriting decision trees using a Differentiable Programming approach, as suggested by the NODE paper. The idea of this paper is to replace XGBoost by a Neural Network.
More specifically, after explaining why the process of building Decision Trees is not differentiable, it introduced the necessary mathematical tools to regularize the two main elements associated with a decision node:
- Feature Selection
- Branch detection
The NODE paper shows that both can be handled using the entmax function.
To summarize, we have shown how to create a binary tree without using comparison operators.
The previous article ended with open questions regarding training a regularized decision tree. It’s time to answer these questions.
If you’re interested in a deep dive in Gradient Boosting Methods, have a look at my book:
First, based on what we presented in the previous article, let’s create a new Python class: SmoothBinaryNode
.
This class encodes the behavior of a smooth binary node. There are two key parts in its code :
- The selection of the features, handled by the function
_choices
- The evaluation of these features, with respect to a given threshold, and the identification of the path to follow:
left
orright
. All this is managed by the methodsleft
andright
.
This post originally appeared on TechToday.