Databases Reference
In-Depth Information
binary variable case. In this case, it takes some extra work to decide
on the information gain because it depends on the threshold as well
as the feature.
In fact, you could think of the decision of where the threshold should
live as a separate submodel. It's possible to optimize to this choice by
maximizing the entropy on individual attributes, but that's not clearly
the best way to deal with continuous variables. Indeed, this kind of
question can be as complicated as feature selection itself—instead of
a single threshold, you might want to create bins of the value of your
attribute, for example. What to do? It will always depend on the
situation.
Surviving the Titanic
For fun, Will pointed us to this decision tree for surviving on the
Titanic on the BigML website. The original data is from the Encyclo‐
pedia Titanica -source code and data are available there. Figure 7-6
provides just a snapshot of it, but if you go to the site, it is interactive.
Figure 7-6. Surviving the Titanic
 
Search WWH ::




Custom Search