Databases Reference
In-Depth Information
But if X denotes the event of a rainfall in a desert, then it's low entropy.
In other words, the bag of day-long weather events is not highly mixed
in deserts.
Using this concept of entropy, we will be thinking of X as the target of
our model. So, X could be the event that someone buys something on
our site. We'd like to know which attribute of the user will tell us the
most information about this event X . We will define the information
gain , denoted IG X , a , for a given attribute a , as the entropy we lose
if we know the value of that attribute:
IG X , a = H X H X a
To compute this we need to define H X a . We can do this in two steps.
For any actual value a 0 of the attribute a we can compute the specific
conditional entropy H X a = a 0 as you might expect:
H X a = a 0 =− p X = 1 a = a 0 log 2 p X = 1 a = a 0
p X = 0 a = a 0
log 2 p X = 0 a = a 0
and then we can put it all together, for all possible values of a , to get
the conditional entropy H X a :
H X a =∑ a i
p a = a i
· H X a = a i
In words, the conditional entropy asks: how mixed is our bag really if
we know the value of attribute a ? And then information gain can be
described as: how much information do we learn about X (or how
much entropy do we lose) once we know a ?
Going back to how we use the concept of entropy to build decision
trees: it helps us decide what feature to split our tree on, or in other
words, what's the most informative question to ask?
The Decision Tree Algorithm
You build your decision tree iteratively, starting at the root. You need
an algorithm to decide which attribute to split on; e.g., which node
should be the next one to identify. You choose that attribute in order
to maximize information gain , because you're getting the most bang
for your buck that way. You keep going until all the points at the end
Search WWH ::




Custom Search