Extracting Meaning from Data - Doing Data Science - page 187

Databases Reference

In-Depth Information

But if X denotes the event of a rainfall in a desert, then it's low entropy.

In other words, the bag of day-long weather events is not highly mixed

in deserts.

Using this concept of entropy, we will be thinking of X as the target of

our model. So, X could be the event that someone buys something on

our site. We'd like to know which attribute of the user will tell us the

most information about this event X . We will define the information

gain , denoted IG X , a , for a given attribute a , as the entropy we lose

if we know the value of that attribute:

IG X , a = H X − H X a

To compute this we need to define H X a . We can do this in two steps.

For any actual value a 0 of the attribute a we can compute the specific

conditional entropy H X a = a 0 as you might expect:

H X a = a 0 =− p X = 1 a = a 0 log 2 p X = 1 a = a 0 −

p X = 0 a = a 0

log 2 p X = 0 a = a 0

and then we can put it all together, for all possible values of a , to get

the conditional entropy H X a :

H X a =∑ a i

p a = a i

· H X a = a i

In words, the conditional entropy asks: how mixed is our bag really if

we know the value of attribute a ? And then information gain can be

described as: how much information do we learn about X (or how

much entropy do we lose) once we know a ?

Going back to how we use the concept of entropy to build decision

trees: it helps us decide what feature to split our tree on, or in other

words, what's the most informative question to ask?

The Decision Tree Algorithm

You build your decision tree iteratively, starting at the root. You need

an algorithm to decide which attribute to split on; e.g., which node

should be the next one to identify. You choose that attribute in order

to maximize information gain , because you're getting the most bang

for your buck that way. You keep going until all the points at the end

Next Page

Doing Data Science

Search WWH ::

Custom Search

Home