Advanced Analytical Theory and Methods: Classification - Data Science and Big Data Analytics

Database Reference

In-Depth Information

Table 7.1 Conditional Entropy Example

Cellular Telephone Unknown

P(contact)

0.6435

0.0680

0.2885

P(subscribed=yes | contact) 0.1399

0.0809

0.0347

P(subscribed=no | contact) 0.8601

0.9192

0.9653

The conditional entropy of the contact attribute is computed as shown here.

Computation inside the parentheses is on the entropy of the class labels within

a single contact value. Note that the conditional entropy is always less than

or equal to the base entropy—that is, . The conditional

entropy is smaller than the base entropy when the attribute and the outcome are

correlated. In the worst case, when the attribute is uncorrelated with the outcome,

the conditional entropy equals the base entropy.

The information gain of an attribute A is defined as the difference between the base

entropy and the conditional entropy of the attribute, as shown in Equation 7.3 .

7.3

In the bank marketing example, the information gain of the contact attribute is

shown in Equation 7.4 .

7.4

Information gain compares the degree of purity of the parent node before a split

with the degree of purity of the child node after a split. At each split, an attribute

with the greatest information gain is considered the most informative attribute.

Information gain indicates the purity of an attribute.

Search WWH ::

Custom Search

Home