Decision Tree Induction - Gene Expression Programming

Information Technology Reference

In-Depth Information

9.1 Decision Trees with Nominal Attributes

Describing data using a decision tree is both easy on the eye and an excellent

way of understanding our data. Consider, for instance, the play tennis data

presented in Table 9.1, a famous toy dataset in decision tree induction (Quinlan

1986). This dataset concerns the weather conditions that are suitable for play-

ing tennis. As you can see, there are four different attributes - OUTLOOK,

TEMPERATURE, HUMIDITY, and WINDY - and the decision is whether

to play or not depending on the values of the different attributes. In this

particular case, all four attributes have nominal values. For instance, OUT-

LOOK can be “sunny”, “overcast”, or “rainy”; TEMPERATURE can be “hot”,

“mild”, or “cool”; HUMIDITY can be “high” or “normal”; and WINDY can

be “true” or “false”.

Table 9.1

A small training set with nominal attributes.

OUTLOOK

TEMPERATURE

HUMIDITY

WINDY

Play

sunny

hot

high

false

No

sunny

hot

high

true

No

overcast

hot

high

false

Yes

rainy

mild

high

false

Yes

rainy

cool

normal

false

Yes

rainy

cool

normal

true

No

overcast

cool

normal

true

Yes

sunny

mild

high

false

No

sunny

cool

normal

false

Yes

rainy

mild

normal

false

Yes

sunny

mild

normal

true

Yes

overcast

mild

high

true

Yes

overcast

hot

normal

false

Yes

rainy

mild

high

true

No

A decision tree learned from this dataset is presented in Figure 9.1. And

the classification rules from such a tree are inferred from running all the

paths from the top node to all the leaf nodes. For instance, for the decision

tree of Figure 9.1, there are a total of five different paths or classification

rules. And starting at the most leftward path and continuing towards the right,

they are as follow:

Gene Expression Programming

Search WWH ::

Custom Search

Home