Decision Trees - Data Mining for the Masses

Database Reference

In-Depth Information



eReader_Adoption : This attribute exists only in the training data set. It consists of data

for customers who purchased the previous-gen eReader. Those who purchased within a

week of the product's release are recorded in this attribute as 'Innovator'. Those who

purchased after the first week but within the second or third weeks are entered as 'Early

Adopter'. Those who purchased after three weeks but within the first two months are

'Early Majority'. Those who purchased after the first two months are 'Late Majority'. This

attribute will serve as our label when we apply our training data to our scoring data.

With Richard's data and an understanding of what it means, we can now proceed to…

DATA PREPARATION

This chapter's example consists of two data sets: Chapter10DataSet_Training.csv and

Chapter10DataSet_Scoring.csv. Download these from the companion web site now, then

complete the following steps:

1) Import both data sets into your RapidMiner repository. You do not need to worry about

attribute data types because the Decision Tree operator can handle all types of data. Be

sure that you do designate the first row of each of the data sets as the attribute names as

you import. Save them in the repository with descriptive names, so that you will be able to

tell what they are.

2) Drag and drop both of the data sets into a new main process window. Rename the

Retrieve objects as Training and Scoring respectively. Run your model to examine the data

and familiarize yourself with the attributes.

Figure 10-2. Meta data for the scoring data set.

Search WWH ::

Custom Search

Home