Database Reference
In-Depth Information
eReader_Adoption : This attribute exists only in the training data set. It consists of data
for customers who purchased the previous-gen eReader. Those who purchased within a
week of the product's release are recorded in this attribute as 'Innovator'. Those who
purchased after the first week but within the second or third weeks are entered as 'Early
Adopter'. Those who purchased after three weeks but within the first two months are
'Early Majority'. Those who purchased after the first two months are 'Late Majority'. This
attribute will serve as our label when we apply our training data to our scoring data.
With Richard's data and an understanding of what it means, we can now proceed to…
DATA PREPARATION
This chapter's example consists of two data sets: Chapter10DataSet_Training.csv and
Chapter10DataSet_Scoring.csv. Download these from the companion web site now, then
complete the following steps:
1) Import both data sets into your RapidMiner repository. You do not need to worry about
attribute data types because the Decision Tree operator can handle all types of data. Be
sure that you do designate the first row of each of the data sets as the attribute names as
you import. Save them in the repository with descriptive names, so that you will be able to
tell what they are.
2) Drag and drop both of the data sets into a new main process window. Rename the
Retrieve objects as Training and Scoring respectively. Run your model to examine the data
and familiarize yourself with the attributes.
Figure 10-2. Meta data for the scoring data set.
 
Search WWH ::




Custom Search