Database Reference
In-Depth Information
set. That may seem a little confusing, but our chapter example should help clarify it, so let's move
on to the next CRISP-DM step.
DATA PREPARATION
This chapter's example will be a slight divergence from other chapters. Instead of there being a
single example data set in CSV format for you to download, there are two this time. You can
access the Chapter 7 data sets on the topic's companion web site
( https://sites.google.com/site/dataminingforthemasses/ ) .
They are labeled Chapter07DataSet_Scoring.csv and Chapter07DataSet_Training.csv. Go ahead
and download those now, and import both of them into your RapidMiner repository as you have
in past chapters. Be sure to designate the attribute names in the first row of the data sets as you
import them. Be sure you give each of the two data sets descriptive names, so that you can tell
they are for Chapter 7, and also so that you can tell the difference between the training data set and
the scoring data set. After importing them, drag only the training data set into a new process
window , and then follow the steps below to prepare for and create a discriminant analysis data
mining model.
1) Thus far, when we have added data to a new process, we have allowed the operator to
simply be labeled 'Retrieve', which is done by RapidMiner by default. For the first time, we
will have more than one Retrieve operator in our model, because we have a training data
set and a scoring data set. In order to easily differentiate between the two, let's start by
renaming the Retrieve operator for the training data set that you've dragged and dropped
into your main process window. Right click on this operator and select Rename. You will
then be able to type in a new name for this operator. For this example, we will name the
operator 'Training', as is depicted in Figure 7-1.
 
Search WWH ::




Custom Search