Database Reference
In-Depth Information
DATA PREPARATION
You should already have downloaded and imported the Chapter 4 data set, but if not, you can get
it from the topic's companion web site ( https://sites.google.com/site/dataminingforthemasses/ ) .
Download and import the Chapter 8 data set from the companion web site as well. Once you
have both the Chapter 4 and Chapter 8 data sets imported into your RapidMiner data repository,
complete the following steps:
1) Drag and drop both data sets into a new process window in RapidMiner. Rename the
Chapter 4 data set to 'Training (CH4), and the Chapter 8 data set to 'Scoring (CH8)'.
Connect both out ports to res ports, as shown in Figure 8-1, and then run your model.
Figure 8-1. Using both Chapter 4 and 8 data sets to set up a linear regression model.
2) Figures 8-2 and 8-3 show side-by-side comparisons of the training and scoring data sets.
When using linear regression as a predictive model, it is extremely important to remember
that the ranges for all attributes in the scoring data must be within the ranges for the
corresponding attributes in the training data. This is because a training data set cannot be
relied upon to predict a target attrtibute for observations whose values fall outside the
training data set's values.
 
Search WWH ::




Custom Search