Database Reference
In-Depth Information
Figure 8-2. Value ranges for the training data set's attributes.
Figure 8-3. Value ranges for the scoring data set's attributes.
3) We can see that in comparing Figures 8-2 and 8-3, the ranges are the same for all attributes
except Avg_Age. In the scoring data set, we have some observations where the Avg_Age
is slightly below the training data set's lower bound of 15.1, and some observations where
the scoring Avg_Age is slightly above the training set's upper bound of 72.2. You might
think that these values are so close to the training data set's values that it would not matter
if we used our training data set to predict heating oil usage for the homes represented by
these observations. While it is likely that such a slight deviation from the range on this
attribute would not yield wildly inaccurate results, we cannot use linear regression
prediction values as evidence to support such an assumption. Thus, we will need to
remove these observations from our data set. Add two Filter Examples operators with the
parameters attribute_value_filter and Avg_Age>=15.1 | Avg_Age <=72.2. When you run
your model now, you should have 42,042 observations remaining. Check the ranges again
to ensure that none of the scoring attributes now have ranges outside those of the training
attributes. Then return to design perspective.
4) As was the case with discriminant analysis, linear regression is a predictive model, and thus
will need an attribute to be designated as the label—this is the target, the thing we want to
predict. Search for the Set Role operator in the Operators tab and drag it into your training
Search WWH ::




Custom Search