Database Reference
In-Depth Information
3) Switch back to design perspective. While there are no missing or apparently inconsistent
values in the data set, there is still some data preparation yet to do. First of all, the
User_ID is an arbitrarily assigned value for each customer. The customer doesn't use this
value for anything, it is simply a way to uniquely identify each customer in the data set. It
is not something that relates to each person in any way that would correlate to, or be
predictive of, their buying and technology adoption tendencies. As such, it should not be
included in the model as an independent variable.
We can handle this attribute in one of two ways. First, we can remove the attribute using a
Select Attributes operator, as was demonstrated back in Chapter 3. Alternatively, we can
try a new way of handling a non-predictive attribute. This is accomplished using the Set
Role operator. Using the search field in the Operators tab, find and add Set Role operators
to both your training and scoring streams. In the Parameters area on the right hand side of
the screen, set the role of the User_ID attribute to 'id'. This will leave the attribute in the
data set throughout the model, but it won't consider the attribute as a predictor for the
label attribute. Be sure to do this for both the training and scoring data sets, since the
User_ID attribute is found in both of them (Figure 10-3).
Figure 10-3. Setting the User_ID attribute to an 'id' role, so
it won't be considered in the predictive model.
Search WWH ::




Custom Search