Decision Trees - Data Mining for the Masses

Database Reference

In-Depth Information

3) Switch back to design perspective. While there are no missing or apparently inconsistent

values in the data set, there is still some data preparation yet to do. First of all, the

User_ID is an arbitrarily assigned value for each customer. The customer doesn't use this

value for anything, it is simply a way to uniquely identify each customer in the data set. It

is not something that relates to each person in any way that would correlate to, or be

predictive of, their buying and technology adoption tendencies. As such, it should not be

included in the model as an independent variable.

We can handle this attribute in one of two ways. First, we can remove the attribute using a

Select Attributes operator, as was demonstrated back in Chapter 3. Alternatively, we can

try a new way of handling a non-predictive attribute. This is accomplished using the Set

Role operator. Using the search field in the Operators tab, find and add Set Role operators

to both your training and scoring streams. In the Parameters area on the right hand side of

the screen, set the role of the User_ID attribute to 'id'. This will leave the attribute in the

data set throughout the model, but it won't consider the attribute as a predictor for the

label attribute. Be sure to do this for both the training and scoring data sets, since the

User_ID attribute is found in both of them (Figure 10-3).

Figure 10-3. Setting the User_ID attribute to an 'id' role, so

it won't be considered in the predictive model.

Search WWH ::

Custom Search

Home