Database Reference
In-Depth Information
Decision Trees
As we explained above, we want to build a decision tree model that predicts
whether or not a new customer is likely to place an order with a total amount
of more than $3,500. For this we use the TargetCustomers view. The decision
tree algorithm requires us to indicate the class attribute to be predicted, the
attributes that must be used as input, and the attributes that will be ignored
by the algorithm but that can be used for visualization of the results. The
TargetCustomers view includes the attribute HighValueCust .Avalueof' 1 'in
this attribute means that the customer is a high-valued one. Otherwise, the
variable takes the value ' 0 '. This is the variable to be predicted.
Figure 9.7 shows how the attributes to be used for building the model
are defined in Analysis Services, both for the decision tree model explained
in this section and the clustering model explained in the next section. Note
that the attribute HighValueCust is defined as PredictOnly . Also, for example,
BusinessType will be used as a predictor variable; therefore, it is defined as
input . Finally, Address will only be used for visualization purposes, and it is
marked as Ignore .
With this input, the model is deployed. Figure 9.8 shows an excerpt of
the decision tree obtained. We can see that the root (the whole data set) is
first split using the attribute YearEstablished , resulting in six subsets. Then,
the nodes are further split according to the distribution of the HighValueCust
values. When the contents of the classes are stable, the split stops. We can
see, for example, that all the records in the path YearEstablished > = 1975 and
YearEstablished < 1990
BusinessType = ' Restaurant ' have HighValueCust
=1 . However, if BusinessType = ' Grocery Store ' the algorithm continued
splitting.
Fig. 9.7 Attributes for the decision tree and the clustering models in the Northwind
case study
Search WWH ::




Custom Search