Database Reference
In-Depth Information
Decision Trees
As we explained above, we want to build a decision tree model that predicts
whether or not a new customer is likely to place an order with a total amount
of more than $3,500. For this we use the
TargetCustomers
view. The decision
tree algorithm requires us to indicate the class attribute to be predicted, the
attributes that must be used as input, and the attributes that will be ignored
by the algorithm but that can be used for visualization of the results. The
TargetCustomers
view includes the attribute
HighValueCust
.Avalueof'
1
'in
this attribute means that the customer is a high-valued one. Otherwise, the
variable takes the value '
0
'. This is the variable to be predicted.
Figure
9.7
shows how the attributes to be used for building the model
are defined in Analysis Services, both for the decision tree model explained
in this section and the clustering model explained in the next section. Note
that the attribute
HighValueCust
is defined as
PredictOnly
. Also, for example,
BusinessType
will be used as a predictor variable; therefore, it is defined as
input
. Finally,
Address
will only be used for visualization purposes, and it is
marked as
Ignore
.
With this input, the model is deployed. Figure
9.8
shows an excerpt of
the decision tree obtained. We can see that the root (the whole data set) is
first split using the attribute
YearEstablished
, resulting in six subsets. Then,
the nodes are further split according to the distribution of the
HighValueCust
values. When the contents of the classes are stable, the split stops. We can
see, for example, that all the records in the path
YearEstablished
>
=
1975 and
YearEstablished
<
1990
BusinessType =
'
Restaurant
' have
HighValueCust
=1
. However, if
BusinessType =
'
Grocery Store
' the algorithm continued
splitting.
→
Fig. 9.7
Attributes for the decision tree and the clustering models in the Northwind
case study
Search WWH ::
Custom Search