Classification Models in VisMiner - Visual Data Mining: The VisMiner Approach

Databases Reference

In-Depth Information

As the tree progresses toward the leaf nodes, at times there is not enough

room to show the split criteria. In these cases, the criteria box either draws a

“ . . . ” or is left blank.

To see the split criteria and node contents, hover over any of the nodes.

The leaf nodes of the tree all contain homogeneous content, except for the

node containing two Versicolor observations and one Virginica observation.

This node represents the one error in the confusion matrix. When the decision

tree is used to make a prediction, the input attribute values are used to navigate

to a leaf node. The most frequently occurring category of that leaf node then

becomes the predicted value.

Exercise 5.1

Use the OliveOil dataset to generate classification models based on the acid

measures.

a. Build classification models to predict Region using the decision tree, ANN,

and SVM classification modelers. Note: The modelers automatically use

all attributes in the dataset for model construction. Since you do not want

to use Area to classify Region, you will first need to create a derived

set that excludes Area, then build the models using the derived set. Look

at the confusion matrices for all three models. How well do they predict

the training set values?

b. Build classification models to predict Area using the decision tree, ANN, and

SVM classification modelers. Look at the confusion matrices. How well do

they predict the training set values? Which modeler performs best? Which

Areas do the models have the most trouble predicting? Hint: The cells in

the matrix off the main diagonal (excluding the totals column and row) with

the tallest bars represent the observations most frequently misclassified.

c. View the tree graph for the decision tree model. Which acid best distin-

guishes the South Apulia oils? Describe the primary distinguishing acids

characteristics of the Inland Sardinia oils.

Prediction Likelihoods

To this point, we have only evaluated input contributions of the decision tree

models using the tree graph. Decision trees are relatively simple structures.

The structure of other models is not as easy to visualize due to the complexity

Search WWH ::

Custom Search

Home