Prediction Algorithms for Data Mining - Visual Data Mining: The VisMiner Approach - page 97

Databases Reference

In-Depth Information

Tests for model significance - at this time, there is no published test of

model significance. By significance, we mean: “Can we measure the

probability that the model predictions are anything more than chance

occurrences?” Unlike the well-studied linear regression models, there is

no “null hypothesis” type test. When using ANNs, the only work-around is

to build the model with enough observations that we feel comfortable that

the model actually will predict accurately when applied to new data. The use

of validation data, discussed in Chapter 5, can give us that assurance.

Support Vector Machines

The algorithm for support vector machines (SVM) was developed to improve

classifier performance. Look back at the scatter plot of Figure 4.3. If a decision

tree algorithm is applied to this dataset, it will perform poorly. This is because

the algorithm considers input attributes in isolation when searching for the best

split attribute. As a result, the splits are always parallel to one of the attribute

axes. On the other hand, in its simplest form, the SVM computationally locates

that

line that will best split

the observations according to classification

attribute values.

In the search for that dividing line, there are a number of issues that the

algorithm needs to deal with. Consider Figure 4.10 - a slightly modified version

of Figure 4.3. Three candidate dividing lines are drawn separating the red dots

(class A) from the blue dots (class B). Each would work equally well as a

class A

class B

Y

X

Figure 4.10 Potential Dividing Lines

Next Page

Visual Data Mining: The VisMiner Approach

Search WWH ::

Custom Search

Home