Kernel-Based Algorithms and Visualization for Interval Data Mining - Mining Complex Data

Information Technology Reference

In-Depth Information

product in the formula, we obtain non-linear PCA (KPCA). An example of the

visualization of the Segment interval dataset (class 7 against all) with KPCA

using the RBF kernel function is shown in figure 5.3.

In linear FDA, we consider projecting all the multidimensional data onto a

generic direction w , and then separately observing the mean and the variance of

the projections of the two classes. By substituting the kernel function for a linear

inner product into the linear FDA formula, we have non-linear FDA (KFDA).

An example of the visualization of the Segment interval dataset (class 7 against

all) with KFDA using the RBF kernel function is shown in figure 5.4.

These kernel-based methods are also extended for learning model in which

the input data is corrupted with noise, e.g. sampling data corrupted with noise

makes the input data uncertain. The interval data concept can also represent

them. So the kernel-based methods and SVM using the RBF kernel function in

section 2 can also deal with uncertain data.

5.4

Inductive Rules Extraction for Explaining SVM

Results

Although SVM algorithms have shown to build accurate models, their results

may be very dicult to understand. Most of the time, the user only obtains

information regarding the support vectors being used as “black box” to classify

the data with a good accuracy. The user does not know how SVM models can

work. For many data mining applications, understanding the model obtained

by the algorithm is as important as the accuracy even if up to now very few

methods have been proposed [6], [8] and [22].

We propose here to use interactive decision tree algorithms [23], [24] for trying

to explain the SVM results. The SVM performance in classification task is deeply

understood by the way of IF-THEN rules extracted intuitively from the graphical

representation of the decision trees that can be easily interpreted by humans.

Figure 5.5 is an example of the inductive rule extraction explaining support

vector classification results with the Segment interval dataset. The SVM algo-

rithm using the RBF kernel function classifies the class 7 (considered as +1 class)

against all other classes (considered as -1 class) with 100.00 % accuracy. CIAD

uses 2D scatter-plot matrices [25] for visualizing interval data: the data points

are displayed in all possible pair-wise combinations of dimensions in 2D scatter-

plot matrices. For n

1) / 2

matrices. A data point in two interval dimensions is represented by a cross and

color corresponds to the class.

The user interactively chooses the best separating split (parallel to an axis)

to interactively construct the decision tree (based on human pattern recognition

capabilities) or with the help of automatic algorithms. The obtained decision

tree having 4 leaves (corresponding to 4 rules) can explain the SVM model. One

rule is created for each path from the root to a leaf, each dimension value along

a path is added in a conjunction and the leaf node holds the class prediction.

−

dimensional data, this method visualizes n ( n

−

Mining Complex Data

Search WWH ::

Custom Search

Home