Kernel-Based Algorithms and Visualization for Interval Data Mining - Mining Complex Data

Information Technology Reference

In-Depth Information

this new data type. We construct a new radial basis kernel function (RBF) for

interval data used for classification, regression and novelty detection tasks. The

numerical test results are obtained on real and artificial datasets.

Furthermore, for many applications, sampling data corrupted with noise makes

the input data uncertain. The interval data concept can also represent this uncer-

tainty. So the kernel-based methods and SVM can also deal with uncertain data.

Although SVM gives high quality results, the interpretation of these results is

not so easy. The support vectors found by the algorithms provide limited infor-

mation. Most of the time, the user only obtains information regarding support

vectors and accuracy. He cannot explain or understand why a model constructed

by SVM makes a good prediction. Understanding the model obtained by the

algorithm is as important as the accuracy because the user has a good compre-

hension of the knowledge discovered and more confidence in this knowledge [6],

[7]. Our investigation aims at using visualization methods to try to explain the

SVM results. We use interactive graphical decision tree algorithms and visualiza-

tion methods [6], [8] to give an insight into classification, regression and novelty

detection tasks with SVM. We illustrate how to combine some strengths of dif-

ferent visualization methods to help the user to improve the comprehensibility

of SVM results.

This paper is organized as follows. In section 2, we present a new Gaussian

RBF kernel construction to deal with interval data. In section 3, we briefly

introduce classification, regression and novelty detection for interval data with

SVM algorithms and other kernel-based methods. Section 4 presents a way to

explain SVM results by using interactive decision tree algorithms. We propose

to use an approach based on different visualization methods to try to interpret

SVM results in section 5 before the conclusion and future work.

5.2

Non Linear Kernel Function for Interval Data

SVM and kernel-based methods are a powerful paradigm and have shown prac-

tical relevance for classification and regression, but the learning task is not easy

to perform with the challenge of large datasets. We propose to scale up their

training tasks based on the interval data concept. Large datasets are aggregated

into smaller data sizes, we need to use more complex data type, e.g. interval

type instead of standard ones.

The simplest way depicted in figure 5.1 is to summarize large datasets into

high-level data type, e.g. clusters using clustering algorithm (e.g. k-means [9]).

We can use the interval data concept to represent the clusters where an interval

vector corresponds to a cluster, the low and high values of an interval are com-

puted by low and high bound of data points inside this cluster. Then, we need

to construct non-linear kernel function for dealing with interval datasets.

We are interested in RBF kernel function because it is general and ecient

[10]. Assume we have two data points x and y

R n .The RBF kernel formula

∈

Mining Complex Data

Search WWH ::

Custom Search

Home