Information Technology Reference
In-Depth Information
this new data type. We construct a new radial basis kernel function (RBF) for
interval data used for classification, regression and novelty detection tasks. The
numerical test results are obtained on real and artificial datasets.
Furthermore, for many applications, sampling data corrupted with noise makes
the input data uncertain. The interval data concept can also represent this uncer-
tainty. So the kernel-based methods and SVM can also deal with uncertain data.
Although SVM gives high quality results, the interpretation of these results is
not so easy. The support vectors found by the algorithms provide limited infor-
mation. Most of the time, the user only obtains information regarding support
vectors and accuracy. He cannot explain or understand why a model constructed
by SVM makes a good prediction. Understanding the model obtained by the
algorithm is as important as the accuracy because the user has a good compre-
hension of the knowledge discovered and more confidence in this knowledge [6],
[7]. Our investigation aims at using visualization methods to try to explain the
SVM results. We use interactive graphical decision tree algorithms and visualiza-
tion methods [6], [8] to give an insight into classification, regression and novelty
detection tasks with SVM. We illustrate how to combine some strengths of dif-
ferent visualization methods to help the user to improve the comprehensibility
of SVM results.
This paper is organized as follows. In section 2, we present a new Gaussian
RBF kernel construction to deal with interval data. In section 3, we briefly
introduce classification, regression and novelty detection for interval data with
SVM algorithms and other kernel-based methods. Section 4 presents a way to
explain SVM results by using interactive decision tree algorithms. We propose
to use an approach based on different visualization methods to try to interpret
SVM results in section 5 before the conclusion and future work.
5.2
Non Linear Kernel Function for Interval Data
SVM and kernel-based methods are a powerful paradigm and have shown prac-
tical relevance for classification and regression, but the learning task is not easy
to perform with the challenge of large datasets. We propose to scale up their
training tasks based on the interval data concept. Large datasets are aggregated
into smaller data sizes, we need to use more complex data type, e.g. interval
type instead of standard ones.
The simplest way depicted in figure 5.1 is to summarize large datasets into
high-level data type, e.g. clusters using clustering algorithm (e.g. k-means [9]).
We can use the interval data concept to represent the clusters where an interval
vector corresponds to a cluster, the low and high values of an interval are com-
puted by low and high bound of data points inside this cluster. Then, we need
to construct non-linear kernel function for dealing with interval datasets.
We are interested in RBF kernel function because it is general and ecient
[10]. Assume we have two data points x and y
R n .The RBF kernel formula
 
Search WWH ::




Custom Search