Information Technology Reference
In-Depth Information
m
α i ) = 0
s.t.
( α i
(5.6)
i =1
α i i
0( i =1 ,...,m )
The solution of (6) gives α i i . Thus the regression function is given by:
C
SV
f ( x )=
( α i
α i ) K
x, x i
b
i =1
where the scalar b is determined by the support vectors.
A survey [14] and the topic [15] provide more details about SVM and others
kernel-based learning methods.
These SVMs only deal with continuous data. To deal with interval data no
algorithmic changes are required from the usual case of continuous data other
than the substitution of the RBF kernel function for interval data described
in section 2 into the classical SVM algorithms including SVC, One-class SVM,
SVR. All the benefits of the classical SVMs are kept. Thus they can be used to
deal with interval data.
For the evaluation of our proposed approach, we have added the new non-
linear kernel for interval data to the publicly available toolkit, LibSVM [16]. The
software program is able to deal with interval data in classification, regression
and novelty detection tasks. To apply the SVM algorithms to the multi-class clas-
sification problem (more than 2 classes), LibSVM uses one-against-one strategy.
Assume that we have k classes, LibSVM construct k
1) / 2 models: a model
separates i th class against j th class. Then to predict the class for a new data
point, LibSVM just predicts with each model and finds out which one separates
the furthest into the positive region. We have used datasets from Statlog [17],
the UCI machine learning repository [18], regression datasets [19] and Delve [20].
By using k-means algorithm [9], the large datasets are aggregated into smaller
ones. A data point in interval datasets corresponds to a cluster, the low and high
values of an interval are computed by the cluster data points. Some other meth-
ods for creating interval data can be found in [5]. Furthermore, we generated
uncertain data set for evaluating our algorithm. This dataset called Ringnoise is
4
( k
dimensional with 2 classes where class 1 is multivariate normal with mean
0 and covariance 4 times the identity matrix and class 2 has unit covariance and
mean (0 . 5 , 0 . 5 , 0 . 5 , 0 . 5). Then Gaussian noise is added with mean (0 , 0 , 0 , 0) and
covariance matrix σ i I where σ i is randomly chosen from [0.1, 0.8], the matrix
I denotes the 4x4 identity matrix. The interval data concept can also represent
this dataset uncertainty. Table 5.1 presents the dataset description and aggre-
gations (interval data). We report the cross validation accuracy of classification
results and mean squared error of regression results in table 5.2. The results of
novelty detection task are presented in table 5.3 with the number of outliers (fur-
thest from other data points in the dataset). According to our knowledge, there
is no other available algorithm being able to deal with interval data in non-linear
 
Search WWH ::




Custom Search