Information Technology Reference
In-Depth Information
product in the formula, we obtain non-linear PCA (KPCA). An example of the
visualization of the Segment interval dataset (class 7 against all) with KPCA
using the RBF kernel function is shown in figure 5.3.
In linear FDA, we consider projecting all the multidimensional data onto a
generic direction w , and then separately observing the mean and the variance of
the projections of the two classes. By substituting the kernel function for a linear
inner product into the linear FDA formula, we have non-linear FDA (KFDA).
An example of the visualization of the Segment interval dataset (class 7 against
all) with KFDA using the RBF kernel function is shown in figure 5.4.
These kernel-based methods are also extended for learning model in which
the input data is corrupted with noise, e.g. sampling data corrupted with noise
makes the input data uncertain. The interval data concept can also represent
them. So the kernel-based methods and SVM using the RBF kernel function in
section 2 can also deal with uncertain data.
5.4
Inductive Rules Extraction for Explaining SVM
Results
Although SVM algorithms have shown to build accurate models, their results
may be very dicult to understand. Most of the time, the user only obtains
information regarding the support vectors being used as “black box” to classify
the data with a good accuracy. The user does not know how SVM models can
work. For many data mining applications, understanding the model obtained
by the algorithm is as important as the accuracy even if up to now very few
methods have been proposed [6], [8] and [22].
We propose here to use interactive decision tree algorithms [23], [24] for trying
to explain the SVM results. The SVM performance in classification task is deeply
understood by the way of IF-THEN rules extracted intuitively from the graphical
representation of the decision trees that can be easily interpreted by humans.
Figure 5.5 is an example of the inductive rule extraction explaining support
vector classification results with the Segment interval dataset. The SVM algo-
rithm using the RBF kernel function classifies the class 7 (considered as +1 class)
against all other classes (considered as -1 class) with 100.00 % accuracy. CIAD
uses 2D scatter-plot matrices [25] for visualizing interval data: the data points
are displayed in all possible pair-wise combinations of dimensions in 2D scatter-
plot matrices. For n
1) / 2
matrices. A data point in two interval dimensions is represented by a cross and
color corresponds to the class.
The user interactively chooses the best separating split (parallel to an axis)
to interactively construct the decision tree (based on human pattern recognition
capabilities) or with the help of automatic algorithms. The obtained decision
tree having 4 leaves (corresponding to 4 rules) can explain the SVM model. One
rule is created for each path from the root to a leaf, each dimension value along
a path is added in a conjunction and the leaf node holds the class prediction.
dimensional data, this method visualizes n ( n
 
Search WWH ::




Custom Search