Information Technology Reference
In-Depth Information
For example, we visualize regression result with the Shuttle interval dataset in
figure 5.7. The data points far from the regression function are outliers. Thus, the
model has high quality. The dimensions 1 and 8 are interesting in the obtained
model.
5.5.3
Novelty Detection Results
With a novelty detection task, we visualize the outliers allowing the user to vali-
date them. The approach is based on the interactive linking and brushing of the
histogram and 2D scatter-plot views. The histogram displays the data distribu-
tion according to the distance from the hyper-sphere obtained by one class SVM.
The data points far from the hyper-sphere are brushed in the histogram view, thus
they are automatically selected in 2D scatter-plot view. The user can interpret and
validate the outliers. And then, the dimensions corresponding to the projection
present clearly the outliers and are interesting in the obtained model.
Figure 5.8 is a visualization of one class SVM result on Bank8FM. The user
can verify the outliers obtained by the novelty detection with SVM. And for
example, the dimensions 5 and 7 corresponding to the projection present clearly
some outliers and are interesting in the obtained model.
5.6
Conclusion and Future Works
We have presented an interval data mining approach using kernel-based and visu-
alization methods. Our investigation aims at scaling up kernel-based algorithms
to mine very large datasets and data corrupted with noise. The approaches are
based on the interval data concept. The massive datasets or the uncertain data is
represented by the interval data concept. Thus, we have proposed to construct
a new RBF kernel for interval data. This modification tremendously changes
kernel-based algorithms. No algorithmic changes are required from the usual
case of continuous data other than the modification of the RBF kernel evalua-
tion. The kernel-based algorithms can deal with interval data in classification,
regression and novelty detection. It is extremely rare to find algorithms being
able to construct non-linear models on interval data for the three problems:
classification, regression and novelty detection.
We have also proposed two ways to try to explain SVM results that are well-
known “black boxes”. The first one is to use interactive decision tree algorithms
to explain SVM results. The user can interpret the SVM performance in the way
of IF-THEN rules extracted intuitively from the graphical representation of the
decision trees that can be easily interpreted by the user. The second one is based
on a set of different visualization techniques combined with linking and brushing
techniques giving an insight into classification, regression and novelty detection
tasks with SVM. The graphical representation shows the interesting dimensions
in the obtained model.
For dealing with histogram data type, [26] proposed to represent each his-
togram individual with k bins by a succession of k interval individuals (the first
 
Search WWH ::




Custom Search