Kernel-Based Algorithms and Visualization for Interval Data Mining - Mining Complex Data

Information Technology Reference

In-Depth Information

For example, we visualize regression result with the Shuttle interval dataset in

figure 5.7. The data points far from the regression function are outliers. Thus, the

model has high quality. The dimensions 1 and 8 are interesting in the obtained

model.

5.5.3

Novelty Detection Results

With a novelty detection task, we visualize the outliers allowing the user to vali-

date them. The approach is based on the interactive linking and brushing of the

histogram and 2D scatter-plot views. The histogram displays the data distribu-

tion according to the distance from the hyper-sphere obtained by one class SVM.

The data points far from the hyper-sphere are brushed in the histogram view, thus

they are automatically selected in 2D scatter-plot view. The user can interpret and

validate the outliers. And then, the dimensions corresponding to the projection

present clearly the outliers and are interesting in the obtained model.

Figure 5.8 is a visualization of one class SVM result on Bank8FM. The user

can verify the outliers obtained by the novelty detection with SVM. And for

example, the dimensions 5 and 7 corresponding to the projection present clearly

some outliers and are interesting in the obtained model.

5.6

Conclusion and Future Works

We have presented an interval data mining approach using kernel-based and visu-

alization methods. Our investigation aims at scaling up kernel-based algorithms

to mine very large datasets and data corrupted with noise. The approaches are

based on the interval data concept. The massive datasets or the uncertain data is

represented by the interval data concept. Thus, we have proposed to construct

a new RBF kernel for interval data. This modification tremendously changes

kernel-based algorithms. No algorithmic changes are required from the usual

case of continuous data other than the modification of the RBF kernel evalua-

tion. The kernel-based algorithms can deal with interval data in classification,

regression and novelty detection. It is extremely rare to find algorithms being

able to construct non-linear models on interval data for the three problems:

classification, regression and novelty detection.

We have also proposed two ways to try to explain SVM results that are well-

known “black boxes”. The first one is to use interactive decision tree algorithms

to explain SVM results. The user can interpret the SVM performance in the way

of IF-THEN rules extracted intuitively from the graphical representation of the

decision trees that can be easily interpreted by the user. The second one is based

on a set of different visualization techniques combined with linking and brushing

techniques giving an insight into classification, regression and novelty detection

tasks with SVM. The graphical representation shows the interesting dimensions

in the obtained model.

For dealing with histogram data type, [26] proposed to represent each his-

togram individual with k bins by a succession of k interval individuals (the first

Mining Complex Data

Search WWH ::

Custom Search

Home