Kernel-Based Algorithms and Visualization for Interval Data Mining - Mining Complex Data

Information Technology Reference

In-Depth Information

The interactive brushing technique allows the user to focus on an area (brush)

in the displayed data to highlight groups of data points. And thus, the linked

multiple views provide more information than the single one. We use the interac-

tive brushing and linking techniques and different visualization methods to try

to explain SVM results.

5.5.1

Support Vector Classification Results

For classification tasks with SVM algorithms, understanding the margin (fur-

thest distance between +1 class and -1 class) is one of the most important keys

of the support vector classification. For this purpose, we need to display the

points near the separating boundary between the two classes. To achieve this

goal, we propose to use the data distribution according to the distance from

the separating surface. While the classification task is processed (based on the

support vectors), we also compute the data distribution according to the dis-

tance from the separating surface. For each class, the positive distribution is the

set of correctly classified data points and the negative distribution is the set of

misclassified data points. The data points being near the frontier correspond to

the bar charts near the origin. When the bar charts corresponding to the points

near the frontier are selected, the data points are also selected in the other views

(visualization methods) by using the brushing and linking technique. We use

2D scatter-plot matrices for visualizing interval data. The user can see approxi-

mately the boundary between classes and the margin width. This helps the user

to evaluate the robustness of the model obtained by support vector classification.

He can also know the interesting dimensions (corresponding to the projections

providing a clear boundary between the two classes) in the obtained model.

Figure 5.6 is an example of visualizing support vector classification results

with the Segment interval dataset (class 7 against all). From data distribution

according to the distance from the separating surface, the four bar charts near the

origin are brushed, and then the corresponding points are linked and displayed

in 2D scatter-plot matrices. From the upper part of figure 5.6, we can conclude

there is a clear boundary between the two classes (there is no misclassified data

point), and from the lower part, we can see that dimensions 2 and 16 showing a

clear boundary between the two classes are interesting in the obtained model.

5.5.2

Support Vector Regression Results

We have extended this idea for visualizing support vector regression results.

We have also computed the data distribution according to the distance from

the regression function. Then we combine the histogram with 2D scatter-plot

matrices for visualization. When the user selects the data points far from the

regression function, he can know how the function fits data. If the function

well predicts the data points in high-density region then the obtained model is

interesting.

Search WWH ::

Custom Search

Home