Kernel-Based Algorithms and Visualization for Interval Data Mining - Mining Complex Data

Information Technology Reference

In-Depth Information

Fig. 5.1. Aggregation of a large dataset

(kernel width σ ) in (1) of two data vectors x and y of continuous type, K(x,y)

is based on the Euclidean distance between these vectors, d E ( x, y )=

x

−

y

.

2

−

x

−

y

K

x, y

=exp(

)

(5.1)

2 σ 2

For dealing with interval data, we only need to measure the distance between

two vectors of interval type and then we substitute this distance measure for

the Euclidean distance into the RBF kernel formula (1). Thus the new RBF

kernel can deal with interval data. We propose to use the Hausdorff (1868-1942)

distance to measure the dissimilarity between two data vectors of interval type.

Suppose we have two intervals represented by low and high values: I 1 =

[ low 1 ,high 1 ]and I 2 =[ low 2 ,high 2 ], the Hausdorff distance between two intervals

I 1 and I 2 is defined by (2):

d H ( I 1 ,I 2 )=max(

|

low 1 −

low 2 |

,

|

high 1 −

high 2 |

)

(5.2)

Let us consider two data vectors u , v

∈

Ω having n dimensions of interval

type:

u =([ u 1 ,low ,u 1 ,high ], [ u 2 ,low ,u 2 ,high ], ... ,[ u n,low ,u n,high ])

v =([ v 1 ,low ,v 1 ,high ], [ v 2 ,low ,v 2 ,high ], ... ,[ v n,low ,v n,high ])

The Hausdorff distance between two vectors u and v is defined by (3):

n

d H ( u, v )=

max (

|

u i,low −

v i,low |

2 ,

|

u i,high −

v i,high |

2 )

(5.3)

i =1

By substituting the Hausdorff distance measure d H into RBF kernel formula,

we obtain a new RBF kernel for dealing with interval data. This modification

tremendously changes kernel algorithms for mining interval data. No algorith-

mic changes are required from the usual case of continuous data other than the

modification of the RBF kernel evaluation. All the benefits of the original ker-

nel methods are kept. Kernel-based learning algorithms including Support Vector

Mining Complex Data

Search WWH ::

Custom Search

Home