Information Technology Reference
In-Depth Information
Fig. 5.1. Aggregation of a large dataset
(kernel width σ ) in (1) of two data vectors x and y of continuous type, K(x,y)
is based on the Euclidean distance between these vectors, d E ( x, y )=
x
y
.
2
x
y
K
x, y
=exp(
)
(5.1)
2 σ 2
For dealing with interval data, we only need to measure the distance between
two vectors of interval type and then we substitute this distance measure for
the Euclidean distance into the RBF kernel formula (1). Thus the new RBF
kernel can deal with interval data. We propose to use the Hausdorff (1868-1942)
distance to measure the dissimilarity between two data vectors of interval type.
Suppose we have two intervals represented by low and high values: I 1 =
[ low 1 ,high 1 ]and I 2 =[ low 2 ,high 2 ], the Hausdorff distance between two intervals
I 1 and I 2 is defined by (2):
d H ( I 1 ,I 2 )=max(
|
low 1
low 2 |
,
|
high 1
high 2 |
)
(5.2)
Let us consider two data vectors u , v
Ω having n dimensions of interval
type:
u =([ u 1 ,low ,u 1 ,high ], [ u 2 ,low ,u 2 ,high ], ... ,[ u n,low ,u n,high ])
v =([ v 1 ,low ,v 1 ,high ], [ v 2 ,low ,v 2 ,high ], ... ,[ v n,low ,v n,high ])
The Hausdorff distance between two vectors u and v is defined by (3):
n
d H ( u, v )=
max (
|
u i,low
v i,low |
2 ,
|
u i,high
v i,high |
2 )
(5.3)
i =1
By substituting the Hausdorff distance measure d H into RBF kernel formula,
we obtain a new RBF kernel for dealing with interval data. This modification
tremendously changes kernel algorithms for mining interval data. No algorith-
mic changes are required from the usual case of continuous data other than the
modification of the RBF kernel evaluation. All the benefits of the original ker-
nel methods are kept. Kernel-based learning algorithms including Support Vector
Search WWH ::




Custom Search