Graphics Reference
In-Depth Information
As
Φ
(
x j
)
(
x j
)
=
κ
(
x j , x j
)
,thedualproblemcanberewrittenasasimple
Z
quadratic programming problem:
max
β
W
(
β
)=−
j
β j β j κ
(
x j , x j
)
( . )
j
s.t.
β j
=
and
β j
C .
( . )
j
Solutions for β's are not unique, unless the kernel matrix K
=[
κ
(
x j , x j
)]
is of full
rank. In an optimal solution to problem ( . ), if
<
β j
<
C,thenξ j
=
and
the corresponding data point x j and its image Φ
(
x j
)
lie, respectively, on the cluster
boundaries and the surface of the sphere in
Z
. Such a point is called a support vector
(SV). he image of a point x j with ξ j
lies outside the sphere. his implies that
the corresponding β j
C.Suchanx j is called a bounded support vector (BSV). Data
points can be classified into three types: SVs lie on the cluster boundaries, BSVs are
outside the boundaries, and the other points lie inside clusters. Since
=
β j
C and
n
j = β j
=
, there is no BSV when C
. Moreover,
(
nC
)
is an upper bound on the
fraction of BSVs.
In three-dimensional space, we can easily visualize the clusters once the bound-
aries are drawn. However, in a data space with more dimensions, it is hard to picture
and determine which data points are inside a specific cluster. hus, we need an algo-
rithm for cluster assignment. Ben-Hur et al. ( ) introduced the adjacency matrix
for cluster assignment. Let R
Φ
(
y
)=
Φ
(
y
)−
Z be the feature distance of Φ
(
y
)
to the data centroid Φ. Denote the adjacency matrix by A
=[
A jj
]
,whereA jj
=
j, j
indicates that the pair
(
)
is in the same cluster and A jj
=
otherwise. We need
j
only the upper triangular part of the A matrix. For j
<
if R
(
y
)
R,
y on the line segment connecting x j and x j ,
A jj
=
otherwise .
hisdefinition is based on a geometric observation byBen-Huret al. For a given pair
of data points, say x j and x j , belonging to different clusters, any path that connects
them must exit fromthe enclosing sphere.herefore,sucha path contains a segment
of points y such that R
R. Checking all points on a line segment is impossible.
In practice, Ben-Hur et al., suggest that - uniformly distributed points should
be used to check whether R
(
y
)
R or not. Once the adjacency matrix A is formed,
the clusters can be defined as the connected components of the graph induced by
A. his procedure will leave the BSVs as outliers. One can either assign them to the
nearest clusters or leave them alone. We will illustrate an example to show how the
SVC works and the effect of the window width of the Gaussian kernel.
(
y
)
Example 9 We synthesize points in R space, among which points are in the
upperhemisphereoftheballwithradius . andwithitscenterattheorigin.heother
points are generated from three areas of the XY plane and then mapped into the
lower hemisphere of the ball. We vary the parameter q from to and the number
9
Search WWH ::




Custom Search