Graphics Reference
In-Depth Information
As R is the radius of the enclosing sphere, corresponding pre-images of the en-
closing sphere consist of points
Φ
Z
R
C=
x
Φ
(
x
)−
=
.Forx
C
we have
n
j =
n
j, j =
Φ
Z
R .
Φ
(
x
)−
=
κ
(
x, x
)−
β j κ
(
x j , x
)+
β j β j κ
(
x j , x j
)=
Or equivalently
n
j =
C=
x
β j κ
(
x j , x
)=
ρ
,
( . )
n
R
where ρ
=(
κ
(
,
)+
j, j = β j β j κ
(
x j , x j
)−
)
. When the enclosing sphere is
mapped back to the data space
, it forms a set of probability contours. hese con-
toursareusedasclusterboundaries, anddata pointsinside eachcontour areassigned
tothesamecluster.heSVCformscontours viakernelmixture( . ),withthemix-
ingcoe cients β j being solved from( . ).Notethat Φisaweighted centroid inthe
feature space and
X
n
j = β j
Φ
Z can be regarded as a weighted measure of
data dispersion in the feature space. In other words, the SVC algorithm finds mixing
coe cients to make the data dispersion as large as possible in the feature space
Φ
(
x j
)−
Z
,
while it draws the kernel mixture contours in the original data space
X
to form clus-
ters.heset
defines theclusterboundaries. Data points lying ontheboundaries are
calledsupportvectors.Notethatthenonlinear transformation Φisimplicitly defined
by a Gaussian kernel, κ
C
e q x j x j , q
. (he normalizing constant for
κ is not relevant to cluster analysis and is dropped for simplicity.) A larger value of q
corresponds to a smaller window width and leads to more clusters in the analysis.
Unlike thek-means algorithm, wherethe numberof clusters k mustbeprescribed
bytheuser,thewindowwidthinSVCcanvarycontinuously, resultinginhierarchical
clusters. he number of clusters depends on the window width of the Gaussian ker-
nel. Decreasing the width leads to more clusters. Also, in contrast with the k-means
procedure, no initial centroids are required as the algorithm input. herefore, a de-
terministic result, independent from initial conditions, can be expected. SVC also
has the ability to deal with outliers by employing slack variables. It allows some data
points to remain outside the enclosing sphere in the feature space. his is same as
the “sot margin” idea in support vector machines. With the introduction of slack
variables, the optimization problem becomes
(
x j , x j
)=
n
j =
R
Z
R
min
a ,R,ξ
+
C
ξ j ,subjectto
Φ
(
x j
)−
a
+
ξ j , ξ j
,
j .
( . )
It is straightforward to derive the corresponding dual problem:
n
j =
Φ
Z
max
β
W
β
β j
Φ
x j
( . )
(
)=
(
)−
s.t.Φ
β j Φ
x j
,
β j
and
β j
C .
=
j
(
)
i
=
Search WWH ::




Custom Search