Data Visualization via KernelMachines - Data Visualization

Graphics Reference

In-Depth Information

As R is the radius of the enclosing sphere, corresponding pre-images of the en-

closing sphere consist of points

(

)−

.Forx

we have

j =

j, j ′ =

R .

(

)−

(

x, x

)−

β j κ

(

x j , x

β j β j ′ κ

(

x j , x j ′

Or equivalently

j =

β j κ

(

x j , x

( . )

where ρ

(

j, j ′ = β j β j ′ κ

(

x j , x j ′

)−

)

. When the enclosing sphere is

mapped back to the data space

, it forms a set of probability contours. hese con-

toursareusedasclusterboundaries, anddata pointsinside eachcontour areassigned

tothesamecluster.heSVCformscontours viakernelmixture( . ),withthemix-

ingcoe cients β j being solved from( . ).Notethat Φisaweighted centroid inthe

feature space and

j = β j

Z can be regarded as a weighted measure of

data dispersion in the feature space. In other words, the SVC algorithm finds mixing

coe cients to make the data dispersion as large as possible in the feature space

(

x j

)−

while it draws the kernel mixture contours in the original data space

to form clus-

ters.heset

defines theclusterboundaries. Data points lying ontheboundaries are

calledsupportvectors.Notethatthenonlinear transformation Φisimplicitly defined

by a Gaussian kernel, κ

e − q x j − x j ′ , q

. (he normalizing constant for

κ is not relevant to cluster analysis and is dropped for simplicity.) A larger value of q

corresponds to a smaller window width and leads to more clusters in the analysis.

Unlike thek-means algorithm, wherethe numberof clusters k mustbeprescribed

bytheuser,thewindowwidthinSVCcanvarycontinuously, resultinginhierarchical

clusters. he number of clusters depends on the window width of the Gaussian ker-

nel. Decreasing the width leads to more clusters. Also, in contrast with the k-means

procedure, no initial centroids are required as the algorithm input. herefore, a de-

terministic result, independent from initial conditions, can be expected. SVC also

has the ability to deal with outliers by employing slack variables. It allows some data

points to remain outside the enclosing sphere in the feature space. his is same as

the “sot margin” idea in support vector machines. With the introduction of slack

variables, the optimization problem becomes

(

x j , x j ′

j =

min

a ,R,ξ

ξ j ,subjectto

(

x j

)−

ξ j , ξ j

∀

j .

( . )

It is straightforward to derive the corresponding dual problem:

j =

max

β j

x j

( . )

(

)−

s.t.Φ

β j Φ

x j

β j

and

β j

C .

(

)

Data Visualization

Search WWH ::

Custom Search

Home