Analysis of Text Patterns Using Kernel Methods - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

Distance between Feature Vectors.

The length of the line joining two

images φ ( x )and φ ( z ) can be computed as

2 =

φ ( x )

−

φ ( z )

φ ( x )

−

φ ( z ) ,φ ( x )

−

φ ( z )

φ ( x ) ,φ ( x )

−

φ ( x ) ,φ ( z )

φ ( z ) ,φ ( z )

= κ ( x , x )

−

2 κ ( x , z )+ κ ( z , z ) .

(1.1)

It is easy to find out that this is a special case of the norm. The algorithms

demonstrated at the end of this chapter are based on distance.

Norm and Distance from the Center of Mass.

Consider now the center

of mass of the set φ ( S ). This is the vector

φ S =

φ ( x i ) .

i =1

As with all points in the feature space we have not an explicit vector rep-

resentation of this point, but in this case there may not exist a point in X

whose image under φ is φ S . However we can compute the norm of the points

of φ S using only evaluations of the kernel on the inputs:

= 1

φ ( x j )

φ ( x i ) , 1

2 =

φ S

φ S ,φ S

i =1

j =1

i,j =1

φ ( x i ) ,φ ( x j )

κ ( x i , x j ) .

i,j =1

Hence, the square of the norm of the center of mass is equal to the average

of the entries in the kernel matrix. This implies that this sum is equal to

zero if the center of mass is at the origin of the coordinate system and greater

than zero otherwise. The distance of the image of a point x from the center

of mass φ S is:

2 =

φ ( x )

−

φ S

φ ( x ) ,φ ( x )

φ S ,φ S −

φ ( x ) ,φ S

= κ ( x , x )+ 1

κ ( x i , x j )

−

κ ( x , x i ) .

(1.2)

i,j =1

i =1

Linear Classification. Classification, also called categorization in text

analysis, is one of the possible tasks that can be solved using kernel approach.

The aim is to assign any input of our training set to one of a finite set of

categories; the classification is binary if there are two categories, otherwise we

are considering a multi-class problem.

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home