Analysis of Text Patterns Using Kernel Methods - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

{

·

x j

+ b : y j = i

}

for i =

{

1 ,

−

}

standard deviations of the function

w

1

;the

term λ

2 has the aim to regularize the norm of the weight vector. Also in

this case, it is possible to rewrite the weight's vector w as a linear combination

of the training examples and the function g ( x ) in dual coordinates. For an

explicit derivation see ( 27 ).

Only if there exists an hyperplane that correctly classifies the data, the

Perceptron procedure is guaranteed to converge; furthermore, the algorithm

may give different results depending on the order in which the elements are

processed, indeed several different solutions exist. Fisher's Discriminant does

not suffer from these problems because its solution is unique since it finds the

hyperplane ( w ,b ) on which the projection of the data is maximally separated.

Fisher's Linear Discriminant (FDA), Partial Least Squares (PLS), Ridge

Regression (RR), Principal Components Analysis (PCA), K-means and Spec-

tral Clustering (SC), Canonical Correlation Analysis (CCA), Novelty Detec-

tion (ND), and many others can all be implemented in a dual form following

the approaches outlined here. We refer the reader to (25; 18; 29; 6; 19; 1; 27)

for more information on these methods, to (3) for a tutorial on kernel methods

based on eigenvalue problems (PCA, CCA, PLS, FDA and SC), and to (33; 32)

for two nice examples of the use of kernel methods in real life problems.

Owing to the level of maturity already achieved in these algorithmic do-

mains, recently the focus of kernel methods research is shifting towards the

design of kernels defined on general data types (such as strings, text, nodes

of a graph, trees, graphs,. . . ). Major issues in kernel design are expressive

power and eciency of evaluation (10; 13; 30; 17; 12).

w

1.2.2 Formal Properties of Kernel Functions

So far, the only way of verifying that the considered function is a kernel is to

construct a feature space, for which the function corresponds to first perform-

ing the feature mapping and then computing the inner product between the

two images. An alternative method of demonstrating that a candidate func-

tion is a kernel is Mercer's Theorem; it provides a characterization of when

a function κ ( x , z )isakernel. Thisisanimportanttheoreticaltoolusefulto

create new kernels, and combine different kernels to form new ones.

The kernel matrix K ij = κ ( x i , x j ), formed by evaluating a kernel on all

pairs of any set of inputs, is a positive semi-definite matrix.

Finitely Positive Semi-Definite Functions

A function

κ : X

×

X

−→ R

satisfies the finitely positive semi-definite property if it is a symmetric function

for which the matrices formed by restriction to any finite subset of the space

X are positive semi-definite. Note that this definition does not require the set

X to be a vector space.

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home