Analysis of Text Patterns Using Kernel Methods - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

The first step of the kernel approach is to embed the data items (e.g.,

documents) into a Euclidean space where the patterns can be represented by

a linear relation. This step reduces many complex problems to a class of linear

problems, and algorithms used to solve them are ecient and well understood.

Depending on the data and on the patterns that are to be expected, it is

necessary to choose a function that defines an embedding map.

The second step is to detect relations within the embedded data set, using

a robust and ecient pattern analysis algorithm. Once again the choice of a

particular pattern analysis algorithm depends on the problem at hand.

The strength of the kernel approach is that the embedding and subsequent

analysis are performed in a modular fashion, so it is possible to consider these

two parts as separate and the embedding step does not need to be performed

explicitly, as will be described shortly.

Given a general input space X

n and a linear pattern analysis algorithm,

we first embed X into a high dimensional feature space F

⊆ R

N and then

relations are detected in the embedded data using the linear pattern analysis

algorithms. The feature space can be defined as

⊆ R

F = {φ ( x ): x ∈

X}

N

where φ : X

→

F

⊆ R

is the embedding map and x is a vector containing

the feature's value.

Linear algorithms are preferred because of their eciency and indeed they

are well understood, both from a statistical and computational perspective.

Since φ can be non-linear, any linear relation in F obtained by a linear algo-

rithm can correspond to a non-linear relation in X . Examples include classical

methods such as Least Squares, Linear Regression, etc.

Duality. The fundamental observation of the kernel approach is that lin-

ear relations can be represented by using inner products

φ ( x ) ,φ ( z )

between

all pairs of observed points x , z

∈

X and without explicitly using their co-

N . Thisiscalledthe dual representation of linear relations,

and has far-reaching consequences for algorithm application. It is possible

to apply most linear pattern analysis algorithms given the relative positions

of data points in a feature space, without ever needing to know their actual

coordinates.

The function that returns the inner product between the images of any two

data points in the feature space is called kernel . Examples include kernels for

text, kernels for images that induce similarity between objects using different

aspects of them.

R

ordinates in

Kernel Function.

A kernel is a function κ that for all x , z ∈ X satisfies

κ ( x , z )=

φ ( x ) ,φ ( z )

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home