Analysis of Text Patterns Using Kernel Methods - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

φ :

using the new feature vector

κ ( d 1 ,d 2 )= φ ( d 1 ) SS φ ( d 2 ) = φ ( d 1 ) φ ( d 2 ) .

Different choices of S lead to different variants of the VSMs. We can con-

sider S as a product of successive embeddings. We define it as S = RP ,where

R is a diagonal matrix giving the term weightings and P is a proximity matrix

defining semantic spreading between different terms of the corpus.

In Information Retrieval (IR), the term frequency is considered a local fea-

ture of the document. In particular tasks, terms need to carry an absolute

information across the documents into the corpus or a given topic. Several

measures have been proposed for term weighting such as mutual information

(8), entropy (26), or term frequency of words across the documents. We con-

sider an absolute measure known as idf (11) that weights terms as a function

of their inverse document frequency . If the corpus contains documents, and

df ( t ) is the number of documents that contain the term t ,the idf weight is

w ( t )=ln

df ( t )

Idf is implicitly able to downweight the stop words. If a term is present in

each document, then w ( t ) = 0. In general it is preferable to create a stop word

list, and remove the stop word before computing the vector representation.

This helps to decrease the dictionary size.

The idf rule is just an example of a kind of term weight. In general, we

can develop a new VSM choosing the term weightings matrix R as a diagonal

matrix in the following way:

R tt = w ( t ).

The associated kernel computes the inner product

κ ( d 1 ,d 2 )= φ ( d 1 ) RR φ ( d 2 ) =

w ( t ) 2 tf ( t, d 1 ) tf ( t, d 2 ) .

This kernel merges the tf and idf representation well known in IR as tf-idf .

It is implementable by a weighted version A w of the algorithm A :

κ ( d 1 ,d 2 )= A w ( L ( d 1 ) ,L ( d 2 )) .

The tf-idf representation is able to highlight discriminative terms and down-

weight irrelevant terms, but it is not able to take into account semantic in-

formation about two or more terms or about two or more documents. This

semantic information can be introduced into the semantic kernel using the

proximity matrix P . This matrix needs to have non-zero off-diagonal entries,

P ij > 0for i

= j , when the term i is semantically correlated with term j .

Given P , the vector space kernel becomes

Search WWH ::

Custom Search

Home