Analysis of Text Patterns Using Kernel Methods - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

of the length of the document. Long documents contain more words than the

short ones, and hence they are represented by feature vectors with greater

norm. This effect can be removed by normalizing the kernel (for more details

see ( 27 )). Stop word removal and normalization are two examples of opera-

tions that can be performed and repeated as a series of successive embedding

steps.

1.3.1.1

Vector Space Kernel

We have just defined the function φ , which maps a document into a row

vector, in which each entry is the term frequency of that term in that doc-

ument. This vector has a number of entries equal to the number of words

inside the dictionary, but few of them have non-zero value.

Matrix D can be created using this representation. We refer to X as a

matrix of training examples by features. There is a direct correspondence be-

tween X and D , where features become terms, and training examples become

documents.

We create a kernel matrix K = DD corresponding to the vector space

kernel

N

κ ( d 1 ,d 2 )=

φ ( d 1 ) ,φ ( d 2 )

=

tf ( t j ,d 1 ) tf ( t j ,d 2 ) .

j =1

An interesting property of the Vector Space Kernel is the computational

cost. In fact, the time to compute the kernel is proportional to the length

of the two documents O (

|

d 1 |

|

d 2 |

). This is due to the process of sparse

vector representation. Each document is preprocessed, and it is split into a

list of terms using spaces as term separators. Each word inside the vocabu-

lary is associated with a unique numeric id. This allows a document to be

transformed into a sequence of ids together with term frequencies and sorted

in ascending order, according to id. A document d becomes a list L ( d )of

pairs (id:term, frequency) . Now it is a simple and ecient task to compute

κ ( d 1 ,d 2 )= A ( L ( d 1 ) ,L ( d 2 )), where A ( . ) is an algorithm that traverses the

lists, computing products of frequencies whenever the term ids match. This

means that when we compute the kernel, it does not involve evaluation of the

feature vector φ ( d ), but the representation as a list of terms L ( d ). When we

work with high dimensional space, it ensures a cost proportional to the sum

of the length of the documents.

+

1.3.2 Semantic Kernels

An important problem with the bag of words is that it does not contain

information about the semantic content of words. An evolution of the Vector

Space kernel is semantic kernels . They simply try to expand the basic VS

kernel using the linear transformation

φ ( d )= φ ( d ) S . S is a matrix N

k

and we refer to it as semantic matrix . We can rewrite the definition of kernel

×

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home