Database Reference
In-Depth Information
φ :
using the new feature vector
κ ( d 1 ,d 2 )= φ ( d 1 ) SS φ ( d 2 ) = φ ( d 1 ) φ ( d 2 ) .
Different choices of S lead to different variants of the VSMs. We can con-
sider S as a product of successive embeddings. We define it as S = RP ,where
R is a diagonal matrix giving the term weightings and P is a proximity matrix
defining semantic spreading between different terms of the corpus.
In Information Retrieval (IR), the term frequency is considered a local fea-
ture of the document. In particular tasks, terms need to carry an absolute
information across the documents into the corpus or a given topic. Several
measures have been proposed for term weighting such as mutual information
(8), entropy (26), or term frequency of words across the documents. We con-
sider an absolute measure known as idf (11) that weights terms as a function
of their inverse document frequency . If the corpus contains documents, and
df ( t ) is the number of documents that contain the term t ,the idf weight is
w ( t )=ln
df ( t )
.
Idf is implicitly able to downweight the stop words. If a term is present in
each document, then w ( t ) = 0. In general it is preferable to create a stop word
list, and remove the stop word before computing the vector representation.
This helps to decrease the dictionary size.
The idf rule is just an example of a kind of term weight. In general, we
can develop a new VSM choosing the term weightings matrix R as a diagonal
matrix in the following way:
R tt = w ( t ).
The associated kernel computes the inner product
κ ( d 1 ,d 2 )= φ ( d 1 ) RR φ ( d 2 ) =
t
w ( t ) 2 tf ( t, d 1 ) tf ( t, d 2 ) .
This kernel merges the tf and idf representation well known in IR as tf-idf .
It is implementable by a weighted version A w of the algorithm A :
κ ( d 1 ,d 2 )= A w ( L ( d 1 ) ,L ( d 2 )) .
The tf-idf representation is able to highlight discriminative terms and down-
weight irrelevant terms, but it is not able to take into account semantic in-
formation about two or more terms or about two or more documents. This
semantic information can be introduced into the semantic kernel using the
proximity matrix P . This matrix needs to have non-zero off-diagonal entries,
P ij > 0for i
= j , when the term i is semantically correlated with term j .
Given P , the vector space kernel becomes
Search WWH ::




Custom Search