Database Reference
In-Depth Information
φ
:
using the new feature vector
κ
(
d
1
,d
2
)=
φ
(
d
1
)
SS
φ
(
d
2
)
=
φ
(
d
1
)
φ
(
d
2
)
.
Different choices of
S
lead to different variants of the VSMs. We can con-
sider
S
as a product of successive embeddings. We define it as
S
=
RP
,where
R
is a diagonal matrix giving the
term weightings
and
P
is a
proximity
matrix
defining semantic spreading between different terms of the corpus.
In Information Retrieval (IR), the term frequency is considered a local fea-
ture of the document. In particular tasks, terms need to carry an absolute
information across the documents into the corpus or a given topic. Several
measures have been proposed for term weighting such as mutual information
(8), entropy (26), or term frequency of words across the documents. We con-
sider an absolute measure known as
idf
(11) that weights terms as a function
of their
inverse document frequency
. If the corpus contains
documents, and
df
(
t
) is the number of documents that contain the term
t
,the
idf
weight is
w
(
t
)=ln
df
(
t
)
.
Idf
is implicitly able to downweight the stop words. If a term is present in
each document, then
w
(
t
) = 0. In general it is preferable to create a stop word
list, and remove the stop word before computing the vector representation.
This helps to decrease the dictionary size.
The
idf
rule is just an example of a kind of term weight. In general, we
can develop a new VSM choosing the
term weightings
matrix
R
as a diagonal
matrix in the following way:
R
tt
=
w
(
t
).
The associated kernel computes the inner product
κ
(
d
1
,d
2
)=
φ
(
d
1
)
RR
φ
(
d
2
)
=
t
w
(
t
)
2
tf
(
t, d
1
)
tf
(
t, d
2
)
.
This kernel merges the
tf
and
idf
representation well known in IR as
tf-idf
.
It is implementable by a weighted version
A
w
of the algorithm
A
:
κ
(
d
1
,d
2
)=
A
w
(
L
(
d
1
)
,L
(
d
2
)) .
The
tf-idf
representation is able to highlight discriminative terms and down-
weight irrelevant terms, but it is not able to take into account semantic in-
formation about two or more terms or about two or more documents. This
semantic information can be introduced into the semantic kernel using the
proximity matrix
P
. This matrix needs to have non-zero off-diagonal entries,
P
ij
>
0for
i
=
j
, when the term
i
is semantically correlated with term
j
.
Given
P
, the vector space kernel becomes
Search WWH ::
Custom Search