Database Reference
In-Depth Information
where
D
is the document-term matrix, equivalent to taking
P
=
D
.This
definition does not make immediately clear that it implements a semantic
similarity, but if we compute the corresponding kernel
κ
(
d
1
,d
2
)=
φ
(
d
1
)
D
D
φ
(
d
2
)
,
we can observe that the matrix
D
D
has a non-zero (
i, j
)-
th
entry if and only
if there is a document in the corpus in which the
i
-
th
and
j
-
th
terms co-occur,
since
(
D
D
)
ij
=
d
tf
(
i, d
)
tf
(
j, d
).
The strength of a semantic relationship between two terms that co-occurs in
a document is measured by the frequency and number of their co-occurrences.
This approach can be used to reduce the space dimension. In fact, if we have
less documents than terms, we map from the vectors indexed by terms to a
lower-dimensional space indexed by the documents of the corpus.
Latent Semantic Kernels.
Another approach based on the use of co-
occurence information is Latent Semantic Indexing (LSI) (7). This method is
very close to GSVM, the main difference is that it uses singular value decom-
position (SVD) to extract the semantic information from the co-occurrences.
SVD of a matrix considers the first
k
columns of the left and right singu-
lar vectors matrices
U
and
V
corresponding to the
k
largest singular values.
Thus, the word-by-document matrix
D
is factorized as
D
=
UΣV
where
U
and
V
are unitary matrices whose columns are the eigenvectors of
D
D
and
DD
respectively. LSI now projects the documents into the space
spanned by the first
k
columns of
U
, using these new
k
-dimensional vectors
for subsequent processing
φ
(
d
)
U
k
,
where
U
k
is the matrix containing the first
k
columns of
U
. The eigenvectors
define the subspace that minimizes the sum-squared differences between the
points and their projections, so it defines the subspace with minimal sum-
squared residuals. Hence, the eigenvectors for a set of documents can be
viewed as concepts described by linear combinations of terms chosen in such
a way that the documents are described as accurately as possible using only
k
such concepts. The aim of SVD is to extract few high correlated dimen-
sions/concepts able to approximately reconstruct the whole feature vector.
The new kernel can be defined as
d
−→
κ
(
d
1
,d
2
)=
φ
(
d
1
)
U
k
U
k
φ
(
d
2
)
,
Search WWH ::
Custom Search