Database Reference
In-Depth Information
where D is the document-term matrix, equivalent to taking P = D .This
definition does not make immediately clear that it implements a semantic
similarity, but if we compute the corresponding kernel
κ ( d 1 ,d 2 )= φ ( d 1 ) D D φ ( d 2 ) ,
we can observe that the matrix D D has a non-zero ( i, j )- th entry if and only
if there is a document in the corpus in which the i - th and j - th terms co-occur,
since
( D D ) ij =
d
tf ( i, d ) tf ( j, d ).
The strength of a semantic relationship between two terms that co-occurs in
a document is measured by the frequency and number of their co-occurrences.
This approach can be used to reduce the space dimension. In fact, if we have
less documents than terms, we map from the vectors indexed by terms to a
lower-dimensional space indexed by the documents of the corpus.
Latent Semantic Kernels. Another approach based on the use of co-
occurence information is Latent Semantic Indexing (LSI) (7). This method is
very close to GSVM, the main difference is that it uses singular value decom-
position (SVD) to extract the semantic information from the co-occurrences.
SVD of a matrix considers the first k columns of the left and right singu-
lar vectors matrices U and V corresponding to the k largest singular values.
Thus, the word-by-document matrix D is factorized as
D = UΣV
where U and V are unitary matrices whose columns are the eigenvectors of
D D and DD respectively. LSI now projects the documents into the space
spanned by the first k columns of U , using these new k -dimensional vectors
for subsequent processing
φ ( d ) U k ,
where U k is the matrix containing the first k columns of U . The eigenvectors
define the subspace that minimizes the sum-squared differences between the
points and their projections, so it defines the subspace with minimal sum-
squared residuals. Hence, the eigenvectors for a set of documents can be
viewed as concepts described by linear combinations of terms chosen in such
a way that the documents are described as accurately as possible using only
k such concepts. The aim of SVD is to extract few high correlated dimen-
sions/concepts able to approximately reconstruct the whole feature vector.
The new kernel can be defined as
d
−→
κ ( d 1 ,d 2 )= φ ( d 1 ) U k U k φ ( d 2 ) ,
Search WWH ::




Custom Search