Database Reference
In-Depth Information
of the length of the document. Long documents contain more words than the
short ones, and hence they are represented by feature vectors with greater
norm. This effect can be removed by normalizing the kernel (for more details
see ( 27 )). Stop word removal and normalization are two examples of opera-
tions that can be performed and repeated as a series of successive embedding
steps.
1.3.1.1
Vector Space Kernel
We have just defined the function φ , which maps a document into a row
vector, in which each entry is the term frequency of that term in that doc-
ument. This vector has a number of entries equal to the number of words
inside the dictionary, but few of them have non-zero value.
Matrix D can be created using this representation. We refer to X as a
matrix of training examples by features. There is a direct correspondence be-
tween X and D , where features become terms, and training examples become
documents.
We create a kernel matrix K = DD corresponding to the vector space
kernel
N
κ ( d 1 ,d 2 )=
φ ( d 1 ) ( d 2 )
=
tf ( t j ,d 1 ) tf ( t j ,d 2 ) .
j =1
An interesting property of the Vector Space Kernel is the computational
cost. In fact, the time to compute the kernel is proportional to the length
of the two documents O (
|
d 1 |
|
d 2 |
). This is due to the process of sparse
vector representation. Each document is preprocessed, and it is split into a
list of terms using spaces as term separators. Each word inside the vocabu-
lary is associated with a unique numeric id. This allows a document to be
transformed into a sequence of ids together with term frequencies and sorted
in ascending order, according to id. A document d becomes a list L ( d )of
pairs (id:term, frequency) . Now it is a simple and ecient task to compute
κ ( d 1 ,d 2 )= A ( L ( d 1 ) ,L ( d 2 )), where A ( . ) is an algorithm that traverses the
lists, computing products of frequencies whenever the term ids match. This
means that when we compute the kernel, it does not involve evaluation of the
feature vector φ ( d ), but the representation as a list of terms L ( d ). When we
work with high dimensional space, it ensures a cost proportional to the sum
of the length of the documents.
+
1.3.2 Semantic Kernels
An important problem with the bag of words is that it does not contain
information about the semantic content of words. An evolution of the Vector
Space kernel is semantic kernels . They simply try to expand the basic VS
kernel using the linear transformation
φ ( d )= φ ( d ) S . S is a matrix N
k
and we refer to it as semantic matrix . We can rewrite the definition of kernel
×
Search WWH ::




Custom Search