Database Reference
In-Depth Information
2. (27):
κ ( x , z )=exp( κ 1 ( x , z ))
3. Gaussian kernel (4):
2 / (2 σ 2 )) with x , z
κ ( x , z )=exp(
x
z
X
Now we have all necessary tools to discuss kernel applications in text prob-
lems.
1.3 Kernels for Text
In the last twenty five-years, the constant growth of the Web has produced
an explosion of readily available digital text. This huge amount of data has
become one of the main research interests of Artificial Intelligence. Many
algorithms and text representations have been developed obtaining successful
results. The goal of this section is to introduce some applications of Kernel
Methodsinthisarea.
Typically, pattern analysis algorithms are originally developed to be ap-
plied to vectorial data. However, for many other types of data it is possible to
explicitly or implicitly construct a feature space capturing relevant informa-
tion from this data. Unfortunately even when it can be expressed explicitly,
often this feature space is so high dimensional that the algorithms can not
be used in their original form for computational reasons. However many of
these algorithms can be reformulated into a kernel version. These kernel ver-
sions directly operate on the kernel matrix rather than on the feature vectors.
For many data types, methods have been devised to eciently evaluate these
kernels, avoiding the explicit construction of the feature vectors. In this way,
the introduction of kernels defined for a much wider variety of data structures
significantly extended the application domain of linear algorithms. Now we
introduce and discuss various kernels which are commonly used in text.
1.3.1 Vector Space Model
The Vector Space Model (VSM) representation for a document d has been
introduced by (23) in 1975. The main idea consists of representing a docu-
ment as a vector, in particular as a bag of words . This set contains only the
words that belong to the document and their frequency. This means that a
document is represented by the words that it contains. In this representation,
punctuation is ignored, and a sentence is broken into elementary elements
(words) losing the order and the grammar information. These two observa-
tions are crucial, because they show that it is impossible to reconstruct the
 
Search WWH ::




Custom Search