Information Technology Reference
In-Depth Information
5.1 Kernel Methods
In kernel-based machines, both learning and classification algorithms only de-
pend on the inner product between instances. In several cases this can be e -
ciently and implicitly computed by kernel functions by exploiting the following
dual formulation: i =1 ..l y i α i φ ( o i ) φ ( o )+ b =0,where o i and o are two objects,
φ is a mapping from the objects to feature vectors x i and φ ( o i ) φ ( o )= K ( o i ,o )is
a kernel function implicitly defining such mapping. In case of structural kernels,
K determines the shape of the substructures that describe the objects above.
In the following section, we are going to first propose a structural representa-
tion of the question and query pairs, then we will illustrate the Syntactic Tree
Kernel (STK) [2], which computes the number of syntactic tree fragments. In
the last subsection we will show how to engineer new kernels from them, while
the reranking kernel is presented in Sec. 5.5
Fig. 5. Question/Query Syntactic trees
5.2 Representing Question and Queries Pairs
In Data Mining and Information Retrieval the so-called bag-of-words (BOW) has
been shown to be effective to represent textual documents, e.g. [12,6]. However,
in case of questions and queries, we deal with small textual objects in which
the semantic content is expressed by means of few words and poorly reliable
probability distributions. In these conditions the use of syntactic representation
improves BOW and should be always used.
Therefore, in addition to BOW, we represent questions and queries using their
syntactic trees, as shown in Figure 5: for questions (a) we used the Charniak's
syntactic parser [1] while for queries (b) we implemented an ad-hoc SQL parser.
The latter builds a SQL parse tree for each query following its syntactic deriva-
tion according to MySQL grammar. The grammar has been slightly modified
to accommodate the usage of the symbol
for the production of items in the
SELECT clause and in WHERE conditions. In such an SQL tree, the internal
nodes are only the SQL keywords of the query plus the special symbol
whereas
the leaves are names of tables and columns of the database, category variables
or operators. Note that, although we eliminated comma and dot from the gram-
mar, it is still possible to obtain the original SQL query, by just performing
a preorder traversal of the tree. The above structures can be represented in a
learning algorithm using the kernel described in the next section.
 
Search WWH ::




Custom Search