Automatic Generation and Reranking of SQL-Derived Answers to NL Questions - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Information Technology Reference

In-Depth Information

5.1 Kernel Methods

In kernel-based machines, both learning and classification algorithms only de-

pend on the inner product between instances. In several cases this can be e -

ciently and implicitly computed by kernel functions by exploiting the following

dual formulation: i =1 ..l y i α i φ ( o i ) φ ( o )+ b =0,where o i and o are two objects,

φ is a mapping from the objects to feature vectors x i and φ ( o i ) φ ( o )= K ( o i ,o )is

a kernel function implicitly defining such mapping. In case of structural kernels,

K determines the shape of the substructures that describe the objects above.

In the following section, we are going to first propose a structural representa-

tion of the question and query pairs, then we will illustrate the Syntactic Tree

Kernel (STK) [2], which computes the number of syntactic tree fragments. In

the last subsection we will show how to engineer new kernels from them, while

the reranking kernel is presented in Sec. 5.5

Fig. 5. Question/Query Syntactic trees

5.2 Representing Question and Queries Pairs

In Data Mining and Information Retrieval the so-called bag-of-words (BOW) has

been shown to be effective to represent textual documents, e.g. [12,6]. However,

in case of questions and queries, we deal with small textual objects in which

the semantic content is expressed by means of few words and poorly reliable

probability distributions. In these conditions the use of syntactic representation

improves BOW and should be always used.

Therefore, in addition to BOW, we represent questions and queries using their

syntactic trees, as shown in Figure 5: for questions (a) we used the Charniak's

syntactic parser [1] while for queries (b) we implemented an ad-hoc SQL parser.

The latter builds a SQL parse tree for each query following its syntactic deriva-

tion according to MySQL grammar. The grammar has been slightly modified

to accommodate the usage of the symbol

for the production of items in the

SELECT clause and in WHERE conditions. In such an SQL tree, the internal

nodes are only the SQL keywords of the query plus the special symbol

•

whereas

the leaves are names of tables and columns of the database, category variables

or operators. Note that, although we eliminated comma and dot from the gram-

mar, it is still possible to obtain the original SQL query, by just performing

a preorder traversal of the tree. The above structures can be represented in a

learning algorithm using the kernel described in the next section.

Search WWH ::

Custom Search

Home