Information Technology Reference
In-Depth Information
Automatic Generation and Reranking
of SQL-Derived Answers to NL Questions
Alessandra Giordani and Alessandro Moschitti
Department of Computer Science and Engineering,
University of Trento, Italy
Abstract. In this paper, given a relational database, we automatically
translate a natural language question into an SQL query retrieving the
correct answer. We exploit the structure of the DB to generate a set of
candidate SQL queries, which we rerank with a SVM-ranker based on
tree kernels. In particular we use linguistic dependencies in the natu-
ral language question and the DB metadata to build a set of plausible
SELECT, WHERE and FROM clauses enriched with meaningful joins.
Then, we combine all the clauses to get the set of all possible SQL queries,
producing candidate queries to answer the question. This approach can
be recursively applied to deal with complex questions, requiring nested
queries. We sort the candidates in terms of scores of correctness using a
weighting scheme applied to the query generation rules. Then, we use a
SVM ranker trained with structural kernels to reorder the list of question
and query pairs, where both members are represented as syntactic trees.
The f-measure of our model on standard benchmarks is in line with the
best models (85% on the first question), which use external and expensive
hand-crafted resources such as the semantic interpretation. Moreover, we
can provide a set of candidate answers with a Recall of the answer of
about 92% and 96% on the first 2 and 5 candidates, respectively.
1
Introduction
In the last decade, a variety of approaches have been developed to automatically
convert natural language questions into machine-readable instructions. In the
area of databases, question answering systems are supposed to answer a natural
language question by executing a SQL query. This is obviously a complex task as
systems have to deal with the lexical gap between natural language expressions
and database structure. In this paper, we will demonstrate that it is possible
to fill such gap by relying on (i) the informative metadata embedded in all real
databases, (ii) natural language processing methods, e.g., syntactic parsing, and
(iii) advanced machine learning to build kernel-based rerankers.
When designing a database, domain experts are requested to organize en-
tities and relationships naming tables and columns in a meaningful way (i.e.
state name or capital instead of table 1 or table 2 ). Moreover the database
schema also specifies constraints and data types. This metadata is stored in an
underlying database that contains tables of each database. The latter, in turn,
 
Search WWH ::




Custom Search