Automatic Generation and Reranking of SQL-Derived Answers to NL Questions - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Information Technology Reference

In-Depth Information

Automatic Generation and Reranking

of SQL-Derived Answers to NL Questions

Alessandra Giordani and Alessandro Moschitti

Department of Computer Science and Engineering,

University of Trento, Italy

Abstract. In this paper, given a relational database, we automatically

translate a natural language question into an SQL query retrieving the

correct answer. We exploit the structure of the DB to generate a set of

candidate SQL queries, which we rerank with a SVM-ranker based on

tree kernels. In particular we use linguistic dependencies in the natu-

ral language question and the DB metadata to build a set of plausible

SELECT, WHERE and FROM clauses enriched with meaningful joins.

Then, we combine all the clauses to get the set of all possible SQL queries,

producing candidate queries to answer the question. This approach can

be recursively applied to deal with complex questions, requiring nested

queries. We sort the candidates in terms of scores of correctness using a

weighting scheme applied to the query generation rules. Then, we use a

SVM ranker trained with structural kernels to reorder the list of question

and query pairs, where both members are represented as syntactic trees.

The f-measure of our model on standard benchmarks is in line with the

best models (85% on the first question), which use external and expensive

hand-crafted resources such as the semantic interpretation. Moreover, we

can provide a set of candidate answers with a Recall of the answer of

about 92% and 96% on the first 2 and 5 candidates, respectively.

1

Introduction

In the last decade, a variety of approaches have been developed to automatically

convert natural language questions into machine-readable instructions. In the

area of databases, question answering systems are supposed to answer a natural

language question by executing a SQL query. This is obviously a complex task as

systems have to deal with the lexical gap between natural language expressions

and database structure. In this paper, we will demonstrate that it is possible

to fill such gap by relying on (i) the informative metadata embedded in all real

databases, (ii) natural language processing methods, e.g., syntactic parsing, and

(iii) advanced machine learning to build kernel-based rerankers.

When designing a database, domain experts are requested to organize en-

tities and relationships naming tables and columns in a meaningful way (i.e.

state name or capital instead of table 1 or table 2 ). Moreover the database

schema also specifies constraints and data types. This metadata is stored in an

underlying database that contains tables of each database. The latter, in turn,

Search WWH ::

Custom Search

Home