Automatic Generation and Reranking of SQL-Derived Answers to NL Questions - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Information Technology Reference

In-Depth Information

only relevant relations between the stems of the question. Let us consider for

example the question: “ What is the capital of the most populous state? ”and

its associated answering query SELECT capital FROM state WHERE population =

(SELECT max(population) FROM state) .

The matching words are capital and state , while stemming also allows to find

a mapping through popul . We can note that this stem is used both in the l-

value and in the r-value of the WHERE expression. In fact, this query requires

nesting and indeed the categorizing algorithm needs to be recursive. This stem

is classified both as a selection oriented stem for the outer query, and as a

projection oriented one for the inner query (note that it requires aggregation,

handled when generating the SELECT clause set).

Finally we will introduce one last example to clarify Section 3.5. While with

the other examples it is straightforward to compile the FROM clause, since the

other clauses refer to the same table, when we deal with columns belonging

to different tables things get complicated. Take question “ What are the cap-

itals states bordering Texas? ”) and its associated query SELECT capital FROM

... WHERE border = 'Texas' . How can we fill in the dots in the FROM clause?

Fields capital and border belong respectively to tables state and border info .

Form the database catalog, we learn that these two tables are connected via the

foreign key state name and so the final

will include the following join: state

JOIN border info on state.state name = border info.state name .

F

3.1 Optimizing the Dependency List

As introduced in Section 2.1, we don't need all grammatical relations provided in

output by the Stanford Dependency parser. For this reason before preprocessing

the list of dependencies we need to prune the useless ones and remove from

gov ernors and dep endents the appended number (indicating the position of the

word in question q ). Then, gov sand dep s are reduced to stems (using the Porter

stemmer 1 ).

In order to disambiguate the sense of the stems that do not appear in metadata

but could match with it, we create a list of synonyms using off-the-shelf resources

(like Wordnet and similarity measures) combined with our internal knowledge

(represented by database constraints). Using this list we can substitute certain

stems with their stemmed synonyms.

The resulting SDC q is optimized to be processed by the next step. An example

showing SDC opt

q 1 with respect to the original SDC q 1 introduced in Section 2.1

can be found in Table 1.

3.2 Categorizing Stems

Before building

sets we need to identify those stems that are projection

and/or selection oriented. Those stems will be added respectively to Π and/or

S

and

W

1 http://tartarus.org/martin/PorterStemmer/

Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Search WWH ::

Custom Search

Home