Automatic Generation and Reranking of SQL-Derived Answers to NL Questions - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge - page 74

Information Technology Reference

In-Depth Information

correct result set in 82% of the cases. For the other questions, it can be found

within the first 10 generated answers for 99% of the questions (once the 33

questions above have been removed). This can be observed in Figure 6, which

plots the Recall (of the correct question) curve of the generative approach, i.e.,

the baseline. As pointed out in the graphic, the right query is found among the

first three in 93% of the cases.

6.3 Reranking Results

Figure 6 also shows the plot for different rerankers using the following kernels:

STK+STK, STK

STK) 2 , which provide better rankings

(the first STK is applied to the question parse trees whereas the second STK

is applied to the query derivation tree). For example, the latter kernel retrieved

the correct answers 94% of times by only using the first two answers.

To better evaluate the results of our rerankers, we applied standard 10-fold

cross validation and measure the average Recall and Std Dev. of selecting a

query for each question. The results for different kernel models for reranking

are reported in Table 2. The first column of Table 2 lists kernel combination by

means of product and sum between pairs of basic kernels used for the question

and the query, respectively. The other columns show the percentage of questions

for which we found at least 1 correct answer in the top @X positions (average

Recall@X over 10 folds

×

STK and (1+STK

×

Std. Dev).

The results are rather exciting since they compare favorably with the state-

of-the-art. The best system on this datasets was designed in [15] and shows a

Precision of 96.3% and a Recall of 79.3%, for an f-measure of 86.9%, while our

system shows a Precision of 82.8% and a Recall of 87.2%, for an f-measure of

85.0% (when we include the 33 missing questions in the evaluation). Two main

facts should be noted:

±

- our system performs just 2 points less than the system designed in [15]

but it does not need any hand-crafted manual resource, i.e., the semantic

trees manually designed in [15] for each question, and it is very simple to

implement.

- unlike it has been done in previous work, we can also provide multiple ranked

answers. If we select the first n candidates, we highly increasing the Recall

Tabl e 2. Kernel combination recall ( ± Std. Dev) for Geo dataset

Co m bination Rec@1 Rec@2 Rec@3 Rec@4 Rec@5

NO RERANKING 81.4 ± 5.8 87.6 ± 3.8 90.8 ± 3.1 94.0 ± 2.4 95.0 ± 2.0

STK+STK

83.5 ± 3.6 90.4 ± 3.5 94.2 ± 2.9 95.8 ± 2.0 96.7 ± 1.7

STK × STK

86.5 ± 4.0 92.6 ± 3.7 95.3 ± 3.2 97.0 ± 1.8 97.7 ± 1.4

(1+STK 2 ) 2

87.2 ± 3.9 94.1 ± 3.4 95.6 ± 2.7 97.1 ± 1.9 97.9 ± 1.4

BOW × STK

86.7 ± 4.1 92.1 ± 3.2 95.6 ± 2.5 97.1 ± 1.4 97.6 ± 1.2

Next Page

Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Search WWH ::

Custom Search

Home