Information Technology Reference
In-Depth Information
of the correct answers, e.g., within the first 2 we have a f-measure of 90%
(considering the 33 missing questions).
Other closely related work, e.g., [4], suggests that lower results than ours can
be obtained using different approaches. These rely either on semantic grammar
specified by an expert user [9], or on enriching the information contained in the
pairs [10] and implementing ad-hoc rules in a semantic parser [7,11]. Our system
instead, requires no intervention since the database metadata already contains
all the needed data.
Finally, we report the learning curve of one basic reranker in Figure 7, showing
how recall of STK
×
STK increases for larger training sets. The plot reveals that
as soon as we provide a reasonable percentage of training data (25% of the
available data corresponding to 9 folds of 700 questions - one fold is used for
testing) for reranking, the model improves on the baseline.
The main contribution of this research consist in the fact that given a NL
question we can generate a set of mapping SQL queries. Moreover if we can rely
on a relatively small set of correct pairs of questions and queries to train a SVM
classifier, we are able to re-rank the set of generated pairs to select the correct
one with a fairly high accuracy.
7 Conclusions and Future Work
In this paper, we have approached the question answering task of implementing
a NL interface to databases by automatically generating SQL queries based on
grammatical relations and matching metadata. To our knowledge, the underlying
idea that we have proposed to build and combine clauses sets is novelty. Addi-
tionally, we are firstly experimented with a preference reranking kernel, which is
able to boost the accuracy of our generative model.
Given the high accuracy, the simplicity and the practical usefulness of our
approach, (e.g., we can generate the correct question in the first 5 candidates in
95% of the cases), we believe that our methods can be successfully used in the
future for real-world applications.
In the future we plan to experiment with datasets in different domains (e.g.
ATIS corpus). Moreover, given that current challenges in Semantic Web tackle
similar problem [5] ( scaling question answering approaches to Linked Data, i.e.
Question Answering over Linked Data), it would be interesting to apply our
algorithms to semantic search and question answering over RDF data.
References
1. Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of NAACL
2000 (2000)
2. Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: Kernels
over discrete structures, and the voted perceptron. In: Proceedings of ACL 2002
(2002)
 
Search WWH ::




Custom Search