Information Technology Reference
In-Depth Information
and exclusively select those tuples that satisfy a condition C. The notation for
theta-joins of relations R and S based on condition C is R C .WeusetheSQL
keyword ON to keep this condition C separated from the other WHERE condi-
tions since it reflects a database requirement and shouldn't match to anything
of the NL question. (e.g. city JOIN state ON city.city name = state.capital ).
The complexity of generated queries is fairly high indeed, since we can deal
with questions that require nesting, aggregation and negation in addition to basic
projection, selection and joining (e.g. “ How many states have major non-capital
cities excluding Texas ”).
2.3 Problem Definition
The question answering task of finding an SQL query that retrieves an answer
for a given NL question reduces to the following problem.
Given a question q represented by means of one typed dependency collapsed
list SDC q , generate the three sets of clauses
S, F, W
(argument of SELECT,
FROM and WHERE, respectively) such that:
∃s ∈S, ∃f ∈F, ∃w ∈W
s.t. π s ( σ w ( f )) answers q
(3)
The query answer π s ( σ w ( f )) is chosen among the set of all possible queries
A
in a way that maximizes the proba-
bility of generating a result set answering question q .
=
{
SELECT
FROM
WHERE w}
3 Building Clauses Sets
In order to generate all possible queries for a question q we need to find their pos-
sible SELECT, FROM and WHERE clauses (
S, F
W
). We start from a de-
pendency list SDC q and (a) prune and stem its components, (b) add synonyms,
(c) create the set of stems used to build S and W and (d) keep only dependencies
possibly used in the recursive step to generate nested queries. Building the set
F from S and W is straightforward.
We are now going to briefly discuss some examples to introduce the objec-
tive of individual steps and clarify how the entire process is carried out. The
first question we take into account is the simplest one: “ What is the capital
of Texas? ”. Its answer can be retrieved executing the query: SELECT capital
FROM state WHERE state.state name='Texas' . We can see that they share only
two stems, capital and Texas . The key of categorizing stems (Section 3.2) is to
recognize that the first stem will be used in
and
.In
particular, since the word Texas is not a value in the IS , it is used as a r-value
in the WHERE expression, while the l-value is derived from the column name
under where it appears (Section 3.4).
The fact of being respectively projection and selection oriented can be in-
ferred looking at their grammar relations, i.e. inspecting the dependency list
(e.g. root of the sentence together subject dependent are typically used for pro-
jections). This list needs to be preprocessed (section 3.1) to take into account
S
and the second one in
W
 
Search WWH ::




Custom Search