Information Technology Reference
In-Depth Information
only relevant relations between the
stems
of the question. Let us consider for
example the question: “
What is the capital of the most populous state?
”and
its associated answering query
SELECT capital FROM state WHERE population =
(SELECT max(population) FROM state)
.
The matching words are
capital
and
state
, while stemming also allows to find
a mapping through
popul
. We can note that this stem is used both in the l-
value and in the r-value of the WHERE expression. In fact, this query requires
nesting and indeed the categorizing algorithm needs to be recursive. This stem
is classified both as a selection oriented stem for the outer query, and as a
projection oriented one for the inner query (note that it requires aggregation,
handled when generating the SELECT clause set).
Finally we will introduce one last example to clarify Section 3.5. While with
the other examples it is straightforward to compile the FROM clause, since the
other clauses refer to the same table, when we deal with columns belonging
to different tables things get complicated. Take question “
What are the cap-
itals states bordering Texas?
”) and its associated query
SELECT capital FROM
... WHERE border = 'Texas'
. How can we fill in the dots in the FROM clause?
Fields
capital
and
border
belong respectively to tables
state
and
border info
.
Form the database catalog, we learn that these two tables are connected via the
foreign key
state name
and so the final
will include the following join:
state
JOIN border info on state.state name = border info.state name
.
F
3.1 Optimizing the Dependency List
As introduced in Section 2.1, we don't need all grammatical relations provided in
output by the Stanford Dependency parser. For this reason before preprocessing
the list of dependencies we need to prune the useless ones and remove from
gov
ernors and
dep
endents the appended number (indicating the position of the
word in question
q
). Then,
gov
sand
dep
s are reduced to stems (using the Porter
stemmer
1
).
In order to disambiguate the sense of the stems that do not appear in metadata
but could match with it, we create a list of synonyms using off-the-shelf resources
(like Wordnet and similarity measures) combined with our internal knowledge
(represented by database constraints). Using this list we can substitute certain
stems with their stemmed synonyms.
The resulting
SDC
q
is optimized to be processed by the next step. An example
showing
SDC
opt
q
1
with respect to the original
SDC
q
1
introduced in Section 2.1
can be found in Table 1.
3.2 Categorizing Stems
Before building
sets we need to identify those stems that are projection
and/or selection oriented. Those stems will be added respectively to
Π
and/or
S
and
W
Search WWH ::
Custom Search