Information Technology Reference
In-Depth Information
only relevant relations between the stems of the question. Let us consider for
example the question: “ What is the capital of the most populous state? ”and
its associated answering query SELECT capital FROM state WHERE population =
(SELECT max(population) FROM state) .
The matching words are capital and state , while stemming also allows to find
a mapping through popul . We can note that this stem is used both in the l-
value and in the r-value of the WHERE expression. In fact, this query requires
nesting and indeed the categorizing algorithm needs to be recursive. This stem
is classified both as a selection oriented stem for the outer query, and as a
projection oriented one for the inner query (note that it requires aggregation,
handled when generating the SELECT clause set).
Finally we will introduce one last example to clarify Section 3.5. While with
the other examples it is straightforward to compile the FROM clause, since the
other clauses refer to the same table, when we deal with columns belonging
to different tables things get complicated. Take question “ What are the cap-
itals states bordering Texas? ”) and its associated query SELECT capital FROM
... WHERE border = 'Texas' . How can we fill in the dots in the FROM clause?
Fields capital and border belong respectively to tables state and border info .
Form the database catalog, we learn that these two tables are connected via the
foreign key state name and so the final
will include the following join: state
JOIN border info on state.state name = border info.state name .
F
3.1 Optimizing the Dependency List
As introduced in Section 2.1, we don't need all grammatical relations provided in
output by the Stanford Dependency parser. For this reason before preprocessing
the list of dependencies we need to prune the useless ones and remove from
gov ernors and dep endents the appended number (indicating the position of the
word in question q ). Then, gov sand dep s are reduced to stems (using the Porter
stemmer 1 ).
In order to disambiguate the sense of the stems that do not appear in metadata
but could match with it, we create a list of synonyms using off-the-shelf resources
(like Wordnet and similarity measures) combined with our internal knowledge
(represented by database constraints). Using this list we can substitute certain
stems with their stemmed synonyms.
The resulting SDC q is optimized to be processed by the next step. An example
showing SDC opt
q 1 with respect to the original SDC q 1 introduced in Section 2.1
can be found in Table 1.
3.2 Categorizing Stems
Before building
sets we need to identify those stems that are projection
and/or selection oriented. Those stems will be added respectively to Π and/or
S
and
W
1 http://tartarus.org/martin/PorterStemmer/
 
Search WWH ::




Custom Search