Information Technology Reference
In-Depth Information
Moreover there could be redundant queries that, if optimized, allow us to
remove duplicates in the set, reducing its cardinality. For example, the pairing
s 2 ,f 3 ,w 1
requires the columns state.state name and border info.border to be
the same, so w 2 would select the same rows of w 2 (i.e. state.state name = 'new
york' ), but this means that table border info is no longer used and this pairing
is equivalent to
s 2 ,f 1 ,w 2
which, as said above, is meaningless.
4.3 Weighting Scheme
As introduced in the previous sections, we weigh each clause in
S
and
W
by
counting how many stems in the original question originated that clause.
In particular, for the SELECT clause, if there is a table that matches with
a stem, its weight is +2 while the matching with columns weighs +1 (common
stems between table and column are not valid). Superlatives matching with
aggregation operators count as +1.
For the WHERE clause, a weight is computed in the same way as for the
left-hand side of the conditions and a +1 is added for each matching value in
the right-hand side. In addition when dealing with nested queries, the WHERE
clause inherits also the weight of the nested query.
The FROM clauses are not associated with weights. However, we will take
into account how many joins are involved when ordering queries with the same
weight.
When pairing clauses the total weight is obtained just summing up the weight
of its components, and it is used to order the final set
¯
A
of possible useful queries
from the most to the least probable.
Figure 4 highlights this probabilistic score (obtained by the heuristic one by
normalization) through the thickness of connection lines. Dashed lines illustrate
pruned queries. The final ordered set answering q 2 is the following one:
¯
7
6
6
3
2
A
=
{s 1 ,f 3 ,w 2
, s 3 ,f 2 ,w 1
, s 2 ,f 3 ,w 2
, s 1 ,f 1
, s 2 ,f 1
,
1
s 3 ,f 2
}
.
From the pairing with highest weight we derive the answering query, that is:
SELECT state.capital FROM state join border on state.state name =border
info.border WHERE border info.state name='new york' .
It is worth noting that more then a query can have the same weight. To
deal with that, we implemented a comparator that privileges queries involving
less joins and embed the most referenced table (e.g. state in the case of Geo-
Query ). See, for example, the order of the second and third pairings in ¯
A
:they
have been swapped since f 3 contains a join while f 2 doesn't.
5 Kernel Methods for Ranking Question/Query Mapping
Once an initial rank of the candidate SQL queries has been derived, we can rely
on machine learning methods to improve the probability of finding the correct
answer in the top position. The need of designing suitable representations of the
question and query pairs makes this operation quite complex. For this purpose,
we rely on kernel methods.
 
Search WWH ::




Custom Search