Database Reference
In-Depth Information
Figure 11. Tuple trees for query q = 'Hristidis Vagelis and Papakonstantinou Yannis'
each node (tuple) containing a query keyword (e.g., a 2 and a 4 in Figure 11) and executes a Dijkstra's
single source shortest path algorithm for each one of them. The idea is to find a common vertex (e.g.,
p 2 and p 4 in Figure 11) from which a forward path exists to at least one tuple corresponding to a differ-
ent keyword in the query. Such paths will define a rooted directed tree with the common vertex as the
root containing the keyword nodes as leaves, which will be one possible answer (tuple tree) for a given
query. The answer trees are then ranked and displayed to the user. The ranking strategy of BANKS is to
combine nodes and edges weights in a tuple tree to compute a score for ranking.
DBXplorer
DBXplorer (Agrawal, Chaudhuri & Das, 2002) models the relational schema as a graph, in which nodes
map to database relations and edges represent relationships, such as primary-foreign key dependencies.
Given a query consisting of a set of keywords, DBXplorer first searches the symbol table to find the
relations of the database that contain the query keywords. The symbol table serves as an inverted list
and it is built by preprocessing the whole database contents before the search. Then, DBXplorer uses the
schema graph to find join trees that interconnect these relations. A join tree is a subtree of schema graph
that satisfies two conditions: one is that the relation corresponding to a leaf node contains at least one
query keyword; another is that every query keyword is contained by a relation corresponding to a leaf
node. Thus, if all relations in a join tree are joined, the results might contain rows having all keywords.
For each join tree a relevant SQL query is then created and executed. Finally, results are ranked and
displayed to the user. The score function that DBXplorer uses to rank results is very simple. The score
of a result is the number of joins involved. The rationale behind this simple relevance-ranking scheme
is that the more joins are needed to create a row with the query keywords, the less clear it becomes
whether the result might be meaningful or helpful.
DISCOVER
DISCOVER (Hristidis & Papakonstantinou, 2002) also exploits the relational schema graph. It uses the
concept of a candidate network to refer to the schema of a possible answer, which is a tree interconnect-
ing the set of tuples that contain all the keywords, as in DBXplorer. The candidate network generation
algorithm is also similar. However, DISCOVER can be regarded as an improvement of DBXplorer. In
fact, it stores some temporary data to avoid re-executing joins that are common among candidate net-
works. DISCOVER, like DBXplorer, ranks results based on the number of joins of the corresponding
candidate network.
Search WWH ::




Custom Search