Practical Approaches to the Many-Answer Problem - Advanced Database Query Systems

Database Reference

In-Depth Information

Figure 11. Tuple trees for query q = 'Hristidis Vagelis and Papakonstantinou Yannis'

each node (tuple) containing a query keyword (e.g., a 2 and a 4 in Figure 11) and executes a Dijkstra's

single source shortest path algorithm for each one of them. The idea is to find a common vertex (e.g.,

p 2 and p 4 in Figure 11) from which a forward path exists to at least one tuple corresponding to a differ-

ent keyword in the query. Such paths will define a rooted directed tree with the common vertex as the

root containing the keyword nodes as leaves, which will be one possible answer (tuple tree) for a given

query. The answer trees are then ranked and displayed to the user. The ranking strategy of BANKS is to

combine nodes and edges weights in a tuple tree to compute a score for ranking.

DBXplorer

DBXplorer (Agrawal, Chaudhuri & Das, 2002) models the relational schema as a graph, in which nodes

map to database relations and edges represent relationships, such as primary-foreign key dependencies.

Given a query consisting of a set of keywords, DBXplorer first searches the symbol table to find the

relations of the database that contain the query keywords. The symbol table serves as an inverted list

and it is built by preprocessing the whole database contents before the search. Then, DBXplorer uses the

schema graph to find join trees that interconnect these relations. A join tree is a subtree of schema graph

that satisfies two conditions: one is that the relation corresponding to a leaf node contains at least one

query keyword; another is that every query keyword is contained by a relation corresponding to a leaf

node. Thus, if all relations in a join tree are joined, the results might contain rows having all keywords.

For each join tree a relevant SQL query is then created and executed. Finally, results are ranked and

displayed to the user. The score function that DBXplorer uses to rank results is very simple. The score

of a result is the number of joins involved. The rationale behind this simple relevance-ranking scheme

is that the more joins are needed to create a row with the query keywords, the less clear it becomes

whether the result might be meaningful or helpful.

DISCOVER

DISCOVER (Hristidis & Papakonstantinou, 2002) also exploits the relational schema graph. It uses the

concept of a candidate network to refer to the schema of a possible answer, which is a tree interconnect-

ing the set of tuples that contain all the keywords, as in DBXplorer. The candidate network generation

algorithm is also similar. However, DISCOVER can be regarded as an improvement of DBXplorer. In

fact, it stores some temporary data to avoid re-executing joins that are common among candidate net-

works. DISCOVER, like DBXplorer, ranks results based on the number of joins of the corresponding

candidate network.

Search WWH ::

Custom Search

Home