Querying Graph Databases - Advanced Database Query Systems

Database Reference

In-Depth Information

statistical summary information an effective selectivity-aware decomposition process and reduces the

size of the intermediate result of each step.

Mining-Based Graph Indexing Techniques

Graph-Based Mining

The GIndex technique (Yan et al., 2004) is the first work that used pattern-mining techniques to index

graph databases. It makes use of frequent subgraphs as the basic indexing unit. The main observation of

this approach is that graph-based index can significantly improve query performance over a path-based

one due to the high pruning power of graph data structures. However, the limitation of using subgraphs

as the indexing unit is that the number of graph structures is usually much larger than the number of paths

in a graph database. To deal with this problem, GIndex considers only frequent subgraphs. Therefore,

in order to avoid the exponential growth of the number of frequent subgraphs, the support threshold is

progressively increased when the subgraphs grow large. Any subgraph is considered as being frequent

if its support is greater than a minimum support threshold. GIndex uses low support for small subgraphs

and high support for large subgraphs. Given a query graph q , if q is a frequent subgraph, the exact set

of query answers containing q can be retrieved directly without performing any candidate verification

since q is indexed. Otherwise, q probably has a frequent subgraph f whose support maybe close to the

support threshold. A candidate answer set of query q is then retrieved and verified. If the query graph

q is infrequent that means this subgraph is only contained in a small number of graphs in the database.

Therefore, the number of subgraph isomorphism tests is going to be small. Among similar fragments

with the same support, GIndex only index the smallest common fragment since more query graphs may

contain the smallest fragment. Therefore, the subgraph indexes can be more compact and more effective.

Cheng et al. (2007) have extended the ideas of GIndex (Yan et al., 2004) by using nested inverted-

index in a new graph index structure named FG-Index . In this index structure, a memory-resident

inverted-index is built using the set of frequent subgraphs. A disk-resident inverted-index is built on the

closure of the frequent graphs. If the closure is too large, a local set of closure frequent subgraphs can

be computed from the set of frequent graphs and a further nested inverted-index can be constructed.

The main advantage of this approach is that it can answer an important subset of queries (frequent graph

queries) directly without verification.

Tree-Based Mining

Zhang et al. (2007) have presented an approach for using frequent subtrees as the indexing unit for graph

structures named TreePI . The main idea of this approach is based on two main observations: 1) Tree

data structures are more complex patterns than paths and trees can preserve almost equivalent amount

of structural information as arbitrary subgraph patterns. 2) The frequent subtree mining process is rela-

tively easier than general frequent subgraph mining process. Therefore, the TreePI starts by mining the

frequent tree on the graph database and then selecting a set of frequent trees as index patterns. In the

query processing, for a query graph q , the frequent subtrees in q are identified and then and then matched

with the set of indexing features to obtain a candidate set. In the verification phase, the advantage of the

location information partially stored with the feature trees is utilized for devising an efficient subgraph

isomorphism tests. As the canonical form of any tree can be calculated in polynomial time, the indexing

Search WWH ::

Custom Search

Home