Database Reference
In-Depth Information
statistical summary information an effective selectivity-aware decomposition process and reduces the
size of the intermediate result of each step.
Mining-Based Graph Indexing Techniques
Graph-Based Mining
The GIndex technique (Yan et al., 2004) is the first work that used pattern-mining techniques to index
graph databases. It makes use of frequent subgraphs as the basic indexing unit. The main observation of
this approach is that graph-based index can significantly improve query performance over a path-based
one due to the high pruning power of graph data structures. However, the limitation of using subgraphs
as the indexing unit is that the number of graph structures is usually much larger than the number of paths
in a graph database. To deal with this problem, GIndex considers only frequent subgraphs. Therefore,
in order to avoid the exponential growth of the number of frequent subgraphs, the support threshold is
progressively increased when the subgraphs grow large. Any subgraph is considered as being frequent
if its support is greater than a minimum support threshold. GIndex uses low support for small subgraphs
and high support for large subgraphs. Given a query graph q , if q is a frequent subgraph, the exact set
of query answers containing q can be retrieved directly without performing any candidate verification
since q is indexed. Otherwise, q probably has a frequent subgraph f whose support maybe close to the
support threshold. A candidate answer set of query q is then retrieved and verified. If the query graph
q is infrequent that means this subgraph is only contained in a small number of graphs in the database.
Therefore, the number of subgraph isomorphism tests is going to be small. Among similar fragments
with the same support, GIndex only index the smallest common fragment since more query graphs may
contain the smallest fragment. Therefore, the subgraph indexes can be more compact and more effective.
Cheng et al. (2007) have extended the ideas of GIndex (Yan et al., 2004) by using nested inverted-
index in a new graph index structure named FG-Index . In this index structure, a memory-resident
inverted-index is built using the set of frequent subgraphs. A disk-resident inverted-index is built on the
closure of the frequent graphs. If the closure is too large, a local set of closure frequent subgraphs can
be computed from the set of frequent graphs and a further nested inverted-index can be constructed.
The main advantage of this approach is that it can answer an important subset of queries (frequent graph
queries) directly without verification.
Tree-Based Mining
Zhang et al. (2007) have presented an approach for using frequent subtrees as the indexing unit for graph
structures named TreePI . The main idea of this approach is based on two main observations: 1) Tree
data structures are more complex patterns than paths and trees can preserve almost equivalent amount
of structural information as arbitrary subgraph patterns. 2) The frequent subtree mining process is rela-
tively easier than general frequent subgraph mining process. Therefore, the TreePI starts by mining the
frequent tree on the graph database and then selecting a set of frequent trees as index patterns. In the
query processing, for a query graph q , the frequent subtrees in q are identified and then and then matched
with the set of indexing features to obtain a candidate set. In the verification phase, the advantage of the
location information partially stored with the feature trees is utilized for devising an efficient subgraph
isomorphism tests. As the canonical form of any tree can be calculated in polynomial time, the indexing
Search WWH ::




Custom Search