Database Reference
In-Depth Information
Figure 3. Sample graph decomposition
length and records the number of occurrences of each path. Hence, in this index table, each row stands
for a path and each column stands for a graph. Each entry in the table is the number of occurrences of
the path in the graph. In the query processing, the path indexes is used to find a set of candidate graphs
which contains paths in the query structure and to check if the counts of such paths are beyond the
threshold specified in the query. In the verification step, each candidate graph is examined by subgraph
isomorphism to obtain the final results. The main strength of this approach is that the indexing process
of paths with limited lengths is usually fast. However, the size of the indexed paths could drastically
increase with the size of graph database. In addition, the filtering power of paths data structure is limited.
Therefore, the verification cost can be very high due to the large size of the candidate set.
GDIndex
Williams et al. (2007) have presented an approach for graph database indexing using a structured graph
decomposition named GDIndex. In this approach, all connected and induced subgraphs of a given graph
are enumerated. Therefore, a graph of size n is decomposed into at most 2 n subgraphs when each of
the vertices has a unique label. However, due to isomorphism between enumerated graphs, a complete
graph with multiple occurrences of the same label may decompose into fewer subgraphs. If all labels are
identical, a complete graph of size n is decomposed into just n +1 subgraphs. A directed acyclic graph
(DAG) is constructed to model the decomposed graphs and the contained relationships between them. In
this DAG, there is always one node that represents the whole graph G , and one node that represents the
null graph. The children of a node P are all graphs Q where there is a directed link in the DAG between
P and Q . Moreover, the descendants of a node P are all nodes that are reachable from P in the DAG.
Figure 3 depicts a sample of graph decomposition using the GDIndex approach.
A hash table is used to index the subgraphs enumerated during the decomposition process. The hash
key of each subgraph is determined from the string value given by the canonical code of the subgraph.
This canonical code is computed from its adjacency matrix. In this way, all isomorphic graphs produce
the same hash key. Since all entries in the hash table are in canonical form, only one entry is made for
each unique canonical code. This hash table enables the search function to quickly locate any node in
the decomposition DAG, which is isomorphic to a query graph, if it exists. Therefore, in the query-
processing step, the hash key of the graph query q is computed from the query's canonical code. This
Search WWH ::




Custom Search