Querying Graph Databases - Advanced Database Query Systems

Database Reference

In-Depth Information

Figure 3. Sample graph decomposition

length and records the number of occurrences of each path. Hence, in this index table, each row stands

for a path and each column stands for a graph. Each entry in the table is the number of occurrences of

the path in the graph. In the query processing, the path indexes is used to find a set of candidate graphs

which contains paths in the query structure and to check if the counts of such paths are beyond the

threshold specified in the query. In the verification step, each candidate graph is examined by subgraph

isomorphism to obtain the final results. The main strength of this approach is that the indexing process

of paths with limited lengths is usually fast. However, the size of the indexed paths could drastically

increase with the size of graph database. In addition, the filtering power of paths data structure is limited.

Therefore, the verification cost can be very high due to the large size of the candidate set.

GDIndex

Williams et al. (2007) have presented an approach for graph database indexing using a structured graph

decomposition named GDIndex. In this approach, all connected and induced subgraphs of a given graph

are enumerated. Therefore, a graph of size n is decomposed into at most 2 n subgraphs when each of

the vertices has a unique label. However, due to isomorphism between enumerated graphs, a complete

graph with multiple occurrences of the same label may decompose into fewer subgraphs. If all labels are

identical, a complete graph of size n is decomposed into just n +1 subgraphs. A directed acyclic graph

(DAG) is constructed to model the decomposed graphs and the contained relationships between them. In

this DAG, there is always one node that represents the whole graph G , and one node that represents the

null graph. The children of a node P are all graphs Q where there is a directed link in the DAG between

P and Q . Moreover, the descendants of a node P are all nodes that are reachable from P in the DAG.

Figure 3 depicts a sample of graph decomposition using the GDIndex approach.

A hash table is used to index the subgraphs enumerated during the decomposition process. The hash

key of each subgraph is determined from the string value given by the canonical code of the subgraph.

This canonical code is computed from its adjacency matrix. In this way, all isomorphic graphs produce

the same hash key. Since all entries in the hash table are in canonical form, only one entry is made for

each unique canonical code. This hash table enables the search function to quickly locate any node in

the decomposition DAG, which is isomorphic to a query graph, if it exists. Therefore, in the query-

processing step, the hash key of the graph query q is computed from the query's canonical code. This

Search WWH ::

Custom Search

Home