Querying Graph Databases - Advanced Database Query Systems

Database Reference

In-Depth Information

Grafil

Yan et al. (2005) have proposed a feature-based structural filtering algorithm, named Grafil ( Gra ph

Similarity Fil tering) to perform substructure similarity search in graph databases. Grafil models each

query graph as a set of features and transforms the edge deletions into the feature misses in the query

graph. With an upper bound on the maximum allowed feature misses, Grafil can filter many graphs

directly without performing pair-wise similarity computation. It uses two data structures: feature-graph

matrix and edge-feature matrix .

The feature-graph matrix is used to compute the difference in the number of features between a

query graph and graphs in the database. In this matrix, each column corresponds to a target graph in

the graph database, each row corresponds to a feature being indexed and each entry records the number

of the embeddings of a specific feature in a target graph. The edge-feature matrix is used to compute a

bound on the maximum allowed feature misses based on a query relaxation ratio. In this matrix, each

row represents an edge while each column represents an embedding of a feature. Grafil uses a multi-

filter composition strategy, where each filter uses a distinct and complimentary subset of the features.

The filters are constructed by a hierarchical, one-dimensional clustering algorithm that groups features

with similar selectivity into a feature set.

During the query processing, the feature-graph matrix is used to calculate the difference in the

number of features between each graph database member g i and the query q . If the difference is greater

than a user-defined parameter d max then it is discarded while the remaining graphs constitute a candidate

answer set. The substructure similarity is then calculated for each candidate to prune the false positives

candidates. A loop of query relaxation steps can be applied if the user needs more matches than those

returned from the current value of d max .

Closure Tree

He & Singh (2006) have proposed a tree-based index structure named CTree ( C losure- T ree). CTree

index is very similar to the R-tree indexing mechanism (Guttman, 1984) but extended to support graph-

matching queries. In this index structure, each node in the tree contains discriminative information

about its descendants in order to facilitate effective pruning. The closure of a set of vertices is defined

as a generalized vertex whose attribute is the union of the attribute values of the vertices. Similarly, the

closure of a set of edges is defined as a generalized edge whose attribute is the union of the attribute

values of the edges. The closure of two graphs g 1 and g 2 under a mapping M is defined as a generalized

graph (V, E) where V is the set of vertex closures of the corresponding vertices and E is the set of edge

closures of the corresponding edges (Figure 6). Hence, a graph closure has the same characteristics of a

graph. However, the only difference is that the graph database member has singleton labels on vertices

and edges while the graph closure can have multiple labels.

In a closure tree, each node is a graph closure of its children where the children of an internal node

are nodes and the children of a leaf node are database graphs. A subgraph query is processed in two

phases. The first phase traverses the CTree and the nodes are pruned based on a pseudo subgraph iso-

morphism . A candidate answer set is returned. The second phase verifies each candidate answer for

exact subgraph isomorphism and returns the answers. In addition to pruning based on pseudo subgraph

isomorphism, a lightweight histogram-based pruning is also employed. The histogram of a graph is a

vector that counts the number of each distinct attribute of the vertices and edges. For similarity queries,

Search WWH ::

Custom Search

Home