Database Reference
In-Depth Information
Grafil
Yan et al. (2005) have proposed a feature-based structural filtering algorithm, named Grafil ( Gra ph
Similarity Fil tering) to perform substructure similarity search in graph databases. Grafil models each
query graph as a set of features and transforms the edge deletions into the feature misses in the query
graph. With an upper bound on the maximum allowed feature misses, Grafil can filter many graphs
directly without performing pair-wise similarity computation. It uses two data structures: feature-graph
matrix and edge-feature matrix .
The feature-graph matrix is used to compute the difference in the number of features between a
query graph and graphs in the database. In this matrix, each column corresponds to a target graph in
the graph database, each row corresponds to a feature being indexed and each entry records the number
of the embeddings of a specific feature in a target graph. The edge-feature matrix is used to compute a
bound on the maximum allowed feature misses based on a query relaxation ratio. In this matrix, each
row represents an edge while each column represents an embedding of a feature. Grafil uses a multi-
filter composition strategy, where each filter uses a distinct and complimentary subset of the features.
The filters are constructed by a hierarchical, one-dimensional clustering algorithm that groups features
with similar selectivity into a feature set.
During the query processing, the feature-graph matrix is used to calculate the difference in the
number of features between each graph database member g i and the query q . If the difference is greater
than a user-defined parameter d max then it is discarded while the remaining graphs constitute a candidate
answer set. The substructure similarity is then calculated for each candidate to prune the false positives
candidates. A loop of query relaxation steps can be applied if the user needs more matches than those
returned from the current value of d max .
Closure Tree
He & Singh (2006) have proposed a tree-based index structure named CTree ( C losure- T ree). CTree
index is very similar to the R-tree indexing mechanism (Guttman, 1984) but extended to support graph-
matching queries. In this index structure, each node in the tree contains discriminative information
about its descendants in order to facilitate effective pruning. The closure of a set of vertices is defined
as a generalized vertex whose attribute is the union of the attribute values of the vertices. Similarly, the
closure of a set of edges is defined as a generalized edge whose attribute is the union of the attribute
values of the edges. The closure of two graphs g 1 and g 2 under a mapping M is defined as a generalized
graph (V, E) where V is the set of vertex closures of the corresponding vertices and E is the set of edge
closures of the corresponding edges (Figure 6). Hence, a graph closure has the same characteristics of a
graph. However, the only difference is that the graph database member has singleton labels on vertices
and edges while the graph closure can have multiple labels.
In a closure tree, each node is a graph closure of its children where the children of an internal node
are nodes and the children of a leaf node are database graphs. A subgraph query is processed in two
phases. The first phase traverses the CTree and the nodes are pruned based on a pseudo subgraph iso-
morphism . A candidate answer set is returned. The second phase verifies each candidate answer for
exact subgraph isomorphism and returns the answers. In addition to pruning based on pseudo subgraph
isomorphism, a lightweight histogram-based pruning is also employed. The histogram of a graph is a
vector that counts the number of each distinct attribute of the vertices and edges. For similarity queries,
Search WWH ::




Custom Search