Database Reference
In-Depth Information
Figure 4. Sample graph representation using GString basic structures
computed hash value of the graph query is then used to identify and verify the set of queries that
matches the canonical codes of the graph query. A clear advantage of the GDIndex approach is that no
candidate verification is required. However, the index is designed for databases that consist of rela-
tively smaller graphs and do not have a large number of distinct graphs.
GString
The GString approach (Jiang et al., 2007) considers the semantics of the graph structures in the database.
It focuses on modeling graph objects in the context of organic chemistry using basic structures ( Line ,
Star and Cycle ) that have semantic meaning and use them as indexing features. Line structure denotes
a structure consisting of a series of vertices connected end to end, Cycle structure denotes a structure
consisting of a series of vertices that form a close loop and Star structure denotes a structure where a
core vertex directly connects to several vertexes. For a graph g, GString first extracts all Cycle structures,
then it extracts all Star structures, and finally, it identifies the remaining structures as Line structures.
Figure 4 represents a sample graph representation using the GString basic structures. GString represents
both graphs and queries on graphs as string sequences and transforms the subgraph search problem to
the subsequence string-matching domain.
A suffix tree-based index structure for the string representations is then created to support an efficient
string matching process. Given a basic structure, its GString has three components: type , size , and a set
of annotations ( edits ). For Line or Cycle, the size is the number of vertices in the structure. For Star, the
size indicates the fanout of the central vertex. For a query graph q , GString derives its summary string
representation which is then matched against the suffix-tree of the graph database. An element of a sum-
mary string matches a node in the suffix-tree if their types match, sizes are equal or the size in the
query is no more than the size in the node and the counts of corresponding types of edits in the query
are no larger than those in the node. A key disadvantage of the GString approach is that converting
subgraph search queries into a string matching problem could be an inefficient approach especially if
the size of the graph database or the subgraph query is large. Additionally, GString focuses on decom-
posing chemical compounds into basic structures that have semantic meaning in the context of organic
chemistry and it is not trivial to extend this approach in other domain of applications.
Search WWH ::




Custom Search