Information Technology Reference
In-Depth Information
requirement is not a necessary condition. As a closer inspection easily reveals,
extensions closing a ring (that is, extensions by an edge leading to a node that is
already in the base fragment) are also safe and thus can be allowed as candidates
for perfect extensions: since the destination node is already in the fragment, there
cannot be a problem with multiple ways of reaching it. Hence we can slightly
relax the constraints.
Note, however, that these relaxed constraints are still only su cient, but not
necessary. There are other situations in which an extension can be considered
perfect, even though it does not meet the abovementioned requirements. For
example, if an edge leads to a new node and is part of rings of the same size and
composition in all supporting graphs, it is harmless and thus can be considered
a perfect extension. Even the extension by a bond from the nitrogen atom to the
oxygen atom in Figure 11.5 can be considered a safe perfect extension, despite
the fact that the rings have different size. Unfortunately, checking necessary and
sucient conditions for (safe) perfect extensions is complicated and costly and
thus we confine ourselves to the rule that an extension edge must either be a
bridge in all database graphs or must close a ring (cycle) in all database graphs
in order to be considered perfect.
11.4
Canonical Codes for Graphs
As we already mentioned in the introduction, perfect extension pruning, as it
was described in the previous section, is not a problem unless canonical forms are
used to identify redundant fragments. However, since canonical forms are a lot
more elegant than, for example, a repository of already processed fragments, need
less memory and make it easier to parallelize the search, it is desirable to be able
to use perfect extension pruning together with canonical form pruning. In this
section we briefly review some fundamentals of canonical forms for (attributed)
graphs, which are necessary to know in order to understand the code word
reorganization we describe in Section 11.6.
The core idea of canonical forms of graphs is to describe an (attributed) graph
by a code word , which uniquely identifies it up to automorphisms, and from which
the graph can be reconstructed. The letters of such a code word describe the
edges of the graph and which nodes they connect as well as the node and edge
attributes (or labels). In order to capture the connection structure, the nodes
are numbered (or, more generally, endowed with unique labels), since the node
attributes are not enough to identify them uniquely: the same attribute may be
assigned to several nodes in a graph. Of course, there are several possible ways
of numbering the nodes, each of which gives rise to a different code word. In
principle, all of these code words are taken into account and the lexicographically
smallest (or greatest) code word is then defined to be the canonical code word .
Note, however, that due to the way in which code words are used in the search
(see below), the possible node numberings (and thus the possible code words)
one has to consider can actually be restricted to those compatible with traversals
 
Search WWH ::




Custom Search