Information Technology Reference
In-Depth Information
N
O
C S C
2+2 embs.
O C S C
O C S C N
O
C S C N
O C S C
1+1 embs.
1+3 embs.
Fig. 11.3. Example of an imperfect extension
As an example consider again the simple set of molecules shown in Figure 11.1.
If the search is seeded with a single sulfur atom, considering extensions by a single
bond starting at the sulfur atom and leading to an oxygen atom can be postponed
until the structure S-C-N common to all molecules has been grown (provided
that the extensions of this maximal common fragment are not restricted in any
way—a requirement we discuss in detail below).
Technically, the search tree pruning is based on the notion of a perfect exten-
sion . An extension of a fragment, consisting of an edge and possibly a node (if
the edge does not close a ring), is called perfect if all of its embeddings (that is,
occurrences of the fragment in the database graphs) can be extended in exactly
the same way by this edge and node. (Note that there may be several ways of ex-
tending an embedding by this edge and node; then all embeddings of a fragment
must be extendable in the same number of ways.) Obviously, if there is a perfect
extension, all closed super-fragments of a given fragment can, in principle, be
found by searching only the corresponding branch.
Note that one has to be careful when identifying perfect extensions. In the
first place, it does not suce to check whether the number of embeddings of
the extended fragment is equal to or a multiple of the number of embeddings of
the base fragment (as one may be tempted to think at first sight). This is only
a necessary, but not a sucient condition, as the example shown in Figure 11.3
demonstrates. Even though the total number of embeddings in the right branch
is the same as for the root, the extension is not perfect, because the extension can
be done only once in the left molecule, but three times in the right. The extension
in the left branch is not perfect, because the number of extended embeddings,
even though the same for each parent embedding, is reduced from the number
of extensions of its parents. Such a reduction, which also occurs in the right
branch for the left molecule, indicates that some symmetry has been destroyed
by the extension, which therefore cannot be perfect. As a consequence, a test for
perfect extension actually has to count and compare the number of embeddings
per database graph.
A second problem (which was overlooked in both [17] as well as in [4]) is
the behavior of rings (cycles) in the search, as we demonstrate with the example
molecules shown in Figure 11.4. A search tree for these molecules (with only such
fragments that are contained in both molecules) is shown in Figure 11.5. Here
almost all extensions are perfect in the sense that they can be made in the same
way in all molecules. However, the problem becomes clear when one considers
adding a bond from the nitrogen atom to a carbon atom. This extension rules
out certain ways of reaching the carbon atom via the oxygen atom and the
 
Search WWH ::




Custom Search