Information Technology Reference
In-Depth Information
can be pruned from the search tree (see, for example, [1] for a family of such
canonical forms and details of the procedure). As the properties of canonical
forms and code words are widely used throughout this chapter, we briefly review
them in Section 11.4.
To further improve the algorithms one may restrict the search to so-called
closed graph fragments (Section 11.2), which capture all information about the
set of all frequent subgraphs, but lead to considerably smaller output (in terms
of the number of reported fragments). This restriction also enables us to employ
additional pruning techniques, one of which is perfect extension pruning ,aswe
called it in [4], or equivalent occurrence pruning , as it is called in [17]. Unfor-
tunately, neither of these approaches, in the form in which they were originally
described in these papers, works correctly, as they can miss certain fragments.
This flaw we fix in this paper (Section 11.3).
In addition, the approach in [4] avoided redundant search with the help of
a repository of found fragments instead of using the more elegant approach of
canonical form pruning. As a consequence, perfect extension pruning was easier
to perform, since it was not necessary to pay attention to the canonical form.
With canonical form pruning, part of perfect extension pruning is easy to achieve,
namely pruning the search tree branches to the right of the perfect extension
(Section 11.5). This was first shown in Closegraph [17]. In this paper we show how
one may also prune the search tree branches to the left of the perfect extension
by introducing a (strictly limited) code word reorganization (Section 11.6). We
demonstrate the usefulness of the enhanced approach with experiments on two
molecular data sets (Section 11.7).
11.2
Mining Closed Graph Fragments
The notion of a closed fragment is derived from the corresponding notion of a
closed item set, which is defined as an item set no superset of which has the same
support, i.e., is contained in the same number of transactions. Analogously, a
closed graph fragment is a fragment no superstructure of which has the same
support, i.e., is contained in the same number of database graphs.
As an example consider the molecules (no chemical meaning attached—they
were constructed merely for demonstration purposes) shown in Figure 11.1 as
the given database of attributed graphs. A corresponding search tree (starting
from sulfur as a seed and with fragments being extended only if they appear in at
least two molecules) is shown in Figure 11.2 (how the extensions of fragments are
chosen and ordered is explained below). The numbers below or to the left/right
of the fragments state their support, i.e., the number of molecules a fragment is
contained in. Infrequent fragments (i.e. with a support less than two molecules)
S C N C
O
O S C N
F
O S C N
O
Fig. 11.1. Three simple example molecules
 
Search WWH ::




Custom Search