Information Technology Reference
In-Depth Information
S
3
S F
S C
S O
3
1
2
O S C
S C N
S C O
2
3
2
O S C N
O S C
O
S C N
O
S C N C
2
1
1
2
O S C N
O
S C N C
O
1
1
Fig. 11.2. Search tree for the three molecules in Fig. 11.1; infrequent fragments
(contained in only one molecule) are drawn in gray/light colors, closed fragments are
encircled
are drawn in gray/light colors. The encircled fragments are closed and thus
constitute the output of the search (for a minimum support of two molecules).
Note that, for example, the fragment O-S-C is not closed, since the fragment
O-S-C-N , which contains O-S-C as a proper subgraph, has the same support
(namely two molecules).
As for item sets, restricting the search for molecular fragments to closed frag-
ments does not lose any information: all frequent fragments (drawn in black/dark
color in Figure 11.2) can be constructed from the closed ones by simply forming
all substructures of closed fragments that are not closed fragments themselves.
Knowledge of the support of any non-closed frequent fragment is also preserved:
its support is simply the maximum of the support values of those closed frag-
ments of which it is a substructure. Consequently, restricting the search to closed
fragments is a very convenient and lossless way to reduce the size of the output
of a frequent subgraph mining algorithm.
11.3
Perfect Extensions
Perfect extension pruning is based on the observation that sometimes there is a
fairly large common fragment in all currently considered database graphs (that
is, in all graphs considered in a given branch of the search tree). From the
definition of a closed fragment it is clear that in such a situation, if the current
fragment is only a part of the common substructure, then any extension that
does not grow the current fragment towards the maximal common one can be
postponed until this maximal common fragment has been reached. That is, as
long as the search has not grown a fragment to the maximal common one, it
is not necessary to branch in the search tree. The reason is, obviously, that the
maximal common fragment is part of all closed fragments that can be found in
the currently considered set of molecules. Consequently, it suces to follow only
one path in the search tree that leads to this maximal common fragment and to
start branching only from there.
 
Search WWH ::




Custom Search