Database Reference
In-Depth Information
Fig. 14.6 The PUF-tree for
the probabilistic dataset D 2 of
uncertain data
(i.e., some false positives ), PUF-growth then quickly scans the dataset a third time
to check each of them to verify whether or not they are truly frequent (i.e., prune
false positives). As illustrated by Table 14.6 , the prefixed item caps tighten the upper
bound to the expected support of non-singleton patterns (when compared with the
transaction caps in the CUF-tree). Consequently, the number of false positives that
need to be examined by PUF-growth during the third scans of the probabilistic dataset
of uncertain data is usually smaller than that by CUF-growth. Hence, PUF-growth
runs faster than CUF-growth.
6
Constrained Uncertain Frequent Pattern Mining
Recall from Sect. 5 that the UF-growth, UFP-growth, CUF-growth and PUF-growth
algorithms are useful in finding all the frequent patterns from probabilistic datasets of
uncertain data in many situations. However, there are other situations in which users
are interested in only some of the frequent patterns. In these situations, users express
their interest in terms of constraints. This leads to constrained mining [ 20 , 27 , 30 ,
34 , 35 ]. In response, Leung et al. [ 33 , 43 ] extended the UF-growth algorithm to mine
probabilistic datasets of uncertain data for frequent patterns that satisfy user-specified
constraints. The two resulting algorithms, called U-FPS [ 33 ] and U-FIC [ 43 ], push
the constraints in the mining process and exploit properties of different kinds of
constraints (instead of a naive approach of first mining all frequent patterns and then
pruning all uninteresting or invalid ones).
U-FPS exploits properties of (two types of) succinct constraints [ 31 ]. More specif-
ically, by exploiting that “all patterns satisfying any succinct and anti-monotone
( SAM ) constraint C SAM must comprise only items that individually satisfy C SAM ”,
U-FPS stores only these items in the UF-tree when handling C SAM . Similarly, by
exploiting that “all patterns satisfying any succinct but not anti-monotone ( SUC )
constraint C SUC consist of at least one item that individually satisfies C SUC and may
contain other items”, U-FPS partitions the domain items into two groups (one group
contains items individually satisfying C SUC and another group contains those not)
and stores items belonging to each group separately in the UF-tree. See Fig. 14.7 .
Search WWH ::




Custom Search