Database Reference
In-Depth Information
computes the expected support of frequent patterns on-the-fly so as to reduce the
space requirement.
5
Tree-Based Uncertain Frequent Pattern Mining
Recall that (i) candidate generate-and-test based mining algorithms (e.g., the U-
Apriori algorithm) use a levelwise bottom-up breadth-first mining paradigm and
(ii) hyperlinked structure based algorithms (e.g., the UH-mine algorithm) recursively
adjust the hyperlinks in the hyperlinked structure (e.g., UH-struct) to find frequent
patterns from uncertain data in a depth-first fashion. As an alternative to Apriori-based
and hyperlinked structure based mining, tree-based mining avoids generating many
candidates and avoids recursively adjusting many hyperlinks. Tree-based algorithms
use a depth-first divide-and-conquer approach to mine frequent patterns from a tree
structure that captures the contents of the probabilistic dataset.
5.1
UF-growth
To mine frequent patterns from probabilistic datasets of uncertain data, Leung et al.
[ 42 ] proposed a tree-based mining algorithm called UF-growth . Similar to its coun-
terpart for mining precise data (the FP-growth algorithm [ 24 ]), UF-growth also
constructs a tree structure to capture the contents of the datasets. However, it does
not use the FP-tree (as in FP-growth) because each node in the FP-tree only main-
tains (i) an item and (ii) its occurrence count in the tree path. When mining precise
data, the actual support of an pattern X depends solely on the occurrence counts
of items within X . However, when mining uncertain data, the expected support of
X is the sum of the product of the occurrence count and existential probability of
every item within X . Hence, each node in the UF-tree (the tree structure for UF-
growth) consists of three components: (i) an item, (ii) its existential probability, and
(iii) its occurrence count in the path. See Fig. 14.2 . Such a UF-tree is constructed in
a similar fashion as the construction of the FP-tree, except that a new transaction is
merged with a child node only if the same item and the same existential probability
exist in both the transaction and the child node. As such, it may lead to a lower
compression ratio than the original FP-tree. Fortunately, the number of nodes in a
UF-tree is bounded above by the sum of the number of items in all transactions in
the probabilistic dataset of uncertain data.
To reduce the memory consumption, UF-growth incorporates two improvement
techniques [ 42 ]. The first technique is to discretize the existential probability of
each node (e.g., round the existential probability to k decimal places such as k =2
decimal places), which reduces the potentially infinite number of possible existential
probability values to a maximum of 10 k possible values. The second improvement
technique is to limit the construction of UF-trees to only the first two levels (i.e., only
construct the global UF-tree for the original probabilistic dataset D and a UF-tree for
Search WWH ::




Custom Search