Uncertain Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

computes the expected support of frequent patterns on-the-fly so as to reduce the

space requirement.

5

Tree-Based Uncertain Frequent Pattern Mining

Recall that (i) candidate generate-and-test based mining algorithms (e.g., the U-

Apriori algorithm) use a levelwise bottom-up breadth-first mining paradigm and

(ii) hyperlinked structure based algorithms (e.g., the UH-mine algorithm) recursively

adjust the hyperlinks in the hyperlinked structure (e.g., UH-struct) to find frequent

patterns from uncertain data in a depth-first fashion. As an alternative to Apriori-based

and hyperlinked structure based mining, tree-based mining avoids generating many

candidates and avoids recursively adjusting many hyperlinks. Tree-based algorithms

use a depth-first divide-and-conquer approach to mine frequent patterns from a tree

structure that captures the contents of the probabilistic dataset.

5.1

UF-growth

To mine frequent patterns from probabilistic datasets of uncertain data, Leung et al.

[ 42 ] proposed a tree-based mining algorithm called UF-growth . Similar to its coun-

terpart for mining precise data (the FP-growth algorithm [ 24 ]), UF-growth also

constructs a tree structure to capture the contents of the datasets. However, it does

not use the FP-tree (as in FP-growth) because each node in the FP-tree only main-

tains (i) an item and (ii) its occurrence count in the tree path. When mining precise

data, the actual support of an pattern X depends solely on the occurrence counts

of items within X . However, when mining uncertain data, the expected support of

X is the sum of the product of the occurrence count and existential probability of

every item within X . Hence, each node in the UF-tree (the tree structure for UF-

growth) consists of three components: (i) an item, (ii) its existential probability, and

(iii) its occurrence count in the path. See Fig. 14.2 . Such a UF-tree is constructed in

a similar fashion as the construction of the FP-tree, except that a new transaction is

merged with a child node only if the same item and the same existential probability

exist in both the transaction and the child node. As such, it may lead to a lower

compression ratio than the original FP-tree. Fortunately, the number of nodes in a

UF-tree is bounded above by the sum of the number of items in all transactions in

the probabilistic dataset of uncertain data.

To reduce the memory consumption, UF-growth incorporates two improvement

techniques [ 42 ]. The first technique is to discretize the existential probability of

each node (e.g., round the existential probability to k decimal places such as k =2

decimal places), which reduces the potentially infinite number of possible existential

probability values to a maximum of 10 k possible values. The second improvement

technique is to limit the construction of UF-trees to only the first two levels (i.e., only

construct the global UF-tree for the original probabilistic dataset D and a UF-tree for

Search WWH ::

Custom Search

Home