Uncertain Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

bigger UF-trees are constructed by UF-growth, more hyperlinks are adjusted by UH-

mine, and more estimated supports are computed by U-Eclat. Hence, longer runtimes

are required. Similarly, when minsup decreases, more frequent patterns are returned

and longer runtimes are also required. Moreover, the density of datasets also affects

runtimes. For instance, when datasets are dense (e.g., connect4 augmented with

existential probability), UF-trees obtain higher compression ratios and thus require

less time to traverse than sparse datasets (e.g., kosarak augmented with existential

probability). Some experimental results showed the following: (i) datasets with a

low number of distinct existential probabilities led to smaller UF-trees and shorter

runtimes for UF-growth (than U-Apriori); (ii) U-Apriori requires shorter runtimes

than UH-mine when minsup was low (e.g., minsup < 0.3 % for kosarak, minsup <

0.6 % for connect4) but vice versa when minsup was high; (iii) depending on the

number of samples, U-Eclat could take longer or shorter to run than U-Apriori.

11

Extension: Probabilistic Frequent Pattern Mining

The aforementioned algorithms all find (expected support-based) frequent patterns

from uncertain data. These are patterns with expected support meeting or exceed-

ing the user-specified threshold minsup . Note that expected support of a pattern

X provides users with frequency information of X summarized over all “possible

worlds”, but it does not reveal the confidence on the likelihood of X being frequent

(i.e., percentage of “possible worlds” in which X is frequent). However, know-

ing the confidence can be helpful in some applications. Hence, in recent years,

there is also algorithmic development on extending the notion of frequent patterns

based on expected support to useful patterns—such as probabilistic heavy hitters and

probabilistic frequent patterns as described below.

11.1

Mining Probabilistic Heavy Hitters

Although the expected support of an item x (i.e., a singleton pattern

) provides

users with an estimate of the frequency of x in many real-life applications, it is also

helpful to know the confidence on the likelihood of x being frequent in the uncertain

data in some other applications. Hence, Zhang et al. [ 56 ] formalized the notion of

probabilistic heavy hitters (i.e., probabilistic frequent items , which are also known as

probabilistic frequent singleton patterns) following the “possible world” semantics

[ 21 ] for probabilistic datasets of uncertain data.

{

x

}

Definition 14.4 Given (i) a probabilistic dataset D of uncertain data, (ii) a user-

specified support threshold φ , and (iii) a user-specified frequentness probability

threshold τ , the problem of mining probabilistic heavy hitters from uncertain

data is to find all ( φ , τ )-probabilistic heavy hitters (PHHs). An item x isa( φ , τ )-

probabilistic heavy hitter (PHH) if P ( sup ( x , W j ) >φ

|

W j |

) >τ (where sup ( x , W j )

is the support of x in a random possible world W j and

| W j |

is the number of items

Frequent Pattern Mining

Search WWH ::

Custom Search

Home