Uncertain Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

patterns that are supersets of p-FPs and deriving p-FPs in a top-down manner (i.e.,

descending cardinality of p-FPs).

To accelerate probabilistic frequent pattern mining, Wang et al. [ 51 ] applied

a model-based approach that supports both tuple uncertainty (as in the TODIS

algorithm [ 49 ]) and attribute uncertainty (as in the aforementioned dynamic compu-

tation technique [ 15 ]). Specifically, they represented the support pmf of a p-FP as

some existing probability models (e.g., Poisson binomial distribution model, normal

distribution). By doing so, Wang et al. quickly found two types of p-FPs: (i) threshold-

based p-FP (i.e., P ( sup ( X )

≥

minsup )

≥

minP rob ) and (ii) rank-based p-FP (e.g.,

top- k p-FP).

In addition to finding p-FPs from static datasets of uncertain data, there are also

algorithms that find p-FPs from dynamic streaming uncertain data. For instance,

Akbarinia and Masseglia [ 14 ] proposed an exact algorithm called FMU for fast

mining of streaming uncertain data with the sliding window model.

12

Conclusions

Frequent pattern mining is an important data mining task. It helps discover implicit,

previously unknown and potentially useful knowledge; it also helps reveal sets of

frequently co-occurring items in numerous real-life applications (e.g., bundles of

topics that are frequently bought together, collections of courses that are taken in

the same academic terms, events that are often co-located, groups of individuals that

have common interests). Here, it has drawn the attention of many researchers over

the past two decades. The research problem of frequent pattern mining was origi-

nally proposed to analyze shoppers' market basket transaction databases containing

precise data, in which the contents of transactions in the databases are known. Such

a research problem also plays an important role in other data mining tasks, such

as the mining of interesting or unexpected patterns, sequential mining, associative

classification, as well as outlier detection, in various real-life applications. As we

are living in an uncertain world, data in many real-life applications are uncertain.

Recently, researchers have paid more attention to the mining of frequent patterns

from probabilistic datasets of uncertain data.

In this chapter, we presented some recent works on mining frequent patterns

from probabilistic datasets of uncertain data. These include candidate generate-and-

test based, hyperlinked structure based, tree-based, as well as vertical uncertain

frequent pattern mining algorithms. Among them, the U-Apriori algorithm generates

candidate patterns and tests if their expected support meets or exceeds a user-specified

threshold. To avoid such a candidate generate-and-test approach, both UH-mine

and UF-growth algorithms use a pattern-growth mining approach. The UH-mine

algorithm keeps a UH-struct, from which frequent patterns are mined; the UF-growth

algorithm constructs a UF-tree, from which frequent patterns are mined. The UFP-

growth algorithm applies clustering to help reduce the number of nodes in a UFP-tree.

The PUF-growth and CUF-growth algorithms respectively construct a PUF-tree and

Frequent Pattern Mining

Search WWH ::

Custom Search

Home